Sentiment Analysis using GCP NLP API
GCP Natural Language API:
The Google Natural Language API is an easy to use interface to a set of powerful NLP models which have been pre-trained by Google to perform various tasks. As these models have been trained on enormously large document corpuses, their performance is usually quite good as long as they are used on datasets that do not make use of a very idiosyncratic language.
The biggest advantage of using these pre-trained models via the API is, that no training dataset is needed. The API allows the user to immediately start making predictions, which can be very valuable in situations where little labeled data is available.
The Natural Language API comprises five different services:
- Syntax Analysis
- Sentiment Analysis
- Entity Analysis
- Entity Sentiment Analysis
- Text Classification
In this article I will talk about the recipe for performing Sentiment analysis on data that is stored in BigQuery table using Google Natural Language API.
Sentiment Analysis: Sentiment analysis attempts to determine the overall attitude (positive or negative) expressed within the text. Sentiment is represented by numerical score and magnitude values.
- The score indicates the overall emotion of a document. It ranges between -1.0 (negative) and 1.0 (positive) and corresponds to the overall emotional leaning of the text.
- The magnitude indicates how much emotional content is present within the document, and this value is often proportional to the length of the document. It ranges between 0.0 and +inf.
Ingredients: GCP Project, GCP Bigquery, GCP NLP API, Python, Pandas, Service Account, Cloud Shell/ Google SDK.
We’ll use Google Cloud’s python client library for reading data from bigquery table, passing it to NLP API and writing the results to another bigquery table.
Step 1 : Write a python script which takes project_id, dataset_id, table_id, column_name as input and returns a dataframe with a column which has the text data to be passed on to nlp api.
Step 2: Write a python script which takes text as input and returns a list containing sentiment_score, sentiment_magnitude obtained from nlp api.
Step 3: Write a python script which takes a data frame as input and writes it to a bigquery table.
Step 4: Write a driver program which takes project_id, dataset_id, input_table_id, output_table_id, column_name as input and uses the above created python scripts.
To view above mentioned code in github click here.