What is sentiment analysis?
Sentiment analysis is a natural language processing (NLP) analytical technique that is used to determine the underlying emotional opinion of a piece of text or data and whether it is positive, negative or neutral.
This technique is often performed on pieces of text across customer reviews, social media, blogs, news, etc. and is oftentimes used by entities to understand customer sentiment towards products or brands.
Why is sentiment analysis so important?
In an age where people are expressing themselves more openly than ever online, there is a treasure trove of data to be had on the internet. Sentiment analysis is quickly becoming an essential tool to analyze data like this at scale.
Say a company had 10000 reviews for one of the products they were offering, they could automatically determine the sentiment of their customers using sentiment analysis. How long would this have taken to do by hand? Or perhaps businesses could use sentiment analysis to track the reaction to a new piece of news or product shared to social media platforms in real-time, meaning they could respond immediately.
Building a sentiment text analysis tool in C#
In order to be able to analyse customer feedback and reviews, we first need some data to work with. I don’t happen to carry around customer data with me so for this exercise I turned to Amazon who kindly offer up mock data entailing over 100 million records spanning two decades; this mock customer feedback data can be found here:
Next, we can use this data to create a class to represent a single review:
class Review
{
/// <summary>
/// 2 letter country code of the marketplace where the review was written.
/// </summary>
public string Marketplace { get; set; }
/// <summary>
/// Random identifier that can be used to aggregate reviews written by a single author.
/// </summary>
public string CustomerId { get; set; }
/// <summary>
/// The unique ID of the review.
/// </summary>
public string ReviewId { get; set; }
/// <summary>
/// The unique Product ID the review pertains to.
/// </summary>
public string ProductId { get; set; }
/// <summary>
/// Random identifier that can be used to aggregate reviews for the same product.
/// </summary>
public string ProductParent { get; set; }
/// <summary>
/// Title of the product.
/// </summary>
public string ProductTitle { get; set; }
/// <summary>
/// Broad product category that can be used to group reviews (also used to group the dataset into coherent parts).
/// </summary>
public string ProductCategory { get; set; }
/// <summary>
/// The 1-5 star rating of the review.
/// </summary>
public int StarRating { get; set; }
/// <summary>
/// Number of helpful votes.
/// </summary>
public int HelpfulVotes { get; set; }
/// <summary>
/// Number of total votes the review received.
/// </summary>
public int TotalVotes { get; set; }
/// <summary>
/// Review was written as part of the Vine program.
/// </summary>
public string Vine { get; set; }
/// <summary>
/// The review is on a verified purchase.
/// </summary>
public string VerifiedPurchase { get; set; }
/// <summary>
/// The title of the review.
/// </summary>
public string ReviewHeadline { get; set; }
/// <summary>
/// The review text.
/// </summary>
public string ReviewBody { get; set; }
/// <summary>
/// The date the review was written.
/// </summary>
public DateTime ReviewDate { get; set; }
}
Now that we have our data and a class to house it, we can start writing our program. For this example, I’ve opted to use a console application as they are quick and easy to get up and running with.
First, we read all lines into a string array; note that you will have to alter the path based on the location of your data. We skip the first line as that consists of headers. The particular subset of data I obtained from Amazon consisted of over 2 million records, so I’ve limited the number I will work with to 30, using .Take(30). As the data is tab-delimited, we split each line using \t.
reviews = new List<Review>();
string[] lines = File.ReadAllLines(@"C:\Dev\Solutions…");
lines = lines.Skip(1).ToArray();
foreach (string line in lines.Take(30))
{
string[] cols = line.Split('\t');
Review review = new Review
{
Marketplace = cols[0],
CustomerId = cols[1],
ReviewId = cols[2],
ProductId = cols[3],
ProductParent = cols[4],
ProductTitle = cols[5],
ProductCategory = cols[6],
StarRating = Convert.ToInt32(cols[7]),
HelpfulVotes = Convert.ToInt32(cols[8]),
TotalVotes = Convert.ToInt32(cols[9]),
Vine = cols[10],
VerifiedPurchase = cols[11],
ReviewHeadline = cols[12],
ReviewBody = cols[13],
ReviewDate = DateTime.Parse(cols[14])
};
reviews.Add(review);
}
Now that we have a list of reviews in code, we can analyse the sentiment behind each review. To do this we will be taking advantage of Google’s Natural Language API that uses machine learning to apply NLU (natural language understanding). Doing so means that we can use Google’s set of pre-trained machine learning models, as opposed to having to build our own.
In order to use Google’s Natural Language API, we must first set it up in the Google Cloud Console. Full instructions on how to do so can be found here:
Once a private key has been obtained, it can be set in the project properties, under the debug settings as follows:
Now that we’re all set up to use the Natural Language API, we can consult the Natural Language reference documentation for .NET for instructions on how to use the API.
First, we create an instance of LanguageServiceClient. Then using the review body, we create a Document object which is then passed to the AnalyzeSentiment method. The response is of type AnalyzeSentimentResponse so we create this property in our Review class so that we can assign the response to it:
/// <summary>
/// Sentiment analysis response from Google's Natural Language API.
/// </summary>
public AnalyzeSentimentResponse AnalyzeSentimentResponse { get; set; }
reviews = new List<Review>();
string[] lines = File.ReadAllLines(@"C:\Dev\Solutions…");
lines = lines.Skip(1).ToArray();
LanguageServiceClient client = LanguageServiceClient.Create();
foreach (string line in lines.Take(30))
{
string[] cols = line.Split('\t');
Review review = new Review
{
Marketplace = cols[0],
CustomerId = cols[1],
ReviewId = cols[2],
ProductId = cols[3],
ProductParent = cols[4],
ProductTitle = cols[5],
ProductCategory = cols[6],
StarRating = Convert.ToInt32(cols[7]),
HelpfulVotes = Convert.ToInt32(cols[8]),
TotalVotes = Convert.ToInt32(cols[9]),
Vine = cols[10],
VerifiedPurchase = cols[11],
ReviewHeadline = cols[12],
ReviewBody = cols[13],
ReviewDate = DateTime.Parse(cols[14])
};
Document document = Document.FromPlainText(review.ReviewBody);
AnalyzeSentimentResponse sentimentResponse = client.AnalyzeSentiment(document);
review.AnalyzeSentimentResponse = sentimentResponse;
reviews.Add(review);
}
Inspecting the AnalyzeSentimentResponse object of each review, we can see that the sentiment analysis consists of two scores: one for sentiment and the other magnitude.
The sentiment score is the overall emotion of the document (a customer review in our case). The score ranges from -1 indicating a negative emotion, to 1 being a positive emotion.
The sentiment magnitude is how strongly they feel about the given sentiment. This value can range from 0 to infinity and is often proportional to the length of the document.
So a score of 0 and a magnitude of 0 would indicate a neutral sentiment, whereas a score of 0 but a higher magnitude score such as 4 would indicate a mixed sentiment.
Next, we will write a couple of methods to help us interpret the sentiment analysis values easier. First, we create an enum to represent each possible outcome; clearly positive, clearly negative, mixed, neutral and undetermined:
enum SentimentValues
{
ClearlyPositive,
ClearlyNegative,
Neutral,
Mixed,
Undetermined
}
Next, we can create a class that extends the AnalyzeSentimentResponse class and write our methods to interpret the sentiment values. I’ve set the positive/negative threshold to be 0.25 and -0.25 respectively, but you can always tweak these depending on your own results:
static class AnalyzeSentimentResponseExtensions
{
/// <summary>
/// Interpret sentiment analysis values.
/// </summary>
/// <see cref="https://cloud.google.com/natural-language/docs/basics#interpreting_sentiment_analysis_values"/>
/// <param name="analyzeSentimentResponse"></param>
public static SentimentValues Sentiment(this AnalyzeSentimentResponse analyzeSentimentResponse)
{
return DetermineSentiment(analyzeSentimentResponse.DocumentSentiment);
}
private static SentimentValues DetermineSentiment(Sentiment sentiment)
{
if (sentiment == null)
{
return SentimentValues.Undetermined;
}
else if (sentiment.Score >= 0.25)
{
return SentimentValues.ClearlyPositive;
}
else if (sentiment.Score <= -0.25)
{
return SentimentValues.ClearlyNegative;
}
else if (sentiment.Magnitude > 0.25)
{
return SentimentValues.Mixed;
}
return SentimentValues.Neutral;
}
}
Next, we can add two more methods that help us to determine an overall score. By multiplying the score and the magnitude together, we get a score that best represents not only the emotion but its scale:
/// <summary>
/// Calculate an overall sentiment score by multiplying the magnitutde by the score.
/// </summary>
/// <param name="analyzeSentimentResponse"></param>
/// <returns></returns>
public static float OverallSentimentScore(this AnalyzeSentimentResponse analyzeSentimentResponse)
{
return DetermineOverallScore(analyzeSentimentResponse.DocumentSentiment);
}
private static float DetermineOverallScore(Sentiment sentiment)
{
if (sentiment == null)
{
return 1000;
}
else if (sentiment.Score > 0)
{
return sentiment.Score * sentiment.Magnitude;
}
else if (sentiment.Score < 0)
{
return -Math.Abs(sentiment.Score * sentiment.Magnitude);
}
return 0;
}
Now all that is left to do is to display our findings. Firstly, we’ll add a couple more properties to our Review object in order to expose our extension class:
public SentimentValues Sentiment
{
get
{
return AnalyzeSentimentResponse.Sentiment();
}
}
public float OverallSentimentScore
{
get
{
return AnalyzeSentimentResponse.OverallSentimentScore();
}
}
Finally, we give a count of each sentiment available; percentages could also be calculated from these values. We also display each review, its sentiment score and magnitude values and overall score.
Console.WriteLine($"Clearly positive: {reviews.Where(r => r.Sentiment == SentimentValues.ClearlyPositive).Count()}.");
Console.WriteLine($"Clearly negative: {reviews.Where(r => r.Sentiment == SentimentValues.ClearlyNegative).Count()}.");
Console.WriteLine($"Mixed: {reviews.Where(r => r.Sentiment == SentimentValues.Mixed).Count()}.");
Console.WriteLine($"Neutral: {reviews.Where(r => r.Sentiment == SentimentValues.Neutral).Count()}.");
Console.WriteLine("");
foreach (var review in reviews.OrderBy(r => r.OverallSentimentScore))
{
Console.WriteLine(review.ProductTitle);
Console.WriteLine(review.ReviewBody);
Console.WriteLine($"Sentiment: {review.Sentiment}.");
Console.WriteLine($"Score: {review.AnalyzeSentimentResponse.DocumentSentiment?.Score}.");
Console.WriteLine($"Magnitude: {review.AnalyzeSentimentResponse.DocumentSentiment?.Magnitude}.");
Console.WriteLine($"Overall: {review.OverallSentimentScore}.");
Console.WriteLine("-----------------------------------------------------");
Console.WriteLine("");
}
Upon inspecting the results, we can see that the threshold of +-0.25 seems to bear fairly accurate results.
And there we have it. I hope that this exercise has not only shown how to perform sentiment analysis quick and easy but also how easy it can be to incorporate AI artificial intelligence and machine learning into our applications.