Analyze sentiment with Opinion Text Mining (PART 1)

Table of Contents

    Sentiment or opinion analysis is a data mining technique that attempts to determine if a person or a group of people have a certain perception of a product, a company, an event, pretty much everything for which you can gather feedback in text format.

    One popular use is to analyze social media comments, especially from Twitter where the text is short and concise. Other popular use is to analyze hotel guests evaluations on travel web sites and restaurant reviews.

    Whatever your use case for sentiment analysis, you will soon discover that it can provide extremely helpful insight without having to read hundreds if not thousands lines of text and for each one determine the opinion or opinions of the creators.

    Additionally it also removes the emotional aspect of these comments from the data analyst, by focusing on content rather than the form of the way it was written.

    There are, however, challenges with this type of data mining as you can imagine. Opinion mining is a text analysis technique that uses computational linguistics and natural language processing and is a still developing field in Machine Learning. There are numerous complexities such as language, slang, abbreviations and context to take into consideration.

    As with most disciplines in Machine Learning, context is critical in the attempt to understand the subject of the analysis so in the case of opinion mining it is important to determine what is the subject of the text to be analyzed. If the text to be analyzed has multiple opinions then it of course makes the work a little harder. For example:

    sentiment analysis

    In the phrase above we have 3 subjects for which an opinion was given. Would you agree that the overall sentiment about this restaurant is positive? Or would the service trump the overall sentiment and change it to a negative? Using weights you could change the overall sentiment but then you have to deal with synonyms to every single subject, for example instead of “the service was slow” they could have written “the staff was slow”, or “the waiter was rude”, “the waitress messed up our orders”, “the manager was smoking in the kitchen”, and so on, all leading to the same opinion.

    Taking the same example above and considering that the reviewer only gave one or two opinions of the same weight, how to assess it then?

    And are those real life examples? Wouldn’t these be more inline with what we would expect from user reviews for example?

    The food and service were unacceptable, but the concierge were nice. After talking to them about the quality of the food and the process to get room service they refunded the money we spent at the restaurant and gave us a voucher for nearby restaurants.

    The rooms were beautiful. The AC was good and quiet. The breakfast was good too with good options and good servicing times. The thing we didn’t like was that the toilet in our bathroom was smelly. It could have been that the toilet was not cleaned before we arrived. Either way it was very uncomfortable. Once we notified the staff, they came and cleaned it and left candles.

    Purpose

    By implementing an automated sentiment analysis process for your company, product, service or location you will be able to instantly monitor feedback and act quickly. There is nothing worse than an unsatisfied customer remaining angry at you for a long time, it will fester more negativity. However by quickly reaching out to a customer and offering at least some mediation, the customer will feel better about their experience.

    To do the above, you will need tools to automate the entire process to be efficient and to be useful. If it takes you hours or days to do this each time a new batch of comments arrive then it will eventually be self-defeating. Ideally, you would automate the tools to process this immediately after a comment is made.

    data flows

    Tools

    There are a number of out-of-the-box proprietary and open-source Machine Learning engines that you can use with ClicData to perform sentiment analysis. They vary by the functionality they offer, such as language detection, sentiment analysis, and opinion mining.

    • Microsoft Azure Text Analytics
    • Google Cloud Natural Language
    • IBM Watson Natural Language Understanding
    • AWS Comprehend

    These APIs typically require you to send a request body containing the text in JSON format using POST method to get a response containing the results of the analysis. You can then transform the response into a data structure suitable for visualizations.

    Sentiment analysis with ClicData

    For the example below, we are using the Azure Text Analytics API to demonstrate preparing the text in a format required by the API, sending the request, post-processing the response to a data structure and visualizing the analysis in ClicData. We are using the Web Service connector to set up the API.

    Prerequisites:

    • A Microsoft Azure account. Create a free account or sign in.
    • A Cognitive Services API account with the Text Analytics API. If you don’t have one, you can sign up and use the free tier for 5,000 transactions.
    • The Text Analytics access key that was generated for you during sign-up.
    • A ClicData account.

    Approach

    We are looking to analyze the overall sentiment and as well as opinions of specific aspects such as food and atmosphere of a particular restaurant from over 600 yelp reviews. Below is a sample of the raw data:

    opinions data collection

    Azure Text Analysis API provides the functionality to determine the overall sentiment for each text as well as for items within the text using opinion mining.

    The API does require the input text to be formatted to a JSON structure as below:

    json structure

    Once the raw text is transformed into the prescribed format above, it can be sent to the Azure ML engine via the REST API endpoint using a POST request. We should get a response containing sentiment scores of each text and items within each text as shown below.

    sentiment scores

    The API currently has a limit of sending 10 texts per call. The Web Service connector in ClicData allows you to send multiple calls so you can make multiple POST requests containing 10 texts per call if you need to process more than 10 texts. Accordingly, you should get a response for each text you send on the request.

    You can do some further transformations on the response to create data structures that suits the widgets you want to add to the dashboard.

    Below is an example of a dashboard we have created using Yelp reviews.

    reviews

    A detailed description of setting up ClicData to do all of the above using the Web Service connector to connect to the Azure Text Analysis API and transforming the raw text to a compatible format is covered in Part 2 of this blog.

    Summary

    In this article we have given a brief description of what Sentiment Analysis and Opinion Mining is and what they can do for you and your business. We have also looked at some existing cloud tools in the market that can perform Sentiment Analysis in multiple languages. Finally we gave a quick intro on how to build the entire process in ClicData from beginning to end. The full detailed description will be covered in Part 2 of this blog.