Sentiment Analysis has been around for a while. The basic idea revolves around measuring and classifying emotions (often boiled down to a single positive/negative measurement from -1.0 to 1.0) based on lexical analysis. Back a good few years ago, when Facebook and Whatsapp (don’t exactly remember if this was before the merge) made it possible for you to download your text conversations en-masse, I immediately did so and was amazed at the amount of data I had accumulated and the possibilities of extracting insights about my life and the patterns therein. Unfortunately, it ended up as a post-it that was never really picked up.
Two weeks ago, I found myself with a free weekend. Downloading my data again, I had found almost exponential growth in the conversations I had been having online - for someone who prefers face-to-face communication over text, I now had almost 100,000 messages exchanged on Messenger alone, spanning over half a decade. The right kind of analysis could reveal how I had changed as a person, how my conversational partners have changed, whether I’m prone to cyclic behavior and emotional states, and so much more.
Starting with sentiment analysis, I had put off working on this for so long I expected there to be existing suites I could leverage. Libraries that could directly process and work with information collected about your online presence as well as standard suites of graphing and analytics that connected to them. Unfortunately, other than a few tutorials for sentiment analysis libraries, I couldn’t really find any.
Faced with two options - building a general purpose library or directly solving the questions I wanted answers through direct scripting - (and I’m not proud of the fact that I usually choose the latter), I decided to go for the first one this time around. Converse was the result, and in it’s current state it is a sentiment analysis and plotting library for Messenger in python. My primary objectives were ease of use - I wanted something that could come out of the box and provide useful graphs - and a high degree of configurability. I’ll explain some of the issues, solutions and possible vectors for analysis later, but in standard reverse-sear fashion, let’s start with how to use it.
Install and Setup
pip install converse
That’s it. Or, if you’re the kind to prefer an isolated, fully configured system with a demo application and sample conversations,
docker pull hrishioa/converse
should do the trick.
If you’re using the library directly from python, you’ll also need to download the corpora for TextBlob, the sentiment analysis library we’re using, as follows -
python -m textblob.download_corpora lite
Once you’ve got the library, you need to download your data from facebook.
Disclaimer: Now before you do so, let me confirm that converse is an offline library (the extremely paranoid can disable network connectivity on the docker container), and I don’t retain (nor do I have any interest in) any of your information. The code is quite short and on github, where you can also find the dockerfile and sample conversations. That being said, it makes use of several large libraries for sentiment analysis, plotting and data management (primarily plotly, pandas and Textblob), and I take no responsibility for your data.
Here are a few guides on how to download your data, but here’s a quick overview:
- Facebook.com -> General Account Settings page -> Download a copy of your facebook data
(Make sure to select JSON instead of HTML when choosing the format.)
- Wait a while (possibly a few hours), before Facebook notifies you that your data is ready for download, and then grab it.
- Unzip the archive to any directory you’d prefer. We’re interested in the
That’s it. For the next step, run
jupyter notebook where you’d like the code to live, and add the following code:
from converse import Conversation from tqdm import tqdm_notebook as tqdm from plotly.offline import init_notebook_mode, iplot init_notebook_mode(connected=True) convo = Conversation() convo.load("path-to-message.json") # structure is usually archive-root/convo-name/message.json iplot(convo.plot())
path-to-message.json with the actual path to any conversation you’d like (the conversations are stored as a
message.json file in folders marked by conversation names). Note that we’re using Plotly’s offline plotting library to keep things off the server.
And you should have a basic plot! Below is a sample plot of one of my conversations with the default settings: