This year polls were off. Hopefully years of upcoming research will tell us. Finding solutions w/o understanding the problem can be very harmful. Here are some limitations of sentiment analysis as I understand them to apply to predicting voting behaviour https://on.wsj.com/32p4g3j 
Social media is rich with sentiment. Companies gauge that to understand how people relate to their products. In academic research, computational social science has been embracing sentiment analysis for all sorts of phenomena such as community well-being https://whr.tn/2Il9F3S .
If accurate, SA can tell us if a text has a positive or negative sentiment. Often sentiment is not enough to reflect a user’s message, because statistical methods that sum up words in e.g. bags of words may lose parts of the meaning of a text: https://www.aclweb.org/anthology/D18-1141.pdf.
Apart from meaning, there’s also context. SA does not in itself explain the underlying sentiment. For more insights, other approaches are necessary. @daniel_preotiuc, Mihaela Gaman & @nikaletras looked at complaints v sentiment (table from paper) https://www.aclweb.org/anthology/P19-1495.pdf.
Social media is also ripe with sarcasm, pranks, trolling, parody ( https://www.aclweb.org/anthology/2020.acl-main.403.pdf) and Internet culture that is sometimes difficult to understand even for seasoned users (have you seen the sheep pics in Trump’s comments?).
All these measurements depend on data&labels. Change the tweet pool and a method may not be as accurate. We saw this with @gerasimoss & our students when looking at hate speech on Youtube. Comments on music videos are different than comments on influencer & scandal videos.
In academic research, these methods are explored out in the open, and transparency is becoming a mantra. @emilymbender & Batya Friedman’s workshop on data statements is one of my year’s Zoom highlights: https://www.aclweb.org/anthology/Q18-1041/ (thank you Emily & Batya for the revelations!)
Business solutions proposed by companies to understand voter behaviour are a slippery slope. What are the models? What's the data the models are trained on, & how? What are the limitations? Without transparent auditing, such solutions *may* be snake oil. https://twitter.com/random_walker/status/1196870349574623232?s=20.
That’s exactly the Cambridge Analytica model: understand voters so you predict & manipulate behaviour. I've written about the legal issues with using social media data on such a scale without informing platform users (with Stephan Mulders) in @eu_cml: https://twitter.com/eu_cml/status/1165910076676345856?s=20
Really curious what NLP folks think about these developments, and the pro's & con's of sentiment analysis for predicting voting behavior @gerasimoss @emilymbender @ionandrou @nikaletras @daniel_preotiuc @SeeTedTalk
You can follow @CatalinaGoanta.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: