Coding hundreds of tweets & using a machine learning tool, I have preliminary evidence which suggests the PRC government is conducting a new effort on Twitter to amplify genocidal language & generate the perception of Chinese ultranationalism toward Taiwan.
Findings below:
[1]
Findings below:
[1]
Brief rewind: a week ago, I shared the trend below, demonstrating a sudden increase in 2020 on Twitter of the genocidal phrase 留岛不留人 (“keep the island, rid of its people”).
It raises the question: is the trend state-backed or an organic outpouring of ultranationalism?
[2]
It raises the question: is the trend state-backed or an organic outpouring of ultranationalism?
[2]
To get at the answer, I divided all 314 tweets from Feb 2020 into two groups: (1) advocating for the genocidal idea or (2) criticizing/responding to the idea. Overall, there were 261 unique accounts. Two-thirds (163) were advocating and one-third (98) were responding.
[3]
[3]
Using a machine learning tool @botometer, I discovered that accounts advocating for the genocidal phrase had much greater prevalence of suspicious or bot-like traits.
Below is a distribution of scores for all 261 accounts.
Blue = less suspicious account traits
Red = more
[4]
Below is a distribution of scores for all 261 accounts.
Blue = less suspicious account traits
Red = more
[4]
50% of the accounts advocating for the genocidal phrase had two or more traits with a score between 4-5, meaning very suspicious or bot-like characteristics.
In contrast, only 4% of accounts criticizing/responding to the phrase had two or more traits scored between 4-5.
[5]
In contrast, only 4% of accounts criticizing/responding to the phrase had two or more traits scored between 4-5.
[5]
Botometer predicts the probability an account is completely automated. This discrepancy is also striking (see chart).
A third of accounts advocating for the genocidal phrase are 20% or more likely to be completely automated, compared to only 4 accounts in the other group.
[6]
A third of accounts advocating for the genocidal phrase are 20% or more likely to be completely automated, compared to only 4 accounts in the other group.
[6]
So what does this mean?
We know that since Jan 2020, there’s been a sharp increase in accounts putting out ultranationalist & genocidal posts toward Taiwan in simplified Chinese. Now, the sample above indicates a large number of these accounts have suspicious traits.
[7]
We know that since Jan 2020, there’s been a sharp increase in accounts putting out ultranationalist & genocidal posts toward Taiwan in simplified Chinese. Now, the sample above indicates a large number of these accounts have suspicious traits.
[7]
These Chinese-language tweets are not all linked to one or two events, but most are directed at Taiwan. Common targets on Twitter include Tsai Ing-Wen’s account, news about Taiwan, or news of PLA military exercises to intimidate Taiwan.
[8]
[8]
If this initial analysis is reflecting a larger state-backed effort, it could indicate, for example, the PRC wants to create the perception--especially for Taiwanese--that there is a popular outpouring of Chinese support for military and genocidal action against Taiwan.
[9]
[9]
As mentioned previously, the trend seems to have taken off right around Tsai’s re-election. It’s possible the CCP wants to invoke greater fear of Chinese hypernationalism among the Taiwanese public in order to reduce support for Tsai’s state-building.
[10]
[10]
We know the PRC engages in disinformation campaigns on Twitter. For example, see @ASPI_org report from Sept 2019: https://www.aspi.org.au/report/tweeting-through-great-firewall
Twitter suspended a large number of suspicious accounts thought to have links to the PRC govt: https://blog.twitter.com/en_us/topics/company/2019/information_operations_directed_at_Hong_Kong.html
[11]
Twitter suspended a large number of suspicious accounts thought to have links to the PRC govt: https://blog.twitter.com/en_us/topics/company/2019/information_operations_directed_at_Hong_Kong.html
[11]
What I’ve found with limited time or access to large data sets is likely just part of the story. Hopefully this will spur others to dig further.
Ultimately @TwitterSupport is best positioned to look into whether the PRC is amplifying genocidal language on this platform.
[12]
Ultimately @TwitterSupport is best positioned to look into whether the PRC is amplifying genocidal language on this platform.
[12]
Methodology:
Examining the 314 tweets with the phrase “留岛不留人” from the month of Feb 2020, I coded each as either advocating for the phrase or criticizing/responding to the phrase. To reduce bias, I coded each account before obtaining its score from Botometer.
[13]
Examining the 314 tweets with the phrase “留岛不留人” from the month of Feb 2020, I coded each as either advocating for the phrase or criticizing/responding to the phrase. To reduce bias, I coded each account before obtaining its score from Botometer.
[13]
Next I entered each of the 261 unique accounts into Botometer: https://botometer.iuni.iu.edu/#!/
Below is an example of what the tool returns back to the user. I recorded the scores for each account.
[14]
Below is an example of what the tool returns back to the user. I recorded the scores for each account.
[14]
I also recorded the date on which I obtained the data. Scores can change over time because the Botometer tool changes its algorithm as it learns. See their FAQ for more on how it works: https://botometer.iuni.iu.edu/#!/faq
[15]
[15]
I decided to use Botometer because of the variety of data it provides and the positive reviews it has received. For example, Pew Research has used this tool to identify bots:
https://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/
[16]
https://www.pewresearch.org/fact-tank/2018/04/19/qa-how-pew-research-center-identified-bots-on-twitter/
[16]
Because of the amount of time it takes to code and record data, I decided to only examine one month as a sample. So a shortcoming of this analysis is that it could be missing patterns in other months. An assumption is that the trend is similar from Jan-April 2020.
[17]
[17]
The image below contains some of the basic stats of the two groups (Advocacy v Response).
Here is a link to raw data I collected:
https://www.dropbox.com/s/vqpnksxpskwzagk/Feb2020-data-set.xlsx?dl=0
[END]
Here is a link to raw data I collected:
https://www.dropbox.com/s/vqpnksxpskwzagk/Feb2020-data-set.xlsx?dl=0
[END]
Pinging some people who may want to see this follow-up analysis (pardon repeat @-ing you):
@WilliamYang120 @catielila @pybaubry
@audreyt @heguisen @jessicacweiss
@jessicadrun @JMichaelCole1
@lnachman32 @wearytolove
@MargaretKLewis @nathanattrill @tculpan
@WilliamYang120 @catielila @pybaubry
@audreyt @heguisen @jessicacweiss
@jessicadrun @JMichaelCole1
@lnachman32 @wearytolove
@MargaretKLewis @nathanattrill @tculpan