Data Access and Disinformation: How the Politics of Deletion Benefits Influence Operations
Marco T. Bastos
Post based on the keynote address to the CHIST-ERA Programme for European Coordinated Research, 19 May 2021
Social media platforms operate in an open market with limited regulation or public-facing system of governance. These platforms flourished in an environment that supported the continuous upscaling of social infrastructure and positioned the platforms as centralized gatekeepers. These changes in the social infrastructure are driving anxieties about the prevalence of mis- and disinformation on social media, but mitigation strategies are implemented with inadequate appreciation of the policies driving data privacy, access, surveillance, and microtargeting. In the following, we discuss five key challenges in the detection and mitigation of disinformation in a context marked by the politics of deletion implemented by social platforms.
We cannot know what we do not know
Effective mitigation strategies require tracking disinformation in real-time and considerable resources have been allocated by social platforms to this effect. Unfortunately, the research community is not privy to such internal operations and therefore do not know, indeed cannot know, what they do not know. Our study of the Brexit referendum campaign offers a cautionary tale. We analyzed tweet decay in three million posts leading up to the vote and compared it with data from the same period, and finally to a database of four years of Brexit activity on Twitter. While studies prior to 2016 found that on average less than 4% of tweets would disappear from the platform, tweet decay in the Brexit referendum campaign was remarkably higher. Indeed, 33% of the tweets leading up to the Brexit referendum vote have been removed from Twitter. Deletion is not restricted to tweets, but to accounts as well: only about half of the most active accounts that tweeted the Brexit referendum continue to operate publicly. From the entire universe of accounts that tweeted the referendum, 20% are no longer active and 20% of these accounts were actively blocked by Twitter. These studies offer evidence that social platforms are removing more content and systematically purging accounts. In sharp contrast to the 4% baseline of tweet decay reported before 2016, even non-political messages are being purged at a rate three times higher than that.
Back engineering social platforms
Social platforms deserve some measure of sympathy for trying to juggle the conflicting priorities of privacy, transparency, and safety. These competing pressures offered no incentives for increasing access to the data they collected, or to increase the transparency in their policies for content removal. Public accountability is thus constrained by digital privacy concerns that feed their increasing opaqueness, which in turn further skews the balance of power between social platforms and the public. This situation forced researchers and journalists monitoring disinformation campaigns to work with fragmentary evidence and second-guess the algorithmic decisions that resulted in the purging or downranking of content. The reverse engineering of social platforms, commonly referred to as ‘algorithmic auditing,’ requires extensive deep digging into disinformation as it happens in real time and with limited support from social platforms. Even when users and journalists report potential disinformation campaigns, social platforms rarely disclose content that was flagged for removal, and therefore studying influence operations on social media becomes an exercise in reverse engineering at multiple levels, with the most prominent being the interplay between the strategies and intentions of malicious state and non-state actors and the limited amount of evidence (data) made available by social platforms.
Gaslighted by social platforms
Influence operations routinely daisy chain multiple harassment and disinformation campaigns that are phased out and disappear as soon as rectifying information emerges. The low persistence of social media posts is leveraged to transition from one contentious and unverified frame to the next before mechanism for checking and correcting false information are in place. As such, influence operations can easily exploit the opaqueness and inscrutability of social platforms by offloading problematic content that is removed from platforms before the relentless―but ultimately time consuming―news cycle has successfully corrected misleading narratives. The absence of oversight mechanisms, and a context where influence operations can easily leverage the firehose of falsehood model, maximizes the vulnerability of those targeted by influence operations. Individuals find themselves unable to tell whether mass harassment and brigading are coordinated or not, and the decision-making process regarding content that has been reported or flagged for removal is restricted to social platforms’ content policy team, who decides on individual cases with little external input. The opaqueness and the politics of deletion implemented by social platforms is beneficial to influence operations because disinformation performs well in short timeframes. Even when content is routinely removed, the high-volume posting is effective because individuals are more likely to be persuaded if a story, however confusing, appears to have been reported repetitively and by multiple sources.
On social media, no one knows you are a bot
Substantial resources have been allocated to scalable solutions based on machine learning and predictive analytics (i.e., ‘artificial intelligence’). Data analytics and machine learning algorithms present a point of departure from statistical analyses based on probability distributions that grounded much of the social sciences in the past century. But it is uncertain if machine learning can bring a measure of control to this information ecosystem, not the least because machine learning is also available and can be promptly leveraged by influence operations. Indeed, the use of computer-generated profile images has become a staple in influence operations on social media and is also becoming the de facto standard in state propaganda. These are also long-established challenges in bot identification. Even state-of-the-art classifiers are imprecise when it comes to identifying social bots and estimating automated activity. The scores are prone to variance and likely to lead to false negatives (bots classified as humans) and false positives (humans classified as bots), particularly for accounts posting content in languages other than English. In our own study where a large botnet was identified, we found that even though bots rely on trivial computing routines, bot detection was not an exact science and neither human annotators nor machine-learning algorithms would perform particularly well.
If you cannot see it, did it happen at all?
The lockdown of social platform’s APIs, especially that of Facebook and Instagram, has hindered research on influence operations in meaningful ways. Mitigation strategies are inadequate because independent source attribution is near impossible in the absence of digital forensics. The monitoring tools currently available, including data access facilitated by Social Science One and CrowdTangle, which is owned by Facebook, are fundamentally imperfect because no direct access to the data is possible. Similarly, while Twitter has offered archives of disinformation campaigns that the company identified and removed, such sanctioned archives offer only a partial glimpse into the extent of influence operations and may prevent researchers from examining organic contexts of manipulation. With no access to Facebook and Instagram data, arguably the most important platforms for propaganda and influence operations, independent source attribution and the monitoring of disinformation is near impossible. As such, our understanding of what constitutes disinformation and how widespread the problem is on social platforms is tied to, and depends on, the fragmented data that platforms release to limited research groups, usually institutions that have stroke an agreement with the companies. This limited sample of disinformation campaigns effectively shapes our understanding of what strategies are in place, how large these networks of disinformation are, and what mitigation strategies can be employed.
- Acker, A. and Donovan, J. (2019), ‘Data craft: a theory/methods package for critical internet studies’, Information, Communication & Society, 22(11), 1590-1609.
- Ausloos, J. (2012), ‘The ‘right to be forgotten’–Worth remembering?’, Computer law & security review, 28(2), 143-152.
- Bagdouri, M. and Oard, D. W. (2015) On predicting deletions of microblog posts, ACM, 1707-1710.
- Bastos, M. T. (2021c), ‘This Account Doesn’t Exist: Tweet Decay and the Politics of Deletion in the Brexit Debate’, American Behavioral Scientist, 65(5), 757-773.
- Bastos, M. T. and Mercea, D. (2018), ‘The public accountability of social platforms: lessons from a study on bots and trolls in the Brexit campaign’, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
- Bastos, M. T. and Mercea, D. (2019), ‘The Brexit Botnet and User-Generated Hyperpartisan News’, Social Science Computer Review, 37(1), 38-54.
- Elections Integrity (2018) ‘Data archive’, available at: https://about.twitter.com/en_us/values/elections-integrity.html
- Paul, C. and Matthews, M. (2016), ‘The Russian “Firehose of Falsehood” Propaganda Model: Why It Might Work and Options to Counter It’, Rand Corporation, 2-7.
- Rauchfleisch, A. and Kaiser, J. (2020), ‘The False positive problem of automatic bot detection in social science research’, PLoS ONE, 15(10), e0241045.
- Satariano, A. (2021) ‘Inside a Pro-Huawei Influence Campaign’, The New York Times
- Xu, J.-M., Burchfiel, B., Zhu, X. and Bellmore, A. (2013) An examination of regret in bullying tweets, translated by 697-702.