Data Access and Disinformation: How the Politics of Deletion Benefits Influence Operations

Marco T. Bastos

Post based on the keynote address to the CHIST-ERA Programme for European Coordinated Research, 19 May 2021

Social media platforms operate in an open market with limited regulation or a public-facing system of governance. These platforms flourished in an environment that supported the continuous upscaling of social infrastructure and positioned the platforms as centralised gatekeepers. These changes in the social infrastructure are driving anxieties about the prevalence of mis- and disinformation on social media, but mitigation strategies are implemented with inadequate appreciation of the policies driving data privacy, access, surveillance, and microtargeting. In the following, we discuss five key challenges in the detection and mitigation of disinformation in a context marked by the politics of deletion implemented by social platforms.

We cannot know what we do not know

Effective mitigation strategies require tracking disinformation in real-time and considerable resources have been allocated by social platforms to this effect. Unfortunately, the research community is not privy to such internal operations and therefore do not know, indeed cannot know, what they do not know. Our study of the Brexit referendum campaign offers a cautionary tale. We analysed tweet decay in three million posts leading up to the vote and compared it with data from the same period, and finally to a database of four years of Brexit activity on Twitter. While studies prior to 2016 found that on average less than 4% of tweets would disappear from the platform, tweet decay in the Brexit referendum campaign was remarkably higher. Indeed, 33% of the tweets leading up to the Brexit referendum vote have been removed from Twitter. Deletion is not restricted to tweets, but to accounts as well: only about half of the most active accounts that tweeted the Brexit referendum continue to operate publicly. From the entire universe of accounts that tweeted the referendum, 20% are no longer active and 20% of these accounts were actively blocked by Twitter. These studies offer evidence that social platforms are removing more content and systematically purging accounts. In sharp contrast to the 4% baseline of tweet decay reported before 2016, even non-political messages are being purged at a rate three times higher than that.

Back engineering social platforms

Social platforms deserve some measure of sympathy for trying to juggle the conflicting priorities of privacy, transparency, and safety. These competing pressures offered no incentives for increasing access to the data they collected, or to increase the transparency in their policies for content removal. Public accountability is thus constrained by digital privacy concerns that feed their increasing opaqueness, which in turn further skews the balance of power between social platforms and the public. This situation forced researchers and journalists monitoring disinformation campaigns to work with fragmentary evidence and second-guess the algorithmic decisions that resulted in the purging or downranking of content. The reverse engineering of social platforms, commonly referred to as ‘algorithmic auditing,’ requires extensive deep digging into disinformation as it happens in real time and with limited support from social platforms. Even when users and journalists report potential disinformation campaigns, social platforms rarely disclose content that was flagged for removal, and therefore studying influence operations on social media becomes an exercise in reverse engineering at multiple levels, with the most prominent being the interplay between the strategies and intentions of malicious state and non-state actors and the limited amount of evidence (data) made available by social platforms.

Gaslighted by social platforms

Influence operations routinely daisy chain multiple harassment and disinformation campaigns that are phased out and disappear as soon as rectifying information emerges. The low persistence of social media posts is leveraged to transition from one contentious and unverified frame to the next before mechanisms for checking and correcting false information are in place. As such, influence operations can easily exploit the opaqueness and inscrutability of social platforms by offloading problematic content that is removed from platforms before the relentless―but ultimately time consuming―news cycle has successfully corrected misleading narratives. The absence of oversight mechanisms, and a context where influence operations can easily leverage the firehose of falsehood model, maximises the vulnerability of those targeted by influence operations. Individuals find themselves unable to tell whether mass harassment and brigading are coordinated or not, and the decision-making process regarding content that has been reported or flagged for removal is restricted to social platforms’ content policy team, who decides on individual cases with little external input. The opaqueness and the politics of deletion implemented by social platforms is beneficial to influence operations because disinformation performs well in short timeframes. Even when content is routinely removed, the high-volume posting is effective because individuals are more likely to be persuaded if a story, however confusing, appears to have been reported repetitively and by multiple sources.

On social media, no one knows you are a bot

Substantial resources have been allocated to scalable solutions based on machine learning and predictive analytics (i.e., ‘artificial intelligence’). Data analytics and machine learning algorithms present a point of departure from statistical analyses based on probability distributions that grounded much of the social sciences in the past century. But it is uncertain if machine learning can bring a measure of control to this information ecosystem, not the least because machine learning is also available and can be promptly leveraged by influence operations. Indeed, the use of computer-generated profile images has become a staple in influence operations on social media and is also becoming the de facto standard in state propaganda. These are also long-established challenges in bot identification. Even state-of-the-art classifiers are imprecise when it comes to identifying social bots and estimating automated activity. The scores are prone to variance and likely to lead to false negatives (bots classified as humans) and false positives (humans classified as bots), particularly for accounts posting content in languages other than English. In our own study where a large botnet was identified, we found that even though bots rely on trivial computing routines, bot detection was not an exact science and neither human annotators nor machine-learning algorithms would perform particularly well.

If you cannot see it, did it happen at all?

The lockdown of social platform’s APIs, especially that of Facebook and Instagram, has hindered research on influence operations in meaningful ways. Mitigation strategies are inadequate because independent source attribution is near impossible in the absence of digital forensics. The monitoring tools currently available, including data access facilitated by Social Science One and CrowdTangle, which is owned by Facebook, are fundamentally imperfect because no direct access to the data is possible. Similarly, while Twitter has offered archives of disinformation campaigns that the company identified and removed, such sanctioned archives offer only a partial glimpse into the extent of influence operations and may prevent researchers from examining organic contexts of manipulation. With no access to Facebook and Instagram data, arguably the most important platforms for propaganda and influence operations, independent source attribution and the monitoring of disinformation is near impossible. As such, our understanding of what constitutes disinformation and how widespread the problem is on social platforms is tied to, and depends on, the fragmented data that platforms release to limited research groups, usually institutions that have struck an agreement with the companies. This limited sample of disinformation campaigns effectively shapes our understanding of what strategies are in place, how large these networks of disinformation are, and what mitigation strategies can be employed.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.