Introducing an SMS course to prepare for US election misinformation
One of the things we’ve learned from running so many face-to-face and online training sessions in the last five years is that people find it hard to carve out the time in their busy working lives. As November draws near, we’re trying a new format to help people be prepared for election misinformation in a way that fits into their daily schedules.
“Protection from deception,” our free two-week text message course delivers daily nuggets of training via SMS.
It’s designed to be quick and easy enough to appeal to everyone. Every day, at a time of your choosing, you will receive a little nugget of learning by text message, with some extra video and article links if you want to dive deeper.
The course will give you the knowledge and understanding you need to protect yourself and your community from online misinformation. You’ll learn why people create and share false and misleading content, commonly used tactics for spreading it, what you can do to outsmart it, and how to talk to family and friends about it.
We’re hoping to translate the course into multiple languages, but want to give it a thorough test in English first.
The course is focused on preparation, as there is a growing body of research that shows the importance of inoculating audiences against the tactics and techniques used by those creating and disseminating disinformation. Coronavirus has shown us how damaging misinformation can be, and in the US, with the election around the corner, it’s time to prepare everyone for what they might face online. Understanding the psychology of misinformation is important in fighting it, so please share this far and wide.
How to sign up
Step 1 – Head to -> bit.ly/protection-from-deception
Step 2 – Create a free account.
Step 3 – Once you’ve authenticated your phone number, choose when you want to receive your daily text. We also recommend you sign up for the SMS version.
Step 4 – Follow along with each daily lesson and learn how to be prepared for misinformation.
For a sneak preview of the course, check out this video on how emotional skepticism can help protect vulnerable communities.
How to use network analysis to explore social media and disinformation
Network analysis has become an important tool for a disinformation expert. An increasing number of journalists and researchers are using the practice to analyze the social web and gain insight into the hidden networks and communities that drive information — and disinformation — online.
Take, for instance, open-source intelligence expert Benjamin Strick’s Bolivian Info Op Case Study for Bellingcat, or his uncovering of a pro-Chinese government information operation on Twitter and Facebook. Thanks to the analysis of these networks, the author was able to disclose coordination and manipulation, and to shed light on some of the most common tactics behind a disinformation campaign.
Online connections can influence how political opinions take shape, so analyzing these networks has become fundamental.
But there are challenges. Not all datasets identify relationships and it’s up to the journalist or researcher to define these connections. Sometimes you might end up with a visualization that, despite its beauty or complex nature, reveals nothing of interest about your data.
Although they might seem to be impressive visual content to share with your audience, network visualizations are first and foremost great tools for exploration. They are by no means conclusive charts.
And even though it’s an increasingly common tool in the field, network analysis is a detailed process that challenges attempts to penetrate it by the world’s top academics, and its application requires caution.
A little graph theory
But first of all, how do you define a network? In graph theory, a network is a complex system of actors (nodes) interconnected by some sort of relationship (edges).
A relationship could mean different things — especially on social media, which are fundamentally made of connections. Network analysis means focusing on understanding the connections rather than the actors.
One of the first examples of online networks was the ‘Bacon number.’ The idea emerged in 1994 when the actor Kevin Bacon said in an interview that he had worked with “everyone in Hollywood.” Since then, Google can calculate Bacon numbers for any actor in the world and show the connections to the first node. Actors who have worked directly with Kevin Bacon have a Bacon 1, and so on.
On Facebook, networks take shape based on friendships, Pages, and groups in common.
On Twitter, however, you can also investigate things like hashtags, retweets, mentions, or quotes as well as whether users follow each other.
For example, two accounts on Twitter are the nodes of a network. One retweets the other, and this is called the edge. If they retweet each other multiple times, the weight of their relationship will be higher.
The weight of a relationship is only one of the attributes of nodes and edges.
Another important attribute is the degree (or connectivity) of a node, which describes the number of connections.
A fourth is called the direction, which helps us understand the nature of a relationship. When an account retweets another, it creates what is called a directed connection. In directional networks, there are in-degree values, for when an account retweets or mentions another, and out-degree values, when an account is retweeted or mentioned by another. But other times connections don’t have directions, such as two friends on Facebook, for example.
The density shows how well-connected the graph is, dividing the number of connections the nodes have by the total possible connections a node could have. The closeness is the position of a node within the network (or how close it is to all other nodes in the network); a high value might indicate an actor holds authority over several clusters in the network.
The position of the nodes is calculated by complex graph layout algorithms that allow you to immediately see the structure within the network.
The benefit of network visualizations is to look at a system as whole, not only at a part of it. The challenge is to identify what kinds of relationships you want to investigate, and to interpret what they mean.
Tools for analyzing and visualizing networks
Several tools are available for sketching and analyzing the networks that you produce in the course of your investigations, and some allow you to gather the necessary data directly.
Here are some of the best free tools:
Neo4j is a powerful graph database management technology, well-known for being used by the International Consortium of Investigative Journalists (ICIJ) for investigations such as the Panama Papers.
Gephi is an open-code program that allows you to visualize and consult graphs. It doesn’t require any programming knowledge. It is stronger for visualizations than for analysis, but it can handle relatively large datasets (the actual size will always depend on your infrastructure).
The trick about visualization in general is to find the clearest way to communicate the data.
Prepare your data
First you will need data to analyze, of course, and you can obtain network data in several ways.
The easiest way is to download the Twitter Streaming Importer plug-in directly on Gephi. It connects to the Twitter API and streams live data in a Gephi-friendly format based on words, users, or location, allowing you to navigate and visualize the network in real time.
But if you want to use Twitter historical data, you need to use some scrapers — read our tutorial on how to collect Twitter using Python’s tweepy — and convert the scraped data into a format-friendly file for network visualizations using the tool Table 2 Net.
The tricky part can be identifying which column in the data is for nodes and which is for edges. Not all datasets include connections, and you need to make sure the columns selected contain mentions, hashtags, and accounts, in a clean, tidy format.
Once you have selected the columns to create your network, you can download the network file as .GEXF, the Gephi file format.
Alternatively, you can use the programming language Groovy to generate a GEXF network graph from a JSON file in your terminal. You can choose from among mentions, retweets, and replies. Type on your terminal the following command: groovy export.groovy [options] <input files> <output file>
Explore a bit of Gephi
In this example, let’s use a sample that collects mentions of Bill Gates on Twitter, as he has been a constant subject of misinformation and conspiracy theories throughout the pandemic.
Once you upload the GEXF file on Gephi, this is how the network will look:
You have to play around with filters, parameters, and layouts to visually explore your network and turn this amorphous mass of dots into a meaningful shape.
The program will also give some initial information about the network. In the @BillGates file, for example, there are 19,036 nodes (individual accounts) and 207,780 edges (connections between accounts) — a fairly high number of edges.
Let’s have a quick overview of the menu options in the software at the top:
- Overview is where you work on the network’s functionalities;
- Data Laboratory is a display of your dataset in a table format;
- Preview is where you can customize your final network visualization.
Here are a few quick actions to start investigating the network:
What is the average degree of connections?
The average degree will show the average number of connections between the nodes. Run ‘Average degree’ under Statistics on the right-hand side to receive a ‘Degree Report’ of the network.
The average number of connections between the accounts that mentioned @BillGates is 1.092. In other words, an account has, on average, mentioned one other account in this sample.
Which are the main nodes in the network?
You now want to discover which nodes have the highest number of connections. These could be either direct or undirected connections. For example, it could be interesting to see which Twitter accounts a divisive handle retweets the most, or the top accounts that mention a divisive account.
Change the size of the nodes in Gephi by clicking on ‘Ranking’ on the left and by choosing ‘Degree’ as an attribute. Set the minimum and maximum sizes, and the nodes’ dimensions will change according to their number of connections.
In the Bill Gates example, we are interested in knowing which accounts tag him the most (inward connections). To do that, choose the ‘out-degree’ attribute, which will increase the dimensions of the accounts that mentioned Gates the most.
To improve the network’s visibility, choose the layout Force Atlas 2, which uses an algorithm to expand the network and better visualize the communities that are taking shape, as shown below.
Are the actors connected to different communities?
The ‘modularity’ algorithm is often used to detect the number of clusters, or communities, within a network, by grouping the nodes that are more densely connected.
Hit the Run button next to Modularity under the Statistics panel. Doing so will yield a modularity report, which often has quite interesting details for analysis.
At this point, use the partition coloring panel to color the nodes based on modularity class and apply a palette of colors to them.
The next step is to learn something from the graph you have created. By right-clicking on a node, you can see its position in the Data Laboratory, along with the newly created columns displaying degrees and modularity value. While the visualisation helps looking at the global picture, here you can manually explore the data.
You can also apply text to the graph and visualize the names of the nodes.
In the preview tab, you can choose different options to visualize the network. You can add labels, change the background color, or play with sizes of nodes and edges.
If the graph is too crowded, use filters to show fewer nodes. For example, you can filter by number of in-degree or out-degree connections, depending on what you are interested in highlighting.
Remember that data visualizations are a great way to make complicated topics more accessible and engaging, but they can be misleading if they are not well-designed and easy to understand.
Make sure your network graph is visualizing exactly what you are trying to express. Add clear annotation layers to aid in reading it. And don’t try to use them at all costs. Sometimes a simpler chart that conveys its meaning more effectively might be worth thousands of nodes.
The US protests have shown just how easily our information ecosystem can be manipulated
Those who study misinformation are often asked to attribute misleading content to particular actors — whether foreign governments, domestic hoaxers, or hyper-partisan media outlets. Actors deliberately promoting disinformation should not be ignored; however, the recent US protests have demonstrated that focusing on specific examples of disinformation can fail to capture the complexity of what is occurring.
Following the killing of George Floyd, who died after three Minneapolis police officers kneeled on his handcuffed body, hundreds of thousands of demonstrators took to the streets around the world. At the center of their demands was justice for Black people who have died in police custody, and alternative criminal justice systems.
In the US, several accompanying narratives were heavily discussed online. Piles of bricks near protest sites had social media users across the ideological spectrum speculating as to their origin; false rumors spread that protesters in DC were targeted by an internet blackout. Elsewhere, images of police officers who joined marches or kneeled with protesters were distributed to and published by several news outlets uncritically and without examination of law enforcement’s motivation. These examples — along with the deeper investigations into the involvement of the white supremacist “Boogaloo” movement, or questions around what exactly is happening in the Seattle Autonomous Zone — all distracted from the reason for the protests.
When demonstrators began to organize in early June, one of the narratives debated online was the emphasis on “outside agitators” infiltrating the protests. Social media was full of posts that claimed undercover police officers, white nationalist militia members, or organized “antifa” members were responsible for instigating property damage or violence at the protests. Users posted photos of graffiti, claiming the wording was suspicious, and questioned whether it was intentionally placed to sow division.
But focusing on protest attendees who do not care about addressing police brutality distracted from the demands of organizers who do. The “outside agitators” here served a double purpose in attacking and undermining the protests. In the first instance, it defamed the protesters by attributing some of the violent behavior and destruction to their cause. And in the second, it distracted from that cause by turning attention away from the reasons behind it.
Misinformation researchers have coined the term “source hacking” to describe the process by which “media manipulators target journalists and other influential public figures to pick up falsehoods and unknowingly amplify them to the public.” The nature of the news cycle and the way news is reported mean many outlets could not avoid covering these narratives. Internal and external pressures, both financial and professional, would not allow it. And some of the accounts spreading these narratives, but by no means all, would have had malicious intent.
What these narratives demonstrated is that the story of the swirling misinformation surrounding the protests is not one with a central villain or organized network of insidious actors. Instead it is a story of how the modern information landscape, made up of news media, social media, and the people who consume media, is vulnerable to manipulation that influences the ways in which events are shaped and discussed. The “source hacking” that occurred in many of these instances was an organic side effect of the complex information landscape, rather than an intentional ploy. This is perhaps even more difficult for decision makers to navigate, and requires careful consideration on the part of news outlets and journalists to determine how to most effectively center the audience’s needs in what is reported.
In today’s lesson of, “The cops probably did this.”
Can you guys tell me what is wrong with this image and who could have done this? pic.twitter.com/BinKwnZYjm
— Mightykeef (@MightyKeef) June 1, 2020
After all of our social media monitoring during the protests, it is not possible to blame the “outside agitator” narrative on one bad actor. Our analysis is still ongoing, but as with any moment of shared online attention, bots and sock puppet accounts were very likely to have been pushing out content related to those narratives of protest infiltration. And journalistic mistakes were made: There are examples of outlets poorly framing or mis-contextualizing rumors, giving “outside actors” more legitimacy than the evidence indicated. But identifying insidious networks and media missteps is futile without a simultaneous examination of how our current information landscape is so easily influenced by these disturbances.
Social media platforms and their algorithms, editorial decision making, and determinations about what to post and share on an individual level all contribute to the visibility of certain narratives, and they work unintentionally in synchrony — often to undesirable results. For example, news outlets used valuable resources investigating a 75-year-old police brutality victim’s ties — or lack thereof — to “antifa,” thanks in large part to the promotion of this false rumor by President Donald Trump. This is just one example from the protests when news outlets spent many hours having to investigate and debunk claims from politicians, police authorities and video evidence from the streets. When newsrooms, particularly local newsrooms where staff are being laid off and furloughed, are focused on this type of work, they are less able to focus on stories that reflect the experiences and needs of their communities. And yet it is difficult to argue that topics exploding on social media are not newsworthy. The feedback loop between social media and traditional media is broken, and the protests exhibit how damaging that has become.
While the process of misinformation monitoring — locating a piece of false or misleading content and finding the originator of it — is still useful, and fact checkers play an important role in maintaining a healthy information ecosystem, what is becoming increasingly clear is that we must tackle the problem with misinformation from a macro perspective. It’s no longer enough to tackle each ‘atom’ of misinformation. Misleading narratives sometimes flourish in the modern information ecosystem because of a confluence of circumstances, not because of a well-executed plan. Mitigating misinformation whack-a-mole style will be ineffective if we do not address the infrastructural problems that define the way people receive, process, and share information with their own networks. For journalists, this means carefully examining the urge to report on and debunk specific pieces of misinformation. Any effort to do so should be balanced by robust solutions journalism, emphasizing social issues affected or manipulated by misinformation.
Historically, media criticism has focused on gatekeepers. Do legacy news outlets foster sources that understand the communities on which they report? Are they irresponsibly or unintentionally amplifying particular voices? These questions are still relevant, but the internet has dramatically widened the pool of who can disseminate information to the public, and media scholars must adjust their lens. As a result, it’s more difficult to understand the reasons behind poor story framing. An individual Twitter user sharing information about suspicious protest attendees or promoting the “outside agitators” narrative does not obfuscate the reason for the protest by itself. And yet, as part of a groundswell covered by prominent news outlets, their tweets likely contributed to that happening. The questions journalists need to ask are not only “who is responsible” or “how do we stop misleading narratives.” Now, perhaps more than ever, newsrooms need to think about how we ensure our audiences and journalists are prepared to navigate a media landscape so susceptible to gaming and manipulation.
Jacquelyn Mason, Diara J. Townes, Shaydanay Urbani, and Claire Wardle contributed to this report.
The psychology of misinformation: Why we’re vulnerable
The psychology of misinformation — the mental shortcuts, confusions, and illusions that encourage us to believe things that aren’t true — can tell us a lot about how to prevent its harmful effects. Our psychology is what affects whether corrections work, what we should teach in media literacy courses, and why we’re vulnerable to misinformation in the first place. It’s also a fascinating insight into the human brain.
Though psychological concepts originate in academia, many have found their way into everyday language. Cognitive dissonance, first described in 1957, is one; confirmation bias is another. And this is part of the problem. Just as we have armchair epidemiologists, we can easily become armchair cognitive scientists, and mischaracterization of these concepts can create new forms of misinformation.
If reporters, fact checkers, researchers, technologists, and influencers working with misinformation (which, let’s face it, is almost all of them) don’t understand these distinctions, it isn’t simply a case of mistaking an obscure academic term. It risks becoming part of the problem.
We list the major psychological concepts that relate to misinformation, its correction, and prevention. They’re intended as a starting point rather than the last word — use the suggested further reading to dive deeper.
This is the first in a three-part series. The second will be on the psychology of misinformation correction; the last on its prevention.
The psychological feature that makes us most vulnerable to misinformation is that we are ‘cognitive misers. We prefer to use simpler, easier ways of solving problems than ones requiring more thought and effort. We’ve evolved to use as little mental effort as possible.
This is part of what makes our brains so efficient: You don’t want to be thinking really hard about every single thing. But it also means we don’t put enough thought into things when we need to — for example, when thinking about whether something we see online is true
What to read next: “How the Web Is Changing the Way We Trust” by Dario Tarborelli of the University of London, published in Current Issues in Computing and Philosophy in 2008.
Dual process theory
Dual process theory is the idea that we have two basic ways of thinking: System 1, an automatic process that requires little effort; and System 2, an analytical process that requires more effort. Because we are cognitive misers, we generally will use System 1 thinking (the easy one) when we think we can get away with it.
Automatic processing creates the risk of misinformation for two reasons. First, the easier something is to process, the more likely we are to think it’s true, so quick, easy judgments often feel right even when they aren’t. Second, its efficiency can miss details — sometimes crucial ones. For example, you might recall something you read on the internet, but forget that it was debunked.
What to read next: “A Perspective on the Theoretical Foundation of Dual Process Models” by Gordon Pennycook, published in Dual Process Theory 2.0 in 2017.
Heuristics are indicators we use to make quick judgments. We use heuristics because it’s easier than conducting complex analysis, especially on the internet where there’s a lot of information.
The problem with heuristics is that they often lead to incorrect conclusions. For example, you might rely on a ‘social endorsement heuristic’ — that someone you trust has endorsed (e.g., retweeted) a post on social media — to judge how trustworthy it is. But however much you trust that person, it’s not a completely reliable indicator and could lead you to believe something that isn’t true.
As our co-founder and US director Claire Wardle explains in our Essential Guide to Understanding Information Disorder, “On social media, the heuristics (the mental shortcuts we use to make sense of the world) are missing. Unlike in a newspaper where you understand what section of the paper you are looking at and see visual cues which show you’re in the opinion section or the cartoon section, this isn’t the case online.”
What to read next: “Credibility and trust of information in online environments: The use of cognitive heuristics” by Miriam J. Metzger and Andrew J. Flanagin, published in Journal of Pragmatics, Volume 59 (B) in 2013.
Cognitive dissonance is the negative experience that follows an encounter with information that contradicts your beliefs. This can lead people to reject credible information to alleviate the dissonance.
What to read next: “‘Fake News’ in Science Communication: Emotions and Strategies of Coping with Dissonance Online” by Monika Taddicken and Laura Wolff, published in Media and Communication, Volume 8 (1), 206–217 in 2020.
Confirmation bias is the tendency to believe information that confirms your existing beliefs, and to reject information that contradicts them. Disinformation actors can exploit this tendency to amplify existing beliefs.
Confirmation bias is just one of a long list of cognitive biases.
What to read next: “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises” by Raymond Nickerson, published in Review of General Psychology, 2(2), 175–220 in 1998.
Motivated reasoning is when people use their reasoning skills to believe what they want to believe, rather than determine the truth. The crucial point here is the idea that people’s rational faculties, rather than lazy or irrational thinking, can cause misinformed belief.
Motivated reasoning is a key point of current debate in misinformation psychology. In a 2019 piece for The New York Times, David Rand and Gordon Pennycook, two cognitive scientists based at the University of Virginia and MIT, respectively, argued strongly against it. Their claim is that people simply aren’t being analytical enough when they encounter information. As they put it:
“One group claims that our ability to reason is hijacked by our partisan convictions: that is, we’re prone to rationalization. The other group — to which the two of us belong — claims that the problem is that we often fail to exercise our critical faculties: that is, we’re mentally lazy.”
Rand and Pennycook are continuing to build a strong body of evidence that lazy thinking, not motivated reasoning, is the key factor in our psychological vulnerability to misinformation.
What to read next: “Why do people fall for fake news?” by Gordon Pennycook and David Rand, published in The New York Times in 2019.
Pluralistic ignorance is a lack of understanding about what others in society think and believe. This can make people incorrectly think others are in a majority when it comes to a political view, when it is in fact a view held by very few people. This can be made worse by rebuttals of misinformation (e.g., conspiracy theories), as they can make those views seem more popular than they really are.
A variant of this is the false consensus effect: when people overestimate how many other people share their views.
What to read next: “The Loud Fringe: Pluralistic Ignorance and Democracy” by Stephan Lewandowsky, published in Shaping Tomorrow’s World in 2011.
The third-person effect describes the way people tend to assume misinformation affects other people more than themselves.
Nicoleta Corbu, professor of communications at the National University of Political Studies and Public Administration in Romania, recently found that there is a significant third-person effect in people’s perceived ability to spot misinformation: People rate themselves as better at identifying misinformation than others. This means people can underestimate their vulnerability, and don’t take appropriate actions.
What to read next: “Fake News and the Third-Person Effect: They are More Influenced than Me and You” by Oana Ștefanita, Nicoleta Corbu, and Raluca Buturoiu, published in the Journal of Media Research, Volume. 11 3 (32), 5-23 in 2018.
Fluency refers to how easily people process information. People are more likely to believe something to be true if they can process it fluently — it feels right, and so seems true.
This is why repetition is so powerful: if you’ve heard it before, you process it more easily, and therefore are more likely to believe it. Repeat it multiple times, and you increase the effect. So even if you’ve heard something as a debunk, the sheer repetition of the original claim can make it more familiar, fluent, and believable.
It also means that easy-to-understand information is more believable, because it’s processed more fluently. As Stephan Lewandowsky and his colleagues explain:
“For example, the same statement is more likely to be judged as true when it is printed in high- rather than low-color contrast … presented in a rhyming rather than non-rhyming form … or delivered in a familiar rather than unfamiliar accent … Moreover, misleading questions are less likely to be recognized as such when printed in an easy-to-read font.”
What to read next: “The Epistemic Status of Processing Fluency as Source for Judgments of Truth” by Rolf Reber and Christian Unkelbach, published in Rev Philos Psychol. Volume 1 (4): 563–581 in 2010.
Bullshit receptivity is about how receptive you are to information that has little interest in the truth; a meaningless cliche, for example. Bullshit is different from a lie, which intentionally contradicts the truth.
Pennycook and Rand use the concept of bullshit receptivity to examine susceptibility to false news headlines. They found that the more likely we are to accept a pseudo-profound sentence (i.e., bullshit) such as, “Hidden meaning transforms unparalleled abstract beauty,” the more susceptible we are to false news headlines.
This provides evidence for Pennycook and Rand’s broader theory that susceptibility to false news comes from insufficient analytical thinking, rather than motivated reasoning. In other words, we’re too stuck in automatic System 1 thinking, and not enough in analytic System 2 thinking.
What to read next: “Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking” by Gordon Pennycook and David Rand, published in Journal of Personality in 2019.
The importance of local context in taking on misinformation: Lessons from AfricaCheck
The coronavirus pandemic is, by definition, a global problem, and so is the wave of misinformation that has accompanied it. But that doesn’t mean that the form misinformation and disinformation take, or the ways we can combat it, are the same across the planet.
As part of a webinar series looking at information disorder in different regions, First Draft spoke with AfricaCheck chief editor Lee Mwiti and the organization’s founder, Peter Cunliffe-Jones, to explore the way the “infodemic” has manifested in countries across the continent.
As Cunliffe-Jones explained, the idea for AfricaCheck came from a much earlier public health crisis: “In the early 2000s, I was a reporter for AFP [Agence France-Presse] in Nigeria. There was a polio vaccination campaign. There was misinformation. And as a result the authorities in northern Nigeria put in place a ban on the vaccine, and polio cases, which had been declining, surged.
“I felt that as a reporter I hadn’t done my job. When a governor stands up and says, ‘I’m banning polio vaccination because it’s a Western plot to cause harm to Muslims,’ in this case, the media, we reported that. We didn’t investigate those claims. That’s a failure of the process of reporting. You see misinformation has real victims.”
Now eight years old, AfricaCheck has offices in Johannesburg, Nairobi, Dakar, and Lagos, employing 35 people, most of them journalists. But while its focus is the African continent, Mwiti says it has found its audience reflects the global nature of online misinformation. “We thought the audience would be mainly Africa, but it is anything but. … It is a global problem.”
😷 [ANALYSIS] Due to the #coronavirus outbreak, many countries now require people to wear face masks in public.
We examine how these regulations differ & whether it’s a punishable crime to be caught unmasked. https://t.co/rjac5B1YeK
— Africa Check (@AfricaCheck) May 22, 2020
The audience may be global, but that physical presence across different African countries is incredibly important to ensure that AfricaCheck’s journalists are able to spot what those based elsewhere might miss. “It’s an attempt to understand the local contexts. … Because this is where we find a lot of misinformation finds its roots,” says Mwiti. “Being physically present gives us a grasp of the everyday conditions in these countries and where to find misinformation.”
The organization is also working with community groups — “everyone from thespians to farmers,” Mwiti said — to better understand the problems and ensure it reaches beyond those who consume its online output.
One of the reasons Mwiti says it is important to have that local knowledge is that it aids understanding of why misinformation takes on specific characteristics and spreads in specific forms. For instance, decontextualized content, such as images shorn of additional information that gives the true picture, is more prevalent in Kenya than more labor intensive examples, such as deepfakes, because there is a “low barrier to entry.” There is also high unemployment in Kenya, and young people see making money online, including through spreading disinformation, as something on which to focus.
Similarly, Mwiti says AfricaCheck has found that conspiracy theories are particularly common in South Africa, something he puts down partly to racial and economic inequality exacerbating fears of elites and groups that are not part of a person’s own “in-group.”
“In South Africa it is a story of inequality. It has the highest inequality in the world. … There is underlying intergroup conflict, between different races, access to the economy, access to resources.”
“The thing with race is you have people identifying race. You have an in-group identity, and you look to uphold that identity. … Part of that then involves seeing threats from other groups who identify differently. That’s the element we find in South Africa. … Access to the economy is still very dependent on race. It’s a rich ground for conspiracy theories. South Africa is one of the few countries [where] we see this in the continent.”
That local knowledge is just as important to effectively combat misinformation, and part of that is understanding who is going to be the most compelling voice to counter it. “If you are going to use certain channels, make sure the people you are speaking to have trust in them,” says Mwiti.
Each week we’ll show you how to get your facts in order before you share something online.
Episode 1 starts with the basics: 5️⃣ questions to ask yourself before forwarding a WhatsApp message. pic.twitter.com/84ON462zKI
— Africa Check (@AfricaCheck) June 17, 2020
It also means trying a range of digital tools, in some cases following purveyors of misinformation onto the platforms where they are reaching the public. For instance, AfricaCheck runs WhatsCrap, an audio show collecting tips from the public about examples of misinformation, which are then addressed via voice messages. It currently has around 6,000 subscribers.
“There should be no one size fits all,” says Mwiti. “The solution should be targeted to the community.”
Accessing local or community-specific knowledge can also help journalists avoid the kind of stereotyping that is easy to fall into, such as the overuse of images of people from Asia wearing masks in the early days of the pandemic. The key, says Cunliffe-Jones, is having diverse teams.
“The strongest defense against stereotypes … is your colleague who says, ‘Well, it’s not like that, because I am from that part of the country, I am of that race, I am of that gender.’
“Fighting our own stereotypes, our own biases, is hard to do.”
Check out First Draft on YouTube to see all the recent webinars concerning coronavirus reporting.
It matters how platforms label manipulated media. Here are 12 principles designers should follow
Manipulated photos and videos flood our fragmented, polluted, and increasingly automated information ecosystem, from a synthetically generated “deepfake,” to the far more common problem of older images resurfacing and being shared with a different context.
While research is still limited, there is some empirical support to show visuals tend to be both more memorable and more widely shared than text-only posts, heightening their potential to cause real-world harm, at scale. Consider the numerous audiovisual examples in this running list of hoaxes and misleading posts about police brutality protests in the United States. In response, what should social media platforms do? Twitter, Facebook, and others have been working on new ways to identify and label manipulated media.
Label designs aren’t applied in a vacuum, as the divisive reactions to Twitter’s recent decision to label two of Trump’s misleading and incendiary tweets demonstrate. In one instance, the platform appended a label under a tweet that contained incorrect information about California’s absentee ballot process, encouraging readers to click through and “Get the facts about mail-in ballots.” In the other instance, Twitter replaced the text of a tweet referring to shooting of looters with a label that says: “This tweet violates the Twitter Rules about glorifying violence.” Twitter determined that it was in the public interest for the tweet to remain accessible and this label allowed users the option to click through to the original text.
These examples show that the world notices and reacts to the way platforms label content, including President Trump himself, who has directly responded to the labels. With every labeling decision, the ‘fourth wall’ of platform neutrality is breaking down. Behind it, we can see that every label — whether for text, images, video, or a combination — comes with a set of assumptions that must be independently tested against clear goals and transparently communicated to users. Each platform uses its own terminology, visual language, and interaction design for labels, with application informed respectively by their own detection technologies, internal testing, and theories of harm (sometimes explicit, sometimes ad hoc). The end result is a largely incoherent landscape of labels leading to unknown societal effects.
At the Partnership on AI (PAI) and First Draft, we’ve been collaboratively studying how digital platforms might address manipulated media with empirically-tested and responsible design solutions. Though definitions of “manipulated media” vary, we define it as any image or video with content or context edited along the “cheap fake” to “deepfake” spectrum with the potential to mislead and cause harm.
In particular, we’re interrogating the potential risks and benefits of labels: language and visual indicators that notify users about manipulation. What are best practices for digital platforms applying (or not applying) labels to manipulated media to reduce mis- and disinformation’s harms?
What we’ve found
To reduce mis/disinformation’s harm to society, we have compiled the following set of principles for designers at platforms to consider for labeling manipulated media. We also include design ideas for how platforms might explore, test, and adapt these principles for their own contexts. These principles and ideas draw from hundreds of studies and many interviews with industry experts, as well as our own ongoing user-centric research at PAI and First Draft.
Ultimately, labeling is just one way of addressing mis/disinformation: how does labeling compare to other approaches, such as removing or downranking content? Platform interventions are so new that there isn’t yet robust public data on their effects. But researchers and designers have been studying topics of trust, credibility, information design, media perception, and effects for decades, providing a rich starting point of human-centric design principles for manipulated media interventions.
1. Don’t attract unnecessary attention to the mis/disinformation
Although assessing harm from visual mis/disinformation can be difficult, there are cases where the severity or likelihood of harm is so great — for example, when the manipulated media poses a threat to someone’s physical safety — that it warrants an intervention stronger than a label. Research shows that even brief exposure has a “continued influence effect” where the memory trace of the initial post cannot be unremembered . With images and video especially sticky in memory due to “the picture superiority effect”  the best tactic for reducing belief in such instances of manipulated media may be outright removal or downranking.
Another way of reducing exposure may be the addition of an overlay. This approach is not without risk, however: the extra interaction to remove the overlay should not be framed in a way that creates a “curiosity gap” for the user, making it tempting to click to reveal the content. Visual treatments may want to make the posts less noticeable, interesting, and visually salient compared to other content — for example, using grayscale, or reducing the size of the content.
2. Make labels noticeable and easy to process
A label can only be effective if it’s noticed in the first place. How well a fact-check works depends on attention and timing: debunks are most effective when noticed simultaneously with the mis/disinformation, and are much less effective if noticed or displayed after exposure. Applying this lesson, labels need to be as or more noticeable than the manipulated media, so as to inform the user’s initial gut reaction , , , , .
From a design perspective, this means starting with accessible graphics and language. An example of this is how The Guardian prominently highlights the age of old articles with labels.
The risk of label prominence, however, is that it could draw the user’s attention to misinformation that they may not have noticed otherwise. One potential way to mitigate that risk would be to highlight the label through color or animation, for example, only if the user dwells on or interacts with the content, indicating interest. Indeed, Facebook appears to have embraced this approach by animating their context button only if a user dwells on a post.
Another opportunity for making labels easy to process might be to minimize other cues in contrast, such as social media reactions and endorsement. Research has found that exposure to social engagement metrics increases vulnerability to misinformation . In minimizing other cues, platforms may be able to nudge users to focus on the critical task at hand: accurately assessing the credibility of the media.
3. Encourage emotional deliberation and skepticism
In addition to being noticeable and easily understood, a label should encourage a user to evaluate the media at hand. Multiple studies have shown that the more one can be critically and skeptically engaged in assessing content, the more accurately one can judge the information , , , . In general, research indicates people are more likely to trust misleading media due to “lack of reasoning, rather than motivated reasoning” . Thus, deliberately nudging people to engage in reasoning may actually increase accurate recollection of claims .
One tactic to encourage deliberation, already employed by platforms like Facebook, Instagram, and Twitter, requires users to engage in an extra interaction like a click before they can see the visual misinformation. This additional friction builds in time for reflection and may help the user to shift into a more skeptical mindset, especially if used alongside prompts that prepare the user to critically assess the information before viewing content , . For example, labels could ask people “Is the post written in a style that I expect from a professional news organization?” before reading the content .
4. Offer flexible access to more information
During deliberation, different people may have different questions about the media. Access to these additional details should be provided, without compromising the readability of the initial label. Platforms should consider how a label can progressively disclose more detail as the user interacts with it, enabling users to follow flexible analysis paths according to their own lines of critical consideration .
What kind of detail might people want? This question warrants further design exploration and testing. Some possible details may include the capture and edit trail of media; information on a source and its activity; and general information on media manipulation tactics, ratings, and mis/disinformation. Interventions like Facebook’s context button can provide this information through multiple tabs or links to pages with additional details.
5. Use a consistent labeling system across contexts
On platforms it is crucial to consider not just the effects of a label for a particular piece of content, but across all media encountered — labeled and unlabeled.
Recent research suggests that labeling only a subset of fact-checkable content on social media may do more harm than good by increasing users’ beliefs in the accuracy of unlabeled claims. This is because a lack of a label may imply accuracy in cases where the content may be false but not labeled, known as “the implied truth effect” . The implied truth effect is perhaps the most profound challenge for labeling as it is impossible for fact-checkers to check all media posted to platforms. Because of this limitation in fact-checking at scale, fact-checked media on platforms will always be a subset, and labeling that subset will always have the potential to boost the perceived credibility of all other unchecked media, regardless of its accuracy.
Platform designs should, therefore, aim to minimize this “implied truth effect” at a cross-platform, ecosystem level. This demands much more exploration and testing across platforms. For example, what might be the effects of labeling all media as “unchecked” by default?
Further, consistent systems of language and iconography across platforms enable users to form a mental model of media that they can extend to accurately judge media across contexts, for example, by understanding that a “manipulated” label means the same thing whether it’s on YouTube, Facebook, or Twitter. Additionally, if a design language is consistent across platforms, it may help users more rapidly recognize issues over time as it becomes more familiar, and thus and easier to process and trust (see principle two, above).
6. Repeat the facts, not the falsehoods
It’s important not just to debunk what is false, but to ensure that users come away with a clear understanding of what is accurate. It’s risky to display labels that describe media in terms of the false information without emphasizing what is true. Familiar things seem more true, meaning every repetition of the mis/disinformation claim associated with a visual risks imprinting that false claim deeper in memory — a phenomenon known as “the illusory truth effect” or the “myth-familiarity boost” .
Rather than frame the label in terms of what is inaccurate, platforms should elevate the accurate details that are known. Describe the accurate facts rather than simply rate the media in unspecific terms while avoiding repeating the falsehoods , , . If it’s possible, surface clarifying metadata, such as the accurate location, age, and subject matter of a visual.
7. Use non-confrontational, empathetic language
The language on a manipulated media label should be non-confrontational, so as not to challenge an individual’s identity, intelligence, or worldview. Though the research is mixed, some studies have found that in rare cases, confrontational language risks triggering the “backfire effect,” where an identity-challenging fact-check may further entrench belief in a false claim , . Regardless, it helps to adapt and translate the label to the person. As much as possible, meet people where they are by framing the correction in a way that is consistent with the users’ worldview. Highlighting accurate aspects of the media that are consistent with their preexisting opinions and cultural context may make the label easier to process , .
Design-wise, this has implications for the language of the label and label description. While the specific language should be tested and refined with audiences, language choice does matter. A 2019 study found that when adding a tag to false headlines on social media, the more descriptive “Rated False” tag was more effective at reducing belief in the headline’s accuracy than a tag that said “Disputed” .
Additionally, there is an opportunity to build trust by using adaptive language and terminology consistent with other sources and framings of an issue that a user follows and trusts. For example, “one study found that Republicans were far more likely to accept an otherwise identical charge as a ‘carbon offset’ than as a ‘tax,’ whereas the wording has little effect on Democrats or Independents (whose values are not challenged by the word ‘tax;’ Hardisty, Johnson, & Weber, 2010)” .
8. Emphasize credible refutation sources that the user trusts
The effectiveness of a label to influence user beliefs may depend on the source of the correction. To be most effective, a correction should be told by people or groups the user knows and trusts , , rather than a central authority or authorities that the user may distrust. Research indicates that when it comes to the source of a fact-check, people value trustworthiness over expertise — that means a favorite celebrity or YouTube personality might be a more resonant messenger than an unfamiliar fact-checker with deep expertise.
If a platform has access to correction information from multiple sources, the platform should highlight sources the user trusts, for example promoting sources that the user has seen and/or interacted with before. Additionally, platforms could highlight related, accurate articles from publishers the user follows and interacts with, or promote comments from friends that are consistent with the fact-check. This strategy may have the additional benefit of minimizing potential retaliation threats to publishers and fact-checkers — ie preventing users from personally harassing sources they distrust.
9. Be transparent about the limitations of the label and provide a way to contest it
Given the difficulty of ratings and displaying labels at scale for highly subjective, contextualized user-generated content, a user may have reasonable disagreements with a label’s conclusion about a post. For example, Dr. Safiya Noble, a professor of Information Studies at UCLA, and author of the book “Algorithms of Oppression,” recently shared a Black Lives Matter-related post on Instagram that she felt was unfairly flagged as “Partly False Information.”
I was trying to post on IG that folks should not put up the Tuesday tag alongside the BLM tag and I just grabbed an image and reposted. And then it got flagged in 15 min. #algorithmsofoppression pic.twitter.com/dkALdH2fG9
— Safiya Umoja Noble PhD (@safiyanoble) June 3, 2020
As a result, labels should offer swift recourse to contest labels and provide feedback if users feel labels have been inappropriately applied, similar to interactions for reporting harmful posts.
Additionally, platforms should share the reasoning and process behind the label’s application. If an instance of manipulated media was identified and labeled through automation, for example, without context-specific vetting by humans, that should be made clear to users. This also means linking the label to an explanation that not only describes how the media was manipulated, but who made that determination and should be held accountable. Facebook’s labels, for example, will say that the manipulated media was identified by third-party fact-checkers, and clicking the “See Why” button opens up a preview of the actual fact check.
10. Fill in missing alternatives with multiple visual perspectives
Experiments have shown that a debunk is more effective at dislodging incorrect information from a person’s mind when it provides an alternative explanation to take the place of the faulty information . Labels for manipulated media should present alternatives that preempt the user’s inevitable question: what really happened then?
Recall that the “picture superiority effect” means images are more likely to be remembered than words. If possible, fight the misleading visuals with accurate visuals. This can be done by displaying multiple photo and video perspectives to visually reinforce the accurate event . Platforms could also explore displaying “related images,” for example by surfacing reverse image search results, to provide visually similar images and videos possibly from the same event.
11. Help users identify and understand specific manipulation tactics
Research shows that people are poor at identifying manipulations in photos and videos: In one study measuring perception of photos doctored in various ways, such as airbrushing and perspective distortion, people could only identify 60 per cent of images as manipulated. They were even worse at telling where exactly a photo was edited, accurately locating the alterations only 45 per cent of the time .
Even side-by-side presentation of manipulated and original visuals may not be enough without extra indications of what has been edited,. This is due to “change blindness,” a phenomenon where people fail to notice major changes to visual stimuli: “It often requires a large number of alternations between the two images before the change can be identified…[which] persists when the original and changed images are shown side by side” .
These findings indicate that manipulations need to be highlighted in a way that is easy to process in comparison to any available accurate visuals , . Showing manipulations beside an unmanipulated original may be useful especially when accompanied by annotated sections of where an image or video has been edited, and including explanations of manipulations to aid future recognition.
In general, platforms should provide specific corrections, emphasizing the factual information in close proximity to where the manipulations occur — for example, highlighting alterations in context of the specific, relevant points of the video . Or, if a video clip has been taken out of context, show how. For example, when an edited video of Bloomberg’s Democratic debate performance was misleadingly cut together with long pauses and cricket audio to indicate that his opponents were left speechless, platforms could have provided access to the original unedited debate clip and describe how and where the video was edited to increase user awareness of this tactic.
12. Adapt and measure labels according to the use case
In addition to individual user variances, platforms should consider use case variances: interventions for YouTube, where users may encounter manipulated media through searching specific topics or through the auto-play feature, might look different from an intervention on Instagram for users encountering media through exploring tags, or an intervention on TikTok, where users passively scroll through videos. Before designing, it is critical to understand how the user might land on manipulated media, and what actions the user should take as a result of the intervention.
Is the goal to have the user click through to more information about the context of the media? Is it to prevent users from accessing the original media (which Facebook has cited as its own metric for success)? Are you trying to reduce exposure, engagement, or sharing? Are you trying to prompt searches for facts on other platforms? Are you trying to educate the user about manipulation techniques?
Finally, labels do not exist in isolation. Platforms should consider how labels interact with other contexts and credibility indicators around a post (eg social cues, source verification). Clarity about the use and goals are crucial to designing meaningful interventions.
These principles are just a starting point
While these principles can serve as a starting point, they are no replacement for continued and rigorous user-experience testing across diverse populations, tested in actual contexts of use. Much of this research was conducted in artificial contexts that are not ecologically valid, and as this is a nascent research area, some principles have been demonstrated more robustly than others.
Given this, there are inevitable tensions and contradictions in the research literature around interventions. Ultimately, every context is different: abstract principles can get you to an informed concept or hypothesis, but the design details of specific implementations will surely offer new insights and opportunities for iteration toward the goals of reducing exposure to and engagement with mis/disinformation.
To move forward in understanding the effects of labeling requires that platforms publicly commit to the goals and intended effects of their design choices; and share their findings against these goals so as to be held accountable. For this reason, PAI and First Draft are collaborating with researchers and partners in industry to explore the costs and benefits of labeling in upcoming user experience research of manipulated media labels.
Stay tuned for more insights into best practices around labeling (or not labeling) manipulated media, and get in touch to help us thoughtfully address this interdisciplinary wicked problem space together.
If you want to speak to us about this work, email [email protected] or direct message us on Twitter. You can stay up to date with all of First Draft’s work by becoming a subscriber and following us on Facebook and Twitter.
- Swire, Briony, and Ullrich KH Ecker. “Misinformation and its correction: Cognitive mechanisms and recommendations for mass communication.” Misinformation and mass audiences (2018): 195–211.
- The Legal, Ethical, and Efficacy Dimensions of Managing Synthetic and Manipulated Media https://carnegieendowment.org/2019/11/15/legal-ethical-and-efficacy-dimensions-of-managing-synthetic-and-manipulated-media-pub-80439
- Cook, John, and Stephan Lewandowsky. The debunking handbook. Nundah, Queensland: Sevloid Art, 2011.
- Infodemic: Half-Truths, Lies, and Critical Information in a Time of Pandemics https://www.aspeninstitute.org/events/infodemic-half-truths-lies-and-critical-information-in-a-time-of-pandemics/
- The News Provenance Project, 2020 https://www.newsprovenanceproject.com/
- Avram, Mihai, et al. “Exposure to Social Engagement Metrics Increases Vulnerability to Misinformation.” arXiv preprint arXiv:2005.04682 (2020).
- Bago, Bence, David G. Rand, and Gordon Pennycook. “Fake news, fast and slow: Deliberation reduces belief in false (but not true) news headlines.” Journal of experimental psychology: general (2020).
- Pennycook, Gordon, and David G. Rand. “Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning.” Cognition 188 (2019): 39–50.
- Lutzke, L., Drummond, C., Slovic, P., & Árvai, J. (2019). Priming critical thinking: Simple interventions limit the influence of fake news about climate change on Facebook. Global Environmental Change, 58, 101964.
- Karduni, Alireza, et al. “Vulnerable to misinformation? Verifi!.” Proceedings of the 24th International Conference on Intelligent User Interfaces. 2019.
- Metzger, Miriam J. “Understanding credibility across disciplinary boundaries.” Proceedings of the 4th workshop on Information credibility. 2010.
- Nyhan, Brendan. Misinformation and fact-checking: Research findings from social science. New America Foundation, 2012.
- Wood, Thomas, and Ethan Porter. “The elusive backfire effect: Mass attitudes’ steadfast factual adherence.” Political Behavior 41.1 (2019): 135–163.
- Nyhan, Brendan, and Jason Reifler. “When corrections fail: The persistence of political misperceptions.” Political Behavior 32.2 (2010): 303–330.
- Clayton, Katherine, et al. “Real solutions for fake news? Measuring the effectiveness of general warnings and fact-check tags in reducing belief in false stories on social media.” Political Behavior (2019): 1–23.
- Badrinathan, Sumitra, Simon Chauchard, and D. J. Flynn. “I Don’t Think That’s True, Bro!.”
- Ticks or It Didn’t Happen: Confronting Key Dilemmas In Authenticity Infrastructure For Multimedia, WITNESS (2019) https://lab.witness.org/ticks-or-it-didnt-happen/
- Nightingale, Sophie J., Kimberley A. Wade, and Derrick G. Watson. “Can people identify original and manipulated photos of real-world scenes?.” Cognitive research: principles and implications 2.1 (2017): 30.
- Shen, Cuihua, et al. “Fake images: The effects of source, intermediary, and digital media literacy on contextual assessment of image credibility online.” new media & society 21.2 (2019): 438–463.
- Diakopoulos, Nicholas, and Irfan Essa. “Modulating video credibility via visualization of quality evaluations.” Proceedings of the 4th workshop on Information credibility. 2010.
First Draft recognizes Black lives matter
As an organization that supports the work of newsrooms globally, First Draft recognizes that Black lives matter, and that journalism is not a crime. In this moment, we want to acknowledge the particular experiences of Black journalists who have to live daily with the impact of systemic racism and now must also contend with the degradation of their profession. The events of the past two weeks underline the critical importance of eyewitness media, and the damage that can be done when harmful misinformation flourishes and feeds narratives that stoke division. Rigorous verification has never been more important and so we are committed to offering as much training as possible, so that every journalist feels equipped to identify, verify and effectively report on misinformation.
How to talk to family and friends about that misleading WhatsApp message
Combatting misinformation about coronavirus can be thought of in a similar way to the test-and-trace plan to stop the spread of the virus itself. Test a claim for accuracy, trace it back to the source, and isolate it to stop it spreading further.
On messaging apps such as WhatsApp, however, tracing is almost impossible. Memes, posts, videos and audio clips are forwarded to contacts and chat groups with a tap or swipe and no easy way to see how the content might have travelled between communities.
This is why it is important to talk to our contacts, especially those closest to us, about misinformation they might have shared.
India has a history of violent incidents sparked by rumours or accusations on WhatsApp, used by 400 million people across the country. Some 52 per cent of Indian respondents surveyed for the 2019 Digital News Report said they get their news from WhatsApp, compared to just four per cent in the United States.
First Draft spoke to Pratik Sinha, editor and co-founder of Indian fact-checking organization Alt News, on how best to talk to friends and family about something false they might have shared on the messaging app.
Misleading messaging peddling false preventives and cures like hot weather and ‘gaumutra’ (cow urine) are viral in India. Screenshots by author.
Don’t shame your loved ones
The last thing you want to do is turn this into an ugly confrontation. As Sinha says, “Very often we end up getting into a conflict situation and then we see that one side doesn’t want to listen to the other side.”
Replying to the message and calling it out in public could shame the person who shared the claim, potentially making them double down on their views. A private message focusing on their motivations rather than the content is more likely to work. Ask them who they received the message from, if they know where it originated, and why they decided to pass it on to you.
The pandemic has created levels of uncertainty and anxiety that are not easy to handle. It is in this climate that your loved ones are sharing things, not only because they want to spread a message, but also because they are afraid. We’re all afraid. Recognize this in your response and put yourself in their shoes.
In an interview with Canada’s CBC News, Claire Wardle, co-founder of First Draft and a recognized expert in misinformation, said that reacting emotionally and taking a tone of “you’re wrong and I’m right” does not work. If anything, it strengthens the other person’s views, and pushes the two of you further apart. Approaching this with a “we’re all in this together” attitude is advisable.
As tempting as it is to ignore the message, this sends the wrong signal — that you accept false or misleading content in your inbox. We all have a responsibility to call out our contacts, especially those closest to us, for spreading a false message. In the age of the coronavirus, discouraging your friends from sending these untruths any further could be the difference between life and death, as untested “cures” and “remedies” are flooding social media.
Do your research, and check established fact-checking organisations like Alt News or WebQoof in India, or one of their fellow signatories of the Poynter Institute’s IFCN code of principles to ensure a message’s authenticity.
Platforms have made it extremely easy to forward messages — it only takes two taps on WhatsApp — which is why this behavior has become part of our digital culture. The platforms are introducing measures to prevent the spread of false information; but we can all play a role in calling it out.
As Wardle told First Draft, there was once a time when you would simply hope for the best when your drunk friend decided to drive home. Now, you take away his car keys and make sure he gets back safely with someone else.
Don’t expect immediate change
No one forms or changes their views overnight, so don’t expect your loved ones to do so. The process takes courteousness and patience. The more you politely challenge them, the more likely they are to think about the things they share and to question the source.
Speaking up clearly, using concise language and providing sources works, according to a 2016 study on misinformation around research into the 2015-16 Zika virus outbreak.
Misleading messages with an audio clip falsely attributed to Dr. Devi Shetty and a ‘coronavirus vaccine’ were popular on WhatsApp. Screenshots by author.
Misinformation is often spread with an agenda in mind. India is a strong example of this; misinformation has thrived in the country’s WhatsApp groups and messages for several years and could hold lessons on how it can be countered in other parts of the world.
Sinha points out that the situation in India is a product of its polarized politics, and the problem is that misinformation often comes from the top — government officials and political parties. This makes it all the more difficult to challenge, and thus requires more patience. “It is a long process [to change one’s mind],” he says.
Misinformation has a very real impact. False cures, unscientific preventative measures, and conspiracy theories abound on WhatsApp, and all have a tangible influence. From Indians drinking cow urine to Americans consuming cleaning liquid in hope of curing the virus, the misinformation has proven as dangerous and disruptive as the virus itself. Breaking the chain of misleading messages on WhatsApp and elsewhere is a small but crucial step everyone can take.
Find out more about closed messaging groups, as well as online ads and Facebook groups in our recent Essential Guide.
Covering coronavirus: Privacy and security tips for journalists
As the coronavirus outbreak brings with it a surge in online misinformation and conspiracy theories, and with a number of elections just around the corner, there has never been a better time to brush up on digital investigative skills. But whether you’re looking into certain closed groups or contacting sensitive sources, it’s critical to implement privacy and security measures to stay safe online.
First Draft’s co-founder and US Director, Claire Wardle, spoke to Christopher Dufour on May 7 for a webinar about privacy and security tips when reporting on coronavirus. Dufour is a noted national security consultant, disinformation researcher, and speaker.
As the digital landscape is constantly changing, the aim is to provide best practices and good habits that can be used to make decisions about your privacy and security online. Read on for the key takeaways from the session, watch the full webinar below or on our YouTube channel.
‘Hardening’ your browser
As a primary window onto the online world, it’s crucial to think of how your browser is set up. As Dufour highlighted, “most identity security measures are invalidated by bad browser habits.” While Dufour’s session did not focus on specific tools or software, he did emphasize that Internet Explorer was an “absolute no-go.” (It’s also no longer recommended as a default browser by Microsoft.)
Taking the time to go through the privacy and security settings on your browser is the first step to increasing online security. For a guide to “hardening” a browser, Dufour went through the setup of his own installation of Firefox during the webinar, which you can follow here. Similar measures also apply to other browsers, including Chrome and Safari.
Also central to any journalistic investigation is a search engine. On this, Dufour explained:
“Not all search engines are created equal — almost all of them exist for free because they log your search habits.
“They want to understand the keywords you’re using so that they can serve relevant ads or sell that behavior someplace.”
As such, you could opt for search engines like DuckDuckGo, which bills itself as “privacy focused.” But even then, Dufour said he would not recommend a particular search engine.
“Everyone should use what they think is great, and have that baked into your browser.”
Your data is the target
The main way journalists become victims of online attacks is through data leaks, so it’s important to be aware of your digital trail.
“We’re living in an unprecedented time where the sharing of personal information and data has moved so quickly that it is really difficult for us to wrap our minds around all the places where we have leaked information,” Dufour said.
As such, the best place to start is by trying to understand the data you leave in your wake.
A pro tip from Dufour is to “map your footprint.” He encourages journalists to write down all the email addresses and phone numbers and the places they may have been used for any online activity: apps, social media, loyalty programs, etc.
From there, log into each place and perform a security audit. Is two-factor authentication switched on? Is there old data that can be deleted? Are all the settings restricted?
If you’re no longer using the service, delete the account, he said.
Defining your digital identity
Whether you’re using LinkedIn to approach expert sources for your story, Facebook to keep up with friends, or Twitter to connect with other journalists, you’re not going to necessarily be the same person on different platforms.
“As journalists, we have a responsibility to have a public-facing profile,” Dufour said. “We need to be associated with our journalism organization, we also need the public to trust us and reach out to us in a public way.”
He recommended going back to basics with this course from the Berkman Klein Center for Internet & Society at Harvard University. Although aimed at high school students, it contains a number of steps to help individuals think critically about their online presence, as well as practical tips, such as help changing privacy settings on social media accounts.
“Think about ways to rewrite your identity in a way that’s helpful for you in all the ways you want to represent yourself online,” Dufour said.
A recurrent theme was the idea of “trade-offs.” Having all the privacy and security settings activated on a browser will, for instance, mean that you might not be able to access certain websites and pages. As such, the weighing up these trade-offs is an integral part of online reporting.
Dufour offered some words of advice. First, if you work in an organization that has such a team, “work with your IT people, not against them.”
“If you work for a company that has resources, dedicated IT security managers, they probably already have policies involved,” he explained.
“Get to know what those policies are, so that they can form in some way with your personal risk tolerance for how you want to use the company’s devices, accounts that they provide and those types of things.”
A frequent trade-off is security and privacy versus convenience. A rule of thumb: “If it’s less convenient, it’s usually more secure,” he said.
“If it’s more convenient, then usually someone is making it more convenient to pull information out.”
People do, however, need some element of convenience, Dufour acknowledged. This is why best practices and good habits should be at the heart of online reporting.
Dufour offered some “dos and don’ts” on browsing habits as well as some digital security tips.
“All this takes is a little bit of time,” said Dufour.
“One of the things that I had to learn myself in doing this is: ‘Hey, if I stop looking at pictures of laughing cats all day, and just sat down and got really serious about doing this I would build better habits so that I can return to looking at laughing cats — just in a more secure way.’”
Ethical questions for covering coronavirus online: Key takeaways for journalists
The coronavirus outbreak has brought with it a number of new ethical challenges for newsrooms and reporters.
Social media users are sharing their experiences with the virus which can have real and damaging impact on their lives. There has been an uptick in conspiracy theories and misinformation as the world attempts to follow the ever-evolving science around the virus. And much of the misleading information is flourishing in private spaces that are difficult for researchers and reporters to monitor and even harder to trace.
First Draft’s co-founder and US director Claire Wardle and ethics and standards editor Victoria Kwan discussed the tricky ethical terrain that is covering coronavirus in a recent webinar session on May 14. Read on for key takeaways from the session, watch the full webinar below or on our YouTube channel.
Traditional journalistic ethics still apply online
Facebook’s pivot to privacy last year saw the platform recognise a growing preference from users to communicate one-to-one or with limited groups of friends online.
“We’ve seen people move into smaller spaces with people that they trust, people that they know, people that believe similar things to them,” said Claire Wardle.
But this has had important implications for those covering and researching misinformation during the coronavirus outbreak. While it’s crucial to be aware of and publicly counter rumours and hoaxes shared in private spaces, there are a number of ethical questions reporters and newsrooms should set out to answer before entering them.
In many ways, these considerations aren’t so different to those commonly found in traditional journalism: balancing privacy versus public interest, as well as security versus transparency.
“If we’re thinking about reporting from these spaces, what does it mean if ultimately these people believe that they are in a space that is private?” asked Dr Wardle.
“If you can join that group, you can get access to that information and as ever with journalism, it’s that tension with privacy versus public interest.”
Sometimes the public interest case is clear: gaining knowledge from a group discussing future anti-lockdown demonstrations could be of vital importance to the health and safety of the public. In contrast, a closed community where individuals are sharing their experiences of being affected by Covid-19 may not pass the same public interest test.
This also brings up questions of security versus transparency. “For certain types of journalistic investigations, you have to consider this as if you were going undercover,” said Wardle.
“Of course, wherever possible, be transparent but also think about your own security.”
And the burden of considering these questions should not rest solely on the individual reporter — it necessitates a top-down approach from newsrooms, according to Dr Wardle:
“The things we’re talking about today really need to be enshrined as part of a newsroom’s editorial guidelines.”
Adopt a people-first approach
The coronavirus outbreak is a news story with a profound impact on people’s lives. For this reason it is crucial that journalists covering aspects of the crisis adopt a people-first approach, even when venturing online in their reporting.
When seeking to use online testimony from users in reporting, First Draft’s ethics and standards editor Victoria Kwan highlighted the importance of considering the intent of the person posting — especially when they’re not a public figure.
“Sometimes people don’t realise that their posts are public, and sometimes they do but they haven’t thought through the potential consequences,” she said.
Giving the example of a nurse who published a Facebook post about lack of PPE and tagged a number of media outlets in the caption, Kwan illustrated that there are instances where the poster clearly wishes to get publicity. But, she added, even in these examples it’s vital to talk through the potential impacts of your coverage with the individual, outlining the benefits and drawbacks of receiving media attention.
The theme of adopting a people-first approach is also an important aspect of covering conspiracy theories, which have experienced a considerable boost since the beginning of the crisis according to researchers.
For this, Dr Wardle pointed to the human psychology behind such theories.
“When people feel powerless and vulnerable, it makes them much more susceptible to conspiracy theories,” she said.
Newsrooms and journalists should avoid using derogatory language when referring to conspiracy theories and the people who share them online. Prioritising empathetic reporting is key, she said. Reporters should seek to explain why certain theories take root, rather than simply debunking them.
“As societies, we need to have more conversations around why conspiracies are taking off, what is it in society that makes them so appealing, and to talk about the psychological mechanics of conspiracies rather than: ‘these people are crazy, that’s stupid, let’s debunk the stupidity,’” said Dr Wardle.
Be aware of the ‘tipping point’
From false links between 5G and the coronavirus to conspiracy theories around a potential vaccine, a huge ethical consideration for journalists is to avoid amplifying dangerous or misleading information.
Wardle cited the recent case of the viral “Plandemic” video which contained a number of falsehoods. While initially some newsrooms avoided reporting on it for fears of “giving it oxygen”, the video eventually gained so much traction that reporters realised it needed mainstream coverage.
It’s not always straightforward to decide when to cover mis- or disinformation. For this, Dr Wardle has coined the idea of the “tipping point”. On a case-by-case basis, there are a number of questions to ask before covering misleading content:
The idea of the tipping point is also key when investigating closed groups or messaging apps, where it may be difficult — sometimes impossible — to retrieve data about the spread of a piece of misinformation. How can you quantify the spread of a rumour when it leaves no trace?
Dr Wardle advised looking for examples of whether the content is being shared to other platforms as a way of determining whether or not it is being circulated widely.
Finally, she warned that sometimes mis- and disinformation is deliberately created by certain actors with the aim of reaching coverage from mainstream news outlets.
“Often things are seeded within closed messaging apps with the hope that it will then grow, move into social media and then be picked up,” said Dr Wardle.
When it comes to scientific information, WhatsApp users in Argentina are not fools
This study was carried out by the author, María Celeste Wagner (University of Pennsylvania, USA), Eugenia Mitchelstein (Universidad de San Andrés, Argentina), and Pablo J. Boczkowski (Northwestern University, USA), acting as external consultant.
Do people believe the information they receive through messaging apps? Does the identity of the sender influence the credibility of that information? When do people feel more inclined to share content they have received? In the midst of the ongoing pandemic, when rampant misinformation has exploded on social media and closed messaging services, these questions are more pertinent than ever.
Evidence from a December 2019 online experiment in Argentina, where respondents read and then responded to the quality of news around vaccines and climate change, could contribute to answering some of these questions.
Contrary to narratives underscoring the persuasive power of some misinformation, evidence from our experiment suggests that Argentine audiences — at least — are good at differentiating facts from falsehoods and do not blindly accept information from personal contacts.
In other words, we observed that many people were highly critical when reading news about various scientific topics. This could be partly related to a generalized skepticism in Argentina and distrust of information that is thought to be sent by personal contacts.
- Contrary to widely held beliefs, participants were more distrustful of news they believed had been sent by a personal contact than of news where they lacked information about the sender.
- Participants were more distrustful of stories about vaccines and climate change which turned out to be false than they were of factually true stories on the same subject.
- It is unclear how effective fact-checking labels are in correcting for misinformation.
In recent years, there has been growing scholarly and journalistic attention given to issues relating to mis- and disinformation. A large part of this work has focused on analyzing how false content flows online, mostly during political or electoral processes. It has also centered on how political disinformation could change opinions and, eventually, voting preferences. Although valuable, most of this work has focused primarily on the Global North, and on two platforms in particular — Facebook and Twitter.
We sought to counter this dominant trend by examining how people from a country in the Global South (in this case Argentina) consume misinformation on health and the environment on messaging apps.
To do this we conducted an online survey with 1,066 participants in December 2019. The survey was representative of the Argentine population in terms of socioeconomic status, age and gender.
During the online survey, participants were shown news stories about vaccines and global warming and asked to imagine a situation in which they had received these news stories on WhatsApp either from a relative, a colleague or a friend. Half of our participants read four real news stories about the effectiveness of flu and measles vaccines, the roles of human activity in climate change and of the role of carbon dioxide in global warming.
Factually correct stories ran with the following headlines:
- Vacunarse anualmente previene la gripe y sus complicaciones (Annual vaccination prevents the flu and its complications)
- Si bien el dióxido de carbono compone una parte pequeña de la atmósfera es una de las causas del calentamiento global (Although carbon dioxide makes up a small part of the atmosphere, it is one of the causes of global warming)
- El calentamiento global de los últimos 70 años se debe en gran medida a la actividad humana (Global warming over the past 70 years is largely due to human activity)
- El continente americano ya no es más una región libre de sarampión (The American continent is no longer a measles-free region)
The other half of the participants read false news that had been circulating online in Argentina about these same topics. In order to make everything as comparable as possible, we only changed minimal aspects of the false and real news stories.
False stories ran with the following headlines:
- Vacunarse anualmente no previene la gripe y sus complicaciones (Annual vaccination does not prevent the flu and its complications)
- El dióxido de carbono compone una parte muy pequeña de la atmósfera y por ende no puede ser una de las causas del calentamiento global (Carbon dioxide makes up a very small part of the atmosphere and therefore cannot be one of the causes of global warming)
- El calentamiento global de los últimos 70 años se debe en gran medida a causas naturales (Global warming of the past 70 years is largely due to natural causes)
- El continente americano sigue siendo una región libre de sarampión (The American continent remains a measles-free region)
The assignment of these two factors — the person who was sharing the news, and the veracity of the news — was random. All participants judged the veracity and credibility of each news story they had read and expressed how willing they would be to share each news item.
At the end of the study, those participants exposed to false stories were notified that they had been shown false content and the factually correct stories were then shared with them.
Credibility, trust, and sharing
We found that people who had read false news stories about vaccines and global warming before being told they were false assessed them as less credible and less truthful than those who read real news stories about those same topics.
In other words, participants distrusted stories with misinformation about vaccines and climate change more than those comparable stories with true information. They also expressed less willingness to share stories with misinformation in comparison to the news stories that were true. These findings were especially positive given the current concern around the spread of Covid-related falsehoods and rumors.
Our study also showed that those reading stories containing misinformation were significantly less likely to believe that others would trust those stories than those reading accurate news. This suggests that participants extended their critical stance to others. In addition, people showed a general level of distrust towards their contacts. For example, having information about the person who forwarded a story significantly lowered both the credibility of the message and the receiver’s willingness to share it, compared to a control group that lacked information about the source. In other words, people were more distrustful of news they believed had been sent by a personal contact than from news where they lacked information about the sender.
We found no significant gender differences. Receiving news from either a female or male relative, friend or colleague did not result in any significant differences among our measured outcomes of credibility and sharing. Furthermore, telling participants any information about a specific source that sent them the message — a relative, a colleague or a friend — increased their later perceptions that other friends, colleagues or relatives would get upset if the participant were to forward them the news. Overall this suggests that participants do not seem to blindly trust their personal contacts. On the contrary, they are distrustful of what people might send them regardless of the relation.
We also tested the effect of fact-checking labels on evaluating the accuracy of information. After reading false content on global warming and vaccines, half of the participants who had been exposed to and had evaluated these false stories were told that fact checkers had determined the content to be false. At the end of the survey, after a series of distracting measures, all participants were asked to assess the veracity of a number of statements, some of which related to the vaccine and global warming topics they had read about earlier.
We found evidence suggesting that reading false news does have negative effects on participants’ ability to accurately assess information in those stories. Those who read factually correct news about measles vaccines and the role of carbon monoxide in global warming were able to more accurately assess the validity of statements related to those topics than those who had been exposed to false stories. This suggests that reading false news seems to have a negative effect on how people assess information. We did not find significant effects of this in the other two stories: those related to the flu vaccine and the human role in global warming.
We also found discouraging results around the efficiency of fact-checking labels. A fact-checking label would be efficient if someone reading a false story is then told the story they read was deemed false by fact checkers and they then adjusted their position over time by remembering that the information they had read was inaccurate.
However, we found this to be the case only for the news story about the impact of humans on global warming. Those participants who read a fact-checking label were later able to more accurately assess information related to that issue than those people who had been exposed to a false story without the fact-checking label.
When it came to stories about measles vaccines, flu vaccines and carbon monoxide, we found no significant difference on later accuracy assessment between those participants who read a story labeled false and those who read a story without a label. These findings might be due to features that are specific to our study design. It could also be the case that fact-checking labels which do not correct misinformation with accurate information are less reliable, or that receiving a fact-checking label confuses participants in a way that makes it harder to cognitively correct for misinformation later on.
In summary, the Argentine public rated false content on vaccines and the environment as significantly less credible, and was less likely to share false information than true information. Including data on the identity of the sender — friend, family member, colleague — decreased both the credibility and willingness to share the stories.
Despite narratives on the destabilizing power of misinformation on social media during democratic political processes and global events, such as the current Covid-19 pandemic, our findings contribute to the current debate by showing that in a Global South context, even when false news seems to have some negative impact in subsequent opinion-formation, Argentines are good at identifying false information and are wary of the information their personal contacts share with them on messaging apps.
This research was independently designed by the research team, but was supported by a grant from WhatsApp.
How to analyze Facebook data for misinformation trends and narratives
As coronavirus misinformation continues to spread, knowing some basic methods for collecting and analyzing data is essential for journalists and researchers who want to dive under the surface of online information patterns.
There is a mountain of data that can help us examine topics such as the spread of 5G conspiracy theories or where false narratives around Covid-19 cures came from. It can help us analyze cross-border narratives and identify which online communities most frequently discuss certain issues.
While Twitter’s public data is accessible through its Application Programming Interface (API), it can be much more complicated for researchers to access platforms such as Facebook and Instagram.
Facebook-owned platform CrowdTangle is the most easily accessible tool to handle three of the most important social networks — Facebook, Instagram, and Reddit — and it is free for journalists and researchers.
What is CrowdTangle?
CrowdTangle is an enormous archive of social media data that allows us to search through public Instagram, Facebook and Reddit posts or organise public accounts and communities into lists.
Here are a few things we can do easily with CrowdTangle:
- Monitor what is trending around a certain topic
- Search for combinations of words and phrases to discover trends and patterns.
- Track the activity of public accounts and communities.
- See which accounts are posting most and who is getting the most interactions around certain issues.
Read our Newsgathering and Monitoring guide to learn more on how to monitor disinformation on CrowdTangle.
There are some limitations however. The data is not exhaustive and you can only access content that is already publicly available — meaning we can’t see posts published within private groups or by private profiles.
We can also only track the activity of those accounts which are already included in the CrowdTangle database. CrowdTangle have made great efforts to ensure the database is comprehensive but it is not perfect. Nevertheless, it’s the best option we have to access historical data on these platforms, measure trends and find patterns.
It is also useful in helping us decide whether a rumor has crossed the tipping point and entered into the public eye, making it a prime candidate for reporting to stem the spread of misinformation, rather than amplifying something few have seen.
Download the data
If you don’t already have access to CrowdTangle, you can request it through its website. It is free for journalists and researchers.
There are a few different options available for accessing public posts in a data-friendly format, mainly through ‘Historical Data’ (under ‘General Settings’, on the top-right corner) or through the ‘Saved Searches’ feature. We’ll illustrate the latter as it allows us to tailor the search according to a broader range of keywords or phrases.
On the CrowdTangle dashboard page, go to ‘Saves Searches’, click on ‘New Search’ and then on ‘Edit Search’ to input more advanced queries. Unfortunately, we can’t search on both Facebook Pages and Groups at the same time, so we have to choose where we want to search or repeat the same steps twice.
We can use a list of keywords and the CrowdTangle version of Boolean expressions.
In the following example, we used the ‘Saved Searches’ feature to search trends around the false claim that 5G causes Covid-19 when it started trending on social media.
In the first search query, below, we are asking CrowdTangle to search its database of Facebook Pages for posts containing one of a number of variations on “coronavirus” and “5G”. We can input here any word we think people might use instead of “Coronavirus”, including misspellings. Remember it is important to think about the language that people use on social media.
We can then add similar queries an unlimited number of times.
Doing this as a normal Boolean search, of the kind we might use on Twitter, would look something like this:
(coronavirus OR corona OR caronavirus OR carona OR covid OR covid-19) AND (5g OR fivegee OR “5 gee” OR “five G”)
While CrowdTangle’s new Search feature does accept Boolean queries, the dashboard and saved searches like this do not.
So for CT searches you basically have to repeat:
(coronavirus OR corona OR caronavirus OR carona OR covid OR covid-19) AND 5g
(coronavirus OR corona OR caronavirus OR carona OR covid OR covid-19) AND “5 gee”
And so on. With CrowdTangle you can just replace all the “OR” operators with a comma.
After inputting the search queries, it’s time to set filters, like language, timeframe, media type and more.
In our 5G search, for example, we set a timeframe from Jan 1 up to April 16.
By default CrowdTangle will sort the results by ‘Overperforming’, or how well a particular post is doing compared to the average for that page or group. To download the whole dataset we have to change ‘Overperforming’ to ‘Total Interactions’, which would retrieve every public post sorted by total likes, comments, shares, and other reactions.
Click the download button (like a cloud with a down arrow, on the right) to save the data. After a few minutes, depending on the amount of data we are requesting, we will receive an email with a CSV file.
Repeat the steps to search within Facebook Groups.
Making patterns visible
Before moving forward to explore and analyze the data collected, it’s helpful to see what a public post means in terms of the available data. This annotated screenshot shows which details are accessible.
And below is what the CSV file looks like once it is downloaded, based on our search, with columns for different fields like the name of the Page which published the post, the time and date the post was published, and more. We can also easily see the text in the post and the link to the original on the relevant platform.
Data journalism is often about patterns, trends or outliers and looking at social media data is no different. Are there lots of identical posts from different accounts? Or do many posts appear on the same day? Which posts received the most interactions, and why?
It’s important to remember that we are not seeing the whole picture – we are only accessing public data – but it is enough to pull out general trends.
Import, clean and analyze the data on a Google Sheet
Once we have downloaded the CSV file, we can import it into spreadsheet software such as Microsoft Excel or Google Sheets. We’ve used the latter in this example as it is free to use, however the steps are similar in Excel.
After starting a new spreadsheet, we just need to click ‘File’ on the menu bar, then ‘Import’ to find the CSV file in our downloads. It might take a few moments depending on the size of our file.
Before starting, we need to lock the top row in our spreadsheet by clicking on ‘View’ and then ‘Freeze first row’. By doing this, we make sure the column headers remain visible as we scroll down and don’t get mixed up when we play around with the data.
First of all, we want to make sure the data is ‘clean’ and ready to be analyzed.
Data cleaning is a fundamental process to prepare our dataset for analysis and assure accuracy and reliability. We might need to fix spelling and syntax errors, for example, or remove empty fields and duplicates to standardize the data sets.
To handle big datasets, we suggest downloading OpenRefine, an open-source tool to clean messy data. But even on Google Sheets there are a few quick actions you can take.
We sometimes end up with some non-printable characters in our ‘Group Name’ column that could interfere with our next steps. Non-printable characters are characters that fall outside of the American Standard Code for Information Interchange (ASCII) range, that we might wish to remove.
We can easily remove them by adding a new column and writing the function =CLEAN(A2) on the second row of the new column. With the CLEAN function, we remove any non-printable computer characters presented in our worksheet.
To do this for the whole column, select the top cell and then hit ctrl+shift+down.
We can also easily remove some specific characters by selecting a column, press command+F on a Mac or ctrl+F on a PC, and then the three dots for ‘more options’. Here we type any character we want to remove and replace with a blank-space, and we click ‘Replace all’.
We also want to make sure the dates in the ‘Created’ column are formatted correctly.
We select the ‘Created’ column, then go under ‘Format’, ‘Number’ and click on the date format we need for our analysis. We can also click on ‘More Formats’ and customize our own format. This is particularly important if we want to visualize the data which often requires specific date formats.
Once we’ve cleaned up the data, we can move to look at some general patterns in our dataset.
Type the function =MEDIAN(X2:X10001), where X is the column letter, at the end of each column to calculate the average. For example, in our 5G dataset the median of the ‘Total Interaction’ column is 94, which means 5G posts reached an average of 94 interactions – a figure which we would need to investigate more to understand if it is significant.
We can also look more closely at the different reactions from users to the post, such as ‘Haha’ and ‘Angry’, to see what was the most popular in terms of average reactions per post.
Let’s explore some of the posts which attracted the highest number of shares by sorting the data by ‘Shares’ column. We look at shares because we consider it to be the most significant interaction in terms of tipping point.
Click the small arrow at the top of the column then select ‘Sort sheet Z → A’ and the whole worksheet will change order: from posts with the highest number of shares on the top, down to the lowest.
By clicking through the URLs here we can analyze the content of the top ten most shared posts in the dataset and their source.
We can also focus on the messages of the top posts and analyze their content. For example, we can have a quick overview of which languages people are using the most to spread the rumor: we add a column on the right of the ‘Message’ column and we type the function: =DETECTLANGUAGE(U2), where U2 is the ‘Message’ column. If many of the top pages in our dataset use different languages, it might mean a rumour or misleading narrative has spread outside of a specific region and become global.
A similar analysis, but from a different perspective, involves analyzing the most common Groups in our dataset. To address this question we need to create a pivot table. There are numerous valuable things we can do with the simple use of a pivot table, segmenting the sheet and making a huge amount of data more manageable.
First, select all of our data by clicking on the little rectangle on the top left side of the sheet. Then go to ‘Data’ on the top menu and select ‘Pivot Table’.
On the new page, we’ll have a few options to determine the rows, columns, values, and filters for our pivot table. Select ‘Group Names’ both under ‘Rows’ and ‘Value’, because we want to see how many times each Group name recurs in the dataset.
Then we summarize by ‘COUNTA’ under ‘Value’. This will quickly summarize the entire dataset by unique Group Names and tell us how many times that name appears in the dataset.
Click on ‘Sort by’ and select the column ‘COUNTA of Group Names’ and descending order. This gives us a quick overview of who is posting the most about this topic.
We can repeat the same operation with the ‘Links’ column to see the most common links in the dataset:
Or with the ‘Media types’ column or ‘Language’:
It is very important to look at trends over time, especially if we want to know whether or not we should publish a debunk or amplify a rumor.
We can do that using a pivot table again which will sum up all shares for each day in the dataset.
CrowdTangle tells us the exact second a post was published as default, so we need to change the time stamp in our spreadsheet to show the date.
For this we need to add two blank columns on the right of the ‘Created’ column. Copy and paste it, and highlight the new one. Then click on ‘Data’ on the top navigation menu and ‘Split text to columns’ and select ‘Space’ as separator; in this way, we will remove the timestamp and leave a clean column with dates only.
Now we can create a new pivot table. Select the new ‘Created’ column within ‘rows’, and the ‘Shares’ column in ‘values’. We then select ‘sum’ under ‘Summarize by’. We can also create a chart out of our data to overlook how it looks like. Click on Insert Chart in the top menu.
As we see in this case, 5G related posts started increasing around the end of March and spiked at more than 40,000 shares per day at the beginning of April.
We can also decide to simply sort the data by the ‘Created’ column, which could lead us to reconstruct a timeline of how and when the content was amplified and reveal the spread of disinformation.
This is just a quick demonstration of some basic actions we can do with CrowdTangle’s Facebook data. We can also use CrowdTangle for Instagram and Reddit, which will give us slightly different — and less detailed — datasets.
Obviously there are many more ways we can clean and analyze the data. What we do will be shaped by the questions we want to ask of it. Understanding the spread of mis and disinformation is hard but using data to ask the right questions is a solid first step.