Regardless of your beat or area of focus, understanding how to use the social web to discover, monitor and research stories is an essential skill.
It allows you to find sources, monitor conversations, understand behaviors, track events and find the issues that affect a community. But you have to know where, when and how to look online.
In this book we will highlight the best free tools and techniques in newsgathering (active search) and monitoring (passive search). We will also look at the best practices and applications across major platforms and online services so that you can effectively surface the most useful content for your reporting and research. Verifying the information and sources that you find, which we address in a separate guide, is also critical.
There are a few key concepts that are fundamental to every online newsgathering operation. And while we will touch on these in more detail throughout the book, it is worth outlining them briefly at the start.
As with journalism before the internet, up-to-date lists of pertinent sources are the backbone of every beat. But unlike the pre-digital age, we can now listen in on millions of conversations happening in real time. This is where searches for keywords come in — terms, phrases or hashtags used to discuss topics or events, which can help us identify what might be a good source for a story.
For hard news, the relevant sources and keywords might be more apparent. A local reporter needs to know community leaders, subject experts, politicians, charities, academics, influencers, campaign groups, eyewitnesses, celebrities, business leaders and emergency services, to name a few.
If they were on the lookout for breaking news events from those sources, they might want to listen for words and phrases such as shooting, stabbing, crash, collision, attack, assault, shots fired, knife, pistol, explosion, died, body, serious, critical, life-changing, life-threatening, terrorist, extremism, casualty or injuries. Together, this process of identifying sources and keywords forms the core of online newsgathering.
When it comes to niche topics, it can be harder to get started. But identifying some more specific keywords can help us to identify more sources, creating a feedback loop of new sources and new keywords across new platforms, which can get us up to speed in no time.
The types of sources and the relevant keywords will be different for each topic and patch, but the fundamental approach remains the same.
Monitoring is an iterative process that involves the constant collection of new information relevant to the topics, sources and conversations you are tracking. And as your monitoring operation evolves, you will continually surface new and relevant accounts, keywords and hashtags. It’s a good idea to maintain a central document or spreadsheet where you can aggregate this information and add new content as it arises. It will also save time for future monitoring projects tracking similar themes.
It’s important to note that there is no one-size-fits-all way of organizing and managing online research — the nature of what you are reporting on very much dictates the sources, keywords and process necessary — and we encourage you to think creatively and remain flexible when designing your own monitoring systems. There are some powerful paid tools available for this kind of work. This book will cover the free tools and techniques that anyone with an internet connection can use for newsgathering and monitoring on the social web.
Whether you are looking for a new story, investigating the web for more evidence to support an ongoing investigation, or digging up keywords and hashtags for your monitoring operation, the effective use of search engines can make it much easier to get started. Here we will focus on Google, where more than 90 per cent of online searches take place worldwide, but many of the same practices can be applied to other search engines, including Bing, Yahoo, Baidu, Yandex, DuckDuckGo and more.
Boolean search operators are among the most simple but powerful tools to streamline your searches. George Boole was a 19th century mathematician whose legacy of Boolean logic — where every value is either TRUE/FALSE or ON/OFF — still underpins modern computing.
Similarly, his definition of ways to refine searches into logical, almost algebraic constructs is still the best way to search many online databases. Boolean searches mean combining the search terms with certain operators to expand, narrow or exclude the information returned in the search. They are:
• AND : Return results with all specified terms
• OR : Return results with any specified terms
• NOT /- : Return results without specified terms
• “ ” : Return results with the exact phrase contained in quote marks
• ( ) : Group the terms contained in parentheses to clarify search strings with multiple operators
Remembering and applying these five operators is fundamental to finding information online.
So if you are hunting for information related to information disorder, for example, you could search for disinformation OR misinformation OR “fake news”.
This would return a list of results that contain any of the referenced terms. Replace OR with AND (they must always be capitalized) and the search engine will only return results that include all three search terms, although the order in which they are returned would still depend on the filtering option or complex algorithm of the search engine in question.
You can combine any number of Boolean operators when looking for content. If you are looking into reports on the funding behind climate change deniers, for example, you could use the following search.
(“climate change deniers” OR “climate change denial” OR “climate denial”) AND (lobbying OR “dark money” OR funding)
If searching on Google, you can then use some of the additional filtering options to narrow the results. One of the most powerful yet often overlooked Google search tools is the date search. Choose “custom range” from the drop-down menu above the results and you can filter what you see to include pages from a particular day, week, month or any other window of time you decide.
Harnessing the power of Google’s advanced operators can significantly refine your search results, allowing you to find that needle in Google’s ever-expanding haystack.
If you are investigating unregulated exchanges for bitcoin, you may want to first identify a series of websites reporting on the cryptocurrency. You could use the related: search term with the URL of a website you have already identified, such as coindesk.com — which reports on digital currencies — to get a list of similar websites reporting on digital currencies
The real power of advanced Google search comes from combining multiple search operators. If you wanted to find all the pages from coindesk.com that mention unregulated exchange, you could use the site: operator with your search term.
site:coindesk.com “unregulated exchange” OR bitcoin
If you want to find instances of unregulated exchange and bitcoin within multiple sites, such as coindesk. com and another website that you found using related:coindesk.com, you could search “unregulated exchange” AND bitcoin site:coindesk.com OR site:bitcointalk.org.
These examples represent a small sample of what is possible with advanced Google searches. Nonetheless, they illustrate how, with the addition of a few keywords and a bit of creative thinking, you can quickly streamline your online research and monitoring, surfacing relevant content that might have remained invisible otherwise.
Google hosts some online courses for the more advanced search operators to help users super-power their searches, but this table includes some of our favorites.
Google isn’t the only place to look, however, and sometimes you need to try other search engines to get different results. For example, if you want to search for content in China, Baidu is a good place to start. Alternatively, if you are looking into activity in Russia and Eastern Europe, Yandex is probably the best option. If you want to make sure your results aren’t influenced by your previous search history, use a search engine like DuckDuckGo that doesn’t track your activities.
Establishing a system of alerts is an important ingredient in any monitoring, research or reporting operation, allowing journalists to stay apprised of new and relevant content surfacing on the web and social media. Google has multiple options.
Google Alerts sends notifications — on a rolling basis, once a day or once a week — on new content it has found across the web. You can use the Boolean operators and advanced search features to structure and hone your alerts.
For example, if you want to track new content related to either Brexit or Boris Johnson but only information published by specific UK news organizations, you could set up the following alert.
Google Trends also offers a little-known alert service, which is hidden away in its Subscriptions feature. While not as timely as Google alerts — it only allows you to receive updates once a week or once a month for keyword searches — the service still provides trends feedback for topics you may be investigating or reporting on.
Additional resources: There are numerous tutorials online on how to use advanced Google search, including:
Twitter is one of the easiest social media platforms for journalists to explore, with a wide range of tools to scan the constant stream of information. Its ease of use is also why journalists can be too reliant on the platform, so make sure it’s one of many social networks you are monitoring for breaking news or relevant conversations.
Twitter is renowned as being the go-to platform for breaking news in many languages. Despite having fewer regular users than Facebook and Instagram, its simple interface, character limitations and option to organize your timeline chronologically make it perfect for short, sharp bursts of information.
When looking for relevant tweets and sources, it can be tempting to search for the words that first spring to mind. As journalists, they are often the words we would use in a headline. But people are more likely to post updates in the same manner they would speak in real life or in a hurry — using swear words, abbreviations and internet-speak.
Whole guides have been written on the best way to pick keywords in breaking news situations, but one of the most important things to remember is the first person. People personally affected by an event are much more likely to refer to themselves; people commenting on an event from a distance are not.
Bearing this in mind, an effective search for people attending a protest or other political event could be:
(I OR me OR my OR we OR our OR us OR “just saw” OR just seen) AND (coup OR demonstration OR protests OR enough)
Don’t forget about Twitter’s advanced search option when looking for content relevant to your research interests or reporting beat. It gives you an easy interface to make very specific queries, like only searching for tweets from or to specific accounts, during certain time periods, or containing particular types of content, such as videos or links.
One of the easiest and most effective ways of navigating Twitter is with TweetDeck, a free and easy-to-use dashboard owned by Twitter. With TweetDeck, you can display an unlimited number of columns containing tweets from Twitter lists, search strings and specific accounts or activity all side by side, updating in real time.
You can organize these columns any way you wish and many of the advanced search options are included as filters in each column to better narrow the search results. If you were tracking marijuana legislation in California, for example, you could create one column searching for the hashtag #californiacannabis while another might contain a more advanced search, such as: (marijuana OR cannabis) AND (california OR cali OR ca) AND (government OR legislature OR law OR laws). Clicking on the filter icon in the top right of a search column will open a new set of drop-down menus to better refine the search results.
The resulting stream of tweets will likely unearth new, interesting and relevant accounts to follow, which you can then add to particular lists around different topics. These lists can then be turned into their own columns within TweetDeck.
Tweetbeaver is another dynamic research tool, effectively giving you sweeping Twitter powers, such as the ability to find common friends or conversations between two accounts, as well as the ability to download a user’s followers list, among many other options.
You can either create new Twitter lists or subscribe to public lists created by others.
So if you cover climate change in South America, you might decide to create a list of non-governmental organizations, activists and journalists covering the topic in the region to ensure that you have the most topical information related to your beat. You can make them either public or private. If they are public, other people — journalists from your news organization, for example — can follow your lists.
However, Twitter users included in a public list will be informed that they are being followed, so it’s advisable to make lists private if you want to remain anonymous while monitoring or if you don’t want your sources to know you are following them.
Unfortunately, Twitter doesn’t have a directory of Twitter lists, but one of the easiest ways of finding relevant lists is using Google’s advanced search operators.
Type in your keywords and then site:twitter.com/*/lists
The results will show any public lists with the chosen keywords in the title, which you can then subscribe to.
Another useful trick for finding Twitter lists is to identify a good source and then see what other lists they have been placed on. Simply add “/memberships” to the URL of a particular user.
For example, if we wanted a Twitter list for emergency services in New York, we could start with the New York Fire Department:
Then scan the results for a good list to subscribe to. Another tool is Scoutzen. You can search for lists and it shows you the most popular results by subscriber number and members (how many accounts are in a list), which can be a good shortcut for seeing how useful a list is.
Once you have built out your searches and lists in TweetDeck, it’s time to monitor their activity.
If you are covering a general election, for example, you can set up a series of different columns in TweetDeck to track not only officials and pundits, but also activists and constituents across the spectrum, giving you a sweeping view of the political landscape on one screen.
TweetDeck offers numerous ways to streamline your feeds, including an array of filters. You can set your feed to only stream tweets with a certain number of retweets or likes. You can filter the stream by location or language. Or you may want to see tweets that contain an image or a video. There are many ways to customize this versatile and powerful tool. The best way to learn and familiarize yourself is to jump into the platform and get your hands dirty.
You can also set up notifications so that each new tweet that arrives to a feed will elicit a sound or a desktop pop-up, depending on how you have configured the alerts.
Facebook and Instagram
In 2013, Facebook introduced Graph Search, a new architecture for how the vast amounts of data on the world’s biggest social network were organized and connected. Many journalists, human rights workers and investigators spent the following six years learning how Graph Search worked, and how to make it work in their favor. Some created additional tools and websites that went above and beyond what was possible in the native search on Facebook itself, bypassing the algorithm that gives users search results based on their profile and activity.
Then, in June 2019, Facebook fundamentally changed Graph Search and effectively killed all the tools that had sprung up to take advantage of it, denting the output of everyone who used it for investigative work. What’s more, Facebook has continued to change how Graph Search works, hindering efforts to maintain many of the tools and understand how the new system works.
This is important context for understanding that while some tools still exist, their position is fragile and depends entirely on the hard work of the community that built them. We will point to some resources at the end of this chapter, but at the time of writing, they do little more than Facebook’s native search.
At the time of publication, Facebook’s native search includes a host of filters, including the ability to search for public posts in public Groups and Pages, for example.
You can also search by date and by tagged location, as well as by media type, such as videos, photos or livestreams. You can narrow down your search filtering by date and by tagged location, as well as by media type, such as posts, photos, videos, livestream, etc.
You can also use advanced Google searches to sharpen your results when searching for particular content on Facebook. If you were investigating anti-immigrant groups and wanted to find Facebook pages and groups opposed to immigration, you could set up the following searches. site:facebook.com/pages “stop immigration”
If you wanted to surface Facebook groups instead of pages, you could use the following search in Google.
site:facebook.com/groups “stop immigration”
Instagram’s search function can be messy, returning accounts and hashtags in no particular order, so it’s better to use either advanced Google searches, third-party applications or some combination of both, depending on the task at hand.
If you are searching for particular kinds of people or accounts, you can use a tool like searchmybio, which allows you to search for keywords within accounts’ bios. For keywords, we recommend that you use Google advanced searches. If you were looking for Brexit Party accounts, for example, you could search for inurl:instagram.com/p/ “brexit party”
The best tool for monitoring lists of Facebook and Instagram accounts is CrowdTangle, a platform owned by Facebook.
You will need to request access from Facebook, but then you can set up dashboards in the three platforms it supports (Facebook, Instagram and Reddit) and build lists of relevant Pages and Groups in Facebook, public accounts for Instagram and subreddits for Reddit.
You can import lists manually from a CSV file, add individual pages if you know the URL, or search CrowdTangle for pages already logged in its database.
Whether you are tracking an election or digging up disinformation around wedge issues such as immigration, CrowdTangle’s Saved Search function allows you to find pages, groups and accounts on Facebook and Instagram where your particular keywords, key phrases and hashtags appear. All the relevant results will be displayed in a feed, with the only caveat being that it will only return results for pages already in the database.
Lists in CrowdTangle can produce a deluge of information, so CrowdTangle gives users a number of options to navigate the posts, based on time, the total number of interactions, and how much better or worse a post is doing than the average for that page.
With Facebook, you can refine the results even further using the different types of reactions available to Facebook users when they interact with a post. If you are monitoring for disinformation, which is often quite emotive, seeing the posts from a list with the most “angry” or “love” reactions may give different and more useful results.
CrowdTangle offers a number of options for notifications, including digests and viral alerts, once you have established your lists and saved searches. If you are looking for a daily or weekly roundup of content related to your beat, then opt for the Digest feature, which serves up a list of overperforming content or top posts — depending on your configuration — to your email.
If you are covering an upcoming event — say, the Super Bowl — and need to be closer to the action, you can use CrowdTangle’s viral alerts to get real-time notifications about posts. Having already built a series of lists using Facebook and Instagram accounts of players, teams and pundits, you could then set up alerts, which will trigger and arrive in your email or through Slack when a post from one of the accounts reaches a certain threshold of virality.
Reddit is one of the world’s largest social news aggregators and discussion boards. Referred to (by itself) as the “front page of the internet,” it is a trove of online discussions and potentially newsworthy material.
Reddit is composed of different subreddits pertaining to particular topics, such as r/HongKong, r/sports, or r/Worldnews. Users submit or post material to each subreddit; it can be commented on and either upvoted or downvoted by the community. To find content relevant to your beat, you can use the search function in the top navigation bar to sweep over all of Reddit’s posts looking for key terms. A search of Hong Kong, for example, will give you the top communities or subreddits as well as the top posts that include Hong Kong.
Similar to Google and Twitter, Reddit also has a series of search operators you can use to specify and narrow your search results, including Boolean operators. So if you wanted to look for posts with BBC content, you could search site:bbc.com. You can also find particular subreddits by navigating to https://www.reddit.com/subreddits/search. You can set filters on your results as well, such as sorting the content by popularity, or you can look within certain timeframes.
CrowdTangle works for Reddit as well as Facebook and Instagram and you can build lists of subreddits to watch in the same fashion. If you are looking for the latest posts on politics, you could set up a list of various politically affiliated subreddits, such as /r/antifascistsofreddit, /r/progressive, and /r/conservative.
Once you have customized your lists, you can set up alerts to get a digest of top posts from, for example, the last 24 hours, sent to your email. If you don’t have access to CrowdTangle, you can also track subreddits by linking them to an RSS reader. If you are covering European politics, you might want to keep watch of the subreddit r/europe, in which case you can add .rss to the end of the URL, like this:
https://www.reddit.com/r/europe/.rss And add it to your RSS feed reader, for example Feedly.
You can find more information on what keyword operators are available on Reddit and how to use them at https://www.reddit.com/wiki/search.
While many of today’s online debates and key discussions take place over social media, these are often in response to articles or other material originating on blogs and websites. Staying on top of new material relevant to your beat is important for any journalist, regardless of whether you are working on breaking news or awaiting the publication of a new dataset from a government agency to inform your investigation.
You could monitor these websites by continually scanning your lists and search results or frantically refreshing your browser, but we recommend an RSS reader and a system of alerts to keep you up to date.
RSS stands for Really Simple Syndication and is a way of monitoring multiple websites in one aggregated feed. There are a number of RSS readers, but we recommend Feedly. Once you set up an account, you can add new content by topic, website or RSS feed, creating lists of interesting websites or blogs in a similar way to Twitter or CrowdTangle lists.
Once added, new posts will appear. It’s easy to use the interface to monitor the output once or twice a day to see new articles that have been published since you last checked.
Sometimes you might just be looking for a change to a webpage rather than a whole new article or post, and that’s where Klaxon comes in. Developed by The Marshall Project and offered to the public for free, Klaxon scans specific parts of public webpages and pings you a message over email or Slack if any of them change. You can configure Klaxon to scan your sites every 10 minutes (not advisable for webpages that update frequently, such as homepages), every few hours or every few days. If Klaxon detects a change on a URL with each scan, you will immediately receive an alert.
If you are monitoring particular webpages or parts of a page, such as the privacy policies of a tech company, you can combine the superpowers of Klaxon with the Wayback Machine’s “Changes” features, which gives you a side-by-side comparison of different versions of the same URL in order to pinpoint exactly where the change in the document occurred.
With hundreds of hours of video uploaded to the platform every minute, YouTube represents one of the biggest repositories of information the world has ever seen. Many third-party YouTube services require paid subscriptions, but there are a few ways of searching and monitoring the platform that are free.
YouTube’s search filters are your best friend when it comes to finding the videos you want. Pressing the “Filter” button under the search bar will provide more options, allowing you to fine-tune results either by upload date, type of result, video duration or specific features, and then sort the results in different ways as well.
Once you have compiled a catalog of relevant YouTube channels, you can add them to Feedly to get a real-time feed of new videos that are published from those channels alongside any blogs or websites you are also keeping an eye on.
Sometimes specific YouTube channels don’t appear in the Feedly search function. A workaround is to subscribe to those channels of interest in your YouTube account, navigate to your subscription manager and then export them.
You can then import the OPML file — which is generated by the subscription manager — directly into Feedly by visiting https://feedly.com/i/cortex.22
Once your channels have been uploaded to Feedly, you can categorize them to fit the particular topic you are monitoring and reporting on.
You can then either have Feedly open, organizing your lists from latest to oldest, to clearly see the most up-to-date content, or you can sync your Feedly with other web-based services, such as If This Then That (IFTTT), to get alerts pinged to your phone as soon as new content enters your feeds.
The big social media companies — Facebook (including Instagram), Twitter, Reddit, YouTube — are crucial drivers of content and communication around the world. But it’s important not to forget other platforms, such as Snapchat, TikTok, 4chan, Gab and others, some of which have become the preferred messaging hubs for extremist or fringe groups. Here is a brief overview of some of these apps and platforms. Closed messaging apps, including WhatsApp, WeChat and Discord, are discussed in a separate guide.
Snapchat allows users to send each other short texts or video clips and post some clips publicly as “stories.” Now with more than 200 million users, Snapchat is making some stories publicly available to search. In 2017, Snapchat introduced Snap Map, showing the locations of publicly posted material in a heat map, but not showing the user who posted it. Newsworthy footage is occasionally posted to the map, although it is next to impossible to contact the uploader, and designers have been praised for keeping news and social interactions separate.
TikTok is a social video app popular among teenagers and owned by Chinese web giant Tencent. It became the world’s fourth-most downloaded app in 2018, ahead of Instagram and Snapchat, after starting life as a lip-sync app where users filmed themselves performing their favorite songs. TikTok has since evolved to become a hub for many different memes and themes. But by late 2018, it was described as having a “Nazi problem” and being a “minefield” of hate speech, with particular reference to India, as well as reports of online predators targeting children on the app.
Message boards such as 4chan and 8chan have gained notoriety over the years for becoming a breeding ground for extremism, violence and conspiracy theories. Divisive hashtags, messages and internet hoaxes often originate on message boards, where they first gain oxygen before being amplified across other social media platforms. While message boards are good places to monitor for newsworthy stories, be warned that you might encounter disturbing content. (In early August 2019, 8chan was taken down from the publicly accessible web.)
Gab is a social media website with very close functionality to Twitter. It has become closely linked to far-right groups, especially in the US, as users who have been banned from Twitter migrate to the platform. Gab first attracted controversy in 2017, when Google banned Gab’s app from the Google Play Store for violating a ban on hate speech. It was in the news again in the wake of the Pittsburgh synagogue shooting in October 2018. The shooter used Gab as a platform to air his antisemitic views.
BitChute is a video-hosting site akin in many ways to YouTube. But unlike YouTube, BitChute was created to avoid the kind of content regulation rules in place on other platforms. It emphasizes “free speech” above all else. Many of the channels that have been banned or demonetized on YouTube invariably migrate to BitChute.
Brighteon — previously called Real.Video — is another video-hosting site that prides itself on the absence of censorship and free speech. Alex Jones’ InfoWars, which was removed from YouTube, has since taken up residence on Brighteon.
MeWe is a social networking site similar to Facebook, but founded with a strong commitment to user privacy. While much smaller than Facebook, it has been increasing in popularity, especially given the privacy breaches witnessed on Facebook in recent years. Similar to BitChute and Brighteon, many conspiracy theorists and anti-vaxxers who were kicked off Facebook have opened up shop on MeWe.
VKontakte, commonly referred to as VK or the “Russian Facebook,” is a popular Russian social networking platform, boasting over 500 million individual accounts. As its nickname suggests, the platform is extremely similar to Facebook, both visually and functionally. Features such as groups, public pages, a news feed and direct messaging all can be found on the site. The vast majority of users are Russian but VK’s recent growth has led it to become prominent in Eastern European countries, the Baltic states and to a lesser extent in China and parts of Western Europe.
This article is the most up-to-date version of this Essential Guide.
Click here to download a PDF version of this guide. Last updated October 2019
About the authors
Carlotta Dotto is a research reporter at First Draft, specializing in data-led investigations into global information disorder and coordinated networks of amplification. She previously worked with The Times’ data team and la Repubblica’s Visual Lab, and written for a number of publications including The Guardian, the BBC and New Internationalist.
Rory Smith is a senior investigator at First Draft, where he researches and writes about information disorder. Before joining First Draft, Rory worked for CNN, Vox, Vice and Truthout, covering various topics from immigration and food policy to politics and organized crime.