Using Twitter as a Data Source: An Overview of Current Social Media Research Tools

by Wasim Ahmed

I have a social media research blog where I find and write about tools that can be used to capture and analyse data from social media platforms. My PhD looks at Twitter data for health, such as the Ebola outbreak in West Africa. I am increasingly asked why I am looking at Twitter, and what tools and methods there are of capturing and analysing data from other platforms such as Facebook, or even less traditional platforms such as Amazon book reviews. Brainstorming a couple of responses to this question by talking to members of the New Social Media New Social Science network, there are at least six reasons:

  1. Twitter is a popular platform in terms of the media attention it receives and it therefore attracts more research due to its cultural status
  2. Twitter makes it easier to find and follow conversations (i.e., by both its search feature and by tweets appearing in Google search results)
  3. Twitter has hashtag norms which make it easier gathering, sorting, and expanding searches when collecting data
  4. Twitter data is easy to retrieve as major incidents, news stories and events on Twitter are tend to be centred around a hashtag
  5. The Twitter API is more open and accessible compared to other social media platforms, which makes Twitter more favourable to developers creating tools to access data. This consequently increases the availability of tools to researchers.
  6. Many researchers themselves are using Twitter and because of their favourable personal experiences, they feel more comfortable with researching a familiar platform.

It is probable that a combination of response 1 to 6 have led to more research on Twitter. However, this raises another distinct but closely related question: when research is focused so heavily on Twitter, what (if any) are the implications of this on our methods?


As for the methods that are currently used in analysing Twitter data i.e., sentiment analysis, time series analysis (examining peaks in tweets), network analysis etc., can these be applied to other platforms or are different tools, methods and techniques required? In addition to qualitative methods such as content analysis, I have used the following four methods in analysing Twitter data for the purposes of my PhD, below I consider whether these would work for other social media platforms:

  1. Sentiment analysis works well with Twitter data, as tweets are consistent in length (i.e., <= 140) would sentiment analysis work well with, for example Facebook data where posts may be longer?
  2. Time series analysis is normally used when examining tweets overtime to see when a peak of tweets may occur, would examining time stamps in Facebook posts, or Instagram posts, for example, produce the same results? Or is this only a viable method because of the real-time nature of Twitter data?
  3. Network analysis is used to visualize the connections between people and to better understand the structure of the conversation. Would this work as well on other platforms whereby users may not be connected to each other i.e., public Facebook pages?
  4. Machine learning methods may work well with Twitter data due to the length of tweets (i.e., <= 140) but would these work for longer posts and for platforms that are not text based i.e., Instagram?

It may well be that at least some of these methods can be applied to other platforms, however they may not be the best methods, and may require the formulation of new methods, techniques, and tools.

So, what are some of the tools available to social scientists for social media data? In the table below I provide an overview of some the tools I have been using (which require no programming knowledge and can be used by social scientists):

*It is advisable to check whether a tool can support other platforms as it may be possible to import data obtained from elsewhere.

**I won a historical data prize from DiscoverText with up to 3 months of gratis access and I also received 3 days worth of Firehose data via Sifter, and this has allowed me to conduct research that would otherwise have not been possible such as comparing Twitter’s Search API to the Firehose API. DiscoverText is used widely in academic research with over 40 scholarly mentions and contains features such as advanced data filtration and machine learning capabilities.

I would also like to mention:

By searching for relevant software (as documented in the table), I have noticed that there are very few tools that can be used to obtain data from other social media platforms such as, Pinterest, Goolge+, Tumblr, Instagram, Flickr, Vine, LinkedIn, and Amazon among others. Regarding this, I would like to see more software for those in the social sciences to obtain data for a range of platforms and including a range of data i.e., web links, images, and video. At the Masters and PhD level there should be more emphasis on training for social science students in effectively using existing software that can be used to capture data analyse data from social media platforms.

This post was first published on LSE‘s The Impact Blog. Re-published here with permission.

About the Author

Wasim Ahmed is a PhD candidate at the Information School, at the University of Sheffield and the Twitter Manager for NatCen’s Social Research network New Social Media New Social Science. Wasim has a very successful research blog which includes posts about key trends and issues within social media, but also covers more practical posts on using tools to capture and analyse social media data. Wasim is a keen Twitter user (@was3210), and will be happy to answer any technical (or non-technical!) questions you may have. 

Leave a Reply