Altmetric Blog

“Not sure if scientist or just Twitter bot” Or: Who tweets about scholarly papers

Guest Author, 12th July 2018

This is the final post in a series of blog posts on the role Twitter plays in scholarly communication. This post is by, scientometrics researchers, Stefanie Haustein, Rémi Toupin and Juan Pablo Alperin.

During the last weeks we have analyzed what types of scholarly documents get shared on Twitter, where those tweets come from, how they get shared and when Twitter activity occurs. Concluding our mini series on scholarly Twitter metrics, today’s post focuses on who tweets about scholarly papers.

Identifying who is tweeting is one of the main challenges of turning tweets intomeaningful scholarly metrics. Unfortunately, the best—and pretty much the only— source for classification of users are Twitter bios. These self-descriptions are limited to 160-character texts for users to present themselves. As shown for @lariviev and @juancommander, academic users often provide a mix of professional (e.g., professeur, scholar, #scholcommlab) and personal (#IMFC, father, king of bocce) descriptions. Altmetric looks for these keywords to classify Twitter users as scientists, science communicators, practitioners and members of the public.

Societal impact and public engagement with science

Identifying non-academic users who engage with scientific documents on Twitter is the first step to trace the contexts in which research is used on social media outside of the scientific community, and therefore a potential way to uncover societal impact. A keyword-based approach provides a rough approximation of who is tweeting research. It works particularly well to sift out academic users, but is not nuanced enough to distinguish when research is tweeted by those who are part of the public sphere. Capturing such non-academic engagement has always been of great interest, as it could form the basis for analyzing and understanding societal impact. Measuring societal impact would expand the scientific reward system and provide incentives for researchers to seek out public engagement and interact with—instead of presenting to an audience of—interested members of the public.

Academia is overrepresented among those tweeting journal articles

Based on the assumption that Twitter is widely used by the public—a hypotheses supported by Twitter metrics’ low correlations with citations—altmetric studies have argued that tweets linking to scholarly articles reflect interest from the general public. Despite the use of Twitter by large swaths of society, a study classifying users tweeting journal articles found that almost half of all identified individuals had completed or were pursuing a doctorate degree (in contrast to 1% of the US population with a PhD). This academic overrepresentation strongly suggests that it is the scholarly community rather than the general public who tweets about scholarly papers.

These findings were supported by another study coding accounts tweeting Web of Science (WoS) articles published in 2012: 68% of Twitter profiles were maintained by an individual and 21% by an organization. Among individuals (n=542, see figure below), 47% used professional terms (e.g., doctor, MD, photographer) to describe themselves, while 22% identified as researchers, 13% as science communicators and 7% as students. A number of individuals used words from more than one of the four categories—for example, 2% were science communicators and researchers—which made an overlapping classification system essential.

Identifying members of the public is tough

Many Twitter bios contain terms that reflect personal characteristics (e.g. wife, runner, yoga lover), but due to the blurred boundaries between personal and work identities (as seen in @juancommander’s bio above) it is challenging to identify members of the general public. Scholars often describe themselves in both a personal and a professional manner, which means that the presence of personal terms in a Twitter bio does not necessarily help to identify non-academics. In fact, the ‘members of the public’ category by Altmetric absorbs all users without keywords that classify them as scientists, science communicators or practitioners, which is why it would more properly be labeled as “unknown”.

Why don’t we just ask Twitter users who they are?

Another way to find out who is tweeting about science is to approach users directly. With the help of an automated Twitter account, this study asked users who had tweeted a Scielo Brazil paper if they were affiliated with a university. Of the 286 who replied, 24% were employees, 23% were students, while 36% were not affiliated with a university. This micro-survey approach using automated programs can be useful, but it is difficult to employ on a large scale, especially as Twitter clamps down on bots.

Bot or not?

Automated accounts have become prevalent on Twitter, so much so that Twitter suspends millions of accounts every month. These social bots are defined as “a computer algorithm that automatically produces and interacts with humans on social media, trying to emulate and possibly alter their behavior”. Bots do not only interfere with political events, such as the US presidential election and the EU referendum, but are also infiltrating academic Twitter.

Automated Twitter accounts seem particularly prominent among users tweeting arXiv submissions. Coding tweeting activity and Twitter bios of a random sample of 800 accounts captured by Altmetric, 8% seemed completely and 5% partially automated,as observed in regular systematic tweeting patterns. Unfortunately, Botometer (formerly BotOrNot), which analyzes Twitter user statistics, linguistic features and sentiment to automatically detect bots, was not able to identify these automated scholarly accounts, because they behave differently than regular Twitter bots. So overall, we are not sure if users tweeting about scientific articles are scientists or just Twitter bots.

Other than social bots in the general Twittersphere, scholarly bots seem to not (yet) try to emulate human behavior or game the system. They rather resemble RSS feeds tweeting a paper title and link, often specifying in the Twitter bio what type of information they diffuse. Some even provide instructions to create similar feeds. Regardless of whether these bots are considered useful or spam, it is safe to say that these  automated tweets do not imply impact, but instead reflect diffusion.

Academic journals, publishers and bots tend to self-identify and are prominent on Twitter

One way we could get more precise demographic information for Twitter users is to change the keywords we look for in Twitter bios. In particular, we are able to identify journals, publishers, and bots fairly easily. The 24.3 million tweets captured by Altmetric until mid-2016 analyzed in the book chapter this blog series is based on, were sent by 2.6 million users. Searching keywords in the Twitter bio, username and Twitter handle of the 2,043 most active users (≥1,000 tweets), we identified 305 journal or publisher and 248 automated accounts. 15 of the 19 most productive accounts (≥25,000 tweets, see table below) self-identified as bots. These automated paper feeds make up 30% of tweets by the 2,043 most active users and 7% of all 24.3 million tweets captured by Altmetric until June 2016. The journal and publisher accounts produce 11% and 3% of tweets, respectively.

Grouping users by tweeting activity

Instead of classifying Twitter users topically, accounts can also be grouped based on how much they tweet. Dividing accounts into three groups of top 1%, 9% and 90% of users according to number of tweets helps to distinguish highly and less active users. Analyzing the user type distribution for 2015 WoS articles by discipline, lead users (top 1%) were overrepresented in Chemistry, Physics, Mathematics and Engineering & Technology (see Figure below), fields which exhibit the lowest Twitter coverage, density and number of unique users. This might, at least partly, be due to cyborgs and bots, such as @blackphysicists and @MathPaper, which were the two most active accounts sending more than 50 WoS paper tweets daily (see table above).

Assuming that members of the general public only rarely tweet links to scholarly papers, non-academic users engage more with articles published in Professional Fields and Social Sciences, where users classified as least active (90%) were overrepresented. Based on the median number of tweets and followers, lead users sent 149 tweets linking to 2015 WoS articles and had 935 followers, while highly active users (9%) contributed 16 tweets and had less than half the number of followers (median=442.5). Least active users had 212 followers and tweeted only once on average.

So, who is tweeting about science?

So, to conclude, who are these 2.6 million users tweeting scientific articles? Perhaps unsurprisingly, the majority are academic stakeholders rather than members of the public. This suggests that only a small fraction of tweets might hint at public engagement with research. But even if most tweets come from academics, tweets represent only a part of informal scholarly communication and tweeting does not (yet) play an important role in most disciplines.

Academic stakeholders include researchers and science communicators, journals and publishers, as well as paper bots created by academics as a type of RSS feed to diffuse scholarly publications. Such automated tweeting activity promoting certain documents and journals becomes problematic when tweet counts form the basis for measuring research impact. Tweeting activity related to scholarly documents is further influenced by its discipline, journal, document type, publication year, season and weekday and can be manipulated by individual users.

This is not to say that tweets should be altogether disregarded for scholarly metrics. Rather, Twitter needs to be approached critically and in a nuanced way with regard to its user population and types of use. As shown in the previous posts, the population of scholarly Twitter is geographically biased towards North America and Europe, excludes users from countries blocking Twitter, and mostly tweets in English. With half of tweets being retweets, Twitter is not used for in-depth discussions but primarily to diffuse research—academics’ main motivation to use Twitter. Short half-lives, delays and tweet spans also point to rapid diffusion; seeing users engage with articles weeks or months after publication is rare.

This is why raw tweet counts do not correlate with citations. Scholarly Twitter metrics need to differentiate between various levels of engagement and distinguish tweets from retweets. Hashtags can help to understand the context in which scholarly documents are shared on Twitter. They offer a crowdsourced view on tweet and publication content and enrich tweet counts by adding context and highlighting changes over time.

This post concludes our mini series on Twitter in scholarly communication on the Altmetric blog. By demonstrating who uses Twitter in academia, what types of scholarly outputs are diffused how, where and when, the blog series was meant to add context to the various dimensions of scholarly tweets. Analyzing Twitter activity linking to scholarly documents beyond crude counts and correlations, the blog posts intended to provide the basis for a better understanding and development of more sophisticated approaches to tweet-based altmetrics.

If you still want to know more about scholarly Twitter metrics, you can read up on detailed methods and explore more data in this book chapter.

Acknowledgements

We would like to thank Altmetric for proofreading and hosting this blog series. We especially thank Stacy Konkiel and Euan Adie for detailed feedback on each of the posts.

Stefanie Haustein headshot

Stefanie Haustein is assistant professor at the University of Ottawa’s School of InformationStudies, whereshe teach research methods and evaluation, social network analysis and knowledge organization. Her research focuses on scholarly communication, bibliometrics, altmetrics and open science. Stefanie co-directs, together with Juan Pablo Alperin, the #ScholCommLab, a research group that analyzes all aspects of scholarly communication in the digital age. Stefanie’s publications can be found on her website. She tweets as @stefhaustein.

 

Rémi Toupin is a PhD candidate in Science, technologie et société at Université du Québec à Montréal (UQAM). His work focuses on social media use for public engagement with science, especially surrounding environmental issues. For his thesis, he works specifically on the materialization of public engagement through science communication around climate change on Twitter and Reddit. He is also a member of the Laboratoire de communication médiatisée par ordinateur (LabCMO), Chaire de recherche UQAM sur les usages des technologies numériques and Canada Research Chair in the Transformations of Scholarly Communication. He tweets as @RemiToup.

 

Dr. Juan Pablo Alperin is a co-director of the #scholcommlab, as well as an AssistantProfessor at the Canadian Institute for Studies in Publishing and an Associate Director of Research of the Public Knowledge Project at Simon Fraser University, Canada. A full list of publications and presentations can be found at alperin.ca/cv, and he can be found on Twitter at @juancommander.

 

Leave a Reply

Your email address will not be published. Required fields are marked *