This is the fourth in a series of blog posts on the role Twitter plays in scholarly communication by, scientometrics researcher, Stefanie Haustein.
Tweets linking to scientific articles occur shortly after publication and Twitter activity often runs dry a few days later. The short-lived attention points to Twitter being used to spread the word about new publications rather than discussing them in-depth, as we discussed in last week’s blog post. In today’s post, we will analyze the temporal patterns of Twitter activity. As in the previous posts in this series, I will illustrate Twitter metrics based on a forthcoming book chapter using data from Altmetric’s Researcher Data Access Program.
Temporal indicators: Twitter half-life, delay and life span
Twitter half-life, delay and life span help to depict the temporal aspects of tweeting activity related to scholarly documents. These time indicators are derived from bibliometric measures reflecting delay and decay of citation impact: citing and cited half-lives have long been included in the Journal Citation Reports next to the impact factor to reflect temporal reception and currentness of the journal literature by measuring the average age of references and citations. In analogy to cited half-life, tweet half-life (or tweeted half-life) measures the average age of tweets linking to scholarly documents or, more specifically, the amount of time it takes to obtain 50% of the total number of tweets. Mean response time is another, albeit less popular bibliometric indicator, which measures the length of the period between the publication of a document and the first citation received. Publications with particularly long citation delays have been called ‘Sleeping Beauties’, as they are ‘kissed awake’ after years of going unnoticed. Mean response time is defined for a set of papers, such as a journal or scientific field. On the individual article level, it is equivalent to the delay between publication date and appearance of the first citation or tweet (or any other social media event for that matter). Finally, Twitter life span, or tweet span, measures the number of days between the first and last tweet and thus indicates how long a document stays relevant on Twitter.
Accurate publication dates are essential for accurate temporal indicators
Most bibliometric indicators are based on publication year, which, in the times of preprints and online first publications and in the context of fast-lived social media environments, has become less relevant, if not entirely obsolete. Tweet activity can be measured in hours and days rather than years, so any temporal metrics based on tweets need to address those shorter time frames. Publication dates, therefore, need to move from the journal to the article level and be more fine-grained, providing the day instead of the year of publication.
However, even with precise article-level publication dates, there are still problems with determining the actual date of publication. Lags between online and ‘print’ publication dates can cause issues with scientific priority (determining who made a discovery first) and open access embargos, and artificially increase impact factors and other scholarly metrics. This study demonstrated that DOIs of articles were tweeted before their indicated online publication, suggesting systematic errors in article level publication dates. Due to these issues, I will refrain from computing Twitter delay and half-lives and limit the analysis to general tweeting patterns and tweet span, which can be calculated without the publication date. Let’s start by looking at the overall temporal trends of Twitter activity captured by Altmetric between January 2012 and December 2015.
Seasonal patterns reflect academic year
Longitudinal data from Altmetric shows that tweet volume has been increasing annually and that there are distinct seasonal patterns to tweet volume. While a general increase from January to December can be observed each year since 2012, the peaks and valleys seen throughout the year reflect the academic year: activity is higher in spring and fall and drops during summer and particularly during the winter break, especially the last two and the first week of each year.
Workweek affects tweeting
Specific weekday patterns, which have also been observed for article downloads, can be seen in the bar chart below. Comparing the percentage of annual tweets per weekday between 2012 and 2015 shows that tweeting activity increases from Monday (14% of tweets), peaks on Wednesday (18%), and decreases towards the weekend (Saturday 10%, Sunday 8%). Considering that Twitter activity often climaxes the day of or day after publication, the season and weekday of publication might affect a paper’s tweet and retweet counts.
Staying alive on Twitter
The tweet span reflects the amount of time between the first and last tweet of a set of papers or a user and thus indicates the length of respective activity. Although most articles receive their first tweets shortly after publication, the metric is independent of publication date and can thus be computed without accurate dates. The average document in Altmetric has a tweet span of 81 days. However, as seen in the figure below, 58% of documents stay relevant on Twitter for less than one day, which is why the median tweet span is 0. As little as 4% of documents are tweeted one day after their first tweet and 2% for two days, the number of documents that stay ‘alive’ longer gradually decreases after three days.
The documents that stayed ‘alive’ the longest were a paper comparing gluten sensitivity and celiac disease and a psychology study that found that homophobia was linked to being aroused homosexually. Although both articles had a tweet span of almost five years (1,822 days), they demonstrated completely different levels of Twitter activity. While the homophobia study received 2,436 tweets from 2,048 users, the gluten paper was mentioned in 40 tweets only.
Hashtag life span
Combined with content (how documents are tweeted) and user analyses (who tweets), temporal indicators help to understand how tweeting activity changes over time. For example, overlaying the top hashtags per month on the timeline of the tweets for the homophobia paper demonstrates in what context the article was shared every month. The paper was shared using 190 unique hashtags; the top terms were #homophobia and #LGBT (35 tweets each), #AUSpol (27), #science (22) and #gay (15). The figure below displays the most frequent hashtag(s) per month for each month with at least 5 hashtag occurrences. It demonstrates that certain terms are relevant for a short time only. For example, the study is shared using the hashtags #RHbill, #busted, #CDNpoli and #JimWells during one month, respectively, while #LGBT and #homophobic appear on top for several months.
Tweet span can also be directly applied to a hashtag regardless of the tweeted paper. The hashtag life span thus indicates the topicality or timeless relevance or a specific term. For example, among hashtags that occurred at least 1,000 times, #diet, #water and #nutrition were used during the course of more than four years to describe 2015 WoS documents, while #XmasBMJ lasted only 73 days.
How do temporal patterns affect Twitter metrics?
When computing scholarly metrics, the temporal patterns of tweeting behavior needs to be taken into account. As the majority of tweets appear on the day of or day after publication, indicators need to be adjusted using the exact publication date instead of the year. In analogy to their bibliometric predecessors, tweet half-life measures the decay of attention, while delay indicates the time it takes to receive the first tweet. Tweet span represents the length of the period between the first and last tweet and thus indicates how long a document or hashtag stays relevant on Twitter. As demonstrated by the use of hashtags over time, temporal indicators are particularly informative when combined with tweet content, helping to provide context as to how tweeting activity changes over time.
Stefanie Haustein is assistant professor at the University of Ottawa’s School ofInformationStudies, whereshe teach research methods and evaluation, social network analysis and knowledge organization. Her research focuses on scholarly communication, bibliometrics, altmetrics and open science. Stefanie co-directs, together with Juan Pablo Alperin, the #ScholCommLab, a research group that analyzes all aspects of scholarly communication in the digital age. Stefanie’s publications can be found on her website. She tweets as @stefhaustein.