In May last year we announced that Dr Evan Goldstein Postdoctoral Associate in the Department of Geological Sciences at The University of North Carolina at Chapel Hill, was the winner of this year’s Altmetric Research Grant. We recently asked Evan to write us a summary of the progress he’s made with his project so far for our blog and he was happy to oblige…
Time flies. Almost a year ago I set out to investigate questions related to ‘how’, ‘when’ and ‘why’ scholarly mentions in Wikipedia using a > 33,000 article corpus from the high impact, high volume (>1000 articles/year) Earth and Space science journal ‘Geophysical Research Letters’ (GRL) — remember, I’m an earth scientist. Here is my first Altmetric blog post describing the research questions in more detail. Today, I’m happy to report on some of my progress.
Time delay between journal article publication and Wikipedia mention
Using the list of digital object identifiers (DOIs) from GRL, 921 scientific articles have been mentioned 1230 times spread across 762 Wikipedia pages (see this post for the data collection techniques and some initial data analysis). My project set out to determine when mentions in Wikipedia occur relative to article publication. There are two critical times to determine — the date the paper was published (or put online as an ‘accepted manuscript’), and the date the Wikipedia edit was made. These dates were obtained through a mix of data from Altmetric, Crossref, and web scraping (relying heavily on Wikiblame). The interval between these dates I refer to as the lagtime. Here are the distributions of lag times, broken up by publication year:
On the y-axis of this boxplot is the lag time. The x-axis is the publication year (articles mentioned in Wikipedia grouped by publication year). Note that the maximum lag time grows with each year because Wikipedia edits are made constantly.
What happened in 2006?
One aspect of this plot that I noticed immediately is that from 2006 onward, there are GRL articles mentioned in Wikipedia immediately upon publication (a lag time of 0). Why 2006? I’m not quite sure, but it seems there were some changes to the Wikipedia citation template around that time. Therefore the importance of 2006 in the plot above might be a result of the manner in which journal citation data is recorded in Wikipedia, and how I searched for GRL article mentions in Wikipedia (via the DOI).
Also, keep in mind that any future Wikipedia edits to add mentions to GRL articles will tend to extend the range of the data (and the box) upward — future edits are further from the publication date — so mean lag times will tend to increase with time.
Are journal article authors also Wikipedia editors?
In addition to lag time, I have wondered how many Wikipedia mentions can be attributed directly to journal article authors. By trying to match the Wikipedia editor username to journal authors, I found 18 unambiguous matches between journal authors and Wikipedia editors — 1.4% of edits can unambiguously be assigned to a journal author. I want to stress that this is likely an underestimate because of (at least) two reasons — first, many editors do not sign in, or may use a cryptic pseudonym. Second, I search for citations in Wikipedia by using Altmetric data, which tracks paper mentions using properly-formatted citation tags. This format may not have been used the first time the article was cited. Therefore another editor (or bot) who corrects the citation may get attribution. I try to mitigate this by searching the edit history of each Wikipedia page for the first mention of the DOI (using web scraping and WikiBlame), but the results are not perfect.
In the next few months I hope to finish up this analysis and continue posting the results to my blog. There are still a few interesting side trails and pieces of data that I am still investigating. For now, feel free to get in touch if you have and comments, questions, or collaboration ideas. (My code to scrape for and assemble the data is online — I hope to keep adding to it.) Thanks to Altmetric for supporting my project and providing me a venue to discuss these ideas. I encourage anyone interested in working with altmetrics to apply for the annual research grant.
— Evan B. Goldstein; Research Assistant Professor; Department of Geological Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Get your applications in for the 2018 research grant by 11pm GMT on the 20th April. Details on how to apply can be found here.