Altmetric Blog

Analysing online mentions on a massive scale!

Josh Clark, 24th August 2017

The following blog post was written by Terry Bucknell, Sales Director at Digital Science. 

Last month, Euan introduced Altmetric’s new ‘export mentions’ feature, and showed some examples of how pivot tables in Excel can be used to analyse the data that you export.

Euan said that this new feature isn’t designed to help analyse massive datasets, but it does allow you to download up to 1 million mentions at a time. I don’t know your definition of massive, but that’s pretty big in my book. As the Altmetric support team will confirm, I’m always keen to try to break the Explorer so I was keen to see if a) the Explorer really could download that many mentions, and b) whether Excel (and my MacBook) could cope with that amount of data.

Understanding your influence on public policy

When I show the Explorer platforms to institutions, I like to demonstrate that by clicking on “Analyse these Results” whilst viewing the full Altmetric database. You can instantly see when Altmetric started tracking each of its sources, and how many mentions it has found in each:

Altmetric tracks citations in Policy Documents (freely-available grey literature publications from around 50 different bodies, including government departments, NGOs, and charities) and they are a key source of information for institutions that are looking for solid evidence of policy impact. Because these are freely-available policy documents, it is possible to judge for yourself whether each mention (citation) is notable enough to record as evidence of impact, was the cited article a key influencer in the document, or just one of hundreds of articles found in a literature review? The ‘export mentions’ functionality makes it much easier to process and record each document.

The Altmetric database contains (as of August 2017) over 1.15 million mentions (citations) in Policy Documents. I was interested in analyzing those mentions a little: for example, which journals are most often cited by a given Policy Document source? When I started writing this post, The Explorer interface didn’t allow you to choose amongst individual Policy Documents sources, but if I could download all 1.15 million of those mentions, I should be able to do some simple analysis in Excel using filters and pivot tables. So, how to download 1.15 million mentions when the limit is 1 million? Obviously I needed a way to divide the data into two sub-1 million chunks.

Using the Mentions tab, I was able to play around with the dates of mentions until I found how many recent years I could download whilst staying below the 1 million limit. I thought it would be logical to split the download into two roughly equal batches of a little under 600,000 mentions each. 1st January 2011 seemed to fit the bill quite nicely: just over 555,000 mentions in policy documents since that date:

and just over 596,000 mentions in policy documents before that date:

I could have tried copying 500,000+ rows of spreadsheet from one file to the other to create a spreadsheet of over 1.15 million rows, but that seemed a bit overambitious, so instead I sorted each file by ‘Outlet or Author’ (i.e. the body that issued the document) and added a Filter, so that I could easily extract the rows I wanted to work with.

So, if I want to analyse which journals are most often cited by Policy Documents from one body, say, the European Food Safety Authority, then I can extract those rows and build a spreadsheet of just those mentions:

Aside: I found that sometimes Excel garbled my CSV file when I opened it by double-clicking on it Finder (Apple’s file manager), but if I first opened Excel and then did File, Open to open the file, it was fine. And by viewing the file in a text editor, I could see that the CSV file itself was OK.

Working with a spreadsheet containing (in this case) around 22,500 rows is much easier than working with one of over 500,000 rows! So then I was able build a Pivot Table to start analysing which journals are cited by ESFA Policy Documents most often: it just needs to count the number of mentions (citations) of each Journal and sort by that number:

It was immediately apparent that the journal that is most often cited in EFSA publications is EFSA Journal no less! EFSA Journal is an open access, online journal which publishes the scientific outputs of EFSA only. It doesn’t accept or commission third party content.

Why is this sort of analysis useful?

Well for EFSA, it helps them to show that their journal is fulfilling its mission, and this information would also help the organisation to prioritise which other journals it should provide access to for its researchers. More broadly, if you are a librarian managing subscriptions in this field, or if you are a (non-EFSA) researcher looking to publish an article in this field, this is a rather nice prioritised list of journals to consider alongside other factors like prestige, quality of peer review, Impact Factor, and other Altmetric mentions (you can use the Explorer to see how widely any or all of these journals are mentioned in News sites, blogs, and social media for example).

The mentions export from Altmetric contains both the publication date of the policy document (which may be subject to some errors and uncertainty, depending on how consistently the source includes this information in its metadata and/or its publications) and the publication dates of the cited articles. In each case I used the YEAR() function in Excel to extract just the year part of the date, and then I calculated the age of the cited articles in years by subtracting the publication year from the mention year and plotted the distribution of ages:

Altmetric’s policy documents cover a range of fields and bodies, so this sort of analysis can be repeated for any of them. For example, the World Health Organisation most commonly cites articles from the big 4 medical journals: The Lancet, BMJ, NEJM and JAMA, but other commonly-cited journals reflect the remit of the WHO, spanning tropical diseases, AIDS, public health, contraception, vaccines and so on. Again, this list of journals forms a handy ‘must have’ list of titles for any library with a similar global health remit:

Interestingly, WHO tends to cite older articles than ESFA:

which is more obvious if we plot the two distributions on the same graph (and use percentages of articles rather than absolute numbers of articles):

Attention from international news outlets

The ability to limit to Mentions to specific outlets, which was released during the few days I was writing this post, opens up a whole new raft of possibilities. You want to study which of your institution’s papers in Nature and Science were featured in a number of key News sources? Easy!

First you use Advanced Search to find your institution’s outputs in those two specific journals:

And then in the Mentions tab of the Results Analysis overlay, specify the News sources that you want to limit to:

And then you can still export the details of those mentions as a CSV file to further analyse those mentions (by date maybe), or to easily grab the URLs of those stories to add to a record of impact.

I hope you have found this post useful. If you have any questions or comments about this post feel free to comment below or get in touch with us at If you’d like to find out more on the new export mentions feature check out our recent webinar or take a look at Euan’s introductory blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *