This is a guest post from Tuija Sonkkila, Leadership Support Services at Aalto University.
CRIS crash course
CRIS (Current Research Information System) is a new incarnation of semi-automated academic recordkeeping, a hotspot of data synchronized from University master data registries such as staff and project directories; enriched by metadata of activities, events and research output; and augmented with relations to e.g. research infrastructure. Many a CRIS acts also as a full-text repository.
The public interface for CRIS is a web portal, often nicknamed Research Portal, with typically three different paths to dig deeper: persons, organization, and projects.
Although part of the rationale behind CRIS is to increase the automation level of administrative tasks, a strong hope is also that researchers would find their personal CRIS profile attractive and useful, a live portfolio of their university life. One example of how CRIS can also nicely incorporate altmetrics comes from Aalborg University in Denmark. A Pure installation there too.
Trying to stay lean
From the present day commercial altmetrics products, the PlumX Dashboards and Altmetric for Institutions are conceptually close to a CRIS. They provide both a bird’s eye view to the organization, and close-ups. Could I do something at least remotely similar with R and data originating from our test CRIS installation?
First I need a set of DOIs and their affiliation. At the moment, our Pure test database contains some 16 000 research output items published between 2010 and 2013.
Pure is shipped with a REST Web Services (WS) interface, but you cannot query the DOI field as such. For this exercise, I didn’t venture into filtering DOIs from all data returned by the WS. Instead, I ran a standard Pure report that collects metadata from those publications that have a DOI. Export to Excel, import to R.
For the record, about 20% of the publications have a DOI.
Unfortunately, the Pure data model has recently changed, and now you cannot get the same DOI report any longer. In addition, DOI is becoming part of Pure’s linking strategy to full-text files, which means e.g. that the now still separate DOI field will be depreciated. As of writing this, it is still unclear to me how and how much this will affect exercises like this one where the DOI value is used as a core identifier. One possibility is to start saving the DOI to a general URL field, but then you end up cluttering your data. Anyway, I hope there will be novel ways in Pure to save and ferret out the DOI.
A bunch of DOIs and their affiliation ID was now ready. Then: university organization. Because the organization tree and names of Schools, departments etc. is relatively stable data, a standard Pure report on organization will do just fine. Again, via Excel to R.
After joining the DOIs and organization units with the unit ID, I had the foundation ready. Then to the fun part: getting altmetrics data.
To use the free Altmetric API, you first need a key. Like an analog key, it opens the door to the API. Then you just ask the API, a DOI at a time:”Do you have any metrics about this one? If you do, please return it all to me.” Luckily, this kind of computational dialog is easy in R thanks to the rAltmetric library by rOpenSci. All you have to do is basically just to wait for the end result, and then, from it, select those metrics you are interested in.
Result: 10% of my set of DOIs had data collected by Altmetric.com.
Aalto University’s profile at Impactstory has been alive for roughly a year. The status is experimental, and items there are manually added by me. Focus is on software but there are also a few videos, slide decks and ArXiv articles. Despite its limitations, the collection has already proved useful. For example, researchers are productive coders – after all, a substantial part of modern research is algorithms and other computational artifacts – and repositories delivered via GitHub are in active use. Or are they? Thanks to the Impactstory profile, we have now some evidence that this indeed is the case.
Since December 2014, I have logged weekly statistics, those values that are visible as small plus flags on the Impactstory site.
I even made a WeeklyMetrics Twitter bot of them, broadcasting cryptic messages once a week. No wonder that the bot has not been a huge success! Still, it has worked nicely as quality control: if the bot remains silent at the time it should deliver, I know that there are problems upstreams so I better check what’s the matter.
The layout of the 2amconf web application is based on the standard building blocks of the Shinydashboard R library. Navigation is done via the sidebar on the left, and a number of different visualizations are rendered onto the main body on the right.
- scatterplot shows all items by default, but you can be filter them by School, and choose different metrics to axes. The names of the metrics follow roughly the convention of the Altmetric API documentation
- barchart lets you compare two items from the selected School, either stacked or grouped
- sunburst shows the distribution of items between Schools, departments and smaller units. Here, I use the inhouse acronyms of the units. Otherwise, text would pour out of the window
- network graph of that part of the University organization that has items in the data
- pivot table adds some business intelligence to the application. Note that this one is a separate application at the moment due to compatibility issues
- timeline is for Impactstory items
For verification purposes, there is also a data table. It also acts as a linking layer to Altmetric.com source.
When you filter data by School, you may notice that in the five smaller boxes on the right hand side, values are changing too. In Shiny parlance, these are reactive values. They are dependent on the School filter. Whenever the filter changes, the box values change too. Here, they show the total numer of items plus few top altmetrics scores within that School.
By clicking the score value, you can check from the Altmetric landing page, what the item is about.
For example, in School of Science (SCI), the top number of Facebook citations is 4.
Following the link you will find out that the item is DOI http://dx.doi.org/10.1093/scan/nss096, a joined Finnish publication from a project where brain activity of supernatural believers and skeptics where examined. The Aalto University authors come from the Brain Research Unit.
A dashboard-type application like the prototype presented here might be of interest to those who are concerned about the extent and quality of scientific outreach of the University. Are there differences in web presence between Schools? Is there a balanced coverage for all relevant groups of audience? Are some channels over- or underrepresented?
However, there is only so much dimensions you can visualize with scatterplots and barcharts. What is fairly easy though, is to add more metrics.
In this older application with Altmetric data but a wider scope and a different source, there are also values showing
- number of authors from Web of Science by Thomson Reuters
- Journal Metrics from Scopus by Elsevier
- Finnish Publication Forum ranking (JuFo)
From CRIS, the number of authors can be calculated. Journal Metrics and JuFo are among those datasets that will be imported to CRIS on a yearly basis anyway.
The color palette of the scatterplot circles represent the School. With an outer layer in different color aka stroke you can tell something else.
Here, the golden stroke is reserved to those items that are published in an Open Access journal. The data is kindly made available by Lib4RI in Switzerland.
All these prototypes do not contain any dynamic data. A move to a more up-to-date application is not trivial. Let’s say that I’d like to add a query field: “This is our unit ID. Please plot a chart of all our present altmetrics – while I wait.”
First, the application would need to be hosted by the University – not by RStudio – because access to Pure Web Services is restricted to the University network, and for a good reason. CRIS contains a wealth of data, especially about persons. The more information, the more responsibility. Like Wouter Gerritsma has mentioned, CRIS has in fact more common with backend systems than with those in front.
Second, response time. I seriously doubt that I could boost the performance of the application to do all necessary steps within 10 seconds. In practice I would need to be proactive, and have all data ready in the background, up-to-yesterday. From the overall CRIS performance point of view, this would also be the only sensible solution. If you query the WS all the time, it slows down the whole Pure system.
For those of you interested to see the R code of the 2amconf dashboard application, it is accessible via DOI http://dx.doi.org/10.5281/zenodo.32108. Note that the various data files that the application imports are pre-processed. If you’d like to know what they look like and how they are made, drop me a line. Drop me a line or tweet anyway!