(TL;DR there’s a big new dataset of tweets about arXiv preprints up on figshare – check it out and let me know if you do something cool with it)
It’s the PLoS Article Level Metrics workshop & hackathon in San Francisco this weekend. The Altmetrics workshop that Jason, Dario & Paul Groth organized in Evanston earlier this year was awesome, so I was disappointed when some conflicting responsibilities meant I couldn’t attend this time around (it’s all good – what’s stopping me from flying out is a new baby daughter).
I’m trying to follow along online, though, and there’s already some good stuff emerging on Twitter. For one thing the PLoS Altmetrics Collection – an ongoing collection of articles with an altmetrics theme – has officially launched. The seed articles are all interesting and while they’re not all recent pieces this one by Shuai et al. looking at the relationship between arXiv downloads and tweets is. Johan Bollen, who is pretty well known in the altmetrics community for his previous metrics and Twitter related work, is the last author.
Anyway, you should go read it and if you’re left feeling inspired check out this arXiv tweets dataset on figshare to see if you can come up with any additional insights.
Note that this is not the dataset used in the paper above – it’s data collected by Altmetric, which has a different collection strategy. It’s a more exhaustive set of tweets but from a different time period and lacks download stats.
I haven’t analyzed this dataset at all, I’m hoping to hack vicariously through others… so let me know what you find and I’ll update this post. 🙂
The dataset contains details of approximately 57k tweets linking to arXiv papers, found between 1st January and 1st October this year. You’ll need to supplement it with data from the arXiv API if you need metadata about the preprints linked to. The dataset does contain follower counts and lat/lng pairs for users where possible, which could be interesting to plot.
Some basic stuff to check out / visualize / enrich the data with might be:
- what does the tweets per user distribution look like?
- can you add counts from other services using the Altmetric or ImpactStory APIs?
- what % of tweeters are bots vs real people?
- how interconnected (on Twitter) are people who tweet about arXiv preprints?
- which arXiv papers / subjects / key terms got the most attention from Twitter this year?
- where do arXiv tweets come from, and is this different to the geographical distribution of ‘average’ Twitter users?
- can you fetch publication dates for the arXiv preprints and replicate the relevant findings from Shuai et al.?