↓ Skip to main content

Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches

Overview of attention for article published in PLOS ONE, March 2011
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (96th percentile)
  • High Attention Score compared to outputs of the same age and source (93rd percentile)

Mentioned by

news
1 news outlet
blogs
4 blogs
twitter
2 X users
patent
1 patent
facebook
1 Facebook page

Citations

dimensions_citation
235 Dimensions

Readers on

mendeley
239 Mendeley
citeulike
16 CiteULike
connotea
1 Connotea
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches
Published in
PLOS ONE, March 2011
DOI 10.1371/journal.pone.0018029
Pubmed ID
Authors

Kevin W. Boyack, David Newman, Russell J. Duhon, Richard Klavans, Michael Patek, Joseph R. Biberstine, Bob Schijvenaars, André Skupin, Nianli Ma, Katy Börner

Abstract

We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.

X Demographics

X Demographics

The data shown below were collected from the profiles of 2 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 239 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United Kingdom 6 3%
Netherlands 3 1%
Germany 3 1%
Canada 3 1%
United States 3 1%
Australia 2 <1%
Sweden 2 <1%
France 2 <1%
Denmark 2 <1%
Other 11 5%
Unknown 202 85%

Demographic breakdown

Readers by professional status Count As %
Researcher 52 22%
Student > Ph. D. Student 38 16%
Other 23 10%
Student > Master 20 8%
Student > Doctoral Student 15 6%
Other 56 23%
Unknown 35 15%
Readers by discipline Count As %
Computer Science 71 30%
Social Sciences 28 12%
Agricultural and Biological Sciences 22 9%
Business, Management and Accounting 13 5%
Medicine and Dentistry 11 5%
Other 48 20%
Unknown 46 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 38. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 29 August 2023.
All research outputs
#1,045,444
of 24,884,310 outputs
Outputs from PLOS ONE
#13,594
of 215,531 outputs
Outputs of similar age
#4,106
of 126,545 outputs
Outputs of similar age from PLOS ONE
#91
of 1,492 outputs
Altmetric has tracked 24,884,310 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 95th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 215,531 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 15.7. This one has done particularly well, scoring higher than 93% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 126,545 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 96% of its contemporaries.
We're also able to compare this research output to 1,492 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 93% of its contemporaries.