↓ Skip to main content

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF

Overview of attention for article published in Scientific Reports, July 2016
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • Good Attention Score compared to outputs of the same age (79th percentile)
  • Good Attention Score compared to outputs of the same age and source (75th percentile)

Mentioned by

blogs
1 blog
twitter
1 X user

Citations

dimensions_citation
35 Dimensions

Readers on

mendeley
77 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
Published in
Scientific Reports, July 2016
DOI 10.1038/srep30308
Pubmed ID
Authors

Yingnan Cong, Yao-ban Chan, Mark A. Ragan

Abstract

Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length.

X Demographics

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 77 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 1 1%
Germany 1 1%
Switzerland 1 1%
Unknown 74 96%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 19 25%
Student > Master 15 19%
Researcher 12 16%
Student > Bachelor 7 9%
Professor > Associate Professor 4 5%
Other 10 13%
Unknown 10 13%
Readers by discipline Count As %
Computer Science 27 35%
Agricultural and Biological Sciences 17 22%
Biochemistry, Genetics and Molecular Biology 9 12%
Business, Management and Accounting 3 4%
Engineering 2 3%
Other 7 9%
Unknown 12 16%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 8. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 26 July 2016.
All research outputs
#3,998,547
of 22,881,154 outputs
Outputs from Scientific Reports
#31,519
of 123,609 outputs
Outputs of similar age
#72,874
of 365,443 outputs
Outputs of similar age from Scientific Reports
#879
of 3,619 outputs
Altmetric has tracked 22,881,154 research outputs across all sources so far. Compared to these this one has done well and is in the 82nd percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 123,609 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 18.2. This one has gotten more attention than average, scoring higher than 74% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 365,443 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 79% of its contemporaries.
We're also able to compare this research output to 3,619 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 75% of its contemporaries.