↓ Skip to main content

Absent words and the (dis)similarity analysis of DNA sequences: an experimental study

Overview of attention for article published in BMC Research Notes, March 2016
Altmetric Badge

About this Attention Score

  • Good Attention Score compared to outputs of the same age (72nd percentile)
  • Good Attention Score compared to outputs of the same age and source (75th percentile)

Mentioned by

blogs
1 blog

Citations

dimensions_citation
10 Dimensions

Readers on

mendeley
16 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Absent words and the (dis)similarity analysis of DNA sequences: an experimental study
Published in
BMC Research Notes, March 2016
DOI 10.1186/s13104-016-1972-z
Pubmed ID
Authors

Mohammad Saifur Rahman, Ali Alatabbi, Tanver Athar, Maxime Crochemore, M. Sohel Rahman

Abstract

An absent word with respect to a sequence is a word that does not occur in the sequence as a factor; an absent word is minimal if all its factors on the other hand occur in that sequence. In this paper we explore the idea of using minimal absent words (MAW) to compute the distance between two biological sequences. The motivation and rationale of our work comes from the potential advantage of being able to extract as little information as possible from large genomic sequences to reach the goal of comparing sequences in an alignment-free manner. We report an experimental study on the use of absent words as a distance measure among biological sequences. We provide recommendations to use the best index based on our analysis. In particular, our analysis reveals that the best performers are: the length weighted index of relative absent word sets, the length weighted index of the symmetric difference of the MAW sets, and the Jaccard distance between the MAW sets. We also found that during the computation of the absent words, the reverse complements of the sequences should also be considered. The use of MAW to compute the distance between two biological sequences has potential advantage over alignment based methods. It is expected that this potential advantage would encourage researchers and practitioners to use this as a (dis)similarity measure in the context of sequence comparison and phylogeny reconstruction. Therefore, we present here a comparison among different possible models and indexes and pave the path for the biologists and researchers to choose an appropriate model for such comparisons.

Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 16 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 16 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 5 31%
Researcher 4 25%
Professor 2 13%
Student > Master 1 6%
Student > Postgraduate 1 6%
Other 0 0%
Unknown 3 19%
Readers by discipline Count As %
Computer Science 4 25%
Agricultural and Biological Sciences 3 19%
Biochemistry, Genetics and Molecular Biology 2 13%
Psychology 1 6%
Neuroscience 1 6%
Other 2 13%
Unknown 3 19%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 6. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 28 March 2016.
All research outputs
#5,756,057
of 22,858,915 outputs
Outputs from BMC Research Notes
#846
of 4,267 outputs
Outputs of similar age
#81,721
of 300,114 outputs
Outputs of similar age from BMC Research Notes
#28
of 117 outputs
Altmetric has tracked 22,858,915 research outputs across all sources so far. This one has received more attention than most of these and is in the 74th percentile.
So far Altmetric has tracked 4,267 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.5. This one has done well, scoring higher than 79% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 300,114 tracked outputs that were published within six weeks on either side of this one in any source. This one has gotten more attention than average, scoring higher than 72% of its contemporaries.
We're also able to compare this research output to 117 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 75% of its contemporaries.