Report for: VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data
Published in	Microbiome, July 2017
DOI	10.1186/s40168-017-0283-5
Pubmed ID	28683828
Authors	Jie Ren, Nathan A. Ahlgren, Yang Young Lu, Jed A. Fuhrman, Fengzhu Sun
Abstract	Identifying viral sequences in mixed metagenomes containing both viral and host contigs is a critical first step in analyzing the viral component of samples. Current tools for distinguishing prokaryotic virus and host contigs primarily use gene-based similarity approaches. Such approaches can significantly limit results especially for short contigs that have few predicted proteins or lack proteins with similarity to previously known viruses. We have developed VirFinder, the first k-mer frequency based, machine learning method for virus contig identification that entirely avoids gene-based similarity searches. VirFinder instead identifies viral sequences based on our empirical observation that viruses and hosts have discernibly different k-mer signatures. VirFinder's performance in correctly identifying viral sequences was tested by training its machine learning model on sequences from host and viral genomes sequenced before 1 January 2014 and evaluating on sequences obtained after 1 January 2014. VirFinder had significantly better rates of identifying true viral contigs (true positive rates (TPRs)) than VirSorter, the current state-of-the-art gene-based virus classification tool, when evaluated with either contigs subsampled from complete genomes or assembled from a simulated human gut metagenome. For example, for contigs subsampled from complete genomes, VirFinder had 78-, 2.4-, and 1.8-fold higher TPRs than VirSorter for 1, 3, and 5 kb contigs, respectively, at the same false positive rates as VirSorter (0, 0.003, and 0.006, respectively), thus VirFinder works considerably better for small contigs than VirSorter. VirFinder furthermore identified several recently sequenced virus genomes (after 1 January 2014) that VirSorter did not and that have no nucleotide similarity to previously sequenced viruses, demonstrating VirFinder's potential advantage in identifying novel viral sequences. Application of VirFinder to a set of human gut metagenomes from healthy and liver cirrhosis patients reveals higher viral diversity in healthy individuals than cirrhosis patients. We also identified contig bins containing crAssphage-like contigs with higher abundance in healthy patients and a putative Veillonella genus prophage associated with cirrhosis patients. This innovative k-mer based tool complements gene-based approaches and will significantly improve prokaryotic viral sequence identification, especially for metagenomic-based studies of viral ecology.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profiles of 92 X users who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
United States	26	28%
United Kingdom	7	8%
Australia	5	5%
Spain	4	4%
Canada	4	4%
France	3	3%
Germany	3	3%
South Africa	2	2%
New Zealand	1	1%
Other	9	10%
Unknown	28	30%

Demographic breakdown

Type	Count	As %
Scientists	54	59%
Members of the public	36	39%
Practitioners (doctors, other healthcare professionals)	2	2%

Mendeley readers

The data shown below were compiled from readership statistics for 688 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Unknown	688	100%

Demographic breakdown

Readers by professional status	Count	As %
Student > Ph. D. Student	140	20%
Researcher	124	18%
Student > Master	87	13%
Student > Bachelor	73	11%
Other	24	3%
Other	81	12%
Unknown	159	23%

Readers by discipline	Count	As %
Biochemistry, Genetics and Molecular Biology	159	23%
Agricultural and Biological Sciences	145	21%
Immunology and Microbiology	48	7%
Environmental Science	35	5%
Computer Science	29	4%
Other	75	11%
Unknown	197	29%

Attention Score in Context

This research output has an Altmetric Attention Score of 61. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 19 June 2020.

All research outputs

#708,368

of 26,017,215 outputs

Outputs from Microbiome

#191

of 1,790 outputs

Outputs of similar age

#14,535

of 329,721 outputs

Outputs of similar age from Microbiome

of 47 outputs

Altmetric has tracked 26,017,215 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 96th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.

So far Altmetric has tracked 1,790 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 37.9. This one has done well, scoring higher than 89% of its peers.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 329,721 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 95% of its contemporaries.

We're also able to compare this research output to 47 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 80% of its contemporaries.

VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data

About this Attention Score

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context