Report for: Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data

You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.

Title	Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data
Published in	BMC Bioinformatics, January 2017
DOI	10.1186/s12859-016-1426-6
Pubmed ID	28466793
Authors	Kuang-Lim Chan, Rozana Rosli, Tatiana V. Tatarinova, Michael Hogan, Mohd Firdaus-Raih, Eng-Ti Leslie Low
Abstract	Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

View on publisher site Alert me about new mentions

X Demographics

The data shown below were collected from the profile of 1 X user who shared this research output. Click here to find out more about how the information was compiled.

Geographical breakdown

Country	Count	As %
Unknown	1	100%

Demographic breakdown

Type	Count	As %
Members of the public	1	100%

Mendeley readers

The data shown below were compiled from readership statistics for 112 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country	Count	As %
Norway	1	<1%
Unknown	111	99%

Demographic breakdown

Readers by professional status	Count	As %
Researcher	21	19%
Student > Ph. D. Student	18	16%
Student > Bachelor	17	15%
Student > Master	14	13%
Other	8	7%
Other	15	13%
Unknown	19	17%

Readers by discipline	Count	As %
Agricultural and Biological Sciences	38	34%
Biochemistry, Genetics and Molecular Biology	34	30%
Computer Science	6	5%
Mathematics	2	2%
Arts and Humanities	2	2%
Other	9	8%
Unknown	21	19%

Attention Score in Context

This research output has an Altmetric Attention Score of 1. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 04 May 2017.

All research outputs

#20,418,183

of 22,968,808 outputs

Outputs from BMC Bioinformatics

#6,881

of 7,306 outputs

Outputs of similar age

#355,196

of 419,234 outputs

Outputs of similar age from BMC Bioinformatics

#118

of 143 outputs

Altmetric has tracked 22,968,808 research outputs across all sources so far. This one is in the 1st percentile – i.e., 1% of other outputs scored the same or lower than it.

So far Altmetric has tracked 7,306 research outputs from this source. They typically receive a little more attention than average, with a mean Attention Score of 5.4. This one is in the 1st percentile – i.e., 1% of its peers scored the same or lower than it.

Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 419,234 tracked outputs that were published within six weeks on either side of this one in any source. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

We're also able to compare this research output to 143 others from the same source and published within six weeks on either side of this one. This one is in the 1st percentile – i.e., 1% of its contemporaries scored the same or lower than it.

Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data

Mentioned by

Citations

Readers on

X Demographics

Geographical breakdown

Demographic breakdown

Mendeley readers

Geographical breakdown

Demographic breakdown

Attention Score in Context