↓ Skip to main content

Getting the most out of RNA-seq data analysis

Overview of attention for article published in PeerJ, October 2015
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (90th percentile)
  • High Attention Score compared to outputs of the same age and source (80th percentile)

Mentioned by

blogs
1 blog
twitter
19 X users
facebook
1 Facebook page

Citations

dimensions_citation
22 Dimensions

Readers on

mendeley
234 Mendeley
citeulike
3 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Getting the most out of RNA-seq data analysis
Published in
PeerJ, October 2015
DOI 10.7717/peerj.1360
Pubmed ID
Authors

Tsung Fei Khang, Ching Yee Lau

Abstract

Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.

X Demographics

X Demographics

The data shown below were collected from the profiles of 19 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 234 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 5 2%
South Africa 2 <1%
United Kingdom 2 <1%
Portugal 1 <1%
Czechia 1 <1%
Slovakia 1 <1%
Brazil 1 <1%
Belgium 1 <1%
Taiwan 1 <1%
Other 2 <1%
Unknown 217 93%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 57 24%
Researcher 48 21%
Student > Master 26 11%
Student > Bachelor 22 9%
Student > Doctoral Student 17 7%
Other 34 15%
Unknown 30 13%
Readers by discipline Count As %
Agricultural and Biological Sciences 105 45%
Biochemistry, Genetics and Molecular Biology 54 23%
Medicine and Dentistry 9 4%
Computer Science 5 2%
Immunology and Microbiology 4 2%
Other 19 8%
Unknown 38 16%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 18. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 01 January 2016.
All research outputs
#2,011,661
of 24,792,414 outputs
Outputs from PeerJ
#2,128
of 14,772 outputs
Outputs of similar age
#28,913
of 290,667 outputs
Outputs of similar age from PeerJ
#50
of 249 outputs
Altmetric has tracked 24,792,414 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 91st percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 14,772 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 16.9. This one has done well, scoring higher than 85% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 290,667 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 90% of its contemporaries.
We're also able to compare this research output to 249 others from the same source and published within six weeks on either side of this one. This one has done well, scoring higher than 80% of its contemporaries.