↓ Skip to main content

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction

Overview of attention for article published in BMC Genomics, June 2015
Altmetric Badge

About this Attention Score

  • In the top 5% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (94th percentile)
  • High Attention Score compared to outputs of the same age and source (97th percentile)

Mentioned by

blogs
1 blog
twitter
39 X users
googleplus
1 Google+ user

Citations

dimensions_citation
87 Dimensions

Readers on

mendeley
298 Mendeley
citeulike
3 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction
Published in
BMC Genomics, June 2015
DOI 10.1186/1471-2164-16-s8-s2
Pubmed ID
Authors

Adam Frankish, Barbara Uszczynska, Graham RS Ritchie, Jose M Gonzalez, Dmitri Pervouchine, Robert Petryszak, Jonathan M Mudge, Nuno Fonseca, Alvis Brazma, Roderic Guigo, Jennifer Harrow

Abstract

A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.

X Demographics

X Demographics

The data shown below were collected from the profiles of 39 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 298 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
United States 8 3%
United Kingdom 3 1%
Brazil 2 <1%
Netherlands 1 <1%
Norway 1 <1%
Germany 1 <1%
Switzerland 1 <1%
Italy 1 <1%
Czechia 1 <1%
Other 5 2%
Unknown 274 92%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 75 25%
Researcher 64 21%
Student > Bachelor 31 10%
Student > Master 29 10%
Other 18 6%
Other 40 13%
Unknown 41 14%
Readers by discipline Count As %
Biochemistry, Genetics and Molecular Biology 95 32%
Agricultural and Biological Sciences 94 32%
Computer Science 22 7%
Medicine and Dentistry 17 6%
Neuroscience 6 2%
Other 16 5%
Unknown 48 16%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 31. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 12 May 2023.
All research outputs
#1,215,516
of 24,666,614 outputs
Outputs from BMC Genomics
#208
of 11,035 outputs
Outputs of similar age
#14,873
of 269,245 outputs
Outputs of similar age from BMC Genomics
#7
of 249 outputs
Altmetric has tracked 24,666,614 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 95th percentile: it's in the top 5% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 11,035 research outputs from this source. They receive a mean Attention Score of 4.8. This one has done particularly well, scoring higher than 98% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 269,245 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 94% of its contemporaries.
We're also able to compare this research output to 249 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 97% of its contemporaries.