↓ Skip to main content

The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets

Overview of attention for article published in Journal of Cheminformatics, June 2015
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (93rd percentile)
  • High Attention Score compared to outputs of the same age and source (90th percentile)

Mentioned by

blogs
3 blogs
twitter
11 X users

Citations

dimensions_citation
23 Dimensions

Readers on

mendeley
60 Mendeley
citeulike
5 CiteULike
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets
Published in
Journal of Cheminformatics, June 2015
DOI 10.1186/s13321-015-0072-8
Pubmed ID
Authors

Karen Karapetyan, Colin Batchelor, David Sharpe, Valery Tkachenko, Antony J Williams

Abstract

There are presently hundreds of online databases hosting millions of chemical compounds and associated data. As a result of the number of cheminformatics software tools that can be used to produce the data, subtle differences between the various cheminformatics platforms, as well as the naivety of the software users, there are a myriad of issues that can exist with chemical structure representations online. In order to help facilitate validation and standardization of chemical structure datasets from various sources we have delivered a freely available internet-based platform to the community for the processing of chemical compound datasets. The chemical validation and standardization platform (CVSP) both validates and standardizes chemical structure representations according to sets of systematic rules. The chemical validation algorithms detect issues with submitted molecular representations using pre-defined or user-defined dictionary-based molecular patterns that are chemically suspicious or potentially requiring manual review. Each identified issue is assigned one of three levels of severity - Information, Warning, and Error - in order to conveniently inform the user of the need to browse and review subsets of their data. The validation process includes validation of atoms and bonds (e.g., making aware of query atoms and bonds), valences, and stereo. The standard form of submission of collections of data, the SDF file, allows the user to map the data fields to predefined CVSP fields for the purpose of cross-validating associated SMILES and InChIs with the connection tables contained within the SDF file. This platform has been applied to the analysis of a large number of data sets prepared for deposition to our ChemSpider database and in preparation of data for the Open PHACTS project. In this work we review the results of the automated validation of the DrugBank dataset, a popular drug and drug target database utilized by the community, and ChEMBL 17 data set. CVSP web site is located at http://cvsp.chemspider.com/. A platform for the validation and standardization of chemical structure representations of various formats has been developed and made available to the community to assist and encourage the processing of chemical structure files to produce more homogeneous compound representations for exchange and interchange between online databases. While the CVSP platform is designed with flexibility inherent to the rules that can be used for processing the data we have produced a recommended rule set based on our own experiences with the large data sets such as DrugBank, ChEMBL, and data sets from ChemSpider.

X Demographics

X Demographics

The data shown below were collected from the profiles of 11 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 60 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Brazil 2 3%
Bulgaria 1 2%
Germany 1 2%
United Kingdom 1 2%
United States 1 2%
Unknown 54 90%

Demographic breakdown

Readers by professional status Count As %
Researcher 17 28%
Student > Ph. D. Student 12 20%
Other 7 12%
Student > Master 6 10%
Professor 3 5%
Other 8 13%
Unknown 7 12%
Readers by discipline Count As %
Chemistry 21 35%
Computer Science 8 13%
Agricultural and Biological Sciences 6 10%
Pharmacology, Toxicology and Pharmaceutical Science 6 10%
Biochemistry, Genetics and Molecular Biology 4 7%
Other 7 12%
Unknown 8 13%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 26. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 27 July 2020.
All research outputs
#1,439,076
of 24,903,209 outputs
Outputs from Journal of Cheminformatics
#91
of 934 outputs
Outputs of similar age
#17,583
of 270,055 outputs
Outputs of similar age from Journal of Cheminformatics
#3
of 20 outputs
Altmetric has tracked 24,903,209 research outputs across all sources so far. Compared to these this one has done particularly well and is in the 94th percentile: it's in the top 10% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 934 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.2. This one has done particularly well, scoring higher than 90% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 270,055 tracked outputs that were published within six weeks on either side of this one in any source. This one has done particularly well, scoring higher than 93% of its contemporaries.
We're also able to compare this research output to 20 others from the same source and published within six weeks on either side of this one. This one has done particularly well, scoring higher than 90% of its contemporaries.