Title |
ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics
|
---|---|
Published in |
Journal of Cheminformatics, March 2017
|
DOI | 10.1186/s13321-017-0203-5 |
Pubmed ID | |
Authors |
Jiangming Sun, Nina Jeliazkova, Vladimir Chupakhin, Jose-Felipe Golib-Dzib, Ola Engkvist, Lars Carlsson, Jörg Wegner, Hugo Ceulemans, Ivan Georgiev, Vedrin Jeliazkov, Nikolay Kochev, Thomas J. Ashby, Hongming Chen |
Abstract |
Chemogenomics data generally refers to the activity data of chemical compounds on an array of protein targets and represents an important source of information for building in silico target prediction models. The increasing volume of chemogenomics data offers exciting opportunities to build models based on Big Data. Preparing a high quality data set is a vital step in realizing this goal and this work aims to compile such a comprehensive chemogenomics dataset. This dataset comprises over 70 million SAR data points from publicly available databases (PubChem and ChEMBL) including structure, target information and activity annotations. Our aspiration is to create a useful chemogenomics resource reflecting industry-scale data not only for building predictive models of in silico polypharmacology and off-target effects but also for the validation of cheminformatics approaches in general. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 18% |
Sweden | 1 | 9% |
Bulgaria | 1 | 9% |
Chile | 1 | 9% |
France | 1 | 9% |
Russia | 1 | 9% |
Unknown | 4 | 36% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 6 | 55% |
Members of the public | 3 | 27% |
Science communicators (journalists, bloggers, editors) | 2 | 18% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
Unknown | 166 | 100% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Researcher | 39 | 23% |
Student > Master | 29 | 17% |
Student > Ph. D. Student | 27 | 16% |
Student > Bachelor | 13 | 8% |
Other | 7 | 4% |
Other | 13 | 8% |
Unknown | 38 | 23% |
Readers by discipline | Count | As % |
---|---|---|
Chemistry | 33 | 20% |
Computer Science | 22 | 13% |
Biochemistry, Genetics and Molecular Biology | 15 | 9% |
Pharmacology, Toxicology and Pharmaceutical Science | 14 | 8% |
Agricultural and Biological Sciences | 12 | 7% |
Other | 22 | 13% |
Unknown | 48 | 29% |