Title |
A big data approach to the ultra-fast prediction of DFT-calculated bond energies
|
---|---|
Published in |
Journal of Cheminformatics, July 2013
|
DOI | 10.1186/1758-2946-5-34 |
Pubmed ID | |
Authors |
Xiaohui Qu, Diogo ARS Latino, Joao Aires-de-Sousa |
Abstract |
The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g. by Density Functional Theory (DFT), but are still too computationally expensive for those large scale uses. This work explores the possibility of using large amounts of data generated by DFT methods for thousands of molecular structures, extracting relevant molecular properties and applying machine learning (ML) algorithms to learn from the data. Once trained, these ML models can be applied to new structures to produce ultra-fast predictions. An approach is presented for homolytic bond dissociation energy (BDE). |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 50% |
Canada | 1 | 25% |
France | 1 | 25% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 3 | 75% |
Scientists | 1 | 25% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 2 | 2% |
Czechia | 1 | <1% |
Romania | 1 | <1% |
Brazil | 1 | <1% |
Unknown | 120 | 96% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 31 | 25% |
Researcher | 26 | 21% |
Student > Master | 11 | 9% |
Student > Bachelor | 11 | 9% |
Other | 8 | 6% |
Other | 21 | 17% |
Unknown | 17 | 14% |
Readers by discipline | Count | As % |
---|---|---|
Chemistry | 43 | 34% |
Computer Science | 13 | 10% |
Engineering | 10 | 8% |
Chemical Engineering | 7 | 6% |
Agricultural and Biological Sciences | 5 | 4% |
Other | 22 | 18% |
Unknown | 25 | 20% |