Title |
Mapping and classifying molecules from a high-throughput structural database
|
---|---|
Published in |
Journal of Cheminformatics, February 2017
|
DOI | 10.1186/s13321-017-0192-4 |
Pubmed ID | |
Authors |
Sandip De, Felix Musil, Teresa Ingram, Carsten Baldauf, Michele Ceriotti |
Abstract |
High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure-property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently-developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques-showing how these can help reveal structure-property relations, identify outliers and inconsistent structures, and rationalise how perturbations (e.g. binding of ions to the molecule) affect the stability of different conformers. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 4 | 36% |
Mexico | 1 | 9% |
Switzerland | 1 | 9% |
Netherlands | 1 | 9% |
Spain | 1 | 9% |
Unknown | 3 | 27% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Members of the public | 6 | 55% |
Scientists | 4 | 36% |
Science communicators (journalists, bloggers, editors) | 1 | 9% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United States | 1 | 1% |
Unknown | 91 | 99% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 26 | 28% |
Researcher | 24 | 26% |
Student > Master | 8 | 9% |
Professor > Associate Professor | 6 | 7% |
Student > Bachelor | 5 | 5% |
Other | 14 | 15% |
Unknown | 9 | 10% |
Readers by discipline | Count | As % |
---|---|---|
Chemistry | 30 | 33% |
Materials Science | 22 | 24% |
Physics and Astronomy | 10 | 11% |
Agricultural and Biological Sciences | 3 | 3% |
Chemical Engineering | 3 | 3% |
Other | 10 | 11% |
Unknown | 14 | 15% |