Title |
Extracting and connecting chemical structures from text sources using chemicalize.org
|
---|---|
Published in |
Journal of Cheminformatics, April 2013
|
DOI | 10.1186/1758-2946-5-20 |
Pubmed ID | |
Authors |
Christopher Southan, Andras Stracz |
Abstract |
Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors. |
X Demographics
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 2 | 40% |
Sweden | 1 | 20% |
United States | 1 | 20% |
Germany | 1 | 20% |
Demographic breakdown
Type | Count | As % |
---|---|---|
Scientists | 3 | 60% |
Members of the public | 2 | 40% |
Mendeley readers
Geographical breakdown
Country | Count | As % |
---|---|---|
United Kingdom | 2 | 3% |
Germany | 1 | 2% |
Brazil | 1 | 2% |
Netherlands | 1 | 2% |
India | 1 | 2% |
United States | 1 | 2% |
Unknown | 53 | 88% |
Demographic breakdown
Readers by professional status | Count | As % |
---|---|---|
Student > Ph. D. Student | 15 | 25% |
Researcher | 15 | 25% |
Student > Bachelor | 8 | 13% |
Student > Doctoral Student | 4 | 7% |
Professor > Associate Professor | 4 | 7% |
Other | 9 | 15% |
Unknown | 5 | 8% |
Readers by discipline | Count | As % |
---|---|---|
Chemistry | 16 | 27% |
Agricultural and Biological Sciences | 9 | 15% |
Medicine and Dentistry | 6 | 10% |
Computer Science | 5 | 8% |
Engineering | 4 | 7% |
Other | 12 | 20% |
Unknown | 8 | 13% |