↓ Skip to main content

Efficient iterative virtual screening with Apache Spark and conformal prediction

Overview of attention for article published in Journal of Cheminformatics, March 2018
Altmetric Badge

About this Attention Score

  • In the top 25% of all research outputs scored by Altmetric
  • High Attention Score compared to outputs of the same age (81st percentile)
  • Above-average Attention Score compared to outputs of the same age and source (61st percentile)

Mentioned by

twitter
20 X users

Citations

dimensions_citation
30 Dimensions

Readers on

mendeley
53 Mendeley
You are seeing a free-to-access but limited selection of the activity Altmetric has collected about this research output. Click here to find out more.
Title
Efficient iterative virtual screening with Apache Spark and conformal prediction
Published in
Journal of Cheminformatics, March 2018
DOI 10.1186/s13321-018-0265-z
Pubmed ID
Authors

Laeeq Ahmed, Valentin Georgiev, Marco Capuccini, Salman Toor, Wesley Schaal, Erwin Laure, Ola Spjuth

Abstract

Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub ( https://github.com/laeeq80/spark-cpvs ) and can be run on high-performance computers as well as on cloud resources.

X Demographics

X Demographics

The data shown below were collected from the profiles of 20 X users who shared this research output. Click here to find out more about how the information was compiled.
Mendeley readers

Mendeley readers

The data shown below were compiled from readership statistics for 53 Mendeley readers of this research output. Click here to see the associated Mendeley record.

Geographical breakdown

Country Count As %
Unknown 53 100%

Demographic breakdown

Readers by professional status Count As %
Student > Ph. D. Student 12 23%
Researcher 8 15%
Student > Bachelor 6 11%
Student > Master 5 9%
Professor > Associate Professor 4 8%
Other 8 15%
Unknown 10 19%
Readers by discipline Count As %
Chemistry 12 23%
Computer Science 12 23%
Engineering 3 6%
Biochemistry, Genetics and Molecular Biology 2 4%
Unspecified 2 4%
Other 9 17%
Unknown 13 25%
Attention Score in Context

Attention Score in Context

This research output has an Altmetric Attention Score of 11. This is our high-level measure of the quality and quantity of online attention that it has received. This Attention Score, as well as the ranking and number of research outputs shown below, was calculated when the research output was last mentioned on 20 September 2018.
All research outputs
#3,109,466
of 24,261,860 outputs
Outputs from Journal of Cheminformatics
#297
of 893 outputs
Outputs of similar age
#63,153
of 334,855 outputs
Outputs of similar age from Journal of Cheminformatics
#9
of 21 outputs
Altmetric has tracked 24,261,860 research outputs across all sources so far. Compared to these this one has done well and is in the 87th percentile: it's in the top 25% of all research outputs ever tracked by Altmetric.
So far Altmetric has tracked 893 research outputs from this source. They typically receive a lot more attention than average, with a mean Attention Score of 10.5. This one has gotten more attention than average, scoring higher than 66% of its peers.
Older research outputs will score higher simply because they've had more time to accumulate mentions. To account for age we can compare this Altmetric Attention Score to the 334,855 tracked outputs that were published within six weeks on either side of this one in any source. This one has done well, scoring higher than 81% of its contemporaries.
We're also able to compare this research output to 21 others from the same source and published within six weeks on either side of this one. This one has gotten more attention than average, scoring higher than 61% of its contemporaries.