As one of the largest publicly accessible databases for hosting chemical structures and biological activities, PubChem has been processing bioassay submissions from the community since 2004. With the increase in volume for the deposited data in PubChem, the diversity and wealth of information content also grows. Recently, the Tox21 program, has deposited a series of pairwise data in PubChem regarding to different mechanism of actions (MOA), such as androgen receptor (AR) agonist and antagonist datasets, to study cell toxicity. To the best of our knowledge, little work has been reported from cheminformatics study for these especially pairwise datasets, which may provide insight into the mechanism of actions of the compounds and relationship between chemical structures and functions, as well as guidance for lead compound selection and optimization. Thus, to fill the gap, we performed a comprehensive cheminformatics analysis, including scaffold analysis, matched molecular pair (MMP) analysis as well as activity cliff analysis to investigate the structural characteristics and discontinued structure-activity relationship of the individual dataset (i.e., AR agonist dataset or AR antagonist dataset) and the combined dataset (i.e., the common compounds between the AR agonist and antagonist datasets).
Scaffolds associated only with potential agonists or antagonists were identified. MMP-based activity cliffs, as well as a small group of compounds with dual MOA reported were recognized and analyzed. Moreover, MOA-cliff, a novel concept, was proposed to indicate one pair of structurally similar molecules which exhibit opposite MOA.
Cheminformatics methods were successfully applied to the pairwise AR datasets and the identified molecular scaffold characteristics, MMPs as well as activity cliffs might provide useful information when designing new lead compounds for the androgen receptor.