In the Input page, users can provide examples of compounds and comma-separated lists of their protein targets, then click the Make Predictions button to get predictions for each entry.
Protein targets can be referenced by either their gene name or by their STRING ID. Any combination of inputs from both fields will be processed.
The Data page has a downloadable file listing all human proteins, including their names and STRING IDs. Protein targets can be obtained from several online resources, some of which are cited below in 'External Resources'. Make sure to use only human proteins, as our models are trained on human protein annotations.
There is an Autofill button provided, which tries to find the compound names in the input table on a list of DrugBank compounds and targets, and automatically fill the target data based on that.
If a semicolon-separated list of compounds is provided in the Compound column, the Autofill function will add targets found for each compound in the list.
If there is no data in the Target columns, the Make Predictions functionality will try to run the Autofill prior to getting predictions for the compounds provided.
Input example with gene names (results in positive classifications for both male and female mice):
Input example with STRING IDs (results in positive classifications for both male and female mice):
Input example (results in negative classifications for both male and female mice).:
In the Input page, users can provide examples of compounds and their numeric PubChem cid, then click the Make Predictions button to get predictions for each entry.
Input example with only the compound name (cid is obtained automatically, results in positive classifications for both male and female mice):
Input example with both name and cid (name is disregarded as cid is provided, results in a positive classification for male mice and negative for female mice):
Input example with only the cid (same input from previous example, results in a positive classification for male mice and negative for female mice):
For the Target Prediction tool: Prediction outputs are obtained from ensembles of 5 Random Forest models (different ensembles for male and female mice, selected based on each models' performance in our experiments). The features used to describe each compound are annotations associated with their user-provided protein targets, which include Gene Ontology, InterPro protein domains, UniProt Keywords and Pathway data from KEGG, Wiki and Reactome.
For the Chemical Prediction tool: Prediction outputs are obtained from a single Random Forest model (different models for male and female mice). The features used to describe each compound are binary values from PubChem's molecular fingerprints describing chemical substructures that are present/absent in the compound's composition.
A compound is considered a member of the positive class (associated with mouse longevity) if mice treated with it had at least 5% average lifespan increase in the majority of experimental reports from peer-reviewed studies, with significant results (data sourced from the DrugAge database). Compounds may have different class labels for male and female mice, as results for each sex are considered separately.
The male mouse predictions are from models trained exclusively with examples from male mice, which had the best predictive accuracy performance in our experiments. For female mice, we lacked enough data to obtain reliable female-only models so the predictions are obtained from mixed-sex datasets. Predictions for female mice are less reliable (please see the source paper for a detailed discussion of the experimental results).
This web service was implemented in Python, using the Flask framework for the website portion. All models were trained using the sklearn library (v1.1.0), using the datasets provided in the Data page. We selected the models for the ensembles in this tool based on their cross-validation experimental results. The data was preprocessed to remove features with fewer than 3 occurrences ('1' values). Otherwise, Target models include all STRING annotations from the selected categories, and Chemical models include all chemical substructures in the fingerprint.