LODPRIME

In the Input page, users can provide examples of compounds and comma-separated lists of their protein targets, then click the Make Predictions button to get predictions for each entry.

Protein targets can be referenced by either their gene name or by their STRING ID. Any combination of inputs from both fields will be processed.

The Data page has a downloadable file listing all human proteins, including their names and STRING IDs. Protein targets can be obtained from several online resources, some of which are cited below in 'External Resources'. Make sure to use only human proteins, as our models are trained on human protein annotations.

There is an Autofill button provided, which tries to find the compound names in the input table on a list of DrugBank compounds and targets, and automatically fill the target data based on that.

If a semicolon-separated list of compounds is provided in the Compound column, the Autofill function will add targets found for each compound in the list.

If there is no data in the Target columns, the Make Predictions functionality will try to run the Autofill prior to getting predictions for the compounds provided.

Input example with gene names (results in positive classifications for both male and female mice):

Compound: Rapamycin
Gene names: MTOR, FKBP1A, FGF2, FKBP1B, EIF4E, PDCD4, FKBP5

Input example with STRING IDs (results in positive classifications for both male and female mice):

Compound: Alpelisib
String IDs: 9606.ENSP00000263967, 9606.ENSP00000501150, 9606.ENSP00000366563, 9606.ENSP00000419260

Input example (results in negative classifications for both male and female mice).:

Compound: Taxifolin
String IDs: 9606.ENSP00000345659, 9606.ENSP00000178638
Gene names: CA7, CA12

In the Input page, users can provide examples of compounds and their numeric PubChem cid, then click the Make Predictions button to get predictions for each entry.

Input example with only the compound name (cid is obtained automatically, results in positive classifications for both male and female mice):

Compound: Rapamycin
PubChem CID: ""

Input example with both name and cid (name is disregarded as cid is provided, results in a positive classification for male mice and negative for female mice):

Compound: Putrescine
PubChem CID: 1045

Input example with only the cid (same input from previous example, results in a positive classification for male mice and negative for female mice):

Compound: ""
PubChem CID: 1045

DrugAge: Database used as a source for all class labels in our training data. It compiles results from peer-reviewed longevity studies on various target organisms, including mus musculus.
DrugBank: Main source for obtaining protein targets for a compound.
STRING: Source for finding STRING IDs for proteins, which can be used as target identifiers (Note: when using STRING IDs, include the organism identifier '9606.' before the protein identifier 'ENSPXXX...', as our models are trained with homo sapiens targets). Another option is downloading our list of human proteins names and STRING IDs
PubChem: Used for the Chemical Prediction tool, use the name search to find the numeric id associated with a compound.
ID Mapping: Tool from UniProt for converting protein identifiers to different formats. This tool accepts either STRING ID formatting (e.g. 9606.ENSP00000354558) or protein names (e.g., MTOR).
Alternative target sources: BioGrid, Chembl, Pharos, Therapeutic Target Database.

All details regarding the data preparation and experimental setup is included in the source paper.
Here, we include a summary of that information:

Machine learning models:

For the Target Prediction tool: Prediction outputs are obtained from ensembles of 5 Random Forest models (different ensembles for male and female mice, selected based on each models' performance in our experiments). The features used to describe each compound are annotations associated with their user-provided protein targets, which include Gene Ontology, InterPro protein domains, UniProt Keywords and Pathway data from KEGG, Wiki and Reactome.

For the Chemical Prediction tool: Prediction outputs are obtained from a single Random Forest model (different models for male and female mice). The features used to describe each compound are binary values from PubChem's molecular fingerprints describing chemical substructures that are present/absent in the compound's composition.

Class label definition:

A compound is considered a member of the positive class (associated with mouse longevity) if mice treated with it had at least 5% average lifespan increase in the majority of experimental reports from peer-reviewed studies, with significant results (data sourced from the DrugAge database). Compounds may have different class labels for male and female mice, as results for each sex are considered separately.

Predictions for male and female mice:

The male mouse predictions are from models trained exclusively with examples from male mice, which had the best predictive accuracy performance in our experiments. For female mice, we lacked enough data to obtain reliable female-only models so the predictions are obtained from mixed-sex datasets. Predictions for female mice are less reliable (please see the source paper for a detailed discussion of the experimental results).

Implementation:

This web service was implemented in Python, using the Flask framework for the website portion. All models were trained using the sklearn library (v1.1.0), using the datasets provided in the Data page. We selected the models for the ensembles in this tool based on their cross-validation experimental results. The data was preprocessed to remove features with fewer than 3 occurrences ('1' values). Otherwise, Target models include all STRING annotations from the selected categories, and Chemical models include all chemical substructures in the fingerprint.

Input Examples - Target-based Prediction

Input Examples - Chemical Prediction

External resources

Additional details

Machine learning models:

Class label definition:

Predictions for male and female mice:

Implementation: