open acess sources for protein interaction inhibitors

1
EMBL- EBI Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD UK T +44 (0) 1223 494 444 F +44 (0) 1223 494 468 http://www.ebi.ac.uk The Utility of New Open-access Resources for Tracking Down Chemical Structures of Protein- protein Interaction Inhibitors Chris Southan More challenging examples While mapping bioactive compounds between the literature, patents and to PubChem, ChEBI or other open chemical databases is still challenging the expansion in public sources will increasingly enable the Systems Biology community to make these links (background information at http://www.cdsouthan.info/Data/CDS_data.htm). The consequent ability to identify, search, compare, source and extend the assaying of tool compounds and/or drug candidates in different systems will move the field forward. However, this still needs engagement from the community e.g. by including PubChem IDs in publications, submitting assay results to PubChem and sharing compounds. Dr Christopher Southan ELIXIR Database Survey Co- ordinator [email protected] European Bioinformatics Institute While the use of small-molecules as experimental perturbagens is a fundamental approach in systems biology comparing compound structures has hitherto required chemical drawing expertise and expensive commercial database subscriptions. However, the last few years have seen a revolution in public cheminformatic resources (“Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics”.Southan et al. PMID:17897036). This work evaluates a selection of open resources that can already be used by systems biologists, using as an example a recent Nature review of key small molecule inhibitors of protein- protein interactions (SMIPPIs) that have utility both as systems biology tools and therapeutic candidates (“Reaching for high-hanging fruit in drug discovery at protein–protein interfacesWells et al. PMID:18075579). Introduction Fig.1 Compound designations and protein partners from PMID 18075579 While there are sketches of their strcutures included in the publication the series of non-standard compound identifers show above in fig.1 illustrates a common problem. These include company code names such as (Abott) ABT-737 and references to figures in other papers such as “compound 23b”. Unlike gene names of their interaction partners these cannot be easily disambiguated into standardised representations. The problem Conclusions The first stop is the PubChem compound search box where “nutlin- 3” picks up an entry, CID: 16755649. This links across to the ChEBI entry 46742 shown below The simple solution – a database look- up This example confirms the explicit mapping of Nutlin-3 to structural representations in two chemical dbs with accession numbers, without a sketching operation. In both entries (the ChEBI one is shown above) you can see two representation types, InChIs and SMILES strings (see Wikepedia for definitions), that define the structures independently of database IDs. These open up a wide range of cheminformatic searches and other operations that can now be done by the non-specialist with open resources (including Googling InChI’s). Some of the code names in fig.1 have no matches in PubChem compound searches e.g. ABT-737. However, there was match to a PubChem substance ID 24771379 extracted from the MMDB entry. Thus, via the crystal structure, the ligand “ABT-737” could be mapped to the PubChem Compound ID: 11228183. This has 32 similar compounds in PubChem, one of which CID 15991564 also has a PDB Bcl-Xl complex. Attempting complete disambiguation Either by direct look-up or indirectly via PDB ligands it was possible to map additional compounds from fig 1 to PubChem CIDs e.g. Ro26-4550 = 16760522, SP4206 = 5327044, Compound 1 = 656967, SP304 = 5327044. However, some of the others are close matches but have molecular mass and other discrepancies with those chemical structures depicted in the review. These include Compound 23 and CID 5287508, the benzodiazepine dione ligand in PDB code it4e, CID 656933, and the ligand in irv1, CID 448419, that is not identical to the Nutlin-3, CID:16755649. Establishing which bioactive chemical structures have been published in patents has still required expensive commercial database subscriptions because, while many of the compounds are in PubChem, there is no open link to the patent documents. This has changed in the last few months with the SureChem free patent search facility. Taking Nutlin-3 as an example the SMILES entry from PubChem was pasted into the SureChem search box. There are nine exact matches including the granted patent application from Roche shown below. Checking chemical patents

Upload: chris-southan

Post on 26-Jun-2015

933 views

Category:

Technology


2 download

DESCRIPTION

Goteborg Systems Biology Conference Aug 08

TRANSCRIPT

Page 1: Open Acess Sources for Protein Interaction Inhibitors

EMBL- EBI Wellcome Trust Genome CampusHinxton CambridgeCB10 1SDUK

T +44 (0) 1223 494 444F +44 (0) 1223 494 468http://www.ebi.ac.uk

The Utility of New Open-access Resources for Tracking Down Chemical Structures of Protein-protein Interaction Inhibitors

Chris Southan

More challenging examples

While mapping bioactive compounds between the literature, patents and to PubChem, ChEBI or other open chemical databases is still challenging the expansion in public sources will increasingly enable the Systems Biology community to make these links (background information at http://www.cdsouthan.info/Data/CDS_data.htm). The consequent ability to identify, search, compare, source and extend the assaying of tool compounds and/or drug candidates in different systems will move the field forward. However, this still needs engagement from the community e.g. by including PubChem IDs in publications, submitting assay results to PubChem and sharing compounds.

Dr Christopher SouthanELIXIR Database Survey [email protected]

European Bioinformatics Institute

While the use of small-molecules as experimental perturbagens is a fundamental approach in systems biology comparing compound structures has hitherto required chemical drawing expertise and expensive commercial database subscriptions. However, the last few years have seen a revolution in public cheminformatic resources (“Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics”.Southan et al. PMID:17897036). This work evaluates a selection of open resources that can already be used by systems biologists, using as an example a recent Nature review of key small molecule inhibitors of protein-protein interactions (SMIPPIs) that have utility both as systems biology tools and therapeutic candidates (“Reaching for high-hanging fruit in drug discovery at protein–protein interfaces” Wells et al. PMID:18075579).

Introduction

Fig.1 Compound designations and protein partners from PMID 18075579

While there are sketches of their strcutures included in the publication the series of non-standard compound identifers show above in fig.1 illustrates a common problem. These include company code names such as (Abott) ABT-737 and references to figures in other papers such as “compound 23b”. Unlike gene names of their interaction partners these cannot be easily disambiguated into standardised representations.

The problem

Conclusions

The first stop is the PubChem compound search box where “nutlin-3” picks up an entry, CID: 16755649. This links across to the ChEBI entry 46742 shown below

The simple solution – a database look-up

This example confirms the explicit mapping of Nutlin-3 to structural representations in two chemical dbs with accession numbers, without a sketching operation. In both entries (the ChEBI one is shown above) you can see two representation types, InChIs and SMILES strings (see Wikepedia for definitions), that define the structures independently of database IDs. These open up a wide range of cheminformatic searches and other operations that can now be done by the non-specialist with open resources (including Googling InChI’s).

Some of the code names in fig.1 have no matches in PubChem compound searches e.g. ABT-737. However, there was match to a PubChem substance ID 24771379 extracted from the MMDB entry. Thus, via the crystal structure, the ligand “ABT-737” could be mapped to the PubChem Compound ID: 11228183. This has 32 similar compounds in PubChem, one of which CID 15991564 also has a PDB Bcl-Xl complex.

Attempting complete disambiguation

Either by direct look-up or indirectly via PDB ligands it was possible to map additional compounds from fig 1 to PubChem CIDs e.g. Ro26-4550 = 16760522, SP4206 = 5327044, Compound 1 = 656967, SP304 = 5327044. However, some of the others are close matches but have molecular mass and other discrepancies with those chemical structures depicted in the review. These include Compound 23 and CID 5287508, the benzodiazepine dione ligand in PDB code it4e, CID 656933, and the ligand in irv1, CID 448419, that is not identical to the Nutlin-3, CID:16755649.

Establishing which bioactive chemical structures have been published in patents has still required expensive commercial database subscriptions because, while many of the compounds are in PubChem, there is no open link to the patent documents. This has changed in the last few months with the SureChem free patent search facility. Taking Nutlin-3 as an example the SMILES entry from PubChem was pasted into the SureChem search box. There are nine exact matches including the granted patent application from Roche shown below.

Checking chemical patents