surechem and chembl acs cinf webinar john p ......surechembl ligand structures from patent...

Post on 09-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SureChem and ChEMBL

ACS CINF webinar

John P. Overington & Nicko Goncharoff

8th April 2014

Bioactivity data

Compound

Ass

ay/T

arge

t

>Thrombin

MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE

RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT

NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT

TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT

THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY

CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF

EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR

WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR

ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA

NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG

PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE

ChEMBL – Data for Drug Discovery3. Insight, tools and resources for translational drug discovery

2. Organization, integration, curation and standardization of pharmacology data

1. Scientific facts

Ki = 4.5nM

APTT = 11 min.

Overview of EMBL-EBI Chemistry Resources

UniChem – InChI-based resolver (full + relaxed ‘lenses’)

ChEMBL

Bioactivity data from literature

and depositions

ChEBI

Structures, metadata

for metabolites.

Chemical Ontology

Atlas

Ligand induced

transcript response

PDBe

Ligand structures

from structurally

defined protein

complexes

SureChEMBL

Ligand structures

from patent literature

~70M

ChEMBL• The world’s largest

primary public database of medicinal chemistry data– ~1.4 million compounds,

~9,000 targets, ~12 million bioactivities

• Truly Open Data - CC-BY-SA license

• Many download/access formats– Semantic Web

• RDF download, SPARQL endpoint at http://rdf.ebi.ac.uk/chembl

– ChEMBL Applicances• myChEMBL – linux VM• ChEMpi – raspberry pi

• EMBL-EBI acquired the SureChem product from Digital Science– State-of-the-art chemistry

patent product– 15 million chemical structures– Automatically extracted

chemical structures from full-text patent

• Research community wants open access to patent data – Patent literature 2-3 years

ahead of published literature – Better competitive position

• Plan to provide ongoing free, Open resource to entire community

SureChEMBL

SureChEMBL Overview

WO

EPApplications& Granted

USApplications & granted

JPAbstracts

Patent Offices

Processed patents

Name to Structure (five methods)

Image to Structure(one method)

Database

Chemistry Database

Patent PDFs

Application Server

Entity Recognition

Users

API

SureChem System – Amazon Web Services

Molfiles in patent

Immediate Priorities

• Migrate working pipeline across to EMBL-EBI servers

• Establish new account system

• Migrate current user accounts

• Offer GUI access at SureChem Pro equivalent level

• Turn off API access and refactor new API in OpenPHACTS framework

– Partners in OpenPHACTS will get early test access and input into development pipeline

– Build RDF version of SureChEMBL

Future Plans

• Dependent on funding and interest!– Add sequence searching

– Add disease term, animal disease model, etc. indexing

– KNIME/Pipeline Pilot nodes

– Add links to/from Europe PMC

– Extend image extraction retrospectively from 2006• spot pricing compute from AWS

– Provide weekly/monthly feed of patent structures to PubChem and ChemSpider

– Add chemical structure tagging & search to full text content of Europe PMC

– Develop UniChem VM for in-house private patent alerting using feed of SureChEMBL data

The search interfaceKeyword search Filter by authority

Structure sketch

Filter by document sectionhelp

Paste SMILES, MOL, name

Types of chemistry

search

Filter by

date

http://www.surechembl.org/

help

Patent number search

Keyword-based search

Example Searchesroche OR novartisC07D048704sterili?ekinase*Pfizer C07D “kinase inhibitor”pn: WO2011058149A1pa:(bayer OR astra OR Genentech OR merck) AND desc:(chemotherap* AND(Phosphoinositide kinases~3 OR Pi3K))

http://support.surechem.com/knowledgebase/articles/92016-lucene-query-field-names-and-examples

Fielded keyword search

Keyword search Filter by document section

Logical operators

Patent number search

Patent number search

Chemistry-based search

Structure sketch

Paste SMILES, MOL, name

Types of search

Filter by MW range

Filter by document

section

Example searches

• Retrieve all antimalarial small molecule US patents

– ic:C07D AND ic:A61P003306 AND pnctry:US

• Retrieve a specific patent

– pn:WO2011058149A1

• Similarity search (sildenafil nearest neighbours)

– Paste CCCc1nn(C)c2C(=O)NC(=Nc12)c3cc(ccc3OCC)S(=O)(=O)N4CCN(C)CC4

Example search

Review the hits

Review the hits

Select a subset of hits

Export hits (Pro user)

Property range filters

Count filters

Select a subset of hits

Review patent documents

Retrieve patent families

Review patent documents

Retrieve chemistry (Pro user)

Property range filters

Count filters

Summary

• Searching capabilities

– Free text keywords and Lucene fields

– Patent IDs & bibliographic information

– Patent authority & date

– Structure

• Retrieving capabilities

– Retrieve chemistry (with additional filters)

– Retrieve patent family information

– Retrieve annotated full patent text

Any questions?

• http://chembl.blogspot.co.uk/

• http://chembl.blogspot.co.uk/search/label/Webinar

• surechembl-help@ebi.ac.uk

top related