icic 2016: mind the gap: the novel benefits of human-curated substance locations for chemical...
TRANSCRIPT
Mind the Gap: The novel benefits of
human-curated substance locations for
chemical patent analysis
Aalt van de Kuilen, Patent Information Services BV, NL
Paul Peters, CAS/ACS International, DE
ICIC 2016
October 18, 2016
Heidelberg, Germany
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.
Finding the relevant section(s) within the full-text of chemical
patents is often a time-consuming challenge
• They are not always as easy to track
down as we might expect
• They can be long and “artfully” written
• The chemistry is often obscured within complex
names, tables, text, graphics, etc.
Sometimes it seems like the
search may be complete, but
the hunt is just beginning!
Even with a precise chemical patent search, reviewing the results
can quickly become overwhelming
=> FILE CAPLUS
=> S L3
L4 1014 L3
=> S L4 AND (BET OR BROMODOMAIN) AND P/DT
L5 35 L4 AND (BET OR BROMODOMAIN) AND P/DT
A query combining structure and text
terms yields 35 patent publications.
That shouldn’t be too bad, right?
Only 5,498 pages to review.
479 pages
428 pages
321 pages
277 pages
263 pages
261 pages
240 pages
229 pages
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.3
Technology can help, but algorithmic extraction of chemistry in
patents has significant limitations
4
Conclusion: Algorithmic extraction successfully found
only 50-60% of the chemical structures in patents
based on a limited sample, and they were often the
least interesting ones.
Algorithms miss key substances for a myriad of reasons
• Ambiguous naming
• Markush representations
• No name – Explanatory text or
images, rather than as chemical
names or structures
• Stereochemistry issues
• Multi-component substances
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.5
Normally PSS is used for poly(styrenesulfonic acid), but here it represents the aqueous dispersion, which CAS previously identified as poly(1-vinyl-2-pyrolidone)
Normally PSS is used for poly(styrenesulfonic acid), but here it represents the aqueous dispersion, which CAS previously identified as poly(1-vinyl-2-pyrolidone)
PatentPakTM addresses this gap by combining human curation
with new technology to expedite chemical patent analysis
• Rapidly track down the specific location of hard-to-find chemical information
in patents with interactive links to key substances
– Benefit from the indexing efforts of hundreds of CAS scientists
• Instantly and securely access patent PDFs from major patent offices
– No more wasting time navigating multiple web sites
• Locate patents in languages you know with CAplusSM global patent family
coverage
– Save time and translation costs
• Conveniently share these benefits with other IP stakeholders
– Even if they do not use STN® or SciFinder®
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.6
PatentPak is built on the indexing effort of the scientific analysts
that create CAS REGISTRYSM
• Scientists review each patent and
identify new substances for CAS
REGISTRY inclusion
• They mark the specific location of
substances in the text during analysis
• Algorithmic processing with human
intervention allows previously registered
substances to be located and annotated
in backfile documents
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.7
“I analyzed the chemistry in this
entire patent to save you time.”
Keiko Sugimoto
Sr. Scientific Information Analyst, CAS
CAS is a division of the American Chemical Society.
Copyright 2015 American Chemical Society. All rights reserved. 8
PatentPak supplements CAplus records with direct pointers to the
chemistry of interest
Bibliographic information (partially shown)
Hit substance indexing including roles
Hit structure display from CAS REGISTRY
PatentPak links for each hit compound
PatentPak links for document
CAS is a division of the American Chemical Society.
Copyright 2015 American Chemical Society. All rights reserved.
It is possible to access the original PDF…
CAS is a division of the American Chemical Society.
Copyright 2015 American Chemical Society. All rights reserved.
… the annotated PDF (PDF +) …
CAS is a division of the American Chemical Society.
Copyright 2015 American Chemical Society. All rights reserved.
… or review the patent using the interactive viewer
PatentPak links are available in transcripts, tables, and reports
and accessible without an STN login ID to support workflow
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.12
No STN login ID required
PatentPak is also available in SciFinder
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All
rights reserved.
New CAplus records from 31 countries are annotated as part of the
normal workflow, and the backfile is growing rapidly
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.14
The current backfile project will extend historical PatentPak
coverage of key offices by more than a decade by year end
ACS / Proprietary and Confidential / Do Not Distribute 15
16
PatentPak example US5739376: Backfile operation for one of the
first patents on Fullerene derivatives (Hoechst AG)
• Originally a
German basic
patent from 1994
but substance
locations have
been added to the
US equivalent
from 1998
• Fullerene structures
were symbolized by
simple rings
PatentPak example WO2016087417: substances identified in a
Markush table (Bayer CropScience AG)
• Only a few selected
substances in this
patent are fully
identified by name
or structure
• The vast majority
of substances are
indexed by
assembling
Markush tables
PatentPak example WO9851681: Substance identified as “oily
product” (Sanofi)
• This particular
substance is
only identified
as “oily product”
• CAS analyst
indexing from
the chemistry
PatentPak example WO2016120821: Find substances that cannot be
identified by algorithm or structure extraction (Novartis AG)
• Substances in formula VII
are claimed by Markush:
LG = “leaving group”
• Analyst marked four specific
compounds which are
defined later in the claims -
only a human can process
claims like this!
PatentPak example DE2013016487: Multiple location markings
(University of Heidelberg)
• Analyst has
marked multiple
locations - claims
and synthetic
example
21
PatentPak example WO2016001362: Find substances inferred by
their starting material after enzymatic conversion (BASF)
• Starting
materials
(substrates)
identified by
structure on
page 51
• Products not
listed but
inferred in a
table on
page 27
PatentPak example WO2014184355: Find assembled Markush tables
(Dr. August Wolff GmbH & Co Arzeneimittel)
• 9.5 pages of "table
Markush“ structures - a
core structure shown at
the top, with fragments
• The complete structure is
assembled in a table at
the back of PDF+
document, including page
numbers, CAS RN,
chemical name, and
structures
Case study on new Vitamin D metabolites
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.24
How many patent families have been filed since 2013 on new
Vitamin D metabolites?
Find the answer by with
Stepwise approach
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.25
1. Structure search in Registry
2. Remove old compounds
3. Keep compounds with low reference count in CAplus
4. Transfer to Chemical Abstracts
5. Limit to new compounds and published in patents
6. Display records which have a PatentPak record
PatentPak PDF| PatentPak PDF+ | PatentPak Interactive
Structure
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.26
Q
CH2
CH3
Ak
Broad definition of Vitamin D skeletonAll rings are isolated and double bonds are mandatory
CAS REGISTRY search
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.27
FILE 'REGISTRY‘ ENTERED ON 22 SEP 2016
STRUCTURE UPLOADED
=> L3 has 6806 unique substances in Registry
Refine to compounds registered since 2001 (ED>2000)
=> L4 has 2394 unique substances
Refine to substances with less than 5 references (REF.CAPLUS<5)
=> L5 has 2159 unique substances
CAplus search strategy
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.28
Cross-over of L4 with 2159 unique substances
=> L5 has 503 references from all years
Restrict the answer to patent records only (P/DT)
=> L6 has 234 patent references from all years
Restrict to patents with a stronger chemistry focus using C07C
as IPC or CPC codes
=> L7 has 136 patent references from all years
Restrict to patents with a priority year after 2012
=> L8 has 18 patent references
Findings of the 18 patent family records retrieved
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.29
Answer Country Language Pub.year Pages All subst Vitam-D PPAK
1 CN Chinese 2016 27 37 4 Yes
2 CN Chinese 2016 25 47 13 Yes
3 WO English 2016 106 202 55 PDF+
4 CN Chinese 2016 13 4 1 Yes
5 CN Chinese 2016 5 3 1 Yes
6 CN Chinese 2015 21 9 2 Yes
7 CN Chinese 2015 14 9 4 Yes
8 CN Chinese 2015 9 4 1 Yes
9 WO German 2015 45 14 3 Yes
10 DE German 2015 22 14 3 Yes
11 CN Chinese 2014 14 16 1 Yes
12 CN Chinese 2015 12 7 1 Yes
13 US English 2015 18 5 3 Yes
14 WO Spanish 2015 75 141 29 Yes
15 US English 2015 21 18 3 Yes
16 WO English 2015 61 18 2 Yes
17 ES Spanish 2013 55 141 29 Yes
18 WO English 2013 50 30 3 Yes
The result set includes
three “double basic” pairs:
9+10, 14+17, 15+16
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.30
L16 ANSWER 7 OF 18 CAPLUS COPYRIGHT 2016 ACS on STN
PatentPak PDF | PatentPak PDF+ | PatentPak Interactive
AN 2015:979679 CAPLUS Full-text<<LOGINID:ssscas83ppp:20160907>>
DN 163:118806
TI 24,28-Olefine-1-hydroxy-vitamin D derivatives and preparation
method
IN Fang, Zhijie; Guo, Wei; Liu, Yanan; Li, Hongliang
PA Nanjing University of Science and Technology, Peop. Rep. China
SO Faming Zhuanli Shenqing, 14pp.
CODEN: CNXXEV
DT Patent
LA Chinese
FAN.CNT 1
PPPI
PATENT NO. KIND DATE LANGUAGE PatentPak
--------------- ---- -------- ---------- ------------------------
CN 104693087 A 20150610 Chinese PDF | PDF+ | Interactive
PI
PATENT NO. KIND DATE APPLICATION NO. DATE
--------------- ---- -------- --------------------- --------
CN 104693087 A 20150610 CN 2013-10664076 20131210 <--
PRAI CN 2013-10664076 20131210 <--
Display
OriginalFull-text PDF
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.31
L16 ANSWER 7 OF 18 CAPLUS COPYRIGHT 2016 ACS on STN
PatentPak PDF | PatentPak PDF+ | PatentPak Interactive
AN 2015:979679 CAPLUS Full-text<<LOGINID:ssscas83ppp:20160907>>
DN 163:118806
TI 24,28-Olefine-1-hydroxy-vitamin D derivatives and preparation
method
IN Fang, Zhijie; Guo, Wei; Liu, Yanan; Li, Hongliang
PA Nanjing University of Science and Technology, Peop. Rep. China
SO Faming Zhuanli Shenqing, 14pp.
CODEN: CNXXEV
DT Patent
LA Chinese
FAN.CNT 1
PPPI
PATENT NO. KIND DATE LANGUAGE PatentPak
--------------- ---- -------- ---------- ------------------------
CN 104693087 A 20150610 Chinese PDF | PDF+ | Interactive
PI
PATENT NO. KIND DATE APPLICATION NO. DATE
--------------- ---- -------- --------------------- --------
CN 104693087 A 20150610 CN 2013-10664076 20131210 <--
PRAI CN 2013-10664076 20131210 <--
OriginalFull-text PDF + compound table
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.32
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.33
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.34
L16 ANSWER 7 OF 18 CAPLUS COPYRIGHT 2016 ACS on STN
PatentPak PDF | PatentPak PDF+ | PatentPak Interactive
AN 2015:979679 CAPLUS Full-text<<LOGINID:ssscas83ppp:20160907>>
DN 163:118806
TI 24,28-Olefine-1-hydroxy-vitamin D derivatives and preparation
method
IN Fang, Zhijie; Guo, Wei; Liu, Yanan; Li, Hongliang
PA Nanjing University of Science and Technology, Peop. Rep. China
SO Faming Zhuanli Shenqing, 14pp.
CODEN: CNXXEV
DT Patent
LA Chinese
FAN.CNT 1
PPPI
PATENT NO. KIND DATE LANGUAGE PatentPak
--------------- ---- -------- ---------- ------------------------
CN 104693087 A 20150610 Chinese PDF | PDF+ | Interactive
PI
PATENT NO. KIND DATE APPLICATION NO. DATE
--------------- ---- -------- --------------------- --------
CN 104693087 A 20150610 CN 2013-10664076 20131210 <--
PRAI CN 2013-10664076 20131210 <--
Interactive Viewer for substance locations
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.35
Interactive link to location of compound
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.36
Answer #3 has >600 substance locations, which can only be seen
in the PDF+; still very useful
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.37
Case study conclusions
CAS is a division of the American Chemical Society.
Copyright 2016 American Chemical Society. All rights reserved.38
1. Fast identification of relevant patents, containing new compounds
2. Easy access to the patent document
3. Time savings when finding the compounds in a specific patent
(PatentPak PDF+ compound table)
4. Quickly and easily locate a specific compound in a patent with links in
the PatentPak Interactive Viewer
Overall conclusions
39
• Semantic technology has made great advances in classifying, mining and
extracting chemical content from text; however, it has significant limitations
• Human analysis is still necessary to find many of the key compound locations
• PatentPak in STN provides convenient links for patent attorneys and outside
council to facilitate their analysis work
• PatentPak in SciFinder is designed to provide a direct interactive session for
scientists to find relevant compounds and search them in SciFinder
• PatentPak provides significant time savings when analyzing novel vitamin D
metabolites disclosed in patents