substance and chemical structure searching in cas · pdf filesubstance and chemical structure...

Post on 07-Feb-2018

222 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Substance and Chemical Structure Searching in CAS REGISTRYSM and DCR on new STN®

• Settings and Cross File Search operators • Substance search fields in CAS REGISTRYSM

• Structure examples • DCR/DWPI substance and structure search • Multifile search example

Agenda

3

Search settings for substance and structure searches

The default scope for structure searches is FULL.

Automatic Cross File Search can be toggled on or off under General Search.

4

Display settings for substance and structure searches

• Hit structure default displays in bibliographic records can be set on the Display tab in Settings.

• DWPISM hit structures from DCR only display in FULL format.

• CAplusSM default format can be set with Show hit structures.

• Both can be toggled in the record display.

CAplus

5

Cross File Search with REFX to find bibliographic references indexed to substances

Query Cross File Search (REFX L1) Where L1 is a substance search in REGISTRY or DCR

CAplus - Retrieves bibliographic references indexed to the Registry Numbers for the substances in L1 DWPI/DCR- Retrieves bibliographic references indexed to the DCR number for the substances in L1

Alternatively you can enter any substance query instead of L1, e.g. (REFX CAFFEINE/CN).

6

Cross File Search with SUBX to find all substance records indexed to references

Query Cross File Search (SUBX L1) Where L1 is a search in CAplus or DWPI

REGISTRY - Retrieves all REGISTRY records indexed to the bibliographic records from L1 in CAplus DWPI/DCR - Retrieves all DCR number records indexed to the bibliographic records in L1 from DWPI

Alternatively you can enter any bibliographic query instead of L1, e.g. (SUBX (L’OREAL/PA AND A61K/IPC,CPC)).

7

CAS REGISTRYSM and CAplusSM on new STN

CAS REGISTRYSM

94+ million substances

CAplusSM

41+ million references

CAS Registry Numbers®

REFX

SUBX

8

Substance search fields in CAS REGISTRY

• Chemical Names (CN) • Chemical Name Segments (CNS) • Molecular formula (MF) • Component Molecular Formula (CMF) • Element Symbol (ELS)

9

Lookup Chemical Names in the Term Explorer

10

Chemical Name Segments are parsed at punctuation and spaces

11

Chemical Name Segments can have left and right truncation

12

Molecular formula searches and Component Molecular Formula searches

13

Searching Element Symbol (ELS) and counts

14

New STN Structure Editor

15

Create a structure in several ways

• Draw in the structure editor (recommended) • Convert SMILES or InChI strings to structures using Add

to Editor by External Identifier • Copy/Paste from another software program such as

ChemDraw® or ISIS/DrawTM

• Import a structure saved as .cxf or .mol file format ‒ Attributes of the original drawing program may be retained or

changed

16

STN Help has details on all the drawing tools

Click Demo to watch a 10 second video of how to use the tool.

17

Develop a structure search query

Find compounds which meet the following criteria:

• The ring system shown is mono- or bicyclic • R1 = alkyl, alkylene or alkenylene 1-10 C • R2 = O, N, S, or a bond • R3 is a substituted heterocyclic ring containing

exactly 1 N, and up to 1 O or S atom

18

Identify the structure pieces

C = 0 C = 1-6

19

Isolate rings for REGISTRY searches using the lock rings tool

20

Isolate rings for REGISTRY searches using the lock rings tool

Isolated rings have thick, bold bonds.

21

Set Bond Attributes using right-click

22

Set Node Attributes using right-click

23

Define R-groups

24

Set generic ring attributes

25

Set Node attributes

Nodes which have attributes applied display with an asterisk.

26

Show/Hide Attribute Values Panel to verify query

Position your mouse cursor over the attribute value, and portions of the query that have the attribute will be highlighted.

27

Continue to verify the query

When multiple element counts are applied the panel does not highlight, but the nodes do.

Click OK when ready to submit.

28

The system automatically places the query in the Query Builder panel

*Automatic Cross File Search was on for this search.

29

View Counts to see how system interpreted the query

30

Modify structures easily by clicking on the structure under the structure tab

Simply modify the structure by clicking to open it, make your modifications, and click OK.

All structure queries are saved under new STR numbers, unless you click Cancel.

31

History panel to this point

*Automatic Cross File Search was on for these searches.

32

Click on tabs to view results from each database

33

Refine with CAS Roles with Cross File Search ON

See STN Help for CAS roles and definitions.

34

Use parentheses with REFX for best results

Query Cross File Search

1 ((REFX L1) OR (REFX L2)) (U) (THU OR PKT OR PAC)/RL

2 (REFX L1 OR L2) (U) (THU OR PKT OR PAC)/RL X 3

L1 OR L2 (L3 = REGISTRY answers) (REFX L3) (U) (THU OR PKT OR PAC)/RL

4 REFX L3 (U) (THU OR PKT OR PAC)/RL X 5

(REFX (L1 OR L2)) (L6 = CAplus answers) L6 (U) (THU OR PKT OR PAC)/RL

35

Refine a structure or substance search with modifying text in the IT field

The (U) relational operator is defined between a REGISTRY Cross File Search, CAS roles, and modifying text, which must be searched in the IT field.

(U)

36

Comparison search with automatic Cross File search OFF

37

Optionally choose a broad CAS role

The system automatically switches to CAplus and searches “refx l5.”

38

Refine the CAplus L-number with roles and text

39

Complex R-groups

R1 R2

40

Two ways to search with disconnected fragments in REGISTRY

• Draw structure fragments in separate windows, combine structures with the AND operator ‒ Finds single and multi-component substances ‒ Fragments may be in the same or different components (SSS) ‒ There may be over lap between the fragments (SSS)

• Draw structure fragments in the same window ‒ Finds single and multi-component substances ‒ Fragments may be in the same or different components (SSS) ‒ There will be no overlap between the fragments (SSS)

41

Search two separate fragments with AND

42

Results from combining two separate structure queries with the AND operator

Overlapping 2 separate components

Disconnected

43

Search two separate fragments drawn in the same window

44

Results from searching two separate fragments drawn in the same window

2 separate components*

Disconnected

May have overlap, but a disconnected fragment must be present.

* Different from classic STN

45

FAMILY searches with fragments

A FAMILY search finds the same answers with 2 separate structure queries, or when 2 fragments are drawn in the same window.

46

Example answers from FAMILY search

114205-82-2 C54 H104 O18 S3 . 3 H3 N Incompletely Defined Substance (IDS)

1644285-27-7 C6 H14 O6 . x H2 O4 S

FAMILY searches find two or more component answers. This can be useful in polymer searches when you have specific monomers you want to be present, but other monomers can also be present, or you may be interested in Incompletely Defined Substances.

47

EXACT searches with fragments drawn in a single window

48

Find substances from references with SUBX

Asterisks in the patent family indicate publication numbers which have been indexed as basics in CAplus.

49

Click Get Substances to retrieve substances from REGISTRY

50

SUBX finds substances in REGISTRY

The system enters REGISTRY and uses SUBX with the accession number of the CAplus record to Cross File Search the Registry Numbers associated with that record, resulting in 125 substances in L2.

51

Compare answers from each record

52

What is the DWPI Chemistry Resource (DCR)?

• DCR is a chemical structure database covering specific chemical structures indexed in Derwent World Patents Index® (DWPISM) patent records

• Fully integrated with DWPI on new STN

28,000,000+ Patent records

2,400,000+ Substance

records

DCR DWPI

53

DWPI Chemistry Resource (DCR)

• For each specific chemical substance a DCR record is created with a unique DCR number ‒ Basic compound ‒ Salts, isotopes, mixtures, isomers

• Substance records include structure diagrams and substance data, e.g. ‒ IUPAC-name, synonyms ‒ Molecular formula, molecular weight

• DCR numbers (/DCR) form the connection to DWPI patent records

54

. . . .

DCR substance record detailed display

Chemical structures are searchable in the standard new STN format.

DCR can often be a useful source of synonym chemical names (/CN).

DCR numbers connect DCR substance records to DWPI patent records.

55

A note on DCR chemical name fields

• A Preferred Name (/CN.P) may be chosen by Thomson Reuters, e.g. a generic drug name

• Synonyms (/SY) may be selected for inclusion by Thomson Reuters, e.g. trivial names, trade names

• Chemical Name (/CN) field provides a one step search of all Preferred (/CN.P) and Synonym (/SY) names

• A Systematic Chemical Name (/CN.S) may also be available, generated using AutoNom software

• Chemical name segment (/CNS) field provides name fragment searching for all CN and CN.S names

56

Searching DCR numbers

• Use the /AN.S field to retrieve substance records 1 DCR-368/AN.S

• Use the /DCR field to retrieve patent records

DCR-90453 is the DCR number for cetirizine.

57

DWPI patent record detailed display full view

. . . . . . . . . . . .

. . . .

DCR numbers connect DCR substance records to DWPI patent records.

DCR hit structures are automatically displayed in detailed display full view.

58

DCR coverage

• Specific chemical compounds indexed by Thomson Reuters from basic patents in DWPI

• DWPI patents classified in pharmaceutical (B), agrochemical (C) and/or general chemical (E)

• Comprehensive coverage began in 4/1999 • Selective coverage for approximately

‒ 20,000 substances from 1/1987 to date ‒ 2,100 substances from 7/1981 to date

See also: DWPI CPI Chemical Indexing Guidelines: http://ip-science.thomsonreuters.com/m/pdfs/mgr/chemical_index_guidelines.pdf

59

Multi-database chemical structure search example

Search Question: A group of compounds are described as having analgesic properties as kappa opioid agonists. Find similar substances.

60

Identify a common core structure

61

Include query details

• The 6-membered ring is monocyclic or polycyclic, saturated or normalized

• R1-N-R2 describe rings of various sizes, but R1 and R2 are always C

• The ring N is attached directly to the 6-membered ring, or via a 2 C-chain linker

• The amide chain is connected to either the linker or the ring

• Ar is defined as an aryl or heteroaryl ring • n is defined as 1-3 C

62

Multi-database chemical structure search steps

1. Create a new project and select databases 2. Prepare the structure queries 3. Run the structure searches in REGISTRY and DCR 4. Crossover the results to CAplus and DWPI – REFX 5. Use Create Term List to identify unique hits in

DWPI via the CAplus Patent Number/Kind (PNK)

63

. . . .

Create a new project and select databases

64

Prepare the structure queries

• Right click on a node or bond to change Attribute Values

• Mouse over the query or attribute panel to verify Attribute Values

• Click OK to add the query to the structures tab of the history panel

65

Prepare the structure queries (cont.)

Notes: • Manually assigned ring bonds are

represented with a circle symbol • Unspecified bonds are represented

by a dashed line

66

Run the structure searches in REGISTRY and DCR

• Structure queries are assigned STR numbers for searching

• Use REFX to retrieve references (L3)

67

Use Create Term List to identify unique hits

• Create Term List is used to extract data and transfer terms to other databases for searching

• Main focus is on patent information ‒ PN, PNK, PRN, AP available in all patent databases ‒ Basic versions (.B) available in patent family databases ‒ RN, CN and DOI are also available

• Term Lists are identified by Q# ‒ Permanent asset, project independent ‒ Can be searched in combination with other terms ‒ Can be re-qualified with one or more field codes

67

68

Use Create Term List to identify unique hits (cont.)

69

Search Term Lists via their assigned Q-numbers

Q18 = patent number/kind taken from CAplus (L3).

L3 = CAplus and DWPI combined search results.

Patent records only found in DWPI (L4).

Manage Term lists.

70

Additional records found in DWPI

Including DCR/DWPI is an essential part of completing a comprehensive chemical structure prior art search.

DCR hit structures are automatically displayed in detailed display full view.

• Set default hit structure display for CAplus and DWPI in settings

• Use STN Help to find details on structure drawing tools • Strategies can be as broad or narrow as you want • Create Term Lists to search patent publications in

different databases

Summary

CAS help@cas.org Support and Training: www.cas.org

FIZ Karlsruhe helpdesk@fiz-karlsruhe.de Support and Training: www.stn-international.de

For more information …

top related