searching for uncommon sequences on stn · searching for uncommon sequences on stn jim brown fiz...

95
Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc.

Upload: others

Post on 08-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Searching for uncommon sequences on STN

Jim Brown FIZ Karlsruhe, Inc.

Page 2: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Peptide/protein sequence searching – Definition of 20/22 common amino acids – Uncommon amino acids

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for amino acids • B, J and Z

– Modification of peptides/proteins

2

Page 3: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda (cont.)

• Nucleic acid sequence searching – Definition of 5/6 common nucleotides – Uncommon/modified nucleotides

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for nucleotides • R, Y, M, K, S, W, B, D, H, V, N

– Modification of nucleic acids

3

Page 4: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Peptide/protein sequence searching – Definition of 20/22 common amino acids – Uncommon amino acids

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for amino acids • B, J and Z

– Modification of peptides/proteins

4

Page 5: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

20 Common amino acids

5

Amino acid 3 letter designation

1 letter designation

Alanine Ala A

Arginine Arg R

Asparagine Asn N

Aspartic acid Asp D

Cysteine Cys C

Glutamic acid Glu E

Glutamine Gln Q

Glycine Gly G

Histidine His H

Isoleucine Ile I

Page 6: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

20 Common amino acids (cont.)

6

Amino acid 3 letter designation

1 letter designation

Leucine Leu L

Lysine Lys K

Methionine Met M

Phenylalanine Phe F

Proline Pro P

Serine Ser S

Threonine Thr T

Tryptophan Trp W

Tyrosine Tyr Y

Valine Val V

Page 7: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

22 Common amino acids

7

Amino acid 3 letter designation

1 letter designation

Pyrrolysine Pyl O

Selenocysteine Scy U

Note: These two amino acids are only searchable in REGISTRY. Pyrrolysine was added to REGISTRY in 2006; selenocysteine is covered through all time periods. Only amino acids listed in WIPO ST.25 are allowed in formal listings in patents. These two amino acids are not listed in WIPO ST.25, therefore they are not used in DGENE, USGENE and PCTGEN.

Page 8: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

• DGENE only recognizes 20 common amino acids

• O and U are represented by X in the sequence and Selenocysteine or Pyrrolysine will be in the FEATURE TABLE

L1 ANSWER x OF 45 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 kfqicvsxgy rr FEATURE TABLE: Key |Location|Qualifier| =============+========+=========+======================= Modified-site|8 |note |"Selenocysteine"

Example in DGENE

8

Page 9: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

• USGENE only recognizes 20 common amino acids

• O and U are represented by X in the sequence and Selenocysteine or Pyrrolysine will be in the FEATURE TABLE

L2 ANSWER x OF 13 USGENE COPYRIGHT 2013 SEQUENCEBASE CORP on STN SEQ 1 gpssggXg FEATURE TABLE: Key |Location| ==========+========+======================= MOD_RES |7..7 |Selenocysteine

Example in USGENE

9

Page 10: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

• PCTGEN only recognizes 20 common amino acids

• O and U are represented by X in the sequence and Selenocysteine or Pyrrolysine will be in the FEATURE TABLE

L6 ANSWER x OF 43 PCTGEN COPYRIGHT 2013 WIPO on STN SEQ 1 rrXlwdqgn FEATURE TABLE: Key |Location| ==========+========+======================= | |Synthetically generated | |peptide VARIANT |3 |Xaa = Cysteine or | |selenocysteine

Example in PCTGEN

10

Page 11: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

• REGISTRY recognizes 22 common amino acids – Including O and U

L4 ANSWER x OF 137 REGISTRY COPYRIGHT 2013 ACS on STN SEQ 1 MVSPUWTW

Example in REGISTRY

11

Page 12: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Peptide/protein sequence searching – Definition of 20/22 common amino acids – Uncommon amino acids

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for amino acids • B, J and Z

– Modification of peptides/proteins

12

Page 13: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Sequences with uncommon amino acids

• Uncommon amino acids defined as – Anything other than 20/22 common amino acids

• In DGENE, USGENE, and PCTGEN BLAST and Sequence Code Match (aka SCM) searches do not recognize anything other than the 20 common amino acid designations and 3 variable designations (B, X or Z)

13

Page 14: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Sequences with uncommon amino acids

• For protein/peptide searches with uncommon amino acids, use wildcard symbols in search query – ‘X’ in query for BLAST search queries – ‘.’ in query for SCM search queries

• Search FEATURE TABLE

14

Page 15: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Example in DGENE

15

=> FILE DGENE

FILE 'DGENE' ENTERED AT 15:31:54 ON 14 FEB 2013 COPYRIGHT (C) 2013 THOMSON REUTERS

=> RUN GETSEQ LTCLASYXWL/SQSP

RUN GETSEQ AT 15:32:11 ON 14 FEB 2013 COPYRIGHT (C) 2013 FIZ KARLSRUHE GMBH

L1 RUN STATEMENT CREATED L1 2 LTCLASYXWL/SQSP

=> S L1 AND (HOMOCYST? OR HCY)/FEAT 13888 HOMOCYST?/FEAT 56 HCY/FEAT L2 1 L1 AND (HOMOCYST? OR HCY)/FEAT

=> D SEQ FEAT

L2 ANSWER 1 OF 1 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 kltclasyxw lf ========= = HITS AT: 2-11

FEATURE TABLE: Key |Location|Qualifier| =============+========+=========+======================= Modified-site|9 |note |"Homocysteine”

Page 16: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Sequences with uncommon amino acids

• CAS has a list of ‘standardized uncommon amino acids’ with three letter designations – Unique to the CAS databases

• In REGISTRY, SCM searches can recognize ‘standardized uncommon amino acid designations’ – Use three letter designation using single quotes

• e.g. ‘HCY ’

16

Note: To access the entire list of standardized uncommon amino acids while in REGISTRY, use HELP AAU.

Page 17: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE REG

FILE 'REGISTRY' ENTERED AT 17:56:27 ON 22 FEB 2013 USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. PLEASE SEE "HELP USAGETERMS" FOR DETAILS. COPYRIGHT (C) 2013 American Chemical Society (ACS) . . .

=> HELP AAU . . .

3-Letter Code Name ------------- ---- Aaa alpha-amino acid Aad 2-aminoadipic acid (2-aminohexanedioic acid) Aan alpha-asparagine Abu 2-aminobutanoic acid Aca 2-aminocapric acid (2-aminodecanoic acid) . . . Har homoarginine Hcy homocysteine Hhs homohistidine Hiv 2-hydroxyisovaleric acid Hse homoserine . . .

Codes for Standardized Uncommon Amino Acids

17

Page 18: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Example in REGISTRY

18

=> S LTCLASY'HCY'WL/SQSP L1 1 LTCLASY'HCY'WL/SQSP

=> D SEQ NTE

L1 ANSWER 1 OF 1 REGISTRY COPYRIGHT 2013 ACS on STN

SEQ 1 KLTCLASYXW LF ========= = HITS AT: 2-11 NTE modified ------------------------------------------------------------------- type ------ location ------ description ------------------------------------------------------------------- terminal mod. Phe-12 - C-terminal amide uncommon Hcy-9 - - ------------------------------------------------------------------

HCY = Homocysteine

Page 19: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Sequences with uncommon amino acids

• To search for uncommon amino acids not on list of ‘standardized uncommon amino acids’, use wildcard symbols in search queries – In DGENE, USGENE and PCTGEN

• Use ‘X’ in BLAST search queries – Search name variations in FEATURE TABLE

• Use ‘.’ in SCM search queries – Search name variations in FEATURE TABLE

19

Page 20: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Sequences with uncommon amino acids

• To search for uncommon amino acids not on list of ‘standardized uncommon amino acids’, use wildcard symbols in search queries – In REGISTRY,

• use ‘X’ in BLAST search queries – Search for ‘uncommon’ in the NTE field

• use ‘.’ in SCM search queries – Search for ‘uncommon’ in the NTE field

20

Page 21: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example

21

Search Question: Find all peptides that contain the following sequence –

KKLKQKLAELLENLLERFLDLVX

where X=azaproline

Page 22: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE DGENE

FILE 'DGENE' ENTERED AT 17:34:29 ON 29 JAN 2013 COPYRIGHT (C) 2013 THOMSON REUTERS

. . .

=> RUN BLAST KKLKQKLAELLENLLERFLDLVX/SQP -F F -M PAM30 -W 2 -E 10000

BLAST Version 2.2 . . .

Search example in DGENE

22

Note: Changes in BLAST parameters - 1. Turned low complexity filter off. 2. Changed matrix to PAM30. 3. Changed word size to 2. 4. Changed expectation value to 10,000.

Page 23: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

10000 ANSWERS FOUND BELOW EXPECTATION VALUE OF 10000.0 QUERY SELF SCORE VALUE IS 72 BEST ANSWER SCORE VALUE IS 72 Similarity Score 72 | | | | || || || || ||| ||| 36 ||| |||| |||||||| ||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| Answer Count 2000 4000 6000 8000 10000

Search example in DGENE (cont.)

23

Page 24: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

ENTER EITHER THE NUMBER OF ANSWERS YOU WISH TO KEEP OR ENTER MINIMUM PERCENT OF SELF SCORE FOLLOWED BY % (BEST ANSWER PERCENTAGE OF SELF SCORE IS 100%) ENTER (ALL) OR ? :50% L1 RUN STATEMENT CREATED L1 508 KKLKQKLAELLENLLERFLDLVX/SQP.-F F -M PAM30 -W 2 -E 10000

Answer set arranged by accession number; to sort by descending similarity score, enter at an arrow prompt (=>) "sor score d".

=> S L1 AND ((AZA (1W) PRO?) OR AZAPRO?)/FEAT 996 AZA/FEAT 367154 PRO?/FEAT 115 AZA (1W) PRO? 1 AZAPRO?/FEAT L2 85 L1 AND ((AZA (1W) PRO?) OR AZAPRO?)/FEAT => SOR L2 SCORE D PROCESSING COMPLETED FOR L2 L3 85 SOR L2 SCORE D

Search example in DGENE (cont.)

24

Note: Searching for azapro? as one word, or as two words or hyphenated (aza-pro?).

Page 25: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> D BIB SCORE SEQ FEAT

L3 ANSWER 1 OF 85 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN AN AYH82102 peptide DGENE TI New peptides with specific amino acid residue useful to treat or prevent e.g. dyslipidemia, cardiovascular disease e.g. atherosclerosis and restenosis, endothelial dysfunction, macrovascular disorder and microvascular disorder. IN Dasseux J; Schwendeman A S; Zhu L PA (CERE-N) CERENIS THERAPEUTICS SA. PI WO 2010093918 A1 20100819 210 AI WO 2010-US24096 20100212 PRAI US 2009-152960P 20090216 . . . PSL Disclosure; SEQ ID NO 229 DT Patent LA English OS 2010-K33647 [56] DESC Cardiovascular disease treatment related peptide, SEQ ID 229. SCORE 72 100% of query self score 72 SEQ 1 kklkqklael lenllerfld lvx

FEATURE TABLE: Key |Location|Qualifier| =============+========+=========+================= Modified-site|23 |note |"aza-proline”

Search example in DGENE (cont.)

25

Feature table identifies amino acid at position 23 as aza-proline.

Page 26: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY

26

1. Click on BLAST button on main STN Express.

2. This opens the Results Set Manager, which allows you to run a new sequence search.

Page 27: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

27

1. Type in sequence to be searched (or read from file)

2. Optional: Give the search a result name.

3. Click OK.

Page 28: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

28

For this example, choose BLASTp button.

Page 29: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

29

Choose which references are to be seen. For this example, only sequences which appear in at least one patent document are chosen.

Page 30: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

30

As this is a short sequence, check Show Additional Options box.

Page 31: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

31

Parameters changed: 1. Low complexity filter

is turned off. 2. Word size is

changed to 2. 3. Expectation value is

changed to 1000 4. The Weight Matrix is

changed to PAM-30. 5. Max No. of Answers

is changed to 1000.

Page 32: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

32

Sequence search is added to Results Set Manager. The current status is Running.

Page 33: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

33

When the status is Complete, highlight the name and click on View Results.

Page 34: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

34

Decide which sequences are of interest and check box next to those sequences.

Page 35: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

35

Click on Get STN Data button.

Page 36: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

36

Choose the appropriate option. In this example, Sequence Records was chosen.

Page 37: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

37

Page 38: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

38

If you wish to save your transcript, name it here. Tip: For consistency, use the same name as the saved sequence.

Page 39: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

39

Page 40: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

40

Page 41: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Search example in REGISTRY (cont.)

41

Aza-proline is not a standardized uncommon amino acid, so ‘uncommon’ will be used in the Notes (NTE) field.

Page 42: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> S L17 AND UNCOMMON/NTE 753988 UNCOMMON/NTE L18 133 L17 AND UNCOMMON/NTE

=> D SEQ NTE 1-3

L18 ANSWER 1 OF 133 REGISTRY COPYRIGHT 2013 ACS on STN

SEQ 1 KLKQKLAELL ENLLERFLDL VX **RELATED SEQUENCES AVAILABLE WITH SEQLINK** NTE ----------------------------------------------------------------- type ------ location ------ description ----------------------------------------------------------------- uncommon Aaa-22 - - -----------------------------------------------------------------

Search example in REG (cont.)

42

Note: The amino acid designation is Aaa, which stands for ‘other specific amino acid, as opposed to an Xaa (X) which would designate ‘any amino acid.’ This allows for another level of search refinement.

Page 43: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

L18 ANSWER 2 OF 133 REGISTRY COPYRIGHT 2013 ACS on STN

SEQ 1 KLKQKXAELG ENLLERFLDL VX **RELATED SEQUENCES AVAILABLE WITH SEQLINK** NTE ----------------------------------------------------------------- type ------ location ------ description ----------------------------------------------------------------- uncommon Aaa-6 - - uncommon Aaa-22 - - ----------------------------------------------------------------- L18 ANSWER 3 OF 133 REGISTRY COPYRIGHT 2013 ACS on STN

SEQ 1 KLKQKLAELL EQLLDKFLEL AX **RELATED SEQUENCES AVAILABLE WITH SEQLINK** NTE ----------------------------------------------------------------- type ------ location ------ description ---------------------------------------------------------------- uncommon Aaa-22 - - -----------------------------------------------------------------

Search example in REG (cont.)

43

Page 44: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Peptide/protein sequence searching – Definition of 20/22 common amino acids – Uncommon amino acids

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for amino acids • B, J and Z

– Modification of peptides/proteins

44

Page 45: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables used in sequence searching

45

1 letter designation

3 letter designation

Represents

B Asx Aspartic acid or Asparagine

J Xle Isoleucine or Leucine (only works in REG; added in 2006)

X Xxx Uncommon or unspecified

Z Glx Glutamic acid or Glutamine

Page 46: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in BLAST sequence searching

46

• For DGENE, USGENE, PCTGEN – Using B or Z in a BLAST search, the exact amino

acids would be a positive (+) match, not an identity match

=> FILE PCTGEN

FILE 'PCTGEN' ENTERED AT 16:00:49 ON 14 FEB 2013 COPYRIGHT (C) 2013 WIPO

=> RUN BLAST MGBNFQ/SQP -F F -M PAM30 -W 2 -E 20000

BLAST Version 2.2.20 . . .

Note: Changes in BLAST parameters - 1. Turned low complexity filter off. 2. Changed matrix to PAM30. 3. Changed word size to 2. 4. Changed expectation value to 20,000.

Page 47: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in BLAST sequence searching

47

446 ANSWERS FOUND BELOW EXPECTATION VALUE OF 20000.0 QUERY SELF SCORE VALUE IS 24 BEST ANSWER SCORE VALUE IS 24 Similarity Score 24 | | ||| ||||||||||| |||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| 12 |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| . . . |||||||||||||||||||||||||||||||||||||||||||||||||| Answer Count 90 180 270 360 450 ENTER EITHER THE NUMBER OF ANSWERS YOU WISH TO KEEP OR ENTER MINIMUM PERCENT OF SELF SCORE FOLLOWED BY % (BEST ANSWER PERCENTAGE OF SELF SCORE IS 100%) ENTER (ALL) OR ? :60% L1 RUN STATEMENT CREATED L1 446 MGBNFQ/SQP.-F F -M PAM30 -W 2 -E 20000

Page 48: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in BLAST sequence searching

48

=> SOR SCORE D PROCESSING COMPLETED FOR L1 L2 446 SOR L1 SCORE D

=> D SCORE ALIGN 1 5

L2 ANSWER 1 OF 446 PCTGEN COPYRIGHT 2013 WIPO on STN SCORE 24 100% of query self score 24 BLASTALIGN Query = 6 letters Length = 29 Score = 23.5 bits (48), Expect = 1e-05 Identities = 6/6 (100%), Positives = 6/6 (100%) Query: 1 MGBNFQ 6 MGBNFQ Sbjct: 24 MGBNFQ 29 L2 ANSWER 5 OF 446 PCTGEN COPYRIGHT 2013 WIPO on STN SCORE 24 100% of query self score 24 BLASTALIGN Query = 6 letters Length = 498 Score = 23.5 bits (48), Expect = 2e-04 Identities = 5/6 (83%), Positives = 6/6 (100%) Query: 1 MGBNFQ 6 MG+NFQ Sbjct: 40 MGDNFQ 45

The exact amino acid is a positive (+) match, not an identity match.

Page 49: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in SCM sequence searching

49

• For DGENE, USGENE, and PCTGEN – J is not an acceptable letter for sequence searching – B and Z will only capture records where the sequence

has those designations, not the corresponding specific amino acids they represent

– To include specific amino acids in SCM search query, use [ ]

• Example: For TLGIVZPI subsequence search use – RUN GETSEQ TLGIV[ZEQ]PI/SQSP

Page 50: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE DGENE

FILE 'DGENE' ENTERED AT 22:39:25 ON 18 FEB 2013 COPYRIGHT (C) 2013 THOMSON REUTERS . . .

=> RUN GETSEQ FAEBGK/SQSP RUN GETSEQ AT 22:39:35 ON 18 FEB 2013 COPYRIGHT (C) 2013 FIZ KARLSRUHE GMBH L1 RUN STATEMENT CREATED L1 1 FAEBGK/SQSP => D SEQ L1 ANSWER 1 OF 1 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 tyvpkefnae tftfhadict lsekerqikk qtalvelvkh kpkatkeqlk 51 avmddfaafv ekcckaddke tcfaebgkkl vaasqaalgl ====== HITS AT: 73-78

Variable SCM search example in DGENE

50

This search strategy will capture only those sequences that have a B in them, not sequences that have N or D.

Page 51: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> RUN GETSEQ FAE[BND]GK/SQSP

RUN GETSEQ AT 22:46:15 ON 18 FEB 2013 COPYRIGHT (C) 2013 FIZ KARLSRUHE GMBH

L2 RUN STATEMENT CREATED L2 100 FAE[BND]GK/SQSP

=> S L2 NOT L1 L3 99 L2 NOT L1

=> D SEQ

L3 ANSWER 1 OF 99 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 dtavtnkqnf stdviyqvft drfldgnpsn nptgaafdgt csnlklycgg 51 dwqglinkin dnyfsdlgvt alwisqpven ifatinysgv tntayhgywa 101 rdfkktnpyf gtmtdfqnlv tsahakgiki iidfapnhtf pametdtsfa == 151 engklydngs lvggytndtn gyfhhnggsd fstlengiyk nlydladlnh ==== 201 nnstidtyfk daiklwldmg vdgirvdavk hmpqgwqknw mssiyahkpv 251 ftfgewflgs aasdadntdf anesgmslld frfnsavrnv frdntsnmya . . . 651 kkngatitwe ggsnhtfttp tsgtatvtvn wq HITS AT: 149-154

Variable SCM search example in DGENE

51

This search strategy will capture those sequences that have a B, N or D in it.

Page 52: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in SCM sequence searching

52

• For REGISTRY – J, B and Z are searchable – J, B and Z will capture records where the sequence

has those designations, and sequences with the corresponding specific amino acids they represent

Page 53: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE REG FILE 'REGISTRY' ENTERED AT 22:55:32 ON 18 FEB 2013 USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. PLEASE SEE "HELP USAGETERMS" FOR DETAILS. COPYRIGHT (C) 2013 American Chemical Society (ACS) . . . => S FAEBGK/SQSP L1 122 FAEBGK/SQSP

Variable SCM search example in REG

53

Page 54: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> D SEQ 1-122 . . . L1 ANSWER x OF 122 REGISTRY COPYRIGHT 2013 ACS on STN SEQ 1 MIRRTLPILL MILLAGCNQE SGASKEPGEH REVIQGMHTQ FIKVTEGQNQ 51 WYEMAISVDD SNTFRMPVFF AEDGKLVRVD DKQARKLFDR WLKERAKGIA = ===== 101 AFSSVDEQVG FKGPFLALDV KR HITS AT: 70-75 . . . L1 ANSWER x OF 122 REGISTRY COPYRIGHT 2013 ACS on STN SEQ 1 TPDAERTMLT HLGISITLQK SDVDLEKLKS SSISYIEGYL WDGQGTKEAS 51 LLTMEESKKN GVKVAYTYSD PFCVNRSRED FIRLTKEYFD IVFCNTEEAK 101 ALSQREDKLE ALKFISGLSA LVFMTDSANG AYFAENGKIS HVDG ====== HITS AT: 133-138 . . .

Variable SCM search example in REG

54

Page 55: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Peptide/protein sequence searching – Definition of 20/22 common amino acids – Uncommon amino acids

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for amino acids • B, J and Z

– Modification of peptides/proteins

55

Page 56: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Modification of peptides/proteins

56

• Modifications are discussed in – FEATURES Table

• DGENE, USGENE and PCTGEN – Notes field

• REGISTRY

• Modification info may include – Stereochemistry – Modification(s) made at specific amino acid site(s)

• CAS standardized blocking groups • Non-standardized modifications

Page 57: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Modification of amino acid residues

57

• In DGENE, USGENE, and PCTGEN – Amino acid residue representation

• Original amino acid or X – FEATURES Table (/FEAT)

• Look for keywords and variations/spellings/abbreviations

Page 58: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Example of stereochemistry info

58

L1 ANSWER 2 OF 54958 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN AN BAJ42897 peptide DGENE TI New peptide compound having specific amino acid sequence, useful for treating e.g. thrombosis, thrombophlebitis, unstable angina, myocardial infarction, stroke, sepsis, tumor metastasis, inflammatory arthritis. IN Huang T; Chang C; Chung C PA (UNTU) UNIV TAIWAN NAT. PI WO 2012172427 A2 20121220 46 AI WO 2012-IB1345 20120614 PRAI US 2011-496742P 20110614 PSL Claim 3; SEQ ID NO 20 DT Patent LA English OS 2012-R37989 [03] DESC Platelet aggregation inhibition related-peptide (d-form Tro-6), SEQ:20. SEQ3 1 Cys-Lys-Trp-Met-Asn-Val FEATURE TABLE: Key |Location|Qualifier| ===============+========+=========+======================= Misc-difference|2 |note |"D-form residue" Misc-difference|5 |note |"D-form residue"

This is a BIB SEQ3 FEAT custom display. Notice the three-letter abbreviations for the amino acids with a SEQ3 display.

Page 59: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

L1 ANSWER 1 OF 597267 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN AN BAJ43913 peptide DGENE TI Extracting a peptide from a reaction mixture resulting from a peptide coupling reaction comprises adding organic solvents and water to the reaction mixture. IN Monnaie D; Forni L; Giraud M PA (LONZ) LONZA LTD. (LONZ) LONZA BRAINE SA. PI WO 2012171984 A1 20121220 81 AI WO 2012-EP61257 20120614 PRAI EP 2011-170094 20110616 US 2011-497642P 20110616 US 2011-498100P 20110617 PSL Example; SEQ ID NO 17 DT Patent LA English OS 2012-R38071 [02] DESC Solid phase peptide synthesis related peptide, SEQ ID 17. SEQ 1 lwvns

FEATURE TABLE: Key |Location|Qualifier| =============+========+=========+============================ Modified-site|2 |note |"Modified with | | |tert-butoxycarbonyl (Boc)" Modified-site|4 |note |"Modified with trityl (Trt)" Modified-site|5 |note |"C-terminal amide; Modified | | |with tBu”

Example of modification of amino acid residues

59

These are value added modification annotations provided by the TR indexer, not by the applicant.

Page 60: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Modification of amino acid residues

60

• In REGISTRY – Amino acid residue representation

• Original amino acid or X – Notes field (/NTE)

• Look for keywords, standardized abbreviations and variations/spellings/abbreviations

Page 61: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Modification of amino acid residues

61

Page 62: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

L1 ANSWER 1 OF 664 REGISTRY COPYRIGHT 2013 ACS on STN SEQ 1 VEQKYGQFPQ G NTE modified (modifications unspecified) ======== ---------------------------------------------------------------------- type ------ location ------ description ---------------------------------------------------------------------- modification Val-1 - (9h-fluoren-9-ylmethoxy) carbonyl modification Glu-2 - 1,1-dimethylethyl<t-Bu> modification Gln-3 - triphenylmethyl<Trit> modification Lys-4 - (1,1-dimethylethoxy) carbonyl<Boc> modification Tyr-5 - 1,1-dimethylethyl<t-Bu> modification Gln-7 - triphenylmethyl<Trit> modification Gln-10 - triphenylmethyl<Trit> ---------------------------------------------------------------------- REFERENCE 1 AN 157:693426 CA TI Peptide-lipid conjugates that bind lipopolysaccharide and their therapeutic use IN Tice, Thomas; Woeher, Torsten PA Evonik Degussa Corporation, USA SO PCT Int. Appl., 50pp. CODEN: PIXXD2

Modification of amino acid residues

62

The Notes field can list many different modifications.

Page 63: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

DT Patent LA English FAN.CNT 1 PATENT NO. KIND DATE APPLICATION NO. DATE --------------- ---- -------- --------------- -------- PI WO 2012148891 A1 20121101 WO 2012-US34757 20120424 W: AE, AG, AL, AM, AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW RW: AL, AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV, MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM, TR, BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG, BW, GH, GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, SZ, TZ, UG, ZM, ZW, AM, AZ, BY, KG, KZ, MD, RU, TJ, TM US 20120294924 A1 20121122 US 2012-454211 20120424 PRAI US 2011-480596P 20110429 RE.CNT 4 THERE ARE 4 CITED REFERENCES AVAILABLE FOR THIS RECORD ALL CITATIONS AVAILABLE IN THE RE FORMAT

Modification of amino acid residues (cont.)

63

Page 64: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Nucleic acid sequence searching – Definition of common nucleotides – Uncommon nucleotides

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for nucleotides • R, Y, M, K, S, W, B, D, H, V, N

– Modification of nucleic acids

64

Page 65: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

5/6 Common nucleotides

65

1 letter designation

Represents

A Adenine

G Guanine

C Cytosine

T Thymine

U Uracil

I Inosine (REGISTRY only)

Note: Inosine is only searchable in REGISTRY. Only nucleotides listed in WIPO ST.25 are allowed in formal listings in patents. Inosine is not listed in WIPO ST.25, therefore it is not used in DGENE, USGENE and PCTGEN.

Page 66: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Nucleic acid sequence searching – Definition of common nucleotides – Uncommon nucleotides

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for nucleotides • R, Y, M, K, S, W, B, D, H, V, N

– Modification of nucleic acids

66

Page 67: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Sequences with uncommon nucleotides

• Uncommon nucleotides defined as – Anything other than 5/6 common nucleotides

• In DGENE, USGENE, and PCTGEN BLAST and SCM searches do not recognize anything other than the 5 common nucleotide designations and 11 variable nucleotide designations

67

Page 68: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Sequences with uncommon nucleotides

• For nucleic acid searches with uncommon nucleotides, use wildcard symbols in search query – ‘N’ in query for BLAST search queries – ‘.’ in query for SCM search queries

• Search FEATURE TABLE

68

Page 69: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE DGENE FILE 'DGENE' ENTERED AT 23:45:59 ON 18 FEB 2013 COPYRIGHT (C) 2013 THOMSON REUTERS . . . => RUN GETSEQ UUCNG/SQSN RUN GETSEQ AT 23:54:36 ON 18 FEB 2013 COPYRIGHT (C) 2013 FIZ KARLSRUHE GMBH L1 RUN STATEMENT CREATED L1 18 UUCNG/SQSN => S L1 AND INOSINE/FEAT 2153 INOSINE/FEAT L2 2 L1 AND INOSINE/FEAT

Inosine example in DGENE

69

Page 70: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> D SEQ FEAT L2 ANSWER 1 OF 2 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 uucuagccuu cnggagucag ggc == === HITS AT: 9-13 FEATURE TABLE: Key |Location|Qualifier| =============+========+=========+======================= modified_base|12 |*tag= a | | |mod_base |i | |note |"Optionally inosine"

Inosine example in DGENE (cont.)

70

Page 71: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE REG FILE 'REGISTRY' ENTERED AT 00:03:52 ON 19 FEB 2013 USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. PLEASE SEE "HELP USAGETERMS" FOR DETAILS. COPYRIGHT (C) 2013 American Chemical Society (ACS) . . . => S UUCIG/SQSN L3 32 UUCIG/SQSN => D SEQ 1-5 L3 ANSWER x OF 32 REGISTRY COPYRIGHT 2013 ACS on STN SEQ 1 uucuagccuu ciggagucag ggc == === HITS AT: 9-13

Inosine example in REG

71

Page 72: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

CAS shortcut descriptors for “modified base”

72

Page 73: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE REG FILE 'REGISTRY' ENTERED AT 15:13:46 ON 19 FEB 2013 USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. PLEASE SEE "HELP USAGETERMS" FOR DETAILS. COPYRIGHT (C) 2013 American Chemical Society (ACS)

. . .

=> S AC4C/NTE L1 76 AC4C/NTE => D SEQ NTE L1 ANSWER 9 OF 76 REGISTRY COPYRIGHT 2013 ACS on STN SEQ 1 ggagagaugg ccgagcgguc uaaggcgcug guuuiaggca ccagucccuu 51 cgggggcgug gguucgaauc ccacucucuu cacca NTE modified ---------------------------------------------------------------------- type ------ location ------ description ---------------------------------------------------------------------- modified base c-12 ac4c modified base u-19 hu modified base u-63 m5u ----------------------------------------------------------------------

Ac4c in REG

73

Page 74: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE DGENE

FILE 'DGENE' ENTERED AT 14:58:58 ON 19 FEB 2013 COPYRIGHT (C) 2013 THOMSON REUTERS . . .

=> S (ACETYLCYTIDINE OR AC4C)/FEAT 4 ACETYLCYTIDINE/FEAT 67 AC4C/FEAT L1 68 (ACETYLCYTIDINE OR AC4C)/FEAT

=> D SEQ FEAT 1 x

L1 ANSWER 1 OF 68 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 xvxeiqlxhq xarwiqxkx FEATURE TABLE: Key |Location|Qualifier| =============+========+=========+============================== Modified-site|1 |note |"1-aminocyclopentane-1-carboxy | | |lic acid (Ac5c)" Modified-site|3 |label |Aib Modified-site|8 |label |Nle Modified-site|11 |note |"Homoarginine (hR)" Modified-site|17 |label |Aib Modified-site|18 |note |"Modified with | | |N-epsilon-1'-alkyl beta-D- | | |glucuronyl" Modified-site|19 |note |"1-aminocyclo | | |butane-1-carboxylic acid | | |(Ac4c). C- terminal amide"

Ac4c in DGENE

74

This is a peptide sequence, and ac4c has a meaning other than what we were looking for.

Page 75: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

L1 ANSWER x OF 68 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 tcccaatccc aatcccaatc ccaatcccaa tcccaatccc aatcccaatc 51 ccaa FEATURE TABLE: Key |Location|Qualifier | =============+========+============+============================== misc_binding |1..10 |*tag= b | | |bound_moiety|"HT54 nanocircle splint DNA 2" | |note |"Forms double-stranded region | | |with bases 10-1 of splint | | |DNA" misc_feature |1 |*tag= a | | |label |Ligation_site | |note |"HT54 precursor is | | |circularised by ligation of T1 | | | to A54" modified_base|2..4 |*tag= c | | |mod_base |ac4c | |note |"Optionally 4-acetylcytidine" . . .

Ac4c in DGENE (cont.)

75

This is a nucleotide sequence, and ac4c has the correct meaning we were looking for.

Page 76: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Nucleic acid sequence searching – Definition of common nucleotides – Uncommon nucleotides

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for nucleotides • R, Y, M, K, S, W, B, D, H, V, N

– Modification of nucleic acids

76

Page 77: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables used in sequence searching

77

1 letter designation

Represents

A Adenine

G Guanine

C Cytosine

T Thymine

U Uracil

1 letter designation

Represents

R Guanine or adenine Y Thymine/uracil or cytosine M Adenine or cytosine K Guanine or thymine/uracil S Guanine or cytosine W Adenine or thymine/uracil B Guanine, cytosine or thymine/uracil D Adenine, guanine or thymine/uracil H Adenine, cytosine or thymine/uracil V Adenine, cytosine or guanine N Adenine, guanine, cytosine,

thymine/uracil, unknown or other

Page 78: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in BLAST sequence searching

78

• For DGENE, USGENE, PCTGEN – Using the nucleotide variables in a BLAST search, the

exact nucleotide would not be an identity match

L1 ANSWER 1 OF 10000 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SCORE 31 100% of query self score 31 BLASTALIGN Query = 17 letters Length = 62371 Score = 31.4 bits (15), Expect = 2e-04 Identities = 16/17 (94%) Strand = Plus / Plus Query: 1 taarttcttctgcagtt 17 ||| ||||||||||||| Sbjct: 41353 taaattcttctgcagtt 41369

Page 79: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in SCM sequence searching

79

• For DGENE, USGENE, and PCTGEN – R, Y, M, K, S, W, B, D, H, V, and N will only capture

records where the sequence has those designations, not the corresponding specific nucleotides they represent

– To include specific nucleotides in SCM search query, use [ ]

• Example: For TCASCC subsequence search use – RUN GETSEQ TCA[SGC]CC/SQSN

Page 80: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE DGENE

FILE 'DGENE' ENTERED AT 16:10:33 ON 19 FEB 2013 COPYRIGHT (C) 2013 THOMSON REUTERS

FILE LAST UPDATED: 15 FEB 2013 <20130215/UP> MOST RECENT PUBLICATION DATE: 17 JAN 2013 <20130117/PD> . . .

=> RUN GETSEQ TCASCCTA/SQSN

RUN GETSEQ AT 16:10:46 ON 19 FEB 2013 COPYRIGHT (C) 2013 FIZ KARLSRUHE GMBH

L1 RUN STATEMENT CREATED L1 7 TCASCCTA/SQSN => D SEQ

L1 ANSWER 1 OF 7 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 ccaggcagag tgacagttct gtgagttttc tactgtgcaa agcagagctg 51 gtttttcatt ttttatagcg tcascctatt caaagtgaat ataagctttc ======== 101 acatgtgttg tctgactcta tcctcaaatc agctccatga ggtaagaaat . . . HITS AT: 71-78

Variable SCM search example in DGENE

80

This is only capturing sequences where S is in the sequence.

Page 81: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE DGENE

FILE 'DGENE' ENTERED AT 11:23:08 ON 06 FEB 2013 COPYRIGHT (C) 2013 THOMSON REUTERS . . .

=> RUN GETSEQ TCA[SGC]CCTA/SQSN . . . Number of answers 214621 will create 9 Answer Sets L1 RUN STATEMENT CREATED L1 25000 TCA[SGC]CCTA/SQSN L2 RUN STATEMENT CREATED L2 25000 TCA[SGC]CCTA/SQSN . . . L9 RUN STATEMENT CREATED L9 14621 TCA[SGC]CCTA/SQSN

=> S L1-L9 AND SQL<200 21860725 SQL<200 L10 7575 (L1 OR L2 OR L3 OR L4 OR L5 OR L6 . . . OR L9) AND SQL<200

Variable SCM search example in DGENE

81

This is capturing sequences where S, G or C is in the sequence.

Page 82: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> D SEQ . . .

L10 ANSWER x OF 7575 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 aggaggcctc agcctatat == ====== HITS AT: 9-17 L10 ANSWER x OF 7575 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 gaaagcagtc accctatccg ctgatcagcc tcatg == ====== HITS AT: 9-17 L10 ANSWER x OF 7575 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 ttttcagatc tccattacta ggccaggata gcccgagggg gaagaggagc 51 aagtttttca scctacggga gctccgggtc tgcctaattt ttccgcccct === ===== 101 cccagccgaa aaacccatca g HITS AT: 58-65

Variable SCM search example in DGENE

82

Page 83: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Variables in SCM sequence searching

83

• For REGISTRY – R, Y, M, K, S, W, B, D, H, V and N are searchable

• Be careful with N, as it works like ‘.’ – They will capture records where the sequence has

those designations, and sequences with the corresponding specific nucleotides they represent

Page 84: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE REG

FILE 'REGISTRY' ENTERED AT 17:54:42 ON 06 FEB 2013 USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. PLEASE SEE "HELP USAGETERMS" FOR DETAILS. COPYRIGHT (C) 2013 American Chemical Society (ACS) . . .

=> S TCASCCTA/SQSN L1 734794 TCASCCTA/SQSN => S L1 AND SQL<200 17249391 SQL<200 L2 11074 L1 AND SQL<200

Variable SCM search example in REG

84

Page 85: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> D SEQ . . .

L2 ANSWER x OF 11074 REGISTRY COPYRIGHT 2013 ACS on STN

SEQ 1 actctttaga tctggcattc aaactgtctg tgttttgacc atcaccctag ======== 51 atcactgcct sttaccattt taggagtata gtttgaaatt ctgactgatt 101 ttaattggct ctgttcaact c HITS AT: 42-49

L2 ANSWER x OF 11074 REGISTRY COPYRIGHT 2013 ACS on STN

SEQ 1 tgcttaattg attatatctt ccttgtcatt ttgttccttc tttctgttta 51 attagcaaaa yggtgtctta taattctgga acagcaaaca aaatttttca 101 agtcagccta cttctaacac t ======== HITS AT: 103-110

L2 ANSWER x OF 11074 REGISTRY COPYRIGHT 2013 ACS on STN

SEQ 1 ttttcagatc tccattacta ggccaggata gcccgagggg gaagaggagc 51 aagtttttca scctacggga gctccgggtc tgcctaattt ttccgcccct === ===== 101 cccagccgaa aaacccatca g HITS AT: 58-65

Variable SCM search example in REG

85

Page 86: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Agenda

• Nucleic acid sequence searching – Definition of common nucleotides – Uncommon nucleotides

• FEATURE TABLE (FEAT) – In DGENE, USGENE and PCTGEN

• NOTES field (NTE) – In REGISTRY

– Variables for nucleotides • R, Y, M, K, S, W, B, D, H, V, N

– Modification of nucleic acids

86

Page 87: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Nucleic acid sequences with modifications

• To search for nucleic acid sequences with modifications, use normally-occurring nucleotide symbol or wildcard symbols in search queries – In DGENE, USGENE and PCTGEN

• Use ‘N’ in BLAST search queries for wildcard – Search name variations in FEATURE TABLE

• Use ‘.’ in SCM search queries for wildcard – Search name variations in FEATURE TABLE

87

Page 88: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Nucleic acid sequences with modifications

• To search for uncommon nucleic acid sequences with modifications, use normally-occurring nucleotide symbol or wildcard symbols in search queries – In REGISTRY,

• use ‘N’ in BLAST search queries for wildcard – Search for standard abbreviations in the NTE field – Search keywords (consider variations)

• use ‘.’ in SCM search queries for wildcard – Search for standard abbreviations in the NTE field – Search keywords (consider variations)

88

Page 89: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

CAS list of modifications

89

Page 90: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> S DEAZA?/FEAT L1 1060 DEAZA?/FEAT => D SEQ FEAT 3 L1 ANSWER 3 OF 1060 DGENE COPYRIGHT 2013 THOMSON REUTERS on STN SEQ 1 ctatctgucg ttctctgu FEATURE TABLE: Key |Location|Qualifier| =============+========+=========+========================= modified_base|1..18 |*tag= a | | |mod_base |OTHER | |note |"OTHER = Phosphorothioate | | |backbone" modified_base|7..8 |*tag= b | | |mod_base |OTHER | |note |"OTHER = Modified with | | |2'-O-Me" modified_base|10 |*tag= c | | |mod_base |OTHER | |note |"OTHER= 7-deaza-dG“

Deaza in DGENE

90

Page 91: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

=> FILE REG

FILE 'REGISTRY' ENTERED AT 15:53:51 ON 19 FEB 2013 USE IS SUBJECT TO THE TERMS OF YOUR STN CUSTOMER AGREEMENT. PLEASE SEE "HELP USAGETERMS" FOR DETAILS. COPYRIGHT (C) 2013 American Chemical Society (ACS) . . .

=> S DEAZA/NTE L1 180 DEAZA/NTE

=> D SEQ NTE 1 L1 ANSWER 1 OF 180 REGISTRY COPYRIGHT 2013 ACS on STN SEQ 1 tcagtattag cagtccgcg SEQ 1 gcggactgct aat **RELATED SEQUENCES AVAILABLE WITH SEQLINK** NTE multistranded (2) modified ---------------------------------------------------------------------- type ------ location ------ description ---------------------------------------------------------------------- modified base a-11[2] 3-deaza ----------------------------------------------------------------------

Deaza in REG

91

Page 92: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Summary

• Peptide/protein sequence searching – Uncommon amino acids

• FEATURE TABLE (FEAT) or NOTES field (NTE) – Variables for amino acids – Modifications

• Nucleic acid sequence searching – Uncommon nucleotides

• FEATURE TABLE (FEAT) or NOTES field (NTE) – Variables for nucleotides – Modifications

92

Page 93: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Resources

• Biosequence Searching on STN web site – http://www.stn-international.com/biosequence_searching.html

• DGENE workshop manual – http://www.stn-international.com/dgene_wm.html

• USGENE workshop manual – http://www.stn-international.com/usgene_wm.html

• STN quick reference cards – http://www.cas.org/training/stn/commands-qrc

• CAS coverage of sequences – http://www.cas.org/content/chemical-substances/sequences

93

Page 94: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

Acknowledgements

• Rob Austin • Alice Humel Denton • Lora Burgess

94

Page 95: Searching for uncommon sequences on STN · Searching for uncommon sequences on STN Jim Brown FIZ Karlsruhe, Inc. Agenda • Peptide/protein sequence searching – Definition of 20/22

FIZ Karlsruhe [email protected] Support and Training: www.stn-international.de

CAS E-mail: [email protected] Support and Training: www.cas.org

For more information …