new generation of patent sequence databases · patent families gm671154 ada42650 cs017585 acq13114...
TRANSCRIPT
![Page 1: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/1.jpg)
EBI is an Outstation of the European Molecular Biology Laboratory.
New generation of patent sequence databases
Information Sources in Biotechnology
Japan
![Page 2: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/2.jpg)
2
Patent-related resources
Patent
Resources
Patents
http://www.ebi.ac.uk/
![Page 3: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/3.jpg)
3 http://www.ebi.ac.uk/patentdata/
Patent resources at EBI
![Page 4: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/4.jpg)
4
Patent resources at EBI
Same sequences (EPO, USPTO, JPO, KIPO)
Non-redundant sequence data
Patent family classification
Enriched with patent information
EPO
Patent proteins:
USPTO JPO KIPO
Patent nucleotides:
ENA (EPO, USPTO, JPO, KIPO)
![Page 5: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/5.jpg)
5
NR patent
sequence
databases
NCBI
GenBank NIG
DDBJ
EBI
EMBL-Bank
USPTO
EPO
KIPO
JPO
other
patent
offices
INSDC
INSDC agreement:
• Free unrestricted access
• All data exchanged daily
Sequence data from patent literature
![Page 6: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/6.jpg)
6
Non-redundant patent databases
Patent
proteins
Patent
nucleotides
NRNL1
(Non-redundant
nucleotide level-1)
NRPL1
(Non-redundant
protein level-1)
Level-1
http://www.ebi.ac.uk/patentdata/
Groups together
100% identical
patent sequences
NRNL2
(Non-redundant
nucleotide level-2)
NRPL2
(Non-redundant
protein level-2)
Level-2 Groups together
identical sequences
by patent family
![Page 7: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/7.jpg)
7
Patent sequence record in NRNL1
Patents containing
100% identical
sequence
Sequence
![Page 8: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/8.jpg)
8
Patent sequence record in NRNL2
Patent
equivalents
Sequence
record in ENA
Sequence
Patent
literature
Priority number
and date
Translation
![Page 9: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/9.jpg)
9 www.ebi.ac.uk
Remove
sequence redundancy
Group by
patent families
Additional annotation,
including priority dates
for patent families
Level-1 NR
EMBL patents (redundant)
Level-2 NR
Non-redundant patent databases
![Page 10: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/10.jpg)
10
~23.9 M PAT sequences (>230 M total)
~12.2 M sequences
~15.5 M sequences
Nucleotide
NRNL1
ENA
NRNL2
~6.5 M PRT sequences (>32 M total)
~2.5 M sequences
~3.8 M sequences
Protein Patent
Proteins
NRPL1
NRPL2
Patent sequence records at EBI
![Page 11: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/11.jpg)
11
Sequence search
![Page 12: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/12.jpg)
12
Sequence searching
http://www.ebi.ac.uk/
Sequence Similarity &
Analysis
Tools
![Page 13: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/13.jpg)
13
Sequence searching
www.ebi.ac.uk/Tools/sss/
Wide variety of
search tools
![Page 14: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/14.jpg)
14
Choosing the right search engine
FASTA Better general search engine
SSEARCH Sensitive but slow; good for short sequences
GGSEARCH Force full-length matches ||||||||||||||| Query
Subject
GLSEARCH Match domains/patterns
to protein; oligo-to-gene |||||||||||||||
Query
Subject
BLAST General search engine
![Page 15: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/15.jpg)
15
Protein
Patent
databases
Search a variety of databases
*Select all 6
results in triplicate!!
![Page 16: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/16.jpg)
16
Search a variety of databases
Patent data
Nucleotide
*Select all 3
results in triplicate!!
![Page 17: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/17.jpg)
17
let’s look at an example…
![Page 18: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/18.jpg)
18 http://www.ebi.ac.uk/Tools/sss/
Protein
Patent
proteins
Example:
Search patent
protein sequence
Searching a redundant database
![Page 19: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/19.jpg)
19
. . .
>260 identical results
too much to analyze
Results from a redundant database
![Page 20: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/20.jpg)
20
LEVEL-1 NR patent sequence database
removes redundancy
fewer results to analyze, less chance
of missing important results
![Page 21: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/21.jpg)
21
Searching NR level-1 patent database
NR patent
Level-1
http://www.ebi.ac.uk/Tools/sss/
Example:
Search patent
protein sequence NR patent
level-1
![Page 22: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/22.jpg)
22
Results from NR level-1 database
Each hit unique
![Page 23: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/23.jpg)
23
Results from NR level-1 database
Link to patent
documentation
List of all
patents
containing
the sequence
Link to
sequence
entry
Earliest
publication date
![Page 24: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/24.jpg)
24
Patent families
Simple Patent Family is a group of patents
that relate to the same invention, and are
based on the same originating application
They arise when an invention is patented in
multiple countries
Grouping patents into families reduces multi-national
results down to a representative member
![Page 25: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/25.jpg)
25
Patent families
GM671154 CS017585 ACQ13114 DI603183 AAR79155 DD649656 ADA42650
100% identical sequences
Invention A Invention B
HB492658
EP WO US US JP
patent family second
patent family
Same sequence can appear multiple times in a database due to:
Same invention filed multiple times in different offices (same patent family)
Different inventors use the same sequence in different contexts (different
patent families)
![Page 26: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/26.jpg)
26
LEVEL-2 NR patent sequence database
groups identical sequences by patent family
provides earliest priority date for family
![Page 27: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/27.jpg)
27
Searching NR level-2 patent database
NR patent
Level-2
http://www.ebi.ac.uk/Tools/sss/
Example:
Search patent
protein sequence NR patent
level-2
![Page 28: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/28.jpg)
28
Results from NR level-2 database
Each hit = one family
![Page 29: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/29.jpg)
29
Results from NR level-2 database
Patent
equivalents
Earliest publication
data in family Earliest active
priority date in
family
![Page 30: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/30.jpg)
30
Results from NR level-2 database
Link to patent
documentation
Link to
sequence
entry
patents in
same family
![Page 31: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/31.jpg)
31
Text search
![Page 32: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/32.jpg)
32
SRS: advanced text search
1st: Select resources to
search
2nd: Create query
http://www.ebi.ac.uk/srs/
![Page 33: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/33.jpg)
33 Sequence Searching Tools
SRS: advanced text search
Select library tab
![Page 34: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/34.jpg)
34 Sequence Searching Tools
SRS: advanced text search
Select library tab
Search >100 databases
NR patent DNA
(NRNL1 & NRNL2)
NR patent proteins
(NRPL1 & NRPL2)
![Page 35: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/35.jpg)
35 Sequence Searching Tools
SRS: advanced text search
Select library tab
Search >100 databases
Example:
Selected to search
NR level-1 patent
DNA database
![Page 36: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/36.jpg)
36 Sequence Searching Tools
SRS: advanced text search
Select resources to search
Select library tab
![Page 37: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/37.jpg)
37 Sequence Searching Tools
SRS: advanced text search
Select resources to search Select library tab
1) Select field 2) Type in text
![Page 38: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/38.jpg)
38 Sequence Searching Tools
SRS: advanced text search
Select resources to search Select library tab
Here, selected
patent number
![Page 39: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/39.jpg)
39 Sequence Searching Tools
SRS: advanced text search
Select resources to search
Create query
Select library tab
![Page 40: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/40.jpg)
40 Sequence Searching Tools
SRS: advanced text search
Select resources to search Create query Select library tab
Lists non-redundant
nucleotide
sequences from
WO0146262
![Page 41: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/41.jpg)
41 Sequence Searching Tools
SRS: advanced text search
Select resources to search Select library tab Create query
WO0146262 sequences
![Page 42: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/42.jpg)
42 Sequence Searching Tools
SRS: advanced text search
Select resources to search Select library tab Create query
WO0146262
nucleotide sequence
record in NRNL1
WO0146262 sequences
Details which other
patents also claim
this sequence
(with NRNL2, would
see family grouping)
![Page 43: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/43.jpg)
43 Sequence Searching Tools
SRS: advanced text search
Select resources to search Create query Select library tab
NRNL1 sequence record WO0146262 sequences
![Page 44: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/44.jpg)
44 Sequence Searching Tools
SRS: advanced text search
Select resources to search Create query Select library tab
http://www.ebi.ac.uk/srs/
NRNL1 sequence record
WO0146262 literature WO0146262 sequences
![Page 45: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/45.jpg)
45 Sequence Searching Tools
SRS: advanced text search
Find all sequences associated with a patent
Find all sequences associated with a patent
+ identify all patents associated with
each sequence
Find all sequences associated with a patent
+ identify all patents in the same family
associated with each sequence
NRNL1
EMBL-Bank
NRNL2
![Page 46: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/46.jpg)
46 http://www.ebi.ac.uk/patentdata/
Non-redundant
For more information
![Page 47: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/47.jpg)
47
Publication
User
Manual
For more information
![Page 48: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658](https://reader034.vdocuments.net/reader034/viewer/2022042911/5f42c8fd977beb762318555c/html5/thumbnails/48.jpg)
48
Help
Contacts:
http://www.ebi.ac.uk/support/