quality of classification. optimum: all documents pertaining to specific technical area (concept)...
TRANSCRIPT
![Page 1: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/1.jpg)
Quality of Classification
![Page 2: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/2.jpg)
Optimum:
All documents pertaining to specific technical area (concept) are found by classification search
What to achieve ?
Recall = = 1# retrieved relevant documents
# existing relevant documents
For concepts defined in IPC:
documents have all appropriate symbols
< > Efficiency: documents have no inappropriate symbols
Priority 1:
Priority 2:
![Page 3: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/3.jpg)
document is unclassified
has wrong / inappropriate classification
has outdated / invalid classification
non-exhaustive / incomplete classification
> appropriate symbols are missing
> given symbols are not specific enough
varying classifications of family members
excessive classification
Phenomenology of quality issues
![Page 4: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/4.jpg)
Different aspects
individual document / publication- classification by publishing IPO- and by other IPOs, e.g. EPO > ECLA
DPMA > "ICP"JPO,… ?
> examiners create their own search files
different publication levels:- unexamined (unsearched) applications- granted patents
families: in MCD reclassification at family level
data in different databases
![Page 5: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/5.jpg)
![Page 6: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/6.jpg)
Unclassified documents
Published before 1.1.2006:
many documents in MCD still unclassified / not reclassified:
92% of all documents in MCD*
87% of all documents of EPO members
Published after 1.1.2006:97% of all documents in MCD91% of all WO
each week 6 - 8% of WO publications are not classified at all
*cf IPC/CE/40/4
![Page 7: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/7.jpg)
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
06.07.06 14.09.06 23.11.06 01.02.07 12.04.07 21.06.07 30.08.07 08.11.07 17.01.08
Publication week
% u
ncla
ssifi
ed W
O d
ocs
/ w
eek
0
50
100
150
200
250
300
350
400
Num
ber
of u
ncla
ssifi
ed W
O /
wee
k
Percentage unclassifiedNumber unclassified
Unclassified WO documents
![Page 8: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/8.jpg)
Publication week 50 (13.12.2007): 260 of 3272 (7.9%)
ISA
EP 218 (84%)
KR 27 (10%)
AU 5
US 5
RU 2
SE 2
CA 1
Receiving Office
US 177
IB 31
EP 26
GB 9
KR 3
DE 2
FR 2
IL 2
:
Unclassified WO documents
Lesson : There are still many documents without any valid classification
> Top priority: All documents should have at least one valid classification
![Page 9: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/9.jpg)
Wrong classification
A61N 1/00 Electrotherapy; Circuits therefor
courtesy of M. Meier (Audi)
![Page 10: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/10.jpg)
Wrong classification
B60K Arrangement or mounting of propulsion units or of transmissions in vehicles
Lesson : Completely wrong classifications do occur
courtesy of M. Meier (Audi)
![Page 11: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/11.jpg)
Wrong classification
Lesson : Typos may occur; flaws of concordance tables
Example: WO2007126503
ISR: G01L 19/02
Espacenet: G10L 19/02
Wrong classifications: difficult to investigate because difficult to find feedback by users needed
![Page 12: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/12.jpg)
Outdated / invalid classification
Business methods: G06F 17/60 G06Q [2006.01]
in Espacenet: 0 WO docs with a:G06F17/60
in Patentscope: 1506 WO docs with G06F17/60 - e.g. WO2007004271 reclassified in Espacenet only to ECLA
Lesson : Reclassification following revision is still incomplete
Lesson : Classification data may be different in different databases
in Espacenet: many non-PCT min are not reclassified- e.g. CZ, UY, NZ, AR
not all PCT min is reclassified- e.g. only 678 of 14543 KR docs reclassified in ECLA/IPC
![Page 13: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/13.jpg)
Outdated / invalid classification
Traditional medicine: A61K 35/78 A61K 36/.. [2006.01]
in Espacenet: 10413 docs still have 35/78 as ECLAonly 7412 thereof have 36/..
Lesson : Reclassification to valid IPC incomplete
Further example WO1998039019in Espacenet: A61K 36/02 as IPC-AL
A61K 35/80 as ECLAPatentscope: A61K 35/80 as IPC
Lesson : Classification data may be different in different databases
![Page 14: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/14.jpg)
Example: Aircraft cargo loading logistics system
US 2005246132 A1 (3.11.2005)
US 7100827 B2 (5.9.2006)
DE 102005019194 A1 (24.11.2005)
FR 2871269 A1 (9.12.2005)
Classification data on front page
US A1 US B2 DE A1 FR A1
B64C 1/22 G06F 19/00 G06F 17/60 G06F 19/00
G06K 15/00 G07C 11/00 G06F 17/60
Lesson : Classification of granted patents may be very different
Lesson : Assessment of main classification varies
Varying classifications in family
![Page 15: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/15.jpg)
US A1 US B2 DE A1 FR A1 EspaceIPC
EspaceECLA
Depatis PatFT
B64C 1/20 X X X
B64C 1/22 X X X
B64D 9/00 X X X
B64D 9/00A X
G06K 15/00 X X
G06Q 10/00
G06Q 10/00D X
G06F 17/60 X X X
G06F 19/00 X X X X X
G07C 11/00 X X X
Lesson : classification data from subsequent publications may not be in MCD
Lesson : some reclassification data may not be in MCD; exist as ECLA only
Varying classifications in family
![Page 16: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/16.jpg)
Varying classifications of single document
Example: WO2007126503
ECLA: G01L 19/00B (roll up to IPC: G01L 19/00)
IPC: G01L 19/02
Lesson : different views of different classifiers
US7258017 B1 (granted family member)
IPC: G01L 19/04
Lesson : classification of granted patents may be different
![Page 17: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/17.jpg)
Current problems in classification (I): IPC consistency
• KR20070005367 A (Prio.: KR20050060661)• Multifocal lens and manufacture method thereof • IPC (AL):G02B3/10 • • JP2007017937 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/13; G02B3/14; G02F1/1334 • • US2007008599 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02B5/32 • • CN1892258 A (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02B3/10 • • EP1742100 A1 (Prio.: KR20050060661)• Multifocal lens and method for manufacturing the same • IPC (AL):G02F1/1334
Lesson : classifiers may have different views of subject matter to be classified or interpret IPC groups differently
by courtesy of H. Wongel
![Page 18: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/18.jpg)
Non-exhaustive classification
Example: Secondary scheme A01P [2006.01]
"Biocidal, pest repellant ,… activity of chemical compounds"
Espacenet:
not in ECLA !
A01P EP A01N EP
total 43361 1054 (2%)
99994 23330 (24% )
2007 2104 114 (5% )
10328 1040 (10% )
Lesson : incompatibility of IPC and ECLA may cause non-exhaustive classification
![Page 19: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/19.jpg)
Non-exhaustive classification
Example: EP1881839
ECLA: A61K 36/487
IPC: A61K 36/00
Lesson : classifications could be more specific
Lesson : relevant classifications may not be given / available as IPC
Example: A61K 36/..
ECLA: 22440 documents
IPC: only 17847 thereof have a:A61K 36/..
Example: C12Q 1/68
Espacenet: > 100.000 docs
ECLA: > 40 subgroups
IPC: 0 subgroups
![Page 20: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/20.jpg)
Causes/sources for deficiencies "wrong" or varying intellectual classification:
- rules too complicated- drawbacks of classification scheme (too much
overlap)- interpretation of subject matter- differing national practise- lack of expertise, diligence, time pressure
granted claims may differ incompatibility ECLA - IPC; USPC concordance tables lack or delay of reclassification:
- insufficient resources for intellectual reclassification data exchange / management problems data input (typos)
![Page 21: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/21.jpg)
Options for improvement
on IPO level:- allocate resources- adapt / harmonize classification practise / training- develop classification assistance tools
on user level:- knowing deficiencies > adapt search strategies
on IPC level:- improve user-friendliness (e.g. definitions)- simplify IPC scheme, rules
More liberal approach when classifying ?One more symbol better than one symbol missing ?Do we need to be worried about varying classifications ?
![Page 22: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/22.jpg)
Options for improvement
On MCD / database level: crosscheck content of databases pooling / compiling of classification data (in one searchable
field / on family level ?) of- classification data of fam members- subsequent publications- other sources (DE: ICP,…)
processing such compilations of classifications of different origin, e.g.:
compare classification of subsequent publications (A, B, ..)
> create "trusted" classifications (e.g. class (A) = class (B)) ?
![Page 23: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/23.jpg)
Learn from / go WEB 2.0 ?
"Folksonomy", "social tagging", "cooperative, collaborative classification"
> include broader user community ?e.g. any searcher ?
> implement feedback channels ?
![Page 24: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/24.jpg)
Are you satisfied with classification in A61N 1/00 ? Yes / No
Would you like to suggest further classifications: .....................................................................
Submit
Click opens
![Page 25: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/25.jpg)
Learn from / go WEB 2.0 ?
"Folksonomy", "social tagging", "cooperative, collaborative classification"
> include broader user community> compile varying views, ie classifications
process such data; create "trusted" classifications
broader participation in scheme development, in particular definitions ? Tagging of IPC entries ?
Thank you
![Page 26: Quality of Classification. Optimum: All documents pertaining to specific technical area (concept) are found by classification search What to achieve ?](https://reader035.vdocuments.net/reader035/viewer/2022062619/55154647550346a87d8b613c/html5/thumbnails/26.jpg)
More liberal approach when classifying ?One more symbol better than one symbol missing ?Do we need to be worried about varying classifications ?
Include broader user community ?e.g. any searcher ?
Implement feedback channels ?
Create "trusted" classifications (e.g. class (A) = class (B)) ?
Top priority: all documents should have at least one valid classification
Priority 1: documents have all appropriate symbols
Priority 2: documents have no inappropriate symbols