extracting patient data from tables in clinical literature
TRANSCRIPT
![Page 1: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/1.jpg)
Extracting patient data from tables in clinical literature
Case study on extraction of BMI, weight and number of patients
Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic
![Page 2: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/2.jpg)
Clinical trial literature
• PubMed contains nearly 800 000 clinical trial publications
• Researchers challenged with the amount of published literature
![Page 3: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/3.jpg)
Help from text mining?
• Text mining provides methods to process text on a large scale
• Current text mining efforts were mainly focused on text, rather than tables and figures
![Page 4: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/4.jpg)
Tables in clinical documents
• A clinical trial publication contain 2.1 tables • Tables often contain information about
settings and findings of experiments
![Page 5: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/5.jpg)
Challenges for table mining
• Dense content• Variety of layouts• Variety of value representation formats• Misleading visualization markup• Lack of resources (labelled datasets)• How to automatically make make sense from tables
![Page 6: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/6.jpg)
Aim – a case study
• Extract information about number of patients, patient’s BMI and weight from tables in clinical trial literature
• A multi-layered approach to mining information from tables – to facilitate largescale semi-automated extraction – curation of data stored in tables
![Page 7: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/7.jpg)
Methodology overview
• Rule based methodology– Rules created based on a manual analysis of small
subset of tables• Five processing layers– Detection– Functional– Structural– Syntactic– Semantic
![Page 8: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/8.jpg)
Methodology overview
![Page 9: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/9.jpg)
Table model
• We model 4 main types of tables– List– Matrix– Super-row– Multi-tables
• Based on table dimensionality
![Page 10: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/10.jpg)
Table types (1)• List table:
![Page 11: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/11.jpg)
Table types (2)
• Matrix table
![Page 12: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/12.jpg)
Table types (3)
• Super-row table
![Page 13: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/13.jpg)
Table types (4)
• Multi-table
![Page 14: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/14.jpg)
1. Functional analysis
• Classifies cells to functional classes– Header, – super-row, – stub, – data
• Uses heuristics based on content and position
![Page 15: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/15.jpg)
2. Structural analysis
• Determines relationships between cells• Using cell functions and table structure classifies
table into one of the structural table type:– List– Matrix– Super-row– Multi-table
• Based on the type, set of rules resolves the relationships
![Page 16: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/16.jpg)
3.1 Extracting number of patient• Heuristic based approach• Searches captions, headers, cells• In captions 2 rules:
– n=%d– %d Adj*(patients|participants|subjects|individuals)– Usually total number of patients is found
• In header – usually n=%d– can be partial, needs adding up
• In cells– stub contains defined word or phrase– Can be partial, needs adding up
![Page 17: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/17.jpg)
3.2 Extracting BMI
• Based on trigger phrase (BMI, body mass index) list and black list (change, increase)
• Trigger words in stub or header invoke possibility of appearance
• If black listed word is in vicinity it discards the value
• Range of 14-40
![Page 18: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/18.jpg)
3.3 Extracting weights
• Based on trigger words and black lists• Looking in stub and header for words from
lists and values in data cells• Not useful to set range– Person can have 40 – 150 kg– In lbs: 80 – 350 lbs– Baby can have 1500 – 5000 g
![Page 19: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/19.jpg)
Results• Corpus contained 3573 tables in 1273 documents• Each table on average 80 cells• Evaluating Functional and Structural processing: – Selected random 100 tables of each type and
evaluated• Evaluating information extraction:– Number of patients: • 758 contained data• 50 random documents
– BMI and weight: • 113 documents containing these information
![Page 20: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/20.jpg)
Functional analysis results
![Page 21: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/21.jpg)
Results for information extraction
• Extracting number of patients:
• Extracting weight and BMI:
![Page 22: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/22.jpg)
Discussion
• Better scoped values, such as BMI can be modelled – better performance
• Define exhaustive white and black lists• Variety of presentation formats and means• Misleading markup• However, promising results
![Page 23: Extracting patient data from tables in clinical literature](https://reader034.vdocuments.net/reader034/viewer/2022052606/58f0b4521a28ab2c798b4607/html5/thumbnails/23.jpg)
Summary
• Large-scale table mining to harvest population details from clinical trials
• Classified tables based on layout• Case study on clinical trial patient number,
BMI and weight• Promising performance