first year present

23
1 Using Text Mining to Explore Concept Complexity in Obesity through Concept Maps George Karystianis School of Computer Science Supervisors: Goran Nenadic, Iain Buchan Advisor: Andrea Schalk

Upload: doomas

Post on 05-Jun-2015

316 views

Category:

Education


0 download

TRANSCRIPT

Page 1: First year present

1

Using Text Mining to Explore Concept Complexity in Obesity

through Concept Maps

George KarystianisSchool of Computer Science

Supervisors: Goran Nenadic, Iain BuchanAdvisor: Andrea Schalk

Page 2: First year present

2

Motivation● Complex nature of obesity.

● Wide range of biomedical data sources available.

– implementation of biomedical text/data mining.

● Possible to reveal hidden links between obesity and other

diseases.

● Partial completed knowledge representation models of obesity.

● A systematic approach required for:

– analysis and interpretation of clinical knowledge.

Page 3: First year present

3

Concept Maps● Knowledge representation models.

● Consisted of:

– nodes (concepts).

– links (relationships between the nodes).

● Aim: gather, understand, explore knowledge.

● Variety of users.

● No explicit detail.

● Implemented primarily in education.

Page 4: First year present

4

Concept Map Example

Page 5: First year present

5

Aim

● To design a framework to build/enhance medical concept maps.

● To improve the understanding of health care concept

complexity.

● Assist medical professionals in the representation, exploration

and validation of their expert knowledge.

● Improvement of the clinical health care.

Page 6: First year present

6

Objectives● Design and implement methods for health care concept

detection.

● Concept organisation in a concept map form.

● Method generation for concept map updates.

● Build a framework for the design/enhancement/validation of medical concept maps.

● Methodology evaluation through the health problem of obesity:

– validation of obesity related concepts with current structured obesity information available.

– identify gaps in clinical knowledge.

Page 7: First year present

7

Research Hypothesis & Questions

-The analysis required to extract health care concepts.

-The approach to built and enhance a concept map.

-The concept map contribution in the representation/validation of knowledge.

-The text mining results help to understand/explore clinical problems.

BiomedicalText Mining

Scientificliterature

Concept map

Improvement ofhealth care

Framework

Page 8: First year present

8

Obesity

● Worldwide problem.

● Epidemic proportions:

– WHO rates (2005): 1.6 billion overweight, 400 million obese.

● Associations to various diseases.

● Complex risk factors and complications.

● Various aspects.

● Lots of research.

Page 9: First year present

9

Page 10: First year present

10

Biomedical Text Mining● Extraction of information from unstructured data of biomedical

nature.

● Discovery of new, previously unknown knowledge.

● Performed on documents with complex/specific terminology and expressions.

● Challenges:

– language ambiguity.

– variation of language expression.

● Various tools and applications (Termine, Whatizit, GATE).

● Adaptation to user's tasks and requirements.

Page 11: First year present

11

What we are looking for?

● Risk Factors

● Causal Factors

● Confounding Factors

● Outcomes

● Complications

● Interventions

● ...

Page 12: First year present

12

Methodology Overview

1. Document retrieval.

2. Term/concept extraction.

3. Feature engineering and Information extraction:

- application of classification/clustering techniques.

4. Concept map design.

Page 13: First year present

13

Evaluation-Obesity Case Study● Comparison:

– What ?● biomedical text mining results.● concept map information.

– How ?● concepts and relationships.● New ones.

● Examination/manipulation/validation of new knowledge by experts.

● Enhancement of the concept map.

Page 14: First year present

14

Progress so far (1)● Corpus collection.

● Application of Automated Term Recognition (ATR).

● C-value method.

● Single word ATR:

– terminological head identification.

– word of a multi-word term that defines the term class.

– example:

● “Childhood diabetes type II”.● Terminological head: “diabetes”.

Page 15: First year present

15

Progress so far (2)● Ranking head measures:

– total head frequency,

– single head frequency,

– maximum and average C-value,

– abstract frequency,

– ratio of single head frequency/total head frequency,

– tf*idf (term frequency*inverse document frequency).

Page 16: First year present

16

Results

tf*idf total freq single freq abstract freq word freq max_c aver_c ratio

0

5

10

15

20

25

30

35

40

45

01020304050

Statistical measure

Nu

mbe

r o

f ke

y wo

rds

Page 17: First year present

17

Progress so far (3)● Pattern extraction from abstracts for:

– risk, confounding and causal factors,

– interventions,

– complications,

– outcomes.

Obesity risk is increased among women with psychiatric disorders

Potential risk factor

Page 18: First year present

18

Example

Potential risk factors Potential interventions Potential complications

Page 19: First year present

19

Future plan

Species identification in obesity corpus (Linneus)

Exploration of single word terms ATR

Calculation of z-score

Integration of single and multi-word terms

Lexical/semantic analysis of the existing concept map

Paper preparation for the extraction of single terms in text

Pattern extraction from manual analysis

Pattern rule design with Minor Third

Feature engineering

Clustering

Classification

Paper preparation for the classification of disease descriptors

Paper preparation for the clustering of health care concepts

Integration of the results

Preparation of the second year interview/report

Design of concept map relationships (exploration)

Application of visual mapping tools

Update of the new concept map

Comparison and validation of knowledge

Exploration of concept complexity in obesity

Paper preparation for the automatic design of clinical concept maps

Produced generic framework of the methodology

Writing the thesis

October 2010 April 2011 November 2011 May 2012

Year 3Year 2Date

Year 2 (1/2): Concept extraction

Page 20: First year present

20

Future plan

Species identification in obesity corpus (Linneus)

Exploration of single word terms ATR

Calculation of z-score

Integration of single and multi-word terms

Lexical/semantic analysis of the existing concept map

Paper preparation for the extraction of single terms in text

Pattern extraction from manual analysis

Pattern rule design with Minor Third

Feature engineering

Clustering

Classification

Paper preparation for the classification of disease descriptors

Paper preparation for the clustering of health care concepts

Integration of the results

Preparation of the second year interview/report

Design of concept map relationships (exploration)

Application of visual mapping tools

Update of the new concept map

Comparison and validation of knowledge

Exploration of concept complexity in obesity

Paper preparation for the automatic design of clinical concept maps

Produced generic framework of the methodology

Writing the thesis

October 2010 April 2011 November 2011 May 2012

Year 3Year 2Date

Year 2 (2/2): Concept structuring

Page 21: First year present

21

Future plan

Species identification in obesity corpus (Linneus)

Exploration of single word terms ATR

Calculation of z-score

Integration of single and multi-word terms

Lexical/semantic analysis of the existing concept map

Paper preparation for the extraction of single terms in text

Pattern extraction from manual analysis

Pattern rule design with Minor Third

Feature engineering

Clustering

Classification

Paper preparation for the classification of disease descriptors

Paper preparation for the clustering of health care concepts

Integration of the results

Preparation of the second year interview/report

Design of concept map relationships (exploration)

Application of visual mapping tools

Update of the new concept map

Comparison and validation of knowledge

Exploration of concept complexity in obesity

Paper preparation for the automatic design of clinical concept maps

Produced generic framework of the methodology

Writing the thesis

October 2010 April 2011 November 2011 May 2012

Year 3Year 2Date

Year 3: Design of the medical concept map

Page 22: First year present

22

Summary● Framework creation for clinical concept map building and

enhancement.

● Improved understanding of health care concept complexity.

● So far:

– comprehension of literature review.

– methodology design.

– single ATR.

– pattern design.

Page 23: First year present

23

End

Acknowledgements

2. School of Computer ScienceUniversity of Manchester

1. Medical Research Council