xiaohui liu ccccida brunel universityiti.srce.unizg.hr/docs/iti2011/ppt_liu.pdf · david gilbert,...

24
Intelligent Data Analysis for Complex Systems Xiaohui Liu Brunel University C C CIDA IDA IDA IDA

Upload: others

Post on 18-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Intelligent Data Analysis

for Complex Systems

Xiaohui Liu

Brunel UniversityCCCCIDAIDAIDAIDA

Page 2: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Managing complex systems:

Knowledge is power (1987)…

Page 3: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Data Important too (1988)…

Page 4: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Human Factor Difficult (early 90’s)…

Page 5: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

More and more data (90’s)…

Page 6: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Data Analysis

17th Century -

Page 7: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Data mining: alternative meanings

Page 8: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Statistics versus Computing (1)

“Data Mining:

Threat or

Opportunity”?Opportunity”?

(1998; RSS)

Page 9: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Statistics versus Computing (2)

� “Is statistics being left behind by computing?”

(2002 RSS International Conference; Plymouth)

Statistics ComputingInterdisciplinaryStatistics

� Methods and Principles for data analysis

Computing

� Computational infrastructure

� New analysis approaches

Interdisciplinary

� Bayesian networks

� Support Vector Machines

� …

Page 10: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Intelligent Data Analysis

�An interdisciplinary

study concerned with the

effective analysis of dataeffective analysis of data

Page 11: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Technology (Immature)

Page 12: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Automating large-scale image analysis(Fraser/Wang/Liu 2010 Chapman & Hall/CRC)

Copasetic Clustering

Applies a clustering algorithm to the entire

image surface

Copasetic Clustering

Applies a clustering algorithm to the entire

image surface

Structure Extrapolation Feature IdentificationData Services Data Analysis

Post ProcessingPost ProcessingPost Processing

Image Transform

ation Engine

Transform

images into different views as

required. This service is available to all

subsequent stages of the framework

Image Transform

ation Engine

Transform

images into different views as

required. This service is available to all

subsequent stages of the framework HISTORICAL RESULTS

FINAL RESULTS

Spatial BindingSpatial Binding Vader SearchVader Search

Image Layout

Acquire high level image information (gene blocks)

Image Layout

Acquire high level image information (gene blocks)

Image Structure

Acquire low level image information (individual genes)

Image Structure

Acquire low level image information (individual genes)

Final Analysis

Summarise data (calculate gene spot metrics)

Final Analysis

Summarise data (calculate gene spot metrics)

Final Analysis

Summarise data (calculate gene spot metrics)

Clean up image (compensate for background noise)

Clean up image (compensate for background noise)

Clean up image (compensate for background noise)

Image Transform

ation Engine

Transform

images into different views as

required. This service is available to all

subsequent stages of the framework

Image Transform

ation Engine

Transform

images into different views as

required. This service is available to all

subsequent stages of the framework

Data Stream

Service Request Helper Task

Core TaskData StreamData Stream

Service RequestService Request Helper Task

Core TaskOutput

Log2 ratios and related statistics

Output

Log2 ratios and related statistics

Input

Raw 16bit cDNA microarray images

Identify and group regions of interest (a gene’s pixels)

Identify and group regions of interest (a gene’s pixels)

Enhance ‘Spatial Binding’ using historical results

Enhance ‘Spatial Binding’ using historical results

Page 13: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Human behaviour/data quality

Liu, Cheng and Wu (2002)IEEE Trans on KDE

Page 14: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Understanding individual differences

Human Factors

User

Behaviour

User

PerceptionFrias-Martinez, Chen, Liu (2006)J. American Society forInformation Scienceand Technology

Chen & Liu (2008)ACM Transactions onComputer-HumanInteraction

Personalised Interfaces

Online

Learning

Digital

Libraries

Data Mining

Frias-Martinez, Chen, Liu (2006)IEEE Trans on SMC:Part C

Clewley Chen and Liu (2010)Journal of Documentation

Page 15: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Clustering via Consensus

0

0 1141312

LLLLLM

LLLLL n

fff

ffff

Agreement MatrixConsensus Clusters

Input Cluster Results

-10 0 10 20

-10

-50

5

cmdscale(disthhv8)[,1]

cmdscale(disthhv8)[,2]

0000

0

0

1

334

22423

LLLLLL

OMMM

MOOMMM

MOOMMM

MOMMM

MOOMMM

MOOMMM

LLLLLMM

LLLLLM

nn

ij

n

n

f

f

ff

fff

-10 0 10 20

-10

-50

5

cmdscale(disthhv8)[,1]

cmdscale(disthhv8)[,2]

-10 0 10 20

-10

-50

5

cmdscale(disthhv8)[,1]

cmdscale(disthhv8)[,2]

-10 0 10 20

-10

-50

5

cmdscale(disthhv8)[,1]

cmdscale(disthhv8)[,2]

Swift et al Genome Biology 2004Hirsch et al J Computational Biology 2008Li et al Intelligent Data Analysis 2010

Page 16: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Dynamic Modelling

Wang et al (2009)IEEE/ACM Trans onComputational Biology& Bioinformatics

Wei et al (2009)Math Biosciences

& Bioinformatics

Tucker et al (2005)Artificial IntelligenceIn Medicine

Khanin et al (2006)PNAS

Swift et al (2006)Natural Computing

Page 17: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Visual Analytics

Zhang, Kuljis and Liu (2008)IEEE Transactions on SMC Part CZhang et al. (2005)Visualisation and Data Analysis 2005

Page 18: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Real-Time Modelling

Ruta, Li and Liu (2010)Pattern Recognition

Ruta, Li, and Liu (2010)IEEE Transactions on Intelligent Transportation Systems

Page 19: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Understanding the nature

Page 20: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Cyber Defence

‘Cyber attacks and terrorism head threats facing UK’BBC News 18 Oct 2010

Page 21: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

EFACTS (the European Friedreich’s Ataxia Consortium for Translational Studies) assembles a body of expertise to adopt a translational research strategy for the rare autosomal recessive neurological disease, Friedreich’s ataxia (FRDA). FRDA is a severely debilitating inherited disease that leads to loss of the ability to walk and dependency for all activities.

EFACTS strongly believes that, 12 years after European researchers discovered the FRDA gene, frataxin, when new treatments for FRDA are being developed, the time is ripe to invest in FRDA research in a concerted Europe-wide fashion.

EFACTSThis project proposal has the following scientific and technological objectives:

1. Comprehensively populate a European FRDA database, linked to a bio bank2. Define a panel of clinical assessment tools3. Build on the knowledge base of frataxin structure and function4. Build on the knowledge base of the pathogenic cascade5.Build on the knowledge base of epigenetic mechanisms of frataxin silencing6. Develop new cellular and animal models for the study of FRDA7. Identify FRDA biomarkers8. Identify genetic modifiers of FRDA9. Develop therapeutics for FRDA

EFACTS Project Partners

UNIVERSITÉ LIBRE DE BRUXELLES

UNIVERSITAETSKLINIKUM AACHEN

KATHOLIEKE UNIVERSITEIT LEUVEN

FONDAZIONE IRCCS ISTITUTO NEUROLOGICO CARLO BESTA

MEDIZINISCHE UNIVERSITAET INNSBRUCK

UNIVERSITY COLLEGE LONDON

IMPERIAL COLLEGE OF SCIENCE, TECHNOLOGY AND MEDICINE

BRUNEL UNIVERSITY

MEDICAL RESEARCH COUNCIL

OXFORD UNIVERSITY

CONSEJO SUPERIOR DE INVESTIGACIONES CIENTIFICAS

CENTRE EUROPEEN DE RECHERCHE EN BIOLOGIE ET MEDECINE

INSTITUT NATIONAL DE LA SANTE ET DE LA RECHERCHE MEDICALE

Friedreich’s Ataxia (FRDA)

Friedreich ataxia is caused by a GAA repeat expansion mutation in the FXN gene leading to decreased expression of the essentialmitochondrial protein ‘frataxin’.

Decreased frataxin protein in cells leads to a defect in iron-sulphur cluster enzyme activities, iron accumulation and increased freeradical damage to DNA, protein and lipid. Therapeutic strategies are aimed at reversing these disease effects.

Page 22: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Summary

� Handling data, knowledge and human factors with care.

� Data quality is key. Technology and human are two major issues.human are two major issues.

� Statistics and computing are inseparable when it comes to data mining and complementary in their strength.

� Data from complex systems tend to offer interesting analysis challenges.

Page 23: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Strategic Issues

(Scenario;

Reuse)

Opportunities

Complex &

Dynamic

Systems

Interdisciplinary Science of Data Analysis

Page 24: Xiaohui Liu CCCCIDA Brunel Universityiti.srce.unizg.hr/docs/iti2011/ppt_Liu.pdf · David Gilbert, XiaoHui Liu , Crina Grosan, Chris Paterson, Annette Payne (SISCM) and Mark Pook (SHSSC)

Acknowledgements

� All CIDA members, past and present.

� All collaborators, past and present.

� All funding agencies.

�ALL OF YOU for your attention!