big data taxonomy 8/26/2013

23
Towards a Big Data Taxonomy Bill Mandrick, PhD Data Tactics Version 26_August_2013

Upload: datatactics

Post on 26-Jan-2015

106 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Big Data Taxonomy 8/26/2013

Towards a Big Data Taxonomy

Bill Mandrick, PhD

Data Tactics

Version 26_August_2013

Page 2: Big Data Taxonomy 8/26/2013

Scientific Taxonomies Represent

• Types of Processes • Types of Objects

– Physical Objects – Information Artifacts

• Types of Characteristics – Qualities – Roles

• Relationships – Between Processes – Between Objects – Between Characteristics

2

Page 3: Big Data Taxonomy 8/26/2013

Big Data Taxonomy

• Big Data Related Processes

• Big Data Characteristics

• Big Data Information Artifacts

• Big Data Information Bearers

• Relationships between Big Data Elements

• Mapping Instances to the Taxonomy

• Creating Situational Awareness

3

Page 4: Big Data Taxonomy 8/26/2013

Relations Between Processes

• Processes A <relation> Processes B

– Complex Process <has part> Sub-Process

– Sub-Process <part of> Complex Process

– Process A <precedes> Process B

– Process A <follows> Process B

Examples: Data Curation Process <has part> Data Selection Process Data Curation Process <has part> Data Collection Process Data Curation Process <has part> Data Archiving Process

4

Page 5: Big Data Taxonomy 8/26/2013

Information Artifact Lifecycle Processes

• Collecting

• Curating

• Representing

• Storing

– Cluster Storing

• Managing

– Processing

• Distributed Processing

– Map Reduce

• Analyzing

– Data Mining

– Causal Analysis

– Probabilistic Analysis

– Correlation Analysis

• Data Collection Process

• Data Curation Process

• Data Representation Process

• Data Storing Process

– Cluster Storing Process

• Data Management Process

– Processing

• Distributed Data Process

– Map Reduce Process

• Data Analytics Process

– Data Mining Process

– Causal Analysis Process

– Probabilistic Analysis Process

– Correlation Analysis Process

Common Labels Taxonomy Labels

5

Page 6: Big Data Taxonomy 8/26/2013

Big Data Processes

6

Big Data Processes can be decomposed and related to

other (sub)processes

…as well as to their outputs (Information Artifacts).

Page 7: Big Data Taxonomy 8/26/2013

Relating Processes to Products

7

Page 8: Big Data Taxonomy 8/26/2013

Big Data Information Artifacts

8

Page 9: Big Data Taxonomy 8/26/2013

9

Page 10: Big Data Taxonomy 8/26/2013

10

Page 11: Big Data Taxonomy 8/26/2013

Information Content Entities

11

Use Case

Page 12: Big Data Taxonomy 8/26/2013

Data Characteristics

12

Page 13: Big Data Taxonomy 8/26/2013

Information Bearers

13

Page 14: Big Data Taxonomy 8/26/2013

Partial Taxonomy

14

Page 15: Big Data Taxonomy 8/26/2013

Human Genome Data

15

Page 16: Big Data Taxonomy 8/26/2013

Terms from Human Genome Data Use Case

Use Case Term: Genomic Measurements Reference Materials Reference Data Reference Methods Assess Performance Genome Sequencing Integrate Data Sequencing Technologies Sequencing Methods Characterization Whole Human Genomes Assess Performance Genome Sequencing Run Computer System Storage Networking Processing Software Open Source Sequencing Bioinformatics Software Data Source Sequencer Volume Variety Variability Veracity Visualization Data Quality Data Types Data Analytics

Taxonomical Term: Genomic Measurement Result (Measurement Result) Reference Material Role Reference Data Role Reference Method Performance Assessment Process Genome Sequencing Process Data Integration Process Data Sequencing Technology (Tool) Sequencing Method (Process) Characterization (Data Characterization, IA or ICE) Whole Human Genome Characterization (IA or ICE?) Performance Assessment Process Genome Sequencing Run Computer System Data Storage Process Computer Networking Process Data Processing Process Software (IAO placement?) Bioinformatics Sequencing Software Data Source Role Sequencer Data Volume (Characteristic) Data Variety (Characteristic) Data Variability (Characteristic) Data Veracity (Characteristic) Data Visualization Process Data Quality (Characteristic) Data Type Data Analytics Process

16

Page 17: Big Data Taxonomy 8/26/2013

Information Artifacts: Human Genome Data Measurement Result Characterization (Data Characterization, IA or ICE) Whole Human Genome Characterization (IA or ICE?) Performance Assessment Genome Sequence Software (IAO placement?) Data Visualization

Processes: Human Genome Data Measurement Process Reference Method Performance Assessment Process Genome Sequencing Process Data Integration Process Sequencing Method (Process) Data Characterization Process Performance Assessment Process Genome Sequencing Run Data Storage Process Computer Networking Process Data Processing Process Data Visualization Process Data Analytics Process

Roles and Characteristics: Reference Material Role Reference Data Role Data Source Role Data Volume (Characteristic) Data Variety (Characteristic) Data Variability (Characteristic) Data Veracity (Characteristic) Data Visualization Process Data Quality (Characteristic)

Artifacts/Tools: Data Sequencing Technology (Tool) Computer System Computer Network Software (IAO placement?) Bioinformatics Sequencing Software Sequencer

17

Terms from Human Genome Data Use Case

Page 18: Big Data Taxonomy 8/26/2013

Genomic Research Organizations

18 Instances

Page 19: Big Data Taxonomy 8/26/2013

DNA Data Sets

19 Instances

Page 20: Big Data Taxonomy 8/26/2013

DNA Organizational Roles

20 Instances

Page 21: Big Data Taxonomy 8/26/2013

Agent Roles

21

Page 22: Big Data Taxonomy 8/26/2013

DNA Visualization

22 Instances

Page 23: Big Data Taxonomy 8/26/2013

Conclusion

• This method can be done for any part of the Big Data Taxonomy

• Need SME input for various areas/domains

• Need to add definitions in owl

• Need to expand set of standardized relations

• Link instances to the taxonomy (e.g. actual data sets, organizations, etc.)

23