tennessee technological university1 the scientific importance of big data xia li tennessee...

18
Tennessee Technological University 1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Upload: kevin-harrell

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Tennessee Technological University 1

The Scientific Importance of Big Data

Xia Li

Tennessee Technological University

Page 2: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The Scientific Importance of Big Data Financial benefits are the major motivation of big data

research The technical challenges brought by big data The object of "data science" The common question behind data -- relationship network Causality and relationship Big data in social science Complexity in data processing Changes in the way of thinking

Tennessee Technological University 2

Page 3: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Financial benefits According to the statistics of IDC

(International Data Corporation), the size of the created and copied data in 2011 is more than 1.8 Zettabyte (10^21)

75% of them are from individuals (mainly pictures, videos, and musics), more than the data size of all the printed data, 200 Pettabyte (10^15)

Tennessee Technological University 3

Page 4: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Financial benefits Google uses very large scale computing

clusters and MapReduce software to process 400 PB data in one month

In Facebook, registered users upload more than 1 billion photos; The log files generated in each day are more than 300 TB

Tennessee Technological University 4

Page 5: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The technical challenges Six departments of US government started the

big data research projects to "form a unique branch of learning including mathematics, statistics, computer algorithm"

Most of the research projects are focused on data engineering instead of data science

The focus include analysis algorithm and system efficiency

Tennessee Technological University 5

Page 6: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The technical challenges Multiscale abnormal detection Threat plan in network Machine reading Realtime analysis of streaming data Non-linear random data compression Extendable statistics analysis technique

Tennessee Technological University 6

Page 7: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The technical challenges New data expression method

If the data expression method is not suitable, analysis result is more prone to bias

Data combination Data from different locations need to be

combined together to be processed De-redundancy and high efficient low cost

data storage

Tennessee Technological University 7

Page 8: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The object of "data science" Big data research is about how to find new

knowledge; the data itself is not the research object

As a research methodology, it is highly related to artificial intelligence algorithms like: data mining, statistic analysis, information search etc.

Tennessee Technological University 8

Page 9: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The object of "data science" The complexity of traditional algorithm grows

exponentially as the size and dimension of the problem grow

To big data at PB level, new method is needed Traditional AI algorithm can accept

O(NlogN) or even O(N^3) To big data problem, O(NlogN) can hardly be

acceptedTennessee Technological University 9

Page 10: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The common question behind data -- relationship network The big data is composed of individual data

and scattered connections After connection combination, it is a network

Gene data becomes gene network World wide web data becomes social network

Big data exists in a complicatedly connected data network

Tennessee Technological University 10

Page 11: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

The common question behind data -- relationship network The distribution

of world wide web

Can obtain scale free network

Tennessee Technological University 11

Page 12: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Causality and relationship Correlation analysis is to find the mutual

relationship hidden in data Correlation factors: support degree,

confidence degree, interest degree

Tennessee Technological University 12

Page 13: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Causality and relationship A and B are related

The values of A and B have mutual influence Cannot say A causes B Cannot say B causes A

Strictly speaking, statistics cannot prove the logic causality

Tennessee Technological University 13

Page 14: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Big data in social science In Facebook, data is generated randomly Researchers need to find valuable information

from these data Big data in social science has some unique

characteristics like: multi-source heterogeneous, interactive, socialized, suddenness, high noise

Tennessee Technological University 14

Page 15: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Big data in social science The future task is not to get more and more

data It is mining useful knowledge from the data When a kid learns to distinguish animals and

cars, tens of sample pictures will be enough How to eliminate unnecessary data sampling

becomes a problem

Tennessee Technological University 15

Page 16: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Complexity in data processing Original theory

Time complexity: time used in algorithm Space complexity: the memory used in algorithm

Data size complexity The problem can only be solved after the data

size achieve a level The relationship between prediction confidence

probability and data level

Tennessee Technological University 16

Page 17: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Changes in the way of thinking The fourth paradigm

Data intensive research All models are wrong, and

increasingly you can succeed without them

Data in PB level can help us to analysis without model and hypothesis

When data is correlated, statistics algorithm will find new patterns unknown to previous methods

Tennessee Technological University 17

Page 18: Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University

Tennessee Technological University 18

Thank you