big data analytics - cleveland state...

Post on 20-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data Analytics

Sunnie ChungElectrical Engineering and Computer Science

2

Big DataHow Much of Data ? In Peta Bytes !

• Google processes 40 PB a day (2016)• eBay has 11 PB of user data + 50 TB/day (2015)• Facebook has 36 PB of user data + 80-90 TB/day

(2013)• CERN’s LHC: 15 PB a year (~2015)• LSST: 6-10 PB a year (~2015)

How many female WWF fans under the age of 30 visited the Toyota

community over the last 4 days and saw a Class A ad?

How are these people similar to those that visited

Nissan?

Unstructured Text Stream in PB a day

What Your Big Data Stream Looks Like?

3

1. Data Cleaning/Extraction/Transformation

2. Data Staging/Processing

3. Data Mining Strategies: Data Modeling/ Validation

4. Data Visualization

Massively Parallel Processing Systems• Hadoop Based Multi Node Cluster: NoSQL Stack• Cloud Based Hadoop Cluster (20 – 2000 Nodes)Software: Automatic Parallel Execution in MapReduce

Analytic Parallel Data Warehouse Systems

Information Retrieval

∑∑

==

==•=

•=

V

i i

V

i i

V

i ii

dq

dq

d

d

q

q

dq

dqdq

1

2

1

2

1),cos( r

r

r

r

rr

rrrr

Machine Learning: Neural Network, SVM, Classification

Database Research Based Methods:Multi Level Association Rule Mining

Statistics Based Methods ; Cluster

4

010002000300040005000600070008000

Pacific

Paris,

Lo

ndo

n,

Easte

rn…

Am

ste

rda

m,

Ath

en

s,

Ce

ntr

al…

Jakart

a,

Gre

en

lan

d,

Bang

ko

k,

Bra

sili

a,

Ha

waii,

Atla

ntic…

Arizona

,

Lju

blja

na

,

Beiji

ng,

Belg

rade

,

Ne

w D

elh

i,

Berlin

,

Topics Most Talked About on Nov 22, 2015

Regions Most Tweeted on Nov 22, 2015

Data Extraction/Transformation

Your data Tweets Looks like on Nov 22, 2015

5

Top Job titles recently listedlocations of jobs listed 1 day ago

Profile Headlines with Highest Connections

6

Tweets Data Stream on Nov 5, 2016 Tweets Topics on Nov 5, 2016

Leads to the Company Stock FallUnusual Negative Tweets on the Company

Unusual Cluster on the Company Name

7

Tweets Data Stream on Nov 13, 2016

Tweets Per Topic on Nov 13, 2016

8

Database Security on Cloud

Encrypting Database on Cloud for Retrieving the Sensitive Data Without Decrypting

Achieving Cyber Security with Big Data Analytics

Fraud Detection in Credit Card

Intrusion Detection in Systems with sensitive data

Machine Fault Detection

9

Annual Big Data Workshop at CSU Big Data Analytics Curriculum at EECS

Big Data Analytics Research Group

Math, Statistics and DatabasesBig Data Specific Processing TechniquesCloud Computing Massively Parallel Big Data Processing SystemsData Source ModelingData Mining Strategies

Data Driven solutions

President’s Advisory Committee for Center Of ExcellenceData AnalyticsCyber SecurityCloud Computing

top related