on mining big data and social network analysis · 2018-05-03 · background data are being...

33
On Mining Big Data and Social Network Analysis Philip S. Yu ([email protected]) Distinguished Professor & Wexler Chair University of Illinois at Chicago

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

On Mining Big Data

and Social Network

Analysis

Philip S. Yu ([email protected])Distinguished Professor & Wexler Chair

University of Illinois at Chicago

Page 2: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Background

Data are being generated and collected in

unprecedented ways and speed, e.g.,

• Sensors

• Web access log

• Electronic medical record

• Scientific instrumentation

• Transaction processing

• Blogging & social networking

• etc

Page 3: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

3

Source: - What happens in an Internet Minute? (by Intel):

http://www.intel.com/content/dam/www/public/us/en/images/illustrations/embedded-infographic-600-logo.jpg

Page 4: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Why Concerned about Data?

Data is a tremendously valuable asset

o Corporation:

• Customer data can provide competitive edge

• Product review data is critical to the success of a product

o Government

• Public opinion is crucial to develop policies

• Integrated intelligence data is essential to fight crime

and terrorist attacks

o Healthcare

• Patient data is essential to move toward personalized

medicine

o Science

• New discoveries are driven by the availability of massive

amount of data

Page 5: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Big Data Characteristics• Volume/size

• Velocity: Rate/speedo Real-time processing requirement

• Data stream

• Varietyo Complex data

• High dimensionality

• Non-traditional data types

o Variability

o Heterogeneous sources and data types

o Cleanness: noisy, uncertain and incomplete

• Veracityo Trustworthiness

o Privacy preservation

• Valueo Low value density

o Weak signal

Page 6: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Hydraulic Fracturing to extract Shale Gas

Page 7: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Big Data Challenges

• Storage

• Indexing

• Retrieval

• Search

• Backup and restore

• Mining and analysis

• Privacy protection

Page 8: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Information Fusion• Fusing information across multiple sources is

the Holy Grail of big data research

• Many commercial companies have multiple sources of collecting customer information• Google has Google search, G-mail, Google Maps,

Google+, YouTube, etc.

• Other examples• Detection of terrorist plots

• Whereabouts on Malaysia MH370

• Focus on fusing multiple social networks

Page 9: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Social Network• Huge size

o Facebook: more than a billion nodes

• High volume of new content generatedo Rapidly and dynamically changing focus

• Rich information with many different types of data

• Noisy

• High aggregate value, but challenging to mine

Page 10: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Background• Many social networks with different objectives

• Facebook

• Twitter

• Foursquare

• LinkedIn

• YouTube

• Instagram

• WhatsApp

• Google+

• Individuals often participate in multiple social

networks

• Each social network only capture a partial or

biased view of an individual

Page 11: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Issues

• How to connect the multiple accounts of the

same users in different social networks?

• How to transfer knowledge across different

social networks?

Page 12: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Anchor Links across Aligned Networks

Page 13: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Locations

Tips

User Accounts

Temporal Activities

Ne

w

User Accounts

Locations

locate

locate

Tweets

Temporal Activities

Old

target networksource

network

Page 14: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Solve Challenge 1:

Heterogeneous relations

Page 15: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Social

Links

Contents:

Tweets

Location

s

Temporal

Activities

Social Network: Who Where What When

Page 16: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Social

Links

Contents:

Tweets

Location

s

Temporal

Activities

Page 17: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Solve Challenge 2:

Lack of Training Instances

Page 18: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

training set Prediction

Page 19: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Solve Challenge 3:

Information Distribution

Difference Problem

Page 20: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Extract Heterogeneous Features

Social

Spatial

Temporal

Content

?

Page 21: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

training set Prediction

Page 22: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Solve Challenge 4:

Inter-dependency on

relatons

Page 23: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Social

Spatial

Temporal

Content

?

location links

Social

Spatial

Temporal

Content

?

social links

?

?

Page 24: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Problem Description:

Anchor Link Prediction

Page 25: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access
Page 26: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

IssuesTraining/learning:

Feature Selection

Testing/Inference:

One to one Constraint

Page 27: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access
Page 28: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access
Page 29: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access
Page 30: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access
Page 31: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access
Page 32: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

More Compete Matching is Better

Page 33: On Mining Big Data and Social Network Analysis · 2018-05-03 · Background Data are being generated and collected in unprecedented ways and speed, e.g., • Sensors • Web access

Summary• The Big Data revolution will fundamentally

change not only how we conduct business, but

also how we live our life

• Information fusion is the holly grail of big

data research

• Social network is a new way of interaction

among people

• Fusion information across social network is

most challenging