building a science base for the information age john hopcroft cornell university ithaca, ny xiamen...

45
Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Upload: solomon-simon

Post on 12-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Building a Science Base for the Information Age

John HopcroftCornell University

Ithaca, NY

Xiamen University

Page 2: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Time of change

The information age is a revolution that is changing all aspects of our lives.

Those individuals, institutions, and nations who recognize this change and position themselves for the future will benefit enormously.

Page 3: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Computer Science is changing

Early years Programming languages Compilers Operating systems Algorithms Data bases

Emphasis on making computers useful

Page 4: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Computer Science is changing

The future years

Tracking the flow of ideas in scientific literature Tracking evolution of communities in social networks Extracting information from unstructured data

sources Processing massive data sets and streams Extracting signals from noise Dealing with high dimensional data and dimension

reduction

Page 5: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Computer Science is changing

Merging of computing and communication

The wealth of data available in digital form

Networked devices and sensors

Drivers of change

Page 6: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Implications for TCS

Need to develop theory to support the new directions

Update computer science education

Page 7: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

A short view of the future

Page 8: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Digitization of medical records

Doctor – needs my entire medical record Insurance company – needs my last doctor

visit, not my entire medical record Researcher – needs statistical information but

no identifiable individual information

Relevant research – zero knowledge proofs, differential privacy

Page 9: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 10: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Zero knowledge proof

• Graph 3-colorability

• Problem is NP-hard - No polynomial time algorithm unless P=NP

Page 11: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Zero knowledge proof

I send the sealed envelopes.

You select an edge and open the two

envelopes corresponding to the

end points.

Then we destroy all envelopes and

start over, but I permute the colors

and then resend the envelopes.

Page 12: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Digitization of medical records is not the only system

Car and road – gps – privacy

Supply chains

Transportation systems

Page 13: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 15: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 16: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

When will my bus arrive?

Page 17: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

IN 4 TO 5 MINUTES

Page 18: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Tracking the flow of ideas in scientific literatureYookyung Jo

Page rank

Web

Link

GraphRetrieval

Query

Search

Text

Web

Page

Search

Rank

Web

Chord

Usage

Index

Probabilistic

TextFile

Retrieve

Text

Index

Discourse

Word

Centering

Anaphora

Page 19: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Topic evolution thread

• Seed topic :– 648 : making c program

type-safe by separating pointer types by their usage to prevent memory errors

• 3 subthreads :– Type :

– Garbage collection :

– Pointer analysis :

Yookyung Jo, 2010

Page 20: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Topic Evolution Map of the ACM corpus

Yookyung J o, 2010

Page 21: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

In the past, sociologists could study group of a few thousand individuals.

Today with social networks we can study interaction among millions of individuals.

One important activity is how communities form and evolve.

Page 22: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

• Early work Min cut – two equal size communities Conductance – minimizes cross edges

• Future work Consider communities with more external edges

than internal edges Find small communities Track communities over time Develop appropriate definitions for communities Understand the structure of different types of

social networks

Page 23: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Our view of a community

TCS

Me

Colleagues at Cornell

Classmates

Family and friendsMore connections outside than inside

Page 24: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

On going research on finding communities

Page 25: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen UniversitySpectral clustering with K-means.

Page 26: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Spectral clustering with K-means.

Page 27: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen UniversitySpectral clustering with K-means.

Page 28: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 29: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Instead of two overlapping clusters, we find three clusters.

Page 30: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Instead of clustering the rows of the singular vectors, find the minimum 0-norm vector in the space spanned by the singular vectors.

The minimum 0-norm vector is, of course, the all zero vector, so we will require one component to be 1.

Page 31: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Finding the minimum 0-norm vector is NP-hard.

Use the minimum 1-norm vector as a proxy.

Page 32: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 33: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 34: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 35: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 36: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 37: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Page 38: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Minimum 1-norm vector is not an indicator vector.

By thresh-holding the components, convert it to an indicator vector for the community.

Page 39: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

0 50 100 150 200 250 300 350 4000.4

0.5

0.6

0.7

0.8

0.9

1

Page 40: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Random walk

How long?

What dimension?

Page 41: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Krylov subspace

Find orthonormal basis.

Update subspace in each step rather than just the probability vector

Page 42: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Find minimum 1-norm vector in Krylov subspace.

Actually allow vector to be close to subspace.

Page 43: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

Structure of communities

How many communities is a person in?Small, medium, large

How many seed points are needed to uniquely specify a community a person is in?Which seeds are good seeds?Etc.

Page 44: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

What types of communities are there?

How do communities evolve over time?

Page 45: Building a Science Base for the Information Age John Hopcroft Cornell University Ithaca, NY Xiamen University

Xiamen University

This is an exciting time for computer science.

There is a wealth of data in digital format, information from sensors, and social networks to explore.

It is important to develop the science base to support these activities.