lars arge 1/12 lars arge. 2/12 pervasive use of computers and sensors increased ability to...

12
Lars Arge 1/12 Lars Arge

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

1/12

Lars Arge

Page 2: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

2/12

Pervasive use of computers and sensors Increased ability to acquire/store/process data

→ Massive data collected everywhere Society increasingly “data driven”

→ Access/process data anywhere any time

Increased data availability: “nano-technology-like” opportunity

Nature issue 2/06: “2020 – Future of computing” Trend continues as e.g. sensors increasingly pervasive → Exponential growth of scientific data Paradigm shift: Science will be about mining data → Computer science paramount in all sciences

Massive Data

Page 3: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

3/12

Algorithm Inadequacy Importance of scalability/efficiency

→ Algorithmics core computer science area

Traditional algorithmics:Transform input to output using simple machine model

Inadequate with e.g. Massive data Small/diverse devices Continually arriving data

→ Software inadequacies!

Communities addressing inadequacies have emerged But much work still needs to be done

Page 4: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

4/12

Established 2007 Initially funded for 5 years by the national research foundation

High level objectives: Advance algorithmic knowledge in “massive data” processing

area Train researchers in world-leading international environment Be catalyst for multidisciplinary/industry collaboration

Building on: Strong international team Establish vibrant international environment (focus on people) Focus areas:

I/O-efficient, streaming, cache-oblivious algorithms Algorithm engineering

Center for Massive Data Algorithmics

Page 5: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

5/12

I/O-Efficient Algorithms Problems involving massive data on disk

Disk access is 106 times slower than main memory access Large access time amortized by transferring large blocks of data→ Important to store/access data to take advantage of blocks

I/O-efficient algorithms: Move as few disk blocks as possible to solve given problem

track

magnetic surface

read/write armread/write head

“The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in

sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a

sharpener on someone else’s desk.” (D. Comer)

Page 6: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

6/12

I/O-Efficient Algorithms Matters Example: Traversing linked list (List ranking)

Array size N = 10 elements Disk block size B = 2 elements Main memory size M = 4 elements (2 blocks)

Difference between N and N/B large since block size is large Example: N = 256 x 106, B = 8000 , 1ms disk access time

N I/Os take 256 x 103 sec = 4266 min = 71 hr N/B I/Os take 256/8 sec = 32 sec

Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os

1 5 2 6 73 4 108 9 1 2 10 9 85 4 76 3

Page 7: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

7/12

Streaming Algorithms Problems involving truly massive data

Sequential read of disk blocks much faster than random read In many modern (sensor) applications data arrive continually→ (Massive) problems often have to be solved in one sequential scan

Streaming algorithms: Use single scan, handle each element fast, using small space

track

magnetic surface

read/write armread/write head

Page 8: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

8/12

Cache-Oblivious Algorithms Problems to be solved on unknown and/or changing devices

Block access important on all levels of memory hierarchy But memory hierarchies are very diverse

Cache-oblivious algorithms: Use blocks efficiently on all levels of any hierarchy

Page 9: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

9/12

Algorithm engineering: Design/implementation of practical algorithms Experimentation

Center motivated by theory inadequacy Center wants to promote interdisciplinary/industry work→ Natural to consider algorithm engineering

Algorithm engineering work can lead to practical breakthroughs Example: Flow simulation on massive terrain models ~18 billion points at 1 meter (>>1TB)

Implementation of I/O-efficient alg.

→ two weeks to three hours!!

Algorithm Engineering

Page 10: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

10/12

Center Team International core team of algorithms researchers

Including top ranked US and European groups

Leading expertise in focus areas AU: I/O, cache and algorithm engineering MPI: I/O (graph) and algorithm engineering MIT: Cache and streaming

AU

MPIMIT

Arge Brodal

Mehlhorn Meyer

Demaine Indyk

Page 11: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

11/12

Center Activities Visits of core researchers Exchange of AU, MPI and MIT Post Docs and PhD students

Visiting faculty, Post doc and students from other institutions

Frequent summer schools: Streaming algorithms this week! Major international events:

25th Annual Symposium on Computational Geometry in 2009 New conference: Symposium on Algorithms for Massive Datasets

Theme Workshops to foster multidisciplinary/industry collaboration

Page 12: Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere

Lars Arge

12/12

Initially funded for 5 years by the National Research Foundation Collaboration between three leading international groups

Addresses inadequacies of traditional theory When processing massive data When using diverse devices

Initial research focus on I/O-efficient, streaming, cache-oblivious Also focus on algorithm engineering and multidisciplinary work Focus on people: Establish vibrant international environment

www.madalgo.au.dk