the researcher’s guide to the data deluge: querying a scientific database in just a few seconds

19
The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds Martin L. Kersten Stratos Idreos Stefan Manegold Erietta Liarou (and members of the CWI database group)

Upload: nituna

Post on 19-Mar-2016

30 views

Category:

Documents


0 download

DESCRIPTION

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds. Martin L. Kersten Stratos Idreos Stefan Manegold Erietta Liarou (and members of the CWI database group). Science Feb’11 Data. http://www.sciencemag.org/site/special/data/. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

The Researcher’s Guide to the Data Deluge:Querying a Scientific Database

in just a Few Seconds

Martin L. KerstenStratos Idreos

Stefan ManegoldErietta Liarou

(and members of the CWI database group)

Page 2: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds
Page 3: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds
Page 4: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Science Feb’11 Data

http://www.sciencemag.org/site/special/data/

Page 5: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Science Feb’11 Data …. We have recently passed the point where more data is being collected than we can physically store. This storage gap will widen rapidly in data-intensive fields. Thus, decisions will be needed on which data to archive and which to discard. A separate problem is how to access and use these data. Many data sets are becoming too large to download. Even fields with well-established data archives, such as genomics, are facing new and growing challenges in data volume and management. And even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used….

Page 6: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Science Feb’11 Data

Page 7: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Science Feb’11 Data

Page 8: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Database research vision• Throwing away data before harvesting is the worst

ROI one can imagine.

• LSST budget is 100 M$– During its ten-year survey, LSST will acquire 5.6

million 15-second images, spread over 2.8 million pointings.

– 20 billion rows in the Object table, 3 trillion rows in the Source table

Page 9: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Database technology is not designed for the challenges

All sizes don’t fit

Page 10: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

The Dawn of a new Database Era

Capture the query intent !

Page 11: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

FIVE STEPS INTO THE FUTURE

• One-minute DBMS for real-time performance.

• Multi-scale query processing for gradual exploration.

• Post processing for conveying meaningful data.

• Query morphing to adjust for proximity results.

• Query alternatives to cope with lack of providence.

Page 12: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

One-minute database kernels Step 1: Do the BEST you can within a given time frame !

• Research how to …– organize query evaluation around what is

available at low cost– redesign algorithms and operators such that they

adaptively avoid expensive steps normally needed for correctness and completeness

– stop process after agreed upon time– ensure continuation upon request.

Page 13: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Multi-scale query processing Step 2: Use a staging scheme for query evaluation !

• Research how to …– partition the database for producing incremental

valuable resultsD => D1 union (D2.1 union (D2.2 union (D2.3 union ..

– avoid harmful SELECT * FROM table queries

– break a query into a converging query sequenceQ => Q1 union Q2 => Q1 union Q2.1 union Q2.2 =>Q1 union Q2.1 union Q2.2.1 union Q2.2.2 …….

Page 14: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Result-set post processing Step 3: Use meaningful compression to convey more !

• Research how to …– post-process results sets statistically– prepare for facetted query answers– show sort for boundaries first• Min/max domain enclosures for all attributes

Page 15: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Query morphing Step 4: Bend the search towards interesting areas !

• Research how to …– explore the query expression space?– transform a query with small result set such that it

produces relevant, nearby answers

Page 16: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

Result-set post processing Step 5: Ignore stupid questions, give hints instead !

• Research how to …– find alternative queries in terms of expressiveness

+ performance– Better exploit the query log for hints

-- Q1: Using the time budget. (36291322 tuples) SELECT ra, dec, band1, intensity1, type FROM PhotoObj;-- Q2: Using data statistics. (879300 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82;-- Q3: Using query statistics. (899 tuples) SELECT * FROM PhotoObj WHERE ra BETWEEN 53 AND 54 AND dec BETWEEN 80 AND 82 AND distance(ra,dec,radius) < 10;

SELECT * FROM PhotoObj

Page 17: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds
Page 18: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds

The Dawn of a new Database Era

Brought to you by the CWI database research group

Page 19: The Researcher’s Guide to the Data Deluge: Querying a Scientific Database  in just a Few Seconds