Download - Department of Computer Science, National Tsing Hua University Database Research: The Past, The Present, and The Future Yi-Shin Chen Department of Computer

Department of Computer Science, National Tsing Hua University

Database Research:The Past, The Present, and The Future

Yi-Shin ChenDepartment of Computer ScienceNational Tsing Hua [email protected]://www.cs.nthu.edu.tw/~yishin/

Outline

Motivation The Past

Evolution of Data Management [Gray 1996]

The Lowell Database Research Self Assessment Report Where did it come from? What does it say?

The Present The Future

Motivation

Database research is driven by new applications, technology trends, new synergies with related fields, and innovation within the field itself.

The Database

Community

New Stuff

Evolution of Data Management

1900 1955 1965 -1980

Manual Record Managers

Punched-Card Record Managers

Programmed Record Managers

• Birth of high-level programming languages

• Batch processing

On-line Network Databases

• Indexed sequential records

• Data independence• Concurrent Access

1950: Univac had developed a magnetic tape

1951: Univac I delivered to the US Census Bureau

Cons:• The transaction errors cannot

be detected on time• The business did not know

the current state

Con:• Navigational programming

interfaces are too low-level• Need to use very primitive and

procedural database operations

Evolution of Data Management (Contd.)

E.F. Codd outlined the relational model

• Give Database users high-level set-oriented data access operations

1970 1980 1995 2000

Relational Databases && Client-Server Computing

• Uniform representation•1985: first standardized of SQL

• Unexpected benefit•Client-Server

•Because of SQL, ODBC•Parallel processing

•Relational operators naturally support pipeline and partition parallelism

•Graphical User Interface•Easy to render a relation

• Oracle, Informix, Ingres

Multimedia Databases• Richer data types• OO databases• Unifying procedures and data

•(Universal Server)• Projects that push the limits

•NASA EOS/DIS projects

Research Self Assessment

A group of senior database researchers gathers every few years to access the state of database research and point out some potential research problems

Laguna Beach, Calif. in 1989 Palo Alto, Calif. in 1990 and 1995 Cambridge, Mass. in 1996 Asilomar, Calif. in 1998 Lowell, Mass. in 2003

The sixth ad-hoc meeting Last for two days 25 senior database researchers Output: the Lowell database research self assessment report More information: http://research.microsoft.com/~gray/lowell/

Attendees

Serge Abiteboul, Martin Kersten, Rakesh Agrawal, Michael Pazzani, Phil Bernstein, Mike Lesk, Mike Carey, David Maier, Stefano Ceri, Jeff Naughton, Bruce Croft, Hans Schek, David DeWitt, Timos Sellis, Mike Franklin, Avi Silberschatz, Hector Garcia Molina, Rick Snodgrass, Dieter Gawlick, Mike Stonebraker, Jim Gray, Jeff Ullman, Laura Haas, Gerhard Weikum, Alon Halevy , Jennifer Widom, Joe Hellerstein, Stan Zadonik, Yannis Ioannidis

Photos captured from http://www.research.microsoft.com/~gray/lowell/Photos.htm

The Main Driving Forces

The focus of database research Information storage, organization, management, and access

The main driving forces Internet

Particularly by enabling “cross enterprise” applications Require stronger facilities for security and information integration

Sciences Generate large and complex data sets Need support for information integration, managing the pipeline of data

product produced by data analysis, storing and querying “ordered” data, and integrating with the world-wide data grid

The Main Driving Forces (Contd.)

Traditional DBMS topics Technology keeps changing the rules reassessment E.g.: The ratios of capacity/bandwidths change reassess

storage management and query-processing algorithms E.g., data-mining technology DB component, NLP querying

Maturation of related technologies, for example: Data mining technology DB component Information retrieval integrate with DB search techniques Reasoning with uncertainty fuzzy data

Next Generation Infrastructure

Discuss the various infrastructure components that require new solutions or are novel in some other way

1. Integration of Text, Data, Code and Streams2. Information Fusion3. Sensor Data and Sensor Networks4. Multimedia Queries5. Reasoning about Uncertain Data6. Personalization7. Data Mining8. Self Adaptation9. Privacy10. Trustworthy Systems11. New User Interfaces12. One-Hundred-Year Storage13. Query Optimization

Integration of Text, Data, Code and Streams

Rethink basic DBMS architecture supporting: Structured data traditional DBMS

Text information retrieval

Space and time spatial and temporal DB

image and multimedia data image retrieval/multimedia DB

Procedural data user-defined functions

Triggers make facilities scalable

Data streams and queues Data stream management

Integration of Text, Data, Code and Streams

Rethink basic DBMS architecture supporting: Structured data traditional DBMS

Text information retrieval

Space and time spatial and temporal DB

image and multimedia data image retrieval/multimedia DB

Procedural data user-defined functions

Triggers make facilities scalable

Data streams and queues Data stream management

Start with a clean sheet of paper SQL, XML Schema, XQuery

Too complex Venders will pursue the extend-XML/SQL strategies Research community should explore a reconceptualization

Information Fusion

The typical approach Because of Internet Millions of information sources Some data can only be

accessed at query time Perform information integration

on-the-fly Need semantic-heterogeneity

solution Work with the “Semantic

Web” people

Other challenges Security policy: Information in

each database is not free Probabilistic world of evidence

accumulation Web-scale

Extract-transform-load tool

(ETL)

Data Warehouse

Sensor Data and Sensor Networks

Characteristics Draw more power when

communicating than when computing

Rapidly changing configurations

Might not completely calibrated

Multimedia Queries

Challenges Create easy ways to:

Analyze Summarize Search View

Require better facilities for managing multimedia information

Reasoning about Uncertain Data

Traditional DBMS have no facilities for either approximate data or imprecise queries

(Almost) all data are uncertain or imprecise

DBMSs need built-in support for data imprecision

The “lineage” of the data must be tracked

Query processing must move to a stochastic one

The query answers will get better The system should characterize the

accuracy offered

Personalization

Query answers should depend on the user

Relevance feedback should also depend on the person and the context

A framework for including and exploiting appropriate metadata for personalization is needed

Need to verify the information systems is producing a “correct” answer

Data Mining

Focus on efficient ways to discover models of existing data sets

Developed algorithms are: classification, clustering, association-rule discovery, summarization…etc.

Challenges: Data-mining research to

develop algorithms for seeking unexpected “ pearls of wisdom”

Integrate data mining with querying, optimization, and other database facilities such as triggers

Self Adaptation

Modern DBMSs are more complex Must understand disk partitioning, parallel

query execution, thread pools, and user-defined data types

Shortage of competent database administrators

Goals Perform tuning using a combination of a r

ule-based system, a database of knob settings, and configuration data

No knobs: all tuning decision are made automatically

Need user behaviors and workloads Recognize internal malfunctions, identify

data corruption, detect application failures, and do something about them

Privacy

Security systems Revitalize data-oriented

security research Specify the purpose of the

data request Access decisions should

be based on Who is requesting the data To what use it will be put

Trustworthy Systems

Trustworthy systems Safely store data Protect data from unauthorized disclosure Protect data from loss Make it always available to authorized users Ensure the correctness of query results and data-

intensive computations Digital rights management

Protect intellectual property rights Allow private conversation

New User Interfaces

How best to render data visually? During the 1980’s, we have QBE, Visi

Calc Since then, nothing…. Need new better ideas in this area

Query languages SQL and XQuery are not for end user

s Possible choices?

Keyword-based query Information-Retrieval community

Browsing increasingly popular Ontology + speech on NL semantic

Web +NLP

One-Hundred-Year Storage

Archived information is disappearing Capture on a deteriorating medium

Capture on a medium requiring obsolete devices Application can interpret the information no longer works

A DBMS system can Content remains accessible in a useful form Automate the process of migrating content between formats Maintain he hardware and software that each document needs Manage the metadata long with the stored document

Query Optimization

Optimization of information integrators For semi-structured query languages, e.g., X

Query For stream processors For sensor network

Inter-Query optimization involving large numbers of queries

Next Steps

A test bed from Information-integration research Revisit the solved problems Sea changes Avoid drawing too narrow a box around what we

do Explore opportunities for combining database and related technologies

Department of Computer Science, National Tsing Hua University

Thank You.

Any Question?

Reference

Jim Gray. "Evolution of Data Management." Computer v29 n10 (October 1996):38-46.

http://www.research.microsoft.com/~gray/lowell/

Download - Department of Computer Science, National Tsing Hua University Database Research: The Past, The Present, and The Future Yi-Shin Chen Department of Computer

Top Related