1 introduction to stanford db group research li ruixuan public.wh.hb.cn

49
1 Introduction to Stanfo Introduction to Stanfo rd DB Group Research rd DB Group Research Li Ruixuan Li Ruixuan http://cs.hust.edu.cn/rxl http://cs.hust.edu.cn/rxl i/ i/ [email protected] [email protected]

Upload: arthur-butler

Post on 28-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

1

Introduction to Stanford DB GroIntroduction to Stanford DB Group Researchup Research

Li RuixuanLi Ruixuan

http://cs.hust.edu.cn/rxli/http://cs.hust.edu.cn/rxli/

[email protected]@public.wh.hb.cn

Page 2: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

2

ContentsContents

IntroductionIntroduction Past projectsPast projects Current projectsCurrent projects EventsEvents ReferencesReferences LinksLinks

Page 3: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

3

The Stanford Database GroupThe Stanford Database Group ““Mainstream” facultyMainstream” faculty

– Hector Garcia-MolinaHector Garcia-Molina– Jennifer WidomJennifer Widom– Jeff UllmanJeff Ullman– Gio WiederholdGio Wiederhold

““Adjunct” facultyAdjunct” faculty– Chris Manning (natural language processing)Chris Manning (natural language processing)– Rajeev Motwani (theory)Rajeev Motwani (theory)– Terry Winograd (human-computer interaction)Terry Winograd (human-computer interaction)

A.k.a. A.k.a. Stanford InfoLabStanford InfoLab

Page 4: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

4

Database Group (cont’d) Database Group (cont’d) Approximately 25 Ph.D. studentsApproximately 25 Ph.D. students Varying numbers of M.S. and undergraduate Varying numbers of M.S. and undergraduate

studentsstudents Handful of visitorsHandful of visitors One senior research associateOne senior research associate One systems administrator, one programmerOne systems administrator, one programmer Excellent administrative staffExcellent administrative staff Resident photographerResident photographer

Page 5: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

5

Research Areas (very coarse)Research Areas (very coarse) Digital librariesDigital libraries Peer-to-peer systemsPeer-to-peer systems Data streamsData streams Replication, caching, archiving, broadcast, Replication, caching, archiving, broadcast,

…… The WebThe Web Ontologies, semantic WebOntologies, semantic Web Data miningData mining MiscellaneousMiscellaneous

Page 6: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

6

Past ProjectsPast Projects LICLIC: Large-Scale Interoperation and Composition (1999) – : Large-Scale Interoperation and Composition (1999) –

mediator (SKC, OntoWeb, CHAIMS, SmiQL, image DB)mediator (SKC, OntoWeb, CHAIMS, SmiQL, image DB) SKCSKC: Scalable Knowledge Composition (2000) - semantic h: Scalable Knowledge Composition (2000) - semantic h

eterogeneityeterogeneity TID: Trusted Image Distribution (2001) - Image Filtering foTID: Trusted Image Distribution (2001) - Image Filtering fo

r Secure Distribution of Medical Informationr Secure Distribution of Medical Information Image Database: Content-based Image Retrieval (2003)Image Database: Content-based Image Retrieval (2003) SimQL:Simulation Access Language (2001) - Software moSimQL:Simulation Access Language (2001) - Software mo

dules in manufacturing, acquisition, and planning systemsdules in manufacturing, acquisition, and planning systems

Page 7: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

7

Past Projects (cont’d)Past Projects (cont’d) TSIMMISTSIMMIS: Wrapping and mediation for hetero: Wrapping and mediation for hetero

genous information sources (1998)genous information sources (1998) Lore: A Database Management System for XMLore: A Database Management System for XM

L (2000)L (2000) WHIPS: WareHouse Information Prototype at SWHIPS: WareHouse Information Prototype at S

tanford (1998) - Data warehouse creation and mtanford (1998) - Data warehouse creation and maintenanceaintenance

MIDAS: Mining Data at Stanford (1999)MIDAS: Mining Data at Stanford (1999) WSQ: Web-Supported Queries (2000) - IntegratWSQ: Web-Supported Queries (2000) - Integrat

ing database queries and Web searchesing database queries and Web searches

Page 8: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

8

Current ProjectsCurrent Projects WebBaseWebBase: Crawling, storage, indexing, and querying of lar: Crawling, storage, indexing, and querying of lar

ge collections of Web pages. (ge collections of Web pages. (MolinaMolina)) STREAMSTREAM: A Database Management System for Data Strea: A Database Management System for Data Strea

ms (ms (WidomWidom)) PeersPeers: Building primitives for peer-to-peer systems (: Building primitives for peer-to-peer systems (MolinMolin

aa)) Digital LibrariesDigital Libraries: Interoperating on-line services for end-us: Interoperating on-line services for end-us

er support (TID,WebBase,OntoAgents) (er support (TID,WebBase,OntoAgents) (MolinaMolina)) TRAPPTRAPP: Approximate data caching: trading precision for p: Approximate data caching: trading precision for p

erformance (erformance (WidomWidom)) CHAIMSCHAIMS: Compiling High-level Access Interfaces for Mul: Compiling High-level Access Interfaces for Mul

ti-site Software (1999) (ti-site Software (1999) (WiederholdWiederhold)) OntoAgentsOntoAgents: Ontology based Infrastructure for Agents (200: Ontology based Infrastructure for Agents (200

2) (2) (WiederholdWiederhold))

Page 9: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

9

WebBase: ObjectivesWebBase: Objectives Provide a Provide a storage infrastructurestorage infrastructure for Web-like co for Web-like co

ntent ntent Store a Store a sizeable portionsizeable portion of the Web of the Web Enable researchers to easily Enable researchers to easily build indexesbuild indexes of pa of pa

ge features across large sets of pages ge features across large sets of pages Distribute Webbase content via Distribute Webbase content via multicast channmulticast chann

els els Support Support structurestructure andand content-based queryingcontent-based querying o o

ver the stored collection ver the stored collection

Page 10: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

10

WebBase: ArchitectureWebBase: Architecture

Page RepositoryWWW

Crawler

Indexing Module

Indexing Module

Retrieval Indexes

Client

Index

API

Indexing Client

MulticastModule

MulticastModule

Client

Client

QueryEngine

QueryEngine

WebBase

API

Client

Analysis Module

Analysis Module

Feature Repository

Client

Client

Page 11: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

11

WebBase: Current StatusWebBase: Current Status Efficient “smart” Efficient “smart” crawlercrawler

– ParallelismParallelism– Freshness & RelevanceFreshness & Relevance

Efficient and scalable Efficient and scalable indexingindexing– Distributed Web-scale content indexesDistributed Web-scale content indexes– Indexes over graph structureIndexes over graph structure

UnicastUnicast dissemination dissemination– Within StanfordWithin Stanford– External clients: Columbia, U.Wash, U.C.BerkeleyExternal clients: Columbia, U.Wash, U.C.Berkeley

Page 12: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

12

WebBase: In ProgressWebBase: In Progress

WebBase InfrastructureWebBase Infrastructure– Multicast disseminationMulticast dissemination– Complex queriesComplex queries

Other workOther work– PageRankPageRank extensions extensions– Clustering and similarity Clustering and similarity searchsearch– Structured data Structured data extractionextraction– Hidden Web Hidden Web crawlingcrawling

Page 13: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

13

Data Streams: MotivationData Streams: Motivation Traditional DBMS -- data stored in finite, Traditional DBMS -- data stored in finite,

persistent persistent data setsdata sets New applications -- data as multiple, continuous, New applications -- data as multiple, continuous,

rapid, time-varying rapid, time-varying data streamsdata streams– Network monitoring and traffic engineeringNetwork monitoring and traffic engineering– Security applicationsSecurity applications– Telecom call recordsTelecom call records– Financial applicationsFinancial applications– Web logs and click-streamsWeb logs and click-streams– Sensor networksSensor networks– Manufacturing processesManufacturing processes

Page 14: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

14

STREAM: ArchitectureSTREAM: Architecture

DSMS

Scratch Store

Input streams

RegisterQuery

StreamedResult

StoredResult

ArchiveStored

Relations

Page 15: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

15

STREAM: ChallengesSTREAM: Challenges

Multiple, continuous, rapid, time-varyingMultiple, continuous, rapid, time-varying streams of datastreams of data

Queries may be Queries may be continuous continuous (not just one-time)(not just one-time)– Evaluated continuously as stream data arrivesEvaluated continuously as stream data arrives– Answer updated over timeAnswer updated over time

Queries may be Queries may be complexcomplex– Beyond element-at-a-time processingBeyond element-at-a-time processing– Beyond stream-at-a-time processingBeyond stream-at-a-time processing

Page 16: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

16

DBMS versus DSMSDBMS versus DSMS Persistent relationsPersistent relations

One-time queriesOne-time queries

Random accessRandom access

Access plan determined Access plan determined by query processor and by query processor and physical DB designphysical DB design

““Unbounded” disk storeUnbounded” disk store

Transient streams (and Transient streams (and persistent relations)persistent relations)

Continuous queriesContinuous queries

Sequential accessSequential access

Unpredictable data Unpredictable data arrival and arrival and characteristicscharacteristics

Bounded main memoryBounded main memory

Page 17: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

17

STREAM: Current StatusSTREAM: Current Status Data Data streamsstreams and stored and stored relationsrelations Declarative Declarative languagelanguage for registering for registering

continuous queriescontinuous queries Flexible Flexible queryquery plansplans Designed to cope with high Designed to cope with high datadata ratesrates and and

queryquery workloadsworkloads– Graceful approximation when neededGraceful approximation when needed– Careful resource allocation and usageCareful resource allocation and usage

RelationalRelational, , centralizedcentralized (for now) (for now)

Page 18: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

18

STREAM: Ongoing WorkSTREAM: Ongoing Work

AlgebraAlgebra for streams for streams SemanticsSemantics for continuous queries for continuous queries Synopses and Synopses and algorithmicalgorithmic issues issues MemoryMemory management issues management issues Exploiting Exploiting constraintsconstraints on streams on streams ApproximationApproximation in query processing in query processing DistributedDistributed stream processing stream processing System developmentSystem development

Page 19: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

19

STREAM: Related WorkSTREAM: Related Work Amazon/CougarAmazon/Cougar (Cornell) – sensors (Cornell) – sensors AuroraAurora (Brown/MIT) – sensor monitoring, dataflow(Brown/MIT) – sensor monitoring, dataflow Hancock Hancock (AT&T) – telecom streams(AT&T) – telecom streams NiagaraNiagara (OGI/Wisconsin) – Internet XML databases (OGI/Wisconsin) – Internet XML databases OpenCQ OpenCQ (Georgia) – triggers, incr. view maintenance(Georgia) – triggers, incr. view maintenance StreamStream (Stanford) – general-purpose DSMS(Stanford) – general-purpose DSMS TapestryTapestry (Xerox) – pub/sub content-based filtering (Xerox) – pub/sub content-based filtering TelegraphTelegraph (Berkeley) – adaptive engine for sensors (Berkeley) – adaptive engine for sensors TribecaTribeca (Bellcore) – network monitoring (Bellcore) – network monitoring

Page 20: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

20

Peer-To-Peer SystemsPeer-To-Peer Systems

Multiple sitesMultiple sites (at edge) (at edge) Distributed resourcesDistributed resources Sites are Sites are autonomousautonomous (different owners) (different owners) Sites are both Sites are both clients and serversclients and servers Sites have Sites have equal functionalityequal functionality

Page 21: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

21

P2P BenefitsP2P Benefits

Pooling available (inexpensive) resourcesPooling available (inexpensive) resources High availability and fault-toleranceHigh availability and fault-tolerance Self-organizationSelf-organization

Page 22: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

22

P2P ChallengesP2P Challenges SearchSearch

– Query ExpressivenessQuery Expressiveness– ComprehensivenessComprehensiveness– TopologyTopology– Data PlacementData Placement– Message RoutingMessage Routing

Resource ManagementResource Management– fairnessfairness– load balancingload balancing

SecuritySecurity & & PrivacyPrivacy– AnonymityAnonymity– ReputationReputation– AccountabilityAccountability– Information Information

PreservationPreservation– Information QualityInformation Quality– TrustTrust– Denial of service Denial of service

attacksattacks

Page 23: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

23

Peers: Stanford ResearchPeers: Stanford Research

New New ArchitecturesArchitectures Performance Modeling and Performance Modeling and OptimizationOptimization SecuritySecurity and Trust and Trust Distributed Distributed ResourceResource ManagementManagement ApplicationsApplications

Page 24: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

24

Digital Library Project: Digital Library Project: OverviewOverview

InternetLibraries

PaymentInstitutions

SearchAgents

User Interfacesand Annotations

Commercial Information Brokers &

Providers

CopyrightServices

Query/DataConversionHTTP

Z39.50

Telnet

Page 25: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

25

DigLib Projects: DLI1,DLI2DigLib Projects: DLI1,DLI2

Resource Resource DiscoveryDiscovery RetrievingRetrieving Information Information InterpretingInterpreting Information Information ManagingManaging Information Information SharingSharing Information Information

Page 26: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

26

DigLib: Resource DiscoveryDigLib: Resource Discovery

Geographic ViewsGeographic Views (Tools to assist you in (Tools to assist you in more systematically locating different types more systematically locating different types of information from a large and diverse of information from a large and diverse number of information sources)number of information sources)

Page 27: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

27

DigLib: Retrieving InformationDigLib: Retrieving Information

Information Tiling Information Tiling PalmPilot Infrastructure (PDA)PalmPilot Infrastructure (PDA) Power Browsing (PDA) (PDA) Query Translator Query Translator SDLIP (Simple Digital Library Interoperabil (Simple Digital Library Interoperabil

ity Protocol)ity Protocol) Value Filtering Value Filtering WebBase

Page 28: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

28

DigLib: Interpreting InformationDigLib: Interpreting Information

MuralsMurals (Tools to help a user interpret and (Tools to help a user interpret and organize search results)organize search results)

Web ClusteringWeb Clustering

Page 29: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

29

DigLib: Managing InformationDigLib: Managing Information

Archival Repositories Archival Repositories Archiving Movie Archiving Movie InterBib (a tool for maintaining bibliographInterBib (a tool for maintaining bibliograph

ic information)ic information) Medical Transport Info Medical Transport Info PhotoBrowser PhotoBrowser

Page 30: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

30

DigLib: Sharing InformationDigLib: Sharing Information

Diet ORB (PDA, based on MICO) (PDA, based on MICO) Digital Wallets Digital Wallets Mobile Info Delivery Mobile Info Delivery Mobile Security Mobile Security Multicasting Multicasting

Page 31: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

31

DLI1 Projects (95-99)DLI1 Projects (95-99)

AHA AHA ComMentor ComMentor DLITE DLITE GoogleGoogle GLOSS GLOSS FAB FAB Grassroots Grassroots MetadataMetadata Architecture Architecture

RManage/FIRM RManage/FIRM SenseMaker SenseMaker SCAM SCAM Shopping Models, U-PAShopping Models, U-PA

I I SONIA SONIA STARTS STARTS WebWriterWebWriter

Page 32: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

32

TRAPP: OverviewTRAPP: Overview

TRAPP: Tradeoff in Replication Precision and Performance

A.k.a: Approximate Data Caching Project goal: investigating techniques to : investigating techniques to

permit controlled and explicit relaxation of permit controlled and explicit relaxation of data precision in exchange for improved data precision in exchange for improved performanceperformance

Page 33: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

33

TRAPP: MotivationTRAPP: Motivation

Transactional consistency too expensiveTransactional consistency too expensive Even nontransactional propagation of every Even nontransactional propagation of every

update still too expensive in many casesupdate still too expensive in many cases

SolutionSolution: Approximate Caching – Exploit the fact that many applications do not r

equire exact consistency– Avoid propagating insignificant updates– Trade cache precision for network load

Page 34: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

34

Example: Example: TRAPP Over Numeric Data

Caches store intervals that bound the exact source values

Sources refresh when value leaves interval

Query answers are intervals Precision constraints specify maximum width

[2, 5] [-1, 0.8]

3.9 0.2

cache

source source

refreshes refreshes

Page 35: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

35

Eg(cont’d): Querying in TRAPPEg(cont’d): Querying in TRAPPFor one-time aggregation queries:For one-time aggregation queries:

– Answers computed by combining approximate cached Answers computed by combining approximate cached data and exact source datadata and exact source data

– At query-timeAt query-time: Find low-cost subset of sources to : Find low-cost subset of sources to probe so final answer will have adequate precisionprobe so final answer will have adequate precision

– Algorithm determined by aggregation functionAlgorithm determined by aggregation function» Some easy, some hardSome easy, some hard

probe

Query: X + Y (within 2)

Answer: [2.9, 4.7]

[2, 5] [-1, 0.8]

3.9 0.2

cache

source sourceX Y

Page 36: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

36

TRAPP: Approximate CachingTRAPP: Approximate CachingTwo common scenarios:Two common scenarios:

• Minimize bandwidth usage, precision fixedMinimize bandwidth usage, precision fixed» TRAPPTRAPP: caches store : caches store boundsbounds as approximations as approximations» Queries select combination of cached & source dataQueries select combination of cached & source data» Adaptive bound adjustmentAdaptive bound adjustment for good precision level for good precision level

• Bandwidth fixed, maximize precisionBandwidth fixed, maximize precision» Best-Effort SynchronizationBest-Effort Synchronization: caches store stale copies: caches store stale copies» Refreshing based on Refreshing based on priority schedulingpriority scheduling» Global priority order via Global priority order via thresholdthreshold» Adaptive threshold settingAdaptive threshold setting for flow control for flow control

Page 37: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

37

TRAPP: StatusTRAPP: Status

Past workPast work: focused on an approximate data : focused on an approximate data caching architecture that permits fine-caching architecture that permits fine-grained control of the precision-grained control of the precision-performance tradeoff for numerical data in performance tradeoff for numerical data in data caching environments.data caching environments.

Current workCurrent work: applying the above : applying the above techniques and others to more complex data techniques and others to more complex data such as Web pages. such as Web pages.

Page 38: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

38

CHAIMS: OverviewCHAIMS: Overview CHAIMS: Compiling High-level Access Interfaces for MuCHAIMS: Compiling High-level Access Interfaces for Mu

lti-site Softwarelti-site Software ObjectiveObjective: Investigate revolutionary approaches to large-s: Investigate revolutionary approaches to large-s

cale software composition.cale software composition. ApproachApproach: Develop and validate a composition-only langu: Develop and validate a composition-only langu

age, a protocol for large, distributed, heterogeneous and auage, a protocol for large, distributed, heterogeneous and autonomous megamodules, and a supporting system.tonomous megamodules, and a supporting system.

PlannedPlanned contributionscontributions: : – Asynchrony by splitting up CALL-statement.Asynchrony by splitting up CALL-statement.– Hardware and software platform independence.Hardware and software platform independence.– Potential for multi-site dataflow optimization.Potential for multi-site dataflow optimization.– Performance optimization by invocation scheduling.Performance optimization by invocation scheduling.

Page 39: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

39

CHAIMS: OverviewCHAIMS: Overview

Megaprogram for composition, written by domain programmer

CHAIMS system automates generation of client for

distributed system

Megamodules, provided by various megamodule

providersMegamodules

CHAIMS

Page 40: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

40

CHAIMS: ArchitectureCHAIMS: Architecture

writes

e

Megaprogrammer

d

a

b

c

Distribution System (CORBA, RMI…)

CSRT(compiled megaprogram)

Megaprogram(in CHAIMS language)

CHAIMS Compiler

generates

MEGA modules

CHAIMS Repository

adds information to

MegamoduleProvider

wraps non-CHAIMScompliant megamodules

information

information

Wrapper Templates

Page 41: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

41

OntoAgents: ObjectiveOntoAgents: Objective

OntoAgents goalOntoAgents goal: establish an agent infrast: establish an agent infrastructure on the WWW or WWW-like networructure on the WWW or WWW-like networksks

Such an agent infrastructure requires an Such an agent infrastructure requires an infinformation food chainormation food chain: every part of the food : every part of the food chain provides information, which enables tchain provides information, which enables the existence of the next part. he existence of the next part.

Page 42: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

42

OntoAgents: ArchitectureOntoAgents: Architecture

                                                                                          

•Ontology Ontology Construction ToolConstruction Tool

•Ontology Ontology Articulation ToolkitArticulation Toolkit

•Annotated WebpageAnnotated Webpagess•Webpage AnnotatiWebpage Annotati

on Toolon Tool

•OntologiesOntologies•AgentsAgents

•Metadata Metadata RepositoryRepository

•Inference Inference EngineEngine

•Community Community PortalPortal

•End End UserUser

Page 43: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

43

Events: DB Seminars

Academic Year

Fall Winter Spring

2002/2003no seminar in the fall quarter

Database Seminar (CS545)

Genome Databases (CS545G)

2001/2002 Past, Present, and Future of Database Technology

Genome DatabasesDatabase Seminar to come

2000/2001 Interoperation, Databases and the Semantic Web

Image Databases Databases and the Semantic Web

1999/2000Ontologies, E-Commerce, XML & Metadata

n/aOntologies, E-Commerce, XML & Metadata

1998/1999 Digital Libraries Image Databases Internet and Databases

1997/1998 Data Warehousing Image Databases Internet and Databases

1996/1997 Fall Quarter 96 Image Databases Spring Quarter 97

Page 44: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

44

Events: MeetingsEvents: Meetings

Stanford Computer Science Forum - Annual AfStanford Computer Science Forum - Annual Affiliates Meetingfiliates Meeting, Stanford, May 2003. , Stanford, May 2003.

SWiMSWiM (the (the Stream Winter Meeting)Stream Winter Meeting): About 35 r: About 35 researchers in the data streams are came together esearchers in the data streams are came together at Stanford for at Stanford for SWiM, SWiM, Jan. 2003. Jan. 2003. – Stream TeamStream Team: A few data streams research groups h: A few data streams research groups h

eld some informal get-togethers, 2002. eld some informal get-togethers, 2002. Conference TalkConference Talk: ACM SIGMOD/PODS, VLD: ACM SIGMOD/PODS, VLD

B, ICDT, ICDE, ICDCS, B, ICDT, ICDE, ICDCS, CIDRCIDR

Page 45: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

45

References: WebBaseReferences: WebBase Junghoo Cho, Hector Garcia-Molina. Junghoo Cho, Hector Garcia-Molina. ""

Parallel CrawlersParallel Crawlers," ,"  In Proceedings of In Proceedings of the Eleventh Worthe Eleventh World Wide Web Conferenceld Wide Web Conference, May 2002. , May 2002.

Taher Haveliwala, Aristides Gionis, etc. Taher Haveliwala, Aristides Gionis, etc. ""Evaluating Strategies for Similarity Search on the WebEvaluating Strategies for Similarity Search on the Web,"," Proceedings of the Eleventh International World Wid Proceedings of the Eleventh International World Wide Web Conference, May 2002. e Web Conference, May 2002.

Taher Haveliwala. Taher Haveliwala. ""Topic-SensitiveTopic-Sensitive PageRank PageRank,"," Proce Proceedings of the Eleventh International World Wide Web edings of the Eleventh International World Wide Web Conference, May 2002. Conference, May 2002.

Page 46: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

46

References: STREAMReferences: STREAM R. Motwani, J. Widom, etc. R. Motwani, J. Widom, etc.

Query Processing, Resource Management, and ApproximatiQuery Processing, Resource Management, and Approximation in a Data Stream Management System on in a Data Stream Management System

In Proc. of the 2003 Conference on Innovative Data SystemIn Proc. of the 2003 Conference on Innovative Data Systems Research (CIDR), January 2003 s Research (CIDR), January 2003

A. Arasu, B. Babcock. etc. A. Arasu, B. Babcock. etc. STREAM: The Stanford Stream Data ManagerSTREAM: The Stanford Stream Data Manager In In Proc. of the ACM Intl Conf. on Management of Data (SIProc. of the ACM Intl Conf. on Management of Data (SIGMOD 2003), June 2003 GMOD 2003), June 2003

B. Babcock, S. Babu, etc. B. Babcock, S. Babu, etc. Models and Issues in Data Stream SystemsModels and Issues in Data Stream Systems Invited paper in Proc. of the 2002 ACM Symp. on Principles Invited paper in Proc. of the 2002 ACM Symp. on Principles of Database Systems (PODS 2002), June 2002 of Database Systems (PODS 2002), June 2002

Page 47: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

47

References: PeersReferences: Peers

Neil Daswani, Hector Garcia-Molina and Beverly YanNeil Daswani, Hector Garcia-Molina and Beverly Yang. g. Open Problems in Data-Sharing Peer-to-Peer Systems,Open Problems in Data-Sharing Peer-to-Peer Systems, In ICDT, 2003.In ICDT, 2003.

Hector Garcia-Molina. Hector Garcia-Molina. Peer-To-Peer Data Management,Peer-To-Peer Data Management, Key-notes Key-notes In ICDE, In ICDE, 2002.2002.

Hrishikesh Deshpande, Mayank Bawa, and Hector GarHrishikesh Deshpande, Mayank Bawa, and Hector Garcia-Molina. cia-Molina. Streaming Live Media over a Peer-to-Peer Network.Streaming Live Media over a Peer-to-Peer Network.

Page 48: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

48

References: TRAPPReferences: TRAPP

C. Olston and J. Widom. C. Olston and J. Widom. Best-Effort Cache Synchronization with Source CooperaBest-Effort Cache Synchronization with Source Cooperation.tion. ACM SIGMOD 2002 International Conference on Man ACM SIGMOD 2002 International Conference on Management of Data, Madison, Wisconsin, June 2002, pp. 7agement of Data, Madison, Wisconsin, June 2002, pp. 73 -84.3 -84.

C. Olston, B. T. Loo and J. Widom. C. Olston, B. T. Loo and J. Widom. Adaptive Precision Setting for Cached Approximate ValAdaptive Precision Setting for Cached Approximate Values.ues. ACM SIGMOD 2001 International Conference on Man ACM SIGMOD 2001 International Conference on Management of Data, Santa Barbara , California, May 2001, agement of Data, Santa Barbara , California, May 2001, pp. 355-366.pp. 355-366.

Page 49: 1 Introduction to Stanford DB Group Research Li Ruixuan public.wh.hb.cn

49

Useful LinksUseful Links Database Group: http://www-db.stanford.edu/ STREAM: http://www-db.stanford.edu/stream/ Peers: http://www-db.stanford.edu/peers/ DigLib: http://www-diglib.stanford.edu/ TRAPP: http://www-db.stanford.edu/trapp/ WebBase: http://www-diglib.stanford.edu/~tes

tbed/doc2/WebBase/