dex presentation for gdm 2011

16
DEX: a High-Performance Graph Database Management System A High-Performance Graph Database Management System Authors: Norbert Martínez-Bazan Sergio Gómez-Villamor Francesc Escalé-Claveras

Upload: sergio-gomez-villamor

Post on 12-Apr-2017

2.212 views

Category:

Technology


2 download

TRANSCRIPT

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

A High-Performance Graph

Database Management

System

Authors:

Norbert Martínez-Bazan

Sergio Gómez-Villamor

Francesc Escalé-Claveras

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Outline

Introduction

DEX Logical graph model

Internal representation

Software architecture

Experimental results

Conclusions

Future work

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Introduction

[2006] DEX started by DAMA-UPC

[2010] Sparsity Technologies is a spin-

out from DAMA-UPC Sparsity comercializes and provides

services

DAMA-UPC does development and

research

DEX Versions V2.0 March/2009

V3.0 October/2009

V4.0 November/2010

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

DEX

DEX is a graph database: Data and schema both are represented as a graph

Data operations are based on graph operations

Graph-based integrity restrictions

Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database

models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)

Focus: Management of very large graphs

High-performance on query operations

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Logical graph model

Labeled: nodes and edges are “typed”

Directed: edges can have a fixed direction

Attributed: nodes and edges can have multiple single-valued attributes

Multigraph: two nodes can be connected by multiple edges

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Internal representation

Requirements

Split the graph into smaller structures• Favour the caching

• Move to main memory just significant parts

OIDs instead of objects• Reduce memory requirements

Specific structures to improve traversals• Index edges of a node

Attributes fully indexed• Improve queries based on value filters

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Internal representation

Our approach: Map + Bitmaps Link

Link: bidirectional association between values and OIDs Two functionalities:

• Given a value a set of OIDs (a bitmap)

• Given an OID the value

a

b

c

1

2

3

4

5

a

value

b

c

1

1

0

2

0

3

0

4

1

5

0 1 1

1 2 3

0 0 0 1

1 2 3 4

1

oid

2

3

oids

4

5

Bitmaps

Map

Link

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Internal representation

A Graph as a combination of Bitmaps:

1 Bitmap for each node or edge type

1 Link for each attribute

2 Links for each edge type: Out-going and in-going edges

N. Martínez-Bazán, V. Muntés-Mulero, S. Gómez-Villamor, J. Nin, M. A.

Sánchez-Martínez, and J. Larriba-Pey, Dex: high-performance exploration

on large graphs for information retrieval. In Proceedings of the sixteenth

ACM conference on Conference on information and knowledge

management (CIKM '07)

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Software architecture

DEXCORE Complete C++ library

• Storage and query

Linux / Windows /

MacOSX

32 / 64 bits

JDEX DEXCORE functionality

provided as a

Java library

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Software architecture

DEXCORE:

IO Segment: Logical space of pages

Pool: Groups of segments

Storage: I/O device

Cache: I/O management• Replacement policy

Data: Paged out-of-core structures

Bitmaps, Maps, Links, …

Graph: A combination of structures

DbGraph and RGraphs

DEX: Database and Session management

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Software architecture

Implementation details:

37-bit unsigned integer OIDs + 137 billion objects per graph

Bitmaps are compressed Clusters of 32 consecutive bits

Just existing clusters are stored

Groups of OIDs for each type Higher density of consecutive bits into bitmaps

Maps are B+ trees A compressed UTF-8 storage for UNICODE strings

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Experimental results

Load tests

IMDB Wikipedia RMAT (sf=28)

DB 2.4 GB 7.6 GB 83 GB

Physical Mem 9 GB 9 GB 60 GB

Load time 21 min 2h 6min 15h

Nodes 13 M 19 M 230 M

Edges 22 M 180 M 2147 M

Values 48 M 283 M 230 M

Insertions per sec 65 K 62 K 48 K

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Experimental results

Query tests IMDB database (2.4 GB)

Queries:• A: full extraction of a movie, multiple 1-hop traversal [4K edges]

• B: distance between two actors [8K edges]

• C: extract all movies that match a given pattern [315K edges]

In-memory 128 MB

A 0.13 sec 0.13 sec

B 1.52 sec 1.79 sec

C 384 sec 385 sec

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Conclusions

We propose DEX, a high performance graph database

querying system for labeled and directed attributed

multigraphs

We propose a graph representation based on the

intensive use of bitmaps

We perform an experimental performance analysis to

show the ability of DEX to store and query very large

graphs

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Future work

Trillions of objects

Transactional system

Distributed system

Query language Query optimization

High-level graph operations Pattern matching

No

me la

pre

sen

ata

ció

o a

ltrain

fo(o

pcio

nal)

DEX

: a H

igh

-Perfo

rm

an

ce

Grap

hD

ata

base

Man

ag

em

en

t S

yste

m

Questions?

[email protected]

Sparsity Technologies http://www.sparsity-technologies.com

DAMA-UPC http://www.dama.upc.edu