b 4 gravty

41

Upload: line-corporation

Post on 07-Jan-2017

3.615 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: B 4 gravty
Page 2: B 4 gravty

1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans

Page 3: B 4 gravty

1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans

Page 4: B 4 gravty

A Graph Database Is “A graph database is a database that uses

graph structures for semantic queries with nodes, edges and properties to represent and store data.” (Wikipedia)

Stores objects (vertices) and relationships (edges)

Provides graph search capabilities

Page 5: B 4 gravty

Vertices and Edges in a Graph Database

Fr iends

Fr iends L ikes

Page 6: B 4 gravty

Use Cases of a Graph Database

Facebook Social Graph

Social networks

Google PageRank

Ranking websites

Walmart and eBay

Product recommendation

Page 7: B 4 gravty

Need for a Large Graph Database System

Social Graph LINE Timeline

LINE Talk Ranking

Recommendation

LINE Friends Shop

LINE News

Gravty

Page 8: B 4 gravty

Need for a Large Graph Database System

Social Graph LINE Timeline

LINE Talk Ranking

Recommendation

LINE Friends Shop

LINE News

Gravty

7 billion vertices 100 billion edges 200 billion indexes 5 billion writes a day (create / update / delete)

Page 9: B 4 gravty

Gravty Is A scalable graph database to search

relational information efficiently by searching through a large pool of data

using the graph search technique.

Page 10: B 4 gravty

Requirements for Gravty

Easy to scale out

• To support ever-increasing data

Easy to develop

• Add, modify, and remove features as necessary

• Tailored to the LINE development environment

• Not dependent on LINE-specif ic components

Full control over everything!

Easy to use

• Graph query language • REST API

Page 11: B 4 gravty

1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans

Technology Stack and Architecture Data Model

Page 12: B 4 gravty

Technology Stack and Architecture

Application

TinkerPop3 Gremlin-Console

TinkerPop3 Graph API

Graph Processing Layer

Storage Layer

MySQL (config, meta)

HBase Kafka

Gravty

Page 13: B 4 gravty

MySQL (config, meta)

Kafka

Application

TinkerPop3 Gremlin-Console

TinkerPop 3.2.0 Graph API

Graph Processing Layer (OLTP only)

HBase

Storage Layer

Gravty

Page 14: B 4 gravty

HBase 1.1.x Local Memory Kafka 0.10.0.0 Phoenix 4.8.0

Application

TinkerPop3 Gremlin-Console

TinkerPop3 Graph API

Gravty Storage Layer (Abstract Interface)

Phoenix Repository (Default)

Memory Repository (Standalone)

Graph Processing Layer

Page 15: B 4 gravty

• Row key: vertex-id • Edges are stored in columns • Disadvantages

Data Model Flat-Wide Table

Column scan is slow Columns cannot be split

Row Column

vertex- id1 property property edge edge edge edge edge edge

ver tex- id2 …

vertex- id3 …

Page 16: B 4 gravty

• Row key: edge-id

Data Model Tall-Narrow Table (Gravty)

SrcVertexId-Label-TgtVertexId

Row Column

svtxid1-label-tvtxid2 edge property

edge property

svtxid1-label-tvtxid3 …

• Edges are stored in rows • Advantages

More effective edge scan Parallel execution

Page 17: B 4 gravty

Fr iends

Flat-Wide vs Tall-Narrow

g . V ( “ b r o w n ” ) . o u t ( “ f r i e n d s ” ) . i d ( ) . l i m i t ( 3 )

Brown

Cony

Moon

Sal ly

[cony, moon, sally]

Page 18: B 4 gravty

Flat-Wide vs Tall-Narrow Flat-Wide Model

Brown edge edge edge edge edge edge

(1) Row scan

2 operations

(2 ) Co lumn scan

[cony, moon, sally]

‘likes’ ‘friends’

Page 19: B 4 gravty

Flat-Wide vs Tall-Narrow Tall-Narrow Model (Gravty)

brown-friends-sally

(1) Row scan

1 operation

[cony, moon, sally]

brown-friends-moon

brown-friends-cony

• Can split by rows (region) • Can isolate hotspot rows • Can scan in parallel

Page 20: B 4 gravty

Flat-Wide vs Tall-Narrow

g . V ( “ b r o w n ” ) . o u t ( “ f r i e n d s ” ) . o u t ( “ f r i e n d s ” ) .i d ( ) . l i m i t ( 1 0 )

4 searches in total • Flat-Wide = 8 operat ions • Tall-Narrow (Gravty) = 4 operat ions

Page 21: B 4 gravty

1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans

Faster, Compact Querying Avoiding Hot-Spotting Efficient Secondary Indexing

Page 22: B 4 gravty

Faster, Compact Querying

g .V ( b r own ) . h asL ab e l ( " u se r " ) . o u t ( " f r i e n d s ” ) . o rd e r ( ) . b y ( “ n ame ” , O rde r. i n c r ) . l i m i t ( 5 )

Reducing graph traversal steps

GraphStep VertexStep FilterStep RangeStep FilterStep

GGraphStep GVertexStep

Page 23: B 4 gravty

Faster, Compact Querying

g . V ( b r o w n ) . o u t E ( " f r i e n d s ” ) . l i m i t ( 5 ) . i n V ( ) . o r d e r ( ) . b y ( " n a m e " , O r d e r. i n c r ) . p r o p e r t i e s ( " n a m e " )

inV(): Pipelined iterator from outE() • TinkerPop: Sequential consuming • Gravty: Paral lel querying + pre-loading ver tex property

Querying in parallel and pre-loading vertex properties

outE( ) “name” : “Boss”

l imi t 5

f r iends

inV()

“na me ” : “ Edw ar d”

“name” : “Moon”

“name” : “ James”

“na me ” : “ J es s i c a”

“name” : “Cony”

“name” : “Sa l l y ”

Page 24: B 4 gravty

Row keys that have sequential orders may cause RegionServers to suffer:

Hot-spotting problem with HBase RegionServer

EDGE TABLE

SrcVertexId Label TgtVertexId

u000001 1 u000002

u000001 1 u000003

u000002 1 u000001

u000003 1 u000001

u000004 2 u000009

• Heavy loads of writes or reads • Inefficient region splitting

Avoiding Hot-Spotting

Page 25: B 4 gravty

Solutions to the hot-spotting problem - Pre-splitt ing regions - Salting row keys with a hashed prefix (Salting tables by Apache Phoenix)

But, there is a scan performance issue with the LIMIT clause SELECT * FROM index … LIMIT 100;

Avoiding Hot-Spotting

Page 26: B 4 gravty

Avoiding Hot-Spotting Phoenix Salted Table

Scan 100 rows

Client side merge sort

Phoenix Client

Result

Scan 100 rows

Scan 100 rows

Scan 100 rows

Scan maximum 400 rows

Page 27: B 4 gravty

Avoiding Hot-Spotting Custom Salting + Pre-splitting

hash (source-ver t ex - id )

Result

Phoenix Client

Scan 100 rows sequentially

Row Key Prefix

Page 28: B 4 gravty

Indexed graph view for faster graph search

Asynchronous index processing using Kafka

Efficient Secondary Indexing

Tools for failure recovery

Page 29: B 4 gravty

Default Phoenix IndexCommitter

HRegion

HRegion

HRegion

HRegion

HRegion

HRegion

Put

Dele te

Pu t

Indexer Coprocessor

Phoenix Driver

numConnections = regionServers * regionServers * needConnections

Index update

Index update Too many connections on each RegionServer (Network is heavily congested)

Synchronous processing of index update requests

Page 30: B 4 gravty

Gravty IndexCommitter

HRegion

HRegion

HRegion

HRegion

HRegion

HRegion

Put

Dele te

Pu t

Indexer Coprocessor

Phoenix Driver

numConnections = indexers * regionServers * needConnect ions

Muta t ions

Asynchronous processing using Kafka

Kafka

Indexer

Indexer

Index update

Page 31: B 4 gravty

Default Phoenix IndexCommitter

1. Phoen ix c l ien t UPSERT

INDEX 1

Phoenix Coprocessor

Region Server

Primary Table

Phoenix Coprocessor

Region Server

INDEX 2

Phoenix Coprocessor

Region Server

PUT

PUT / DELETE

PUT / DELETE 2. Reques t HBase muta t ions fo r indexes in para l le l

RETURN 3. Phoen ix c l ien t re tu rns

Page 32: B 4 gravty

Gravty IndexCommitter

INDEX 1

Phoenix Coprocessor

Region Server

Primary Table

Phoenix Coprocessor

Region Server

INDEX 2

Phoenix Coprocessor

Region Server

1.PUT 2. HBase mutations for INDEX 1, 2

4. Consume 3.RETURN

Kafka Index Consumer

5. PUT / DELETE

5. PUT / DELETE

Page 33: B 4 gravty

Secondary Indexing Metrics

Server TPS RegionServer Number of connections

3x 1/8

Page 34: B 4 gravty

Reentrant event processing

Every row is versioned in HBase (timestamp)

Logging failures and replaying

failed requests

Time machine to resume at

certain runtime Resetting runtime offset

of Kafka consumers

Best-Effort Failover Fail fast, fix later

Page 35: B 4 gravty

Monitoring Tools for Failure Recovery Setting alerts and displaying metrics

• Prometheus • Dropwizard metrics • jvm_exporter • Grafana • Ambari

Page 36: B 4 gravty

1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans

Page 37: B 4 gravty

Client

Graph API

Multiple Graph Clusters Before

Gravty

HBase Cluster

Client

Graph API

After

Gravty

HBase Cluster HBase Cluster

HBase Cluster

Page 38: B 4 gravty

HBase Repository Storage Layer

Memory Repository (Standalone)

Phoenix Repository (Default)

HBase Repository

Abstract Inter face

HBase Phoenix Region

Coprocessor Local Memory

Page 39: B 4 gravty

Graph analytics system graph computation

OLAP Functionality

TinkerPop Graph Computing API

Page 40: B 4 gravty

We will open source Gravty

Page 41: B 4 gravty