Transcript
Page 1: accumulo summit 2015

Accumulo @ BloombergAccumulo Summit 2015

Skand GuptaBloomberg LP

Page 2: accumulo summit 2015

Bloomberg• Bloomberg technology helps drive the world’s financial markets

– We build our own software, digital platforms, mobile applications and state of the art hardware

– We run one of the world’s largest private networks with over 20,000 routers across our network

– We have the largest server side JavaScript deployment in the world – 22 million lines of JavaScript code

– We developed “cloud computing” and deployed “software as a service” well ahead of the general marketplace

– Our technology, has brought transparency to the global financial markets • Bloomberg technologists

– More than 3,000 software developers and designers located around the world (London, NYC, SF “tech hubs”)

– BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our experts and the broader tech community

• Our clients – Over 320,000 subscribers – Primarily financial professionals including investment bankers, CFOs, investor

relations, hedge funds managers, foreign exchange, etc.

Page 5: accumulo summit 2015

Source:  Commodity  Futures  Trading  Commission

Hiding  in  Plain  Sight

Page 6: accumulo summit 2015

Compliance  Platform  and  Processing  Pipeline  

Chat

Reference Data

Trade Data

Customer Data

Product Data

Market Data

Counterparty

Email

Social Media Voice

Human-­‐  and  Machine-­‐generated  Data

Surveillance  Pipeline

Communication  Data

Transactional  Data

User  Data

Case  Management

Compliance  Platform

Compliance  Storage

Compliance  Officers

Search,  Review,  Analyze

Page 7: accumulo summit 2015

HDFS

Spark

Kafka Storm

Mesos  (Cluster  Resource  Manager)

Elastic  data-­‐processing  and  analytics  stack

Open  REST  API  (Play)

WORM

Pre-­‐fabricated  Hardware

Applications

Page 8: accumulo summit 2015

Need  for  a  robust,  scalable,  high  performance,  geo-­‐distributed  data  storage  and  retrieval  system

❑ More  than  3  Peta  Bytes  of  archived  data  

❑ 80+  Billion  indexed  objects  ❑ Real-­‐time  scanning  of  35  million  

objects  per  day

100’s  G

igab

ytes/year

Communication  Data  Growth Cumulative  Data  Growth

Over  3

 Petab

ytes  to

day

$0.00

$0.75

$1.50

$2.25

$3.00

List Price Replication DR Isolation

$2.31

$1.15

$0.58$0.19

Storing 1GB of Data

Storage  Cost

2000 2002 2004 2006 2008 2010 2012

Page 9: accumulo summit 2015

Need  for  Low  Level  Security  Primitives

Document Level Security

Lorem  ipsum  dolor  sit  amet,  consectetur  adipiscing  elit,  sed  do  eiusmod  tempor  incididunt  ut  labore  et  dolore  magna  aliqua.  Ut  enim  ad  minim  veniam,  quis  nostrud  exercitation  ullamco  laboris  nisi  ut  aliquip  ex  ea  commodo  consequat.  Duis  aute  irure  dolor  in  reprehenderit  in  voluptate  velit  esse  cillum  dolore  eu  fugiat  nulla  pariatur.  Excepteur  sint  occaecat  cupidatat  non  proident,  sunt  in  culpa  qui  officia  deserunt  mollit  anim  id  est  laborum

Company Level Security

Data StoreData Pipe Application

User Level Security

Data Store

Page 10: accumulo summit 2015

Security  Solutions

• Post-process the queries  

– Too slow  

– Nasty bugs  

• Generate unique document for each view  

– Exponential growth in number of documents  

• Use application specific features

– Solr dynamic fields, Mangled Fields  

• Accumulo Visibility

– Fast, Clean, Generic

Page 11: accumulo summit 2015

Data  Model

Row ID Value

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150427 <bytes>

CompanyA_userX_20150428 <bytes>

CompanyA_userY_20150427 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

Page 12: accumulo summit 2015

Find  all  Communications  for  a  Set  of  Users  for  a  Date  Range

Row ID Value

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150427 <bytes>

CompanyA_userX_20150428 <bytes>

CompanyA_userY_20150427 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

Batch ScannerApplication

Page 13: accumulo summit 2015

Find  all  Records  with  “Libor”

Filter

Row ID Value

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150427 <bytes>

CompanyA_userX_20150428 <bytes>

CompanyA_userY_20150427 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

Batch ScannerApplication

Page 14: accumulo summit 2015

Count  Number  of  Objects  that  Match  a  Filter

Counting Iterator Filter

Row ID Value

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150427 <bytes>

CompanyA_userX_20150428 <bytes>

CompanyA_userY_20150427 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

Batch ScannerApplication

Page 15: accumulo summit 2015

Scaling  OutAp

plic

atio

n

Row ID Value

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150426 <bytes>

CompanyA_userX_20150427 <bytes>

CompanyA_userX_20150428 <bytes>

CompanyA_userY_20150427 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

CompanyB_userX_20150428 <bytes>

CountingIterator Filter Batch

Scanner

Counting Iterator Filter Batch

Scanner

Counting Iterator Filter Batch

Scanner

Spar

k Pr

oces

sing

Page 16: accumulo summit 2015

Low  Latency  Writes  using  Accumulo  ‘File  System’

RowID Family Qualifier Valueattach.pdf chunk “00001” <bytes>

attach.pdf chunk “00002” <bytes>

… … … …

attach.pdf metadata file_size <file size>

attach.pdf metadata chunk_size <chunk size>

attach.pdf metadata sha256 <checksum>

Writ

e Ti

mes

(ms)

0 5 10 15 20

HDFS Accumulo File System

Page 17: accumulo summit 2015

Conclusion

• Understand the data

• Free your data… but enforce access control

• Need sensible systems that help achieve these goals

Thank You!

Page 18: accumulo summit 2015

http://careers.bloomberg.com  [email protected]

We Are Hiring!


Top Related