accumulo summit 2015

Click here to load reader

Post on 19-Jul-2015

45 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Accumulo @ BloombergAccumulo Summit 2015

    Skand GuptaBloomberg LP

  • Bloomberg Bloomberg technology helps drive the worlds financial markets

    We build our own software, digital platforms, mobile applications and state of the art hardware

    We run one of the worlds largest private networks with over 20,000 routers across our network

    We have the largest server side JavaScript deployment in the world 22 million lines of JavaScript code

    We developed cloud computing and deployed software as a service well ahead of the general marketplace

    Our technology, has brought transparency to the global financial markets Bloomberg technologists

    More than 3,000 software developers and designers located around the world (London, NYC, SF tech hubs)

    BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our experts and the broader tech community

    Our clients Over 320,000 subscribers Primarily financial professionals including investment bankers, CFOs, investor

    relations, hedge funds managers, foreign exchange, etc.

  • Source: Wall Street Journal, CFTC , New York Times, Marketplace.org

  • Source: Wall Street Journal, CFTC , New York Times

    Importance of Compliance

  • Source: Commodity Futures Trading Commission

    Hiding in Plain Sight

  • Compliance Platform and Processing Pipeline

    Chat

    Reference Data

    Trade Data

    Customer Data

    Product Data

    Market Data

    Counterparty

    Email

    Social Media Voice

    Human- and Machine-generated Data

    Surveillance Pipeline

    Communication Data

    Transactional Data

    User Data

    Case Management

    Compliance Platform

    Compliance Storage

    Compliance Officers

    Search, Review, Analyze

  • HDFS

    Spark

    Kafka Storm

    Mesos (Cluster Resource Manager)

    Elastic data-processing and analytics stack

    Open REST API (Play)

    WORM

    Pre-fabricated Hardware

    Applications

  • Need for a robust, scalable, high performance, geo-distributed data storage and retrieval system

    More than 3 Peta Bytes of archived data

    80+ Billion indexed objects Real-time scanning of 35 million

    objects per day

    100s G

    igab

    ytes/yea

    r

    Communication Data Growth Cumulative Data Growth

    Over 3

    Petab

    ytes to

    day

    $0.00

    $0.75

    $1.50

    $2.25

    $3.00

    List Price Replication DR Isolation

    $2.31

    $1.15

    $0.58$0.19

    Storing 1GB of Data

    Storage Cost

    2000 2002 2004 2006 2008 2010 2012

  • Need for Low Level Security Primitives

    Document Level Security

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

    Company Level Security

    Data StoreData Pipe Application

    User Level Security

    Data Store

  • Security Solutions

    Post-process the queries Too slow

    Nasty bugs

    Generate unique document for each view Exponential growth in number of documents

    Use application specific features Solr dynamic fields, Mangled Fields

    Accumulo Visibility Fast, Clean, Generic

  • Data Model

    Row ID Value

    CompanyA_userX_20150426

    CompanyA_userX_20150426

    CompanyA_userX_20150427

    CompanyA_userX_20150428

    CompanyA_userY_20150427

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    CompanyB_userX_20150428

  • Find all Communications for a Set of Users for a Date Range

    Row ID Value

    CompanyA_userX_20150426

    CompanyA_userX_20150426

    CompanyA_userX_20150427

    CompanyA_userX_20150428

    CompanyA_userY_20150427

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    Batch ScannerApplication

  • Find all Records with Libor

    Filter

    Row ID Value

    CompanyA_userX_20150426

    CompanyA_userX_20150426

    CompanyA_userX_20150427

    CompanyA_userX_20150428

    CompanyA_userY_20150427

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    Batch ScannerApplication

  • Count Number of Objects that Match a Filter

    CountingIterator Filter

    Row ID Value

    CompanyA_userX_20150426

    CompanyA_userX_20150426

    CompanyA_userX_20150427

    CompanyA_userX_20150428

    CompanyA_userY_20150427

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    Batch ScannerApplication

  • Scaling OutAp

    plica

    tion

    Row ID Value

    CompanyA_userX_20150426

    CompanyA_userX_20150426

    CompanyA_userX_20150427

    CompanyA_userX_20150428

    CompanyA_userY_20150427

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    CompanyB_userX_20150428

    CountingIterator Filter

    Batch Scanner

    CountingIterator Filter

    Batch Scanner

    CountingIterator Filter

    Batch Scanner

    Spar

    k Pro

    cess

    ing

  • Low Latency Writes using Accumulo File System

    RowID Family Qualifier Valueattach.pdf chunk 00001

    attach.pdf chunk 00002

    attach.pdf metadata file_size

    attach.pdf metadata chunk_size

    attach.pdf metadata sha256

    Writ

    e Tim

    es (m

    s)

    0 5 10 15 20

    HDFS Accumulo File System

  • Conclusion

    Understand the data

    Free your data but enforce access control

    Need sensible systems that help achieve these goals

    Thank You!

  • http://careers.bloomberg.com [email protected]

    We Are Hiring!