accumulo summit 2015: apache accumulo at bloomberg [keynote]
Embed Size (px)
TRANSCRIPT
-
Accumulo @ BloombergAccumulo Summit 2015
Skand GuptaBloomberg LP
-
Bloomberg Bloomberg technology helps drive the worlds financial markets
We build our own software, digital platforms, mobile applications and state of the art hardware
We run one of the worlds largest private networks with over 20,000 routers across our network
We have the largest server side JavaScript deployment in the world 22 million lines of JavaScript code
We developed cloud computing and deployed software as a service well ahead of the general marketplace
Our technology, has brought transparency to the global financial markets Bloomberg technologists
More than 3,000 software developers and designers located around the world (London, NYC, SF tech hubs)
BloombergLabs.com (@BloombergLabs) is our platform for dialogue between our experts and the broader tech community
Our clients Over 320,000 subscribers Primarily financial professionals including investment bankers, CFOs, investor
relations, hedge funds managers, foreign exchange, etc.
-
Source: Wall Street Journal, CFTC , New York Times, Marketplace.org
-
Source: Wall Street Journal, CFTC , New York Times
Importance of Compliance
-
Source: Commodity Futures Trading Commission
Hiding in Plain Sight
-
Compliance Platform and Processing Pipeline
Chat
Reference Data
Trade Data
Customer Data
Product Data
Market Data
Counterparty
Email
Social Media Voice
Human- and Machine-generated Data
Surveillance Pipeline
Communication Data
Transactional Data
User Data
Case Management
Compliance Platform
Compliance Storage
Compliance Officers
Search, Review, Analyze
-
HDFS
Spark
Kafka Storm
Mesos (Cluster Resource Manager)
Elastic data-processing and analytics stack
Open REST API (Play)
WORM
Pre-fabricated Hardware
Applications
-
Need for a robust, scalable, high performance, geo-distributed data storage and retrieval system
More than 3 Peta Bytes of archived data
80+ Billion indexed objects Real-time scanning of 35 million
objects per day
100s G
igab
ytes/yea
r
Communication Data Growth Cumulative Data Growth
Over 3
Petab
ytes to
day
$0.00
$0.75
$1.50
$2.25
$3.00
List Price Replication DR Isolation
$2.31
$1.15
$0.58$0.19
Storing 1GB of Data
Storage Cost
2000 2002 2004 2006 2008 2010 2012
-
Need for Low Level Security Primitives
Document Level Security
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
Company Level Security
Data StoreData Pipe Application
User Level Security
Data Store
-
Security Solutions
Post-process the queries Too slow
Nasty bugs
Generate unique document for each view Exponential growth in number of documents
Use application specific features Solr dynamic fields, Mangled Fields
Accumulo Visibility Fast, Clean, Generic
-
Data Model
Row ID Value
CompanyA_userX_20150426
CompanyA_userX_20150426
CompanyA_userX_20150427
CompanyA_userX_20150428
CompanyA_userY_20150427
CompanyB_userX_20150428
CompanyB_userX_20150428
CompanyB_userX_20150428
-
Find all Communications for a Set of Users for a Date Range
Row ID Value
CompanyA_userX_20150426
CompanyA_userX_20150426
CompanyA_userX_20150427
CompanyA_userX_20150428
CompanyA_userY_20150427
CompanyB_userX_20150428
CompanyB_userX_20150428
CompanyB_userX_20150428
Batch ScannerApplication
-
Find all Records with Libor
Filter
Row ID Value
CompanyA_userX_20150426
CompanyA_userX_20150426
CompanyA_userX_20150427
CompanyA_userX_20150428
CompanyA_userY_20150427
CompanyB_userX_20150428
CompanyB_userX_20150428
CompanyB_userX_20150428
Batch ScannerApplication
-
Count Number of Objects that Match a Filter
CountingIterator Filter
Row ID Value
CompanyA_userX_20150426
CompanyA_userX_20150426
CompanyA_userX_20150427
CompanyA_userX_20150428
CompanyA_userY_20150427
CompanyB_userX_20150428
CompanyB_userX_20150428
CompanyB_userX_20150428
Batch ScannerApplication
-
Scaling OutAp
plica
tion
Row ID Value
CompanyA_userX_20150426
CompanyA_userX_20150426
CompanyA_userX_20150427
CompanyA_userX_20150428
CompanyA_userY_20150427
CompanyB_userX_20150428
CompanyB_userX_20150428
CompanyB_userX_20150428
CountingIterator Filter
Batch Scanner
CountingIterator Filter
Batch Scanner
CountingIterator Filter
Batch Scanner
Spar
k Pro
cess
ing
-
Low Latency Writes using Accumulo File System
RowID Family Qualifier Valueattach.pdf chunk 00001
attach.pdf chunk 00002
attach.pdf metadata file_size
attach.pdf metadata chunk_size
attach.pdf metadata sha256
Writ
e Tim
es (m
s)
0 5 10 15 20
HDFS Accumulo File System
-
Conclusion
Understand the data
Free your data but enforce access control
Need sensible systems that help achieve these goals
Thank You!
-
http://careers.bloomberg.com [email protected]
We Are Hiring!