oct 2012 hug: apache accumulo: unlocking the power of big data

Click here to load reader

Post on 15-Jan-2015

2.308 views

Category:

Technology

0 download

Embed Size (px)

DESCRIPTION

Apache Accumulo, originally developed by the National Security Agency and now an Apache Software Foundation project, builds upon Google's Bigtable design to provide a scalable, lightly-structured database capability complementing the ubiquitous Hadoop environment. The core capabilities of Accumulo include cell-level security, flexible schemas, real-time analytics, bulk I/O, and linear scalability beyond trillions of entries and petabytes of data. These new capabilities lead to techniques that unlock the power of Big Data, but don't fit into traditional database design patterns. Learn about the advantages of Apache Accumulo and how it fits into the Hadoop and NoSQL ecosystem. Presenter: Adam Fuchs, CTO, sqrrl

TRANSCRIPT

  • 1. sqrrl data, INC.Secure. Scale. Adapt.Adam Fuchs, Chief Technology Officerinfo@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

2. Secure. Scale. Adapt.Who We are is the commercial provider ofMature Database Technology - Apache AccumuloFine-Grained Access Controls - Data Integration and SharingProven Performance - Petabytes and BeyondAdvanced Analytics - Search, Statistics, and Graphs2info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 3. Secure. Scale. Adapt.ContentsCore PhilosophyTechnologyTechniquesApplication APIs3info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 4. Secure. Scale. Adapt.Apache Accumulo Perspective DataData Data Integration across:Multiple business linesMultiple data setsMultiple applicationsMultiple security, privacy, legal, ApplicationApplicationApplicationpolicy, regulatory, andcompliance constraintsNew demands4info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 5. Secure. Scale. Adapt.Accumulo Design Drivers Cell-Level Security 1 Express common security requirements in the infrastructure, not just in the application Data-centric approach encourages secure sharingScalability 2 Near linear performance improvements at thousands of nodes Durable and reliable under increased failures that come with scaleDiverse, Interactive Analytics 3 Sorted key/value core performs well in a diverse set of domains Information retrieval, statistics, graph analysis, geo indexing, and moreFlexible, Adaptive Schema 4 Start with universal structures and indexing Refine the schema over time 5info@sqrrl.com | @sqrrl_inc | 617.520.4375sqrrl data, INC., All Rights Reserved 6. Secure. Scale. Adapt.ContentsCore PhilosophyTechnologyTechniquesApplication APIs6info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 7. Secure. Scale. Adapt.Accumulo Key StructureAn Accumulo key is a 5-tuple, consisting of: Row: Controls Atomicity Column Family: Controls Locality Column Qualifier: Controls Uniqueness Visibility Label: Controls Access Timestamp: Controls VersioningRow Col. Fam. Col. Qual.VisibilityTimestampValuePatient suffersJohn Doe Notes PCPPCP_JD20120912from an acute John Doe Test ResultsCholesterolJD|PCP_JD 20120912183John Doe Test ResultsMental HealthJD|PSYCH_JD 20120801PassJohn Doe Test ResultsX-RayJD|PHYS_JD201205131010110110100Accumulo Key/Value Example 7info@sqrrl.com | @sqrrl_inc | 617.520.4375sqrrl data, INC., All Rights Reserved 8. Secure. Scale. Adapt.Visibility Syntax & Semantics8info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 9. Secure. Scale. Adapt.Tablets Well-Known Location (zookeeper) Collections of KV pairs form Tables Tables are partitioned into Tablets Root Tablet- to Metadata tablets hold info about other tablets, forming a 3-level hierarchy Metadata Tablet 1Metadata Tablet 2A Tablet is a unit of work for a Tablet- to Encyclopedia:Ocelot Encyclopedia:Ocelot to ServerTable: Adams TableTable: Encyclopedia Table: FooData Tablet Data Tablet Data TabletData TabletData Tablet Data Tablet - : thingthing : - : OcelotOcelot : YakYak : - to 9info@sqrrl.com | @sqrrl_inc | 617.520.4375sqrrl data, INC., All Rights Reserved 10. Secure. Scale. Adapt.Accumulo Architecture Delegate Zookeeper AuthorityTablet Server Zookeeper Zookeeper Tablet DelegateRead/Write Application AuthorityTablet ServerAssign/BalanceMaster Application TabletStore/ReplicateApplicationTablet Server Hadoop Tablet10info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 11. Secure. Scale. Adapt.Tablet Data FlowTablet Scan In-MemoryIterator Reads WritesIterator TreeMap MinorTreeCompactionSorted, IndSorted, Ind exed Fileexed File Write Ahead Sorted, IndLogIterator exed File(For Recovery) Merging /Major TreeCompaction 11info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 12. Secure. Scale. Adapt.ContentsCore PhilosophyTechnologyTechniquesApplication APIs 16info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 13. Secure. Scale. Adapt.Hierarchical DecompositionRow:Column Family: attribute purchases returns Column Qualifier:agediscount sneakers hatValue: 17info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 14. Secure. Scale. Adapt.Materialized TableKey/Value Pair Row: bill george Column attributepurchases attribute purchases returns Family: Columnagediscountsneakersage sneakers hatQualifier: Value: 4940%$100 27$83$4218info@sqrrl.com | @sqrrl_inc | 617.520.4375sqrrl data, INC., All Rights Reserved 15. Secure. Scale. Adapt.Forward and Inverted IndexTable: Forward IndexInverted IndexRow:Column Family: + Column Qualifier:Value: 19info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 16. Secure. Scale. Adapt.Forward and Inverted Index 20info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 17. Secure. Scale. Adapt.Graph AnalysisTable: Graph TableRow: Column Family:Node InfoOut Edges In Edges Column Qualifier:(Tuples): Value:21info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 18. Secure. Scale. Adapt.Geospatial Queries Table: Geo IndexLatitudeLongitude Depth 10110101001 00111010010 11010110110 Row: 101001110111010101011100001011100Column Family: Column Qualifier: Value: 22 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 19. Secure. Scale. Adapt.Document Partitioning Table: Shard Table Row: Column Family:Docs Inv. Index Field IndexGeoColumn Qualifier (Tuples): Value: 23info@sqrrl.com | @sqrrl_inc | 617.520.4375sqrrl data, INC., All Rights Reserved 20. Secure. Scale. Adapt.Document Partitioning 24info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 21. Secure. Scale. Adapt.Intersecting Iteratorfoo and (bar or baz) Docs Inv. Index 26info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 22. Secure. Scale. Adapt.ContentsCore PhilosophyTechnologyTechniquesApplication APIs 27info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 23. Secure. Scale. Adapt.acornKey/Value pairs are great! =How do I construct a documentpartitioning key again? Techniques should be built into an API Let the people have polyglot Lucene, SQL, SPARQL, JAQL, Matlab (not just Key, Value, Range)++ 28info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 24. Secure. Scale. Adapt.Combined IR + Graph Search 29info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 25. Secure. Scale. Adapt.Schema-less Stats 30info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 26. Secure. Scale. Adapt.Get Involved http://accumulo.apache.orgHelp us make Accumulo even better! 31info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved 27. Secure. Scale. Adapt.ContactAdam Fuchs, CTOsqrrl data, Inc.617-520-4375 www.sqrrl.com@sqrrl_inc info@sqrrl.com 32info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved