accumulo summit 2014: sql-on-accumulo with pivotal hawq and pxf

of 32 /32
SQL-on-Accumulo with Pivotal HAWQ and PXF Agenda HAWQ & PXF Overview Accumulo Connector - Usage Accumulo Connector - Advanced Features PXF API Demo

Author: accumulo-summit

Post on 15-Jan-2015




0 download

Embed Size (px)


Pivotal Xtension Framework (PXF) support for Accumulo within HAWQ provides a fully-featured and native SQL interface to data stored in Accumulo. The Accumulo/PXF module works by intelligently extracting data from Accumulo through iterators and the Accumulo APIs to deliver data to HAWQ's SQL execution engine. Data extraction is fully parallel and utilizes query predicate push downs for an additional performance boost. Additionally, it natively supports Accumulo's security labels functionality. PXF is an external table interface in HAWQ, a SQL-on-Hadoop system, which allows you to read data stored within the Hadoop ecosystem. External tables can be used to load data into HAWQ from Hadoop and/or also query Hadoop data without materializing it into HAWQ PXF enables analysis of HAWQ data and Hadoop data in a single query. It supports a wide range of data formats such as Text, AVRO, Hive, Sequence, RCFile formats, HBase, and now Accumulo.


  • 1. SQL-on-Accumulo with Pivotal HAWQ and PXF Agenda HAWQ & PXF Overview Accumulo Connector - Usage Accumulo Connector - Advanced Features PXF API Demo

2. HAWQ is A parallel SQL query engine on Hadoop 3. PHD 4. PHD 5. PHD 6. PHD 7. PXF is... A fast extensible framework connecting HAWQ to a data store of choice that exposes a parallel API 8. PHD directanalytics PXF 9. PHD indirectanalytics PXF 10. Usage CREATE EXTERNAL TABLE () LOCATION (pxf://rest_host:port/?) FORMAT () [SEGMENT REJECT LIMIT [ROWS|PERCENT] LOG ERRORS INTO ] -- direct analytics (external) SELECT FROM WHERE -- indirect analytics (internal) INSERT INTO SELECT FROM WHERE Any SQL operation (joining, aggregates, sorting, etc) can be executed 11. Accumulo Connector - Usage CREATE EXTERNAL TABLE () LOCATION (pxf:///?profile=accumulo) FORMAT custom(formatter=pxfwritable_import) CREATE EXTERNAL TABLE t( recordkey text, cf1:date date, cf1:price double) LOCATION (pxf:///instance:sales?profile=accumulo) FORMAT custom(formatter=pxfwritable_import) -- Example of a simple query SELECT cf1:date, max(cf1:price) FROM t GROUP BY cf1:date 12. Accumulo Connector - Advanced Features Smart filtering with predicate pushdown Excluding irrelevant tablets and filtering on values on source according to HAWQs query WHERE clause. Error tables for logging badly formatted data and avoid aborting the query Specify desired error threshold. Query the error table after operation to see the rejected data and the related error. Lookup table for easy access to non textual qualifiers Define a qualifier lookup table that translates between Accumulo style naming and SQL style naming. Automatic Statistics for better join planning Run ANALYZE on a PXF-Accumulo table to update HAWQs optimizer with table and attribute level statistics from the Accumulo table. Mechanism for storing remote credentials The mapping between a HAWQ user credentials and Accumulo user credentials are entered once in HAWQ and automatically transferred to the Accumulo connector in runtime. 13. Accumulo Connector - Advanced Features Visibility labels for enhanced security The Accumulo connector utilizes Accumulos built in cell-level security to ensure users are only able to view information for which they have been granted access. Custom Iterators for increased performance Predicate pushdown is implemented using stackable custom Iterators which increase comparison operation (=, ==, !=) performance in a querys WHERE clause. Intelligent range filtering Specifying a comparison on a recordkey will modify the Accumulo Connectors range, minimizing the amount of data scanned, resulting in faster scans. Automatic type detection Data types are detected automatically within the iterator, ensuring correct comparison operations are being utilized. 14. PXF API Fragmenter returns a list of data source fragments and their location Accessor access a given list of fragments, read them and return records Resolver deserialize each record according to a given schema or technique Distributed execution threads Distributed database servers 15. PXF API AccumuloFragmenter returns a list of Accumulo tablets+locations for a given table AccumuloAccessor access a given list of fragments, read them and return Accumulo records. Use filter pushdown when possible AccumuloResolver convert each qualifier value into something that can be understood by HAWQ 16. Live Demo 17. Accumulo Table Contents 18. User Authorizations 19. $PHD_ROOT/conf/pxf-profiles.xml 20. Define Table in HAWQ 21. Setting Authorizations 22. Executing a Simple Query 23. A Query With a Single Pushdown Filter 24. A Query With a Single Pushdown Filter 25. A Query With a Multiple Pushdown Filters 26. A Query With a Multiple Pushdown Filters 27. A Query With a Multiple Pushdown Filters 28. Setting Authorizations 29. Executing a Query as foo 30. Define a Lookup Table in Accumulo 31. Define a Lookup Table in HAWQ 32. Performing a Simple Query