![Page 1: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/1.jpg)
Apache LensCut data analytics silos in your enterprise
Amareshwari Sriramadasu
![Page 2: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/2.jpg)
Agenda
Evolution of data
analytics in enterprise
Introduction to Apache
LensArchitecture
OLAP Data model
Demo Roadmap
![Page 3: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/3.jpg)
Reporting warehouse : Generation 1: RDBMS
• Reporting data in RDBMS
• Aggregations/materialized views in DB
• ~ 1 TB
![Page 4: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/4.jpg)
Generation 1 :Challenges
• Data Scale: Loading of data taking ~ 24 hrs
• Analysis only upto 3 dimensions
• Heavy queries stalling other user queries
• Unable to move fast with new reporting requirements
![Page 5: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/5.jpg)
Reporting warehouse: Generation 2: Columnar DB
• Small and summarized data in Columnar Database
• Rich Dashboards
![Page 6: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/6.jpg)
Generation 2: Challenges
• Scalability challenges with data growth
• Expensive to grow the capacity on columnar DB
• Data modelling and ETL cycles are long
• Limited Analytical flexibility
![Page 7: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/7.jpg)
Generation 3: Columnar DB + Hadoop
• Small and summarized data in Columnar Database (~10 TB)
• Rich Dashboards
• Granular data in Hadoop (100s of TB)
• Adhoc analysis
![Page 8: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/8.jpg)
Generation 3: Challenges
• Maintaining two lines of independent data warehousing systems
• Data discrepancies
• Schema management
• Learning curve for Users
• Inefficient Utilization
![Page 9: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/9.jpg)
Apache Lens (formerly Grill)
• Platform to enable multi-dimensional queries in a unified way over datasets stored in multiple
warehouses
• OLAP Cube abstraction
• Data discovery by providing single metadata layer
• Unified access to data by integrating Hive with other traditional warehouses
![Page 10: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/10.jpg)
Apache Lens
• Queries get pushed to where data resides
• Central Catalog management: All applications talk same language
• Query analytics for optimizing hot datasets
• Workload based experimentation with newer systems: AWS Redshift, Apache Spark, Apache Tez
![Page 11: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/11.jpg)
Analytics Use cases
• Reporting queries
• Adhoc queries
• Interactive/Batch queries
• Scheduled queries
• Infer insights through ML algorithms
![Page 12: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/12.jpg)
Why both Hadoop and traditional warehouse?
Canned
Adhoc
Response Times
IO (Input Records)
Adhoc
Canned Adhoc
Query Engine
Query
Canned queries are mostly Interactive
Adhoc queries can be Interactive or batch depending on the data volumes and query complexity
![Page 13: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/13.jpg)
Lens Architecture
![Page 14: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/14.jpg)
OLAP Data model
Storage Cube
Fact Table
• Physical Fact tables
Derived Cube
Dimension
Dimension Table
• Physical Dimension tables
![Page 15: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/15.jpg)
Data model - Relationships
Fact table
Cube
Cube
Dimension
DimensionFact table
Storage
Dimension Table
Dimension
Dimension table
Storage
![Page 16: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/16.jpg)
Demo : Example data model
Sales
Product
Customer
• City
City
![Page 17: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/17.jpg)
Demo : Example physical data model
Sales
Raw Fact : HDFS
Aggregate Fact1 : HDFS and DB
Aggregate Fact2 : HDFS and DB
Customer
Customer table : HDFS and DB
Product
Product table : HDFS and DB
City
City table : HDFS
City subset : DB
![Page 18: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/18.jpg)
Demo
![Page 19: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/19.jpg)
Roadmap
Immediate
Add support for querying streaming data stores
Query submission throttling at drivers
Ability to load multiple instances of same driver class
Medium Term
Estimate query execution times
Authorization across all services and storages
Add scheduler service
Make it suitable to integrate with BI tools
Enable machine learning through Lens
Long term
Query caching
Metastore UI
Administrator console
Automatic roll up suggestions on hot datasets
![Page 20: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/20.jpg)
Explorations
• Enable HiveDriver on Tez/Spark
• Explore Zeppelin for web front end
• New drivers : Elastic search driver, Druid driver
![Page 21: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/21.jpg)
Stay Involved
Web site
• http://lens.incubator.apache.org/
Source repo
• https://git-wip-us.apache.org/repos/asf/incubator-lens.git
Source repo for Hive
• https://github.com/InMobi/hive
Mailing lists
![Page 22: Apache Lens - DeveloperMarchdevelopermarch.com/.../report/...ApacheLens_AmareshwariSriramad… · Apache Lens •Queries get pushed to where data resides •Central Catalog management:](https://reader030.vdocuments.net/reader030/viewer/2022041022/5ed38f7e3c5d095ede6021c3/html5/thumbnails/22.jpg)
Thank You!
• Questions?