big data analytics platform @ nokia - hadoop … data analytics platform @nokia − who we are −...
TRANSCRIPT
![Page 1: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/1.jpg)
Selecting the Right Tool for the Right Workload
Yekesa Kosuru Nokia
Location & Commerce
Strata + Hadoop World NY - Oct 25, 2012
Big Data Analytics Platform @ Nokia
1 1
![Page 2: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/2.jpg)
•Big Data Analytics Platform @Nokia −Who we are −Use case data flows −Big data platform −Big data challenges
•Selecting the Right Tool for the Right Workload −Hadoop VS SQL −Which analytical database −Why InfiniDB
Agenda
2 2
![Page 3: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/3.jpg)
Nokia Internal Use Only
Great Mobile Products That Sense the World
WIN IN SMART DEVICES
CONNECT THE NEXT BILLION
INVEST IN FUTURE DISRUPTIONS
CREATE A LEADING “WHERE” PLATFORM
3 3
![Page 4: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/4.jpg)
Nokia Internal Use Only
Apps
Smart Data Platform
Content
Positions Maps Traffic Places Directions Guidance
One Platform, Enabling Contextually Rich Mobile Experiences
4 4
![Page 5: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/5.jpg)
Click to edit Master title style Big DATA ANALYTICS Platform @Nokia
5 5
![Page 6: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/6.jpg)
6
Business Challenges • Data silos, missing semantics
• Multiple sources - overlapping, conflicting
• Timely processing of large volumes of data
• Partial, insufficient, inaccurate, inconsistent.. data
• Security, privacy and other policies unknown
Central Analytics Platform created!
![Page 7: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/7.jpg)
7
Statistics • 10’s PB of data all across Nokia
• Multi-tenant, multi-petabyte analytics cluster
• 10-20K+ jobs per day
• 600+ internal users
• 250M+ KV queries
• Over a terabyte flowing every day
• Multiple data centers around the world
![Page 8: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/8.jpg)
Nokia Internal Use Only
Places Data Store (POI)- Use Case
Search
Platform
Data flow
Cloud Infrastructure
Account Management
Places Manager
Suppliers Places API Transaction
al Data HDFS Analytical DB
BI
Place CRUD
2
Supplier Uploads Data
3 Updated Blend
Record
6
Places Data Analytics
5
ETL and Blend places
4
Places Extract Portal
Delivered to OnlineSystems
7
Access Control
Authentication
User Logs In
1
Data Intake
Data Processing
FTP Oozie Sched
MR Blend
Hive Pig
MR SQL
Places Content
Analytics
K-V Store
8
![Page 9: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/9.jpg)
Nokia Internal Use Only
Reports
Analytical DB
Analytics Cluster
Big Data Analytics Platform Data Flows
Data Asset Catalog
Oracle
Dashboards
Data Discovery
InfiniDB
Interactive Queries
Batch Queries
Web Applications
Activity Logs
VShards (NoSQL)
Reference Data
Device Applications
Probes
3rd Party
Device
User Profile
POI, Map
Activity Sensor
Dat
a In
take
ETL,
Alg
orith
ms
Agg
rega
tion
HDFS
9
![Page 10: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/10.jpg)
Nokia Internal Use Only 10
Big Data Analytics Platform Data Flows
Analytical DB
Analytics Cluster
Data Asset Catalog
Oracle
Data Discovery
InfiniDB
Interactive Queries
Batch Queries
Dat
a In
take
ETL,
Alg
orith
ms
Agg
rega
tion
HDFS
![Page 11: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/11.jpg)
• Logical Tiers −Technology Platform −Data Platform −End User Layer (not shown)
Big Data Analytics Platform
ETL,
Alg
orith
ms
Agg
rega
tion
Data Asset Catalog
Data
Dat
a In
take
HDFS
Technology
Analytical DB
![Page 12: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/12.jpg)
Technology Platform
12
Hadoop R VShards (KV) Scribe, FTP Hive, Pig InfiniDB,
Oracle
Export/ Import
Workflow Engine
Config./ Deploy Monitor Alerts Archiver Scheduler
Security/Kerberos & ACL
Cloud Infrastructure
![Page 13: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/13.jpg)
Data Platform
13
Self Serve Tools
ETL, Agg Algorithms Data Quality Data Asset
Catalog
Data, Metadata, Operational Data
Workflow Orchestration
Technology Platform
![Page 14: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/14.jpg)
14
Data Platform – Analytics Lifecycle
Self Serve Tools
ETL, Agg Algorithms Data Quality Data Asset
Catalog
Data, Metadata, Operational Data
Collect Ingest Organize Analyze Deliver
Technology Platform
![Page 15: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/15.jpg)
Nokia Internal Use Only
Data Platform: Managing the Data Asset • Data Quality - garbage in , garbage out − Rules for validating, cleaning data, other heuristics − Trusting your insights − Process Quality − Light weight governance (semantics, integrity, privacy and
quality)
• Data Asset Catalog – describe your data − Capture essential metadata and logical domain models for
assets −physical model, logical model, policies, classifications −dependencies with other assets
− Serves as a entry-point to data browsing and asset discovery − Insulates subject matter experts from physical details of data
asset
![Page 16: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/16.jpg)
Nokia Internal Use Only
Big Data Challenges
• At every level - capture, curate, storage, process, visualize..
• Hadoop or SQL ? − Performance of analytical database ? − Batch or Interactive analysis − Neither SQL nor MR fits all problems
• Data & Metadata Fragmentation
![Page 17: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/17.jpg)
Click to edit Master title style Selecting the Right Tool for the Right Workload
17 17
![Page 18: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/18.jpg)
Nokia Internal Use Only 18
Hadoop VS SQL/Analytical DB
SQL/DW • Discover the question • Interactive/Fast • No coding • Standard industry tools • Mutable (Type 1 SCD) • Schema on Write • Analyst • Time to Wisdom
SQL/Analytical DB • Standard industry
tools • Interactive/Fast
(secs) • No coding, e.g. built-
in functions • Reasonable complex • Discover the
question
Hadoop/Hive/MR • ETL on steroids,
Scale • Batch/slow • Bunch of coding,
arbitrary complex • Harvest & load
into DW • Discover the
answer
![Page 19: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/19.jpg)
19
Why InfiniDB ?
• Works with BI tools (standard JDBC driver)
• Column oriented, MPP, clean architecture
• Horizontal and vertical partitioning, clever pruning
• Stream based MR like processing
• Efficient joins
• No indexes
• Impressive benchmarks
• Cloud deployment model
![Page 20: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/20.jpg)
Nokia Internal Use Only
InfiniDB vs Hive Performance
0
500
1000
1500
2000
2500
A B C D E F G
infiniDB (sec)
Hive (sec)
Query InfiniDB (sec) Hive (sec) A B 76.32 2155.92 C 25.59 1181.48 D 59.72 1497.22 E 1.8 446.5 F 12.38 1307.38 G 24.32 1886.81
Analytic Queries
Exe
cutio
n Ti
me(
secs
)
![Page 21: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/21.jpg)
Click to edit Master title style InfiniDB Under the Hood
21 21
![Page 22: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/22.jpg)
What is InfiniDB?
22
®
Scalable
Fast
Simple
![Page 23: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/23.jpg)
Analytics Data Platform Foundation
23
Analytics Data Platform
Columnar Performance Efficiency
MapReduce style Query Execution
Widely used MySQL Interface
®
![Page 24: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/24.jpg)
InfiniDB Building Blocks
24
Purpose built for big data analytics. •User Module (UM)
•Performance Module (PM)
or …
Single Server
![Page 25: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/25.jpg)
InfiniDB Building Blocks
25
Purpose built for big data analytics. •User Module (UM)
Understands SQL •Performance Module (PM)
Operates on data blocks
or …
Single Server
![Page 26: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/26.jpg)
Nokia Internal Use Only
InfiniDB M/R Style Distribution of Work “Map-Reduce Inside”
InfiniDB DoW Hadoop M/R Scalability Linear Linear
N-squared Problem Avoided Avoided
Latency Low Medium-High
Intermediate Results Handling
Stream-based File-based
Report Language SQL Erlang M/R, Hive, Pig
Tuning Automatic Manual
Real-Time Analytics Real-time access to granular data
Access to pre-defined aggregates
Ad-Hoc Full Ad-Hoc performance None
Data Storage Structured Unstructured
26
![Page 27: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/27.jpg)
Independent InfiniDB Benchmark
Q1 Series 2 table Joins
Q2 Series 3 table Joins
Q3 Series 4 table Joins
Q4 Series 5 table Joins
27
![Page 28: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/28.jpg)
28
Takeaways
• Hadoop is good but….
• Pay attention to data quality
• Hadoop or SQL
• Describe your data
![Page 29: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/29.jpg)
THANK YOU Yekesa Kosuru Distinguished Architect, Nokia [email protected] www.nokia.com @Nokia Jim Tommaney CTO, Calpont [email protected] www.calpont.com @Calpont, @InfiniDB
![Page 30: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/30.jpg)
![Page 31: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/31.jpg)
![Page 32: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/32.jpg)
![Page 33: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/33.jpg)
![Page 34: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/34.jpg)
![Page 35: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/35.jpg)
![Page 36: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/36.jpg)
![Page 37: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/37.jpg)
![Page 38: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/38.jpg)
![Page 39: Big Data Analytics Platform @ Nokia - Hadoop … Data Analytics Platform @Nokia − Who we are − Use case data flows − Big data platform − Big data challenges • Selecting the](https://reader031.vdocuments.net/reader031/viewer/2022013006/5aaf0bee7f8b9a190d8cdbf2/html5/thumbnails/39.jpg)