mapr data scientist가제공하는 · 2019-04-08 · mapr quick start solutions: speeding...
TRANSCRIPT
© 2017 MapR TechnologiesMapR Confidential 1
+MapR Data Scientist가제공하는Professional Service 소개
정인철부장, Ph.D.
Data Engineer, MapR
2017.06.15
© 2017 MapR TechnologiesMapR Confidential 2
Professional Services
Focus on Customer Success
Key Institutional Knowledge
• Data Science
• Data Engineer
• Solution Design
Measurable Results
• 빠른 실행
• ROI
• 위험 관리
© 2017 MapR TechnologiesMapR Confidential 3
Professional Services 영역
• Installation
• Migrations
• SLA Plans
• Best Practices
• Performance
Tuning
Hadoop Core
Services
IT/ Infrastructure
Linux
Networking
Data Center
Storage
Operations
Big Data
Workflows
• Hive/Pig
• Oozie/Sqoop
• Flume
• M7/HBase
• Data Flow
BI / DBA
BI / ETL / Reporting
Scripting / Java
Hadoop MR
Eco Projects
(HBase, Hive, …)
Solution
Design
• HBase/M7
• Map/Reduce
• Application
Development
• Integration
Development
Java
Hadoop Developer
Architectural Design
Advanced
Analytics
• Use case
Discovery
• Use case
Modeling
• POC
• Workshops
Modeler / Analyst
PhD
Statistics/Math
MatLab / R / SAS
Scripting / Java
BI / ETL / Reporting
Data Engineering Data Science
AUDIENCE
ENGAGEMENTS
SKILLS
© 2017 MapR TechnologiesMapR Confidential 4
MapR Global PS Team
DS/DS
DS/DEDS/DE
Globally Data Scientist/
Data Engineers
North AmericaData Scientists/Data Engineers
(Korean)
EMEAData Scientists/Data Engineers
Asia PacificData Scientists/Data Engineers
(Korean)
© 2017 MapR TechnologiesMapR Confidential 5
제공 서비스 종류
• Hadoop Core Service
– Hadoop Operations
• Big Data Workflows
– Custom Big Data ETL Workflows
• Solution Design
– 어플리케이션 Design, Implementation,Integration
• Advanced Analytics
– Data Science
© 2017 MapR TechnologiesMapR Confidential 6
Hadoop Core Service
일반적인 Hadoop Core Service 제공
• Implementation / Deployment
• Cluster Migration
– From other Distributions to MapR
– Development to Stage to Production Cluster
• MapR Upgrades
• Cluster Tuning and Optimization
• Cluster Health Check / Best Practices
• SLA/DR Strategies
© 2017 MapR TechnologiesMapR Confidential 7
Big Data Workflows
• Architect and develop custom Big Data ETL workflow:
– Hive/Pig
– Flume/Oozie
– Hbase/M7
– NFS
– Etc …
• Custom ETL Big Data workflow includes:
– Ingest
– Processing
– Access
© 2017 MapR TechnologiesMapR Confidential 8
Big Data Workflow On MapR
MapR Data Platform
Processing and Analytics
Ingest
Sqoop
Flume
HDFS
NFS
Access
Tez
Drill
Hive
Pig
Impala
Data Sources
Clickstream
Sensor Data
Billing Data
CRM / ERP
Product Catalog
Social Media
Server Logs
Merchant Listings
Online Chat
Call Detail Records
Visualization
M7HBaseMapReduce
v1 & v2
StormCascadingPig
Solr MahoutYARN
Oozie Hive MLLib
© 2017 MapR TechnologiesMapR Confidential 9
Solution Design
• Map/Reduce Jobs 커스터마이징
• Spark job 커스터마이징
• MapR 단일 데이터 플랫폼 시스템 이용한 어플리케이션 개발
• Hbase 어플리케이션 개발
• 빅데이터 에코 시스템을 이용한 어플리케이션 개발
© 2017 MapR TechnologiesMapR Confidential 10
Advanced Analytics
• Use case Discovery
• Use case Modeling
• Machine Learning
• POC of Modeling solutions
• Workshops
© 2017 MapR TechnologiesMapR Confidential 11
서비스형태
• 다양한형태의조합으로써서비스제공
– 주단위(1주 ~ 4주기간)서비스제공
– 일단위(1일 ~ 3일기간)서비스제공
• 예,교육,워크샵, usecase,
– 시간단위서비스제공
– 프로젝트단위서비스제공
– QSS(Quick Start Service)제공
© 2017 MapR TechnologiesMapR Confidential 12
Quick Start Solutions
© 2017 MapR TechnologiesMapR Confidential 13
What/Why Quick Start Solutions?
일단어느정도갖추고싶은데무엇부터시작할수있죠?
무엇을했으면좋을지모르겠어요
1~2달내에구축을끝내고쓸만한 usecase까지돌려보고싶어요
일단어떻게하는건지알면,그다음부터는우리가알아서할수있을텐데…
한정된 리소스 인력과 범위와 비용일 경우, 적합한 서비스
© 2017 MapR TechnologiesMapR Confidential 14
MapR Quick Start Solutions: Speeding Time-to-Value
Data Warehouse Offload, Optimization and Analytics
Real-Time Security Log Analytics, Production Log analysis
Customer 360, Social Media Analysis, Recommendation Engine
Time Series Analytics, NoSQL Webstore Applications
Deep Learning on GPUs for Image Analytics
솔루션템플릿
빠른 delivery
Knowledge전수
Financial Services – Fraud Detection, Anti-Money Laundering
Complex Event Processing with Drools / Stream Processing
Self Service Data Exploration and BI Analytics on Hadoop
© 2017 MapR TechnologiesMapR Confidential 15
Enterprise Data Hub
© 2017 MapR TechnologiesMapR Confidential 16
Data Warehouse Optimization QSS
Data Transformation/ETL on Hadoop
Offloading “cold” data to Hadoop
Restores
Storage capacity
One-time offload capitalizes on
historic underused data
Minimal impact to existing
data pipelines
Present new data
for exploration
ETL work includes
incremental updatesRestores CPU capacity
and storage to DW
© 2017 MapR TechnologiesMapR Confidential 17
Offload Cold Data to Hadoop
Structured DataETLIncoming
Data
Data Warehouse
MapR Data Platform
▪ Process:
▪ One-time Migration of
Cold Data
▪ Demonstration of Data
upload to DW
▪ Data Access:
▪ ODBC
▪ Thrift
▪ Standard Connectors
Cold Data
Offload
Log Archive
© 2017 MapR TechnologiesMapR Confidential 18
Cold Data Offload - Deliverables
Phases Task Summary/Deliverables Timeline
Cluster Preparation Properly Installed, Configured and validated MapR Cluster ready for QSS
5 days
Requirements Review Review and agree upon requirements and goals 1 Day
Data Selection Data Set Identification and JustificationData Transfer Plan
1 Day
Historical Data Migration Configuration of new data source Data MigrationData Access (ODBC or other Tool)Metadata Creation and Service Setup
10 Days
Data and Services Validation Validation of the following:Transferred Data parityODBC availabilityMetadata Service accuracy
3 Days
TOTAL SCOPE 20 Days
© 2017 MapR TechnologiesMapR Confidential 19
Offload ETL onto Hadoop
Low Latency Data
ETLIncoming
Data
Data Warehouse
MapR
Bulk Data
Restored CPU
and Disk
▪ Process:
▪ One-time Migration of
Historical Data
▪ Redirection on New
Data
▪ Migration of ETL onto
Hadoop
▪ Demonstration of Data
upload to DW
▪ Data Access:
▪ ODBC
▪ Thrift
▪ Standard Connectors
© 2017 MapR TechnologiesMapR Confidential 20
Offload ETL onto Hadoop- Deliverables
Phases Task Summary/Deliverables Timeline
Cluster Preparation Properly Installed, Configured and validated MapR Cluster ready for QSS 5 Days
Requirements Review Review and agree upon requirements and goals 1 Day
Architecture Design Development and Implementation Plan 2 Days
Data Selection. Data Set Identification and JustificationData Transfer Plan 1 Day
Historical Data Migration Configuration of cluster to support new data sourceData Migration of existing data 3 Days
Transient Data Ingestion Configuration of cluster to support new data sourceData ingest in current/new data 3 Days
Workflow Implementation ETL Process DevelopmentWorkflow Management SetupService Implementation 5 Days
Data and Workflow Validation Validation of the following:Transferred Data parityODBC availabilityMetadata Service accuracy Workflow 5 Days
TOTAL SCOPE 25 Days