introduction to sqoop. table of contents sqoop - introduction integration of rdbms and sqoop sqoop...

9
Introduction to Sqoop

Upload: wilfrid-thornton

Post on 31-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

Introduction to Sqoop

Page 2: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

Table of Contents

Sqoop - Introduction

Integration of RDBMS and Sqoop

Sqoop use case

Sample sqoop commands

Key features of Sqoop

Page 3: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

What is Sqoop?

Sqoop is

… a suite of tools that connect Hadoop and database systems

Major functions of Sqoop• Import tables from databases into HDFS for deep

analysis• Replicate database schemas in Hive’s metastore• Export MapReduce results back to a database for

presentation to end-users

Page 4: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

RDBMS important but vulnerable?

Importance of RDBMS• Holds a lot of valuable data in the form of structured tables of

several hundred GB• Provides fast access for OLTP applications like Update / delete

records, Add individual records, Complex transactions

Vulnerability• Can’t store very large datasets (1 TB+)• Poor support for complex datatypes/ large objects• Schema evolution is hard• Analytic queries better suited to a batch-oriented system

Page 5: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

RDBMS and Hadoop

RDBMS

HDFS

Historical data(before processing)

Results of data Analysis

(after processing)

Page 6: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

Sqoop use case : Demographics-aware site analytics

Page 7: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

Sample Sqoop commands

Import using Sqoop• sqoop import

--connect jdbc:mysql://db.foo.com/corp

--table user-profiles

Export using Sqoop• sqoop export

--connect jdbc:mysql://db.foo.com/corp

--table ads_results

--export-dir results

JDBC mysql driver

Input : mysql tableHdfs location with

analysis results

Output : mysql table

Page 8: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

Key features of Sqoop

JDBC-based implementation - Works with many popular database vendors

Auto-generation of tedious user-side code- Writing MapReduce applications to work with data, faster

Integration with Hive - Allows to stay in a SQL-based environment

Page 9: Introduction to Sqoop. Table of Contents Sqoop - Introduction Integration of RDBMS and Sqoop Sqoop use case Sample sqoop commands Key features of Sqoop

THANK YOU