hadoop sqoop

8
Apache Sqoop 陳威宇

Upload: wei-yu-chen

Post on 16-Aug-2015

129 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Hadoop sqoop

Apache Sqoop

陳威宇

Page 2: Hadoop sqoop

Sqoop : RDB 與 Hadoop 的橋樑

• Apache Sqoop is a “tool” designed to transfer data between hadoop and structured datastores.

• 從..拿資料

– RDBMS

– Data warehources

– NoSQL

• 寫資料到..

– Hive

– Hbase

• 使用 mapreduce framework to transfer data in parallel

2 figure Source : http://bigdataanalyticsnews.com/data-transfer-mysql-cassandra-using-sqoop/

Page 3: Hadoop sqoop

Sqoop 使用方法

3 figure Source : http://hive.3du.me/slide.html

Page 4: Hadoop sqoop

Sqoop 與大象的連結 ( setup )

• 解壓縮 http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.5-cdh5.3.2.tar.gz

• 修改

~/.bashrc

• 修改 conf/sqoop-env.sh

• 啟動 sqoop

export JAVA_HOME=/usr/lib/jvm/java-7-oracle export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HIVE_HOME=/home/hadoop/hive export SQOOP_HOME=/home/hadoop/sqoop export HCAT_HOME=${HIVE_HOME}/hcatalog/ export PATH=$PATH:$SQOOP_HOME/bin:

$ sqoop Try 'sqoop help' for usage.

export HADOOP_COMMON_HOME=/home/hadoop/hadoop export HBASE_HOME=/home/hadoop/hbase export HIVE_HOME=/home/hadoop/hive

Page 5: Hadoop sqoop

練習一 : 實作 import to hive

cd ~

git clone https://github.com/waue0920/hadoop_example.git

cd ~/hadoop_example/sqoop/ex1

mysql -u root -phadoop < ./exc1.sql

hadoop fs -rmr /user/hadoop/authors

sqoop import --connect jdbc:mysql://localhost/books --username root --table authors --password hadoop --hive-import -m 1

練習 : 用hive 語法查詢是否已經匯入 hive> select * from authors;

Page 6: Hadoop sqoop

練習一 : 製作 job

hadoop fs -rmr /user/hadoop/authors

sqoop job --create myjob -- import --connect jdbc:mysql://localhost/books --username root -table authors -P -hive-import -m 1

sqoop job --list

sqoop job --show myjob

sqoop job --exec myjob

練習 : 用hive 語法查詢是否已經匯入 hive> select * from authors;

Page 7: Hadoop sqoop

練習二 : 實作 export to mysql

cd ~/hadoop_example/sqoop/ex2

mysql -u root -phadoop < ./create.sql

./update_hdfs_data.sh

sqoop export --connect jdbc:mysql://localhost/db --username root --password hadoop --table employee --export-dir /user/hadoop/sqoop_input/emp_data