replacing telco db/dw to hadoop and hive

Post on 04-Jun-2015

6.731 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

the way to migrate oracle DW to hive.

TRANSCRIPT

Replacing Telco DB/DW to Hadoop and Hive

JunHo Cho

Data Analysis Platform Team

Friday, July 1, 2011

• Cloud Computing Platform - Xen

• Cloud Storage Platform - hadoop

• Massive Email Archiving Solution - hadoop, lucene

• HIVE : social network analysis using email

• Log Archiving Solution - hadoop

• Data Analysis data mining, machine learning, data statistic

• Data Platform - hadoop, lucene, hive

• Cloud Architecture - KT Cloud

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco Data

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

Availability

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

Availability

Scalability

Friday, July 1, 2011

Telco DW & ETL

Collect Server

DataConverting

BatchETL

RDBMS ServerData Sources

RawData

SummaryTable

DimensionTable

Near-RT Search

OLAP

Bottleneck

Bottleneck

Bottleneck

Bottleneck

Availability

Scalability

Expensive

Friday, July 1, 2011

OpenSource

Friday, July 1, 2011

OpenSource

Storage & Computing

Friday, July 1, 2011

OpenSource

Friday, July 1, 2011

OpenSource

Collection

Friday, July 1, 2011

OpenSource

Friday, July 1, 2011

OpenSource

Search

Friday, July 1, 2011

OpenSource

Friday, July 1, 2011

OpenSource

Analysis

Friday, July 1, 2011

OpenSource

Friday, July 1, 2011

OpenSource

Coordination

Friday, July 1, 2011

OpenSource

Friday, July 1, 2011

NexR Data Platform

Data SourcesHDFS

Index

RawData

Real-Time& BatchIndexing

Near RT Search &Monitoring

SummaryTable

DimensionTable

BatchETL

Collection Platform

AnalysisPlatform

SearchPlatform

OLAP

AdvancedAnalytics

Friday, July 1, 2011

NexR Data Platform

Data SourcesHDFS

Index

RawData

Real-Time& BatchIndexing

Near RT Search &Monitoring

SummaryTable

DimensionTable

BatchETL

Collection Platform

AnalysisPlatform

SearchPlatform

OLAP

AdvancedAnalytics

Friday, July 1, 2011

Friday, July 1, 2011

Hive Internal

Friday, July 1, 2011

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

Friday, July 1, 2011

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

select col1 from tab1 where ...

Friday, July 1, 2011

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

Friday, July 1, 2011

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

Friday, July 1, 2011

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

Friday, July 1, 2011

Hive Architecture

UI Driver

CompilerMetaStore

ExecutionEngine

Hadoop

HQLWorks

ResultORM

DDL

a 123344b 121211c 342434

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

Parser

Friday, July 1, 2011

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

Parser

Friday, July 1, 2011

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB tab1

Parser

Friday, July 1, 2011

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

Parser

Friday, July 1, 2011

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

col1

Parser

Friday, July 1, 2011

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

col1 col2

Parser

Friday, July 1, 2011

Parser

Select col1,col2 From tab1 Where col3 > 5

TOK_QUERY

TOK_FROM TOK_INSERT

TOK_TABNAME

TOK_DESTINATION TOK_SELECT

TOK_DIR

TOK_TMP_FILE

TOK_SELEXPR TOK_SELEXPR

TOK_TABLE_OR_COL TOK_TABLE_OR_COL

TOK_WHERE

>

TOK_TABLE_OR_COL 5

QB

tab1

insclause-0

col1 col2

Parser

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperatorTOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperator

SelectOperator

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperator

SelectOperator

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

QB

PlanPlan Select col1,col2 From tab1 Where col3 > 5

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

TOK_FROM

TOK_WHERE

TOK_SELECT

TOK_DESTINATION

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

Context

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPrunerFIL

SELTS

Context

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPrunerFIL

SELTS

Context

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

col1, col2

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

col1, col2, col3

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TableScanOperator

FilterOperator

SelectOperator

FileSinkOperator

ColumnPruner

FIL

SELTSContext

tab1 {col1, col2, col3, col4,col5,col6,col7}

col1, col2, col3

FilterOperator

OptimizerOptimizer Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FSOperator

SELOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF/UDAFsubstrsum

average

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

Friday, July 1, 2011

TaskFactory

QB

TS - GenMRTableScan1

FS - GenMRFileSink1

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

TS - GenMRTableScan1

FS - GenMRFileSink1

FetchTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

TS - GenMRTableScan1

FS - GenMRFileSink1

FetchTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

TS - GenMRTableScan1

FS - GenMRFileSink1

FetchTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator

FS - GenMRFileSink1

FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

Friday, July 1, 2011

TaskFactory

QB

FilterOperator

TableScanOperator

SelectOperator

FileSinkOperator

FilterOperator FetchTask

MapRedTask

TaskTask Select col1,col2 From tab1 Where col3 > 5

MapRedTask

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FILOperator

FILOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

SELOperator

FSOperator

Friday, July 1, 2011

Map Reduce

Hive Internal

Web UI Hive CLI JDBC

Hive QL

Browse, Query, DDL

MetaStore

Thrift API

TSOperator

FILOperator

FILOperator

HDFS

HBaseDB

StorageHandler

...

Parser

Plan

Optimizer

Task

UDF

SerDe

Input/OutputFormat

RCFile

User Script

ExecMapper/ExecReducer

SELOperator

FSOperator

Friday, July 1, 2011

Oracle Migration to Hive

Friday, July 1, 2011

Oracle to Hive

l DDL

l SQL

l Statistic Function

l Analytic Function

Friday, July 1, 2011

l DDL

l HQL (ANSI-SQL)

l Built-In/UDF/UDAF

l HQL + UDF, Pig, MapReduce

Oracle to Hive

l DDL

l SQL

l Statistic Function

l Analytic Function

Friday, July 1, 2011

l DDL

l HQL (ANSI-SQL)

l Built-In/UDF/UDAF

l HQL + UDF, Pig, MapReduce

Oracle to Hive

l DDL

l SQL

l Statistic Function

l Analytic Function

No UpdateNo InsertNo Low Latency

Friday, July 1, 2011

Understand Oracle SQL

• more than 3000 ETL SQL

• understand Data-Flow

• Group similar SQL Pattern

• Investigate used Oracle Function

Friday, July 1, 2011

Oracle SQL

Friday, July 1, 2011

Data Model Convert

Friday, July 1, 2011

Table

Data Model Convert

Friday, July 1, 2011

TableTable

Data Model Convert

Friday, July 1, 2011

TableTable

Partition

Data Model Convert

Friday, July 1, 2011

Partition

TableTable

Partition

Data Model Convert

Friday, July 1, 2011

Partition

TableTable

Sampling

Partition

Data Model Convert

Friday, July 1, 2011

Bucket

Partition

TableTable

Sampling

Partition

Data Model Convert

Friday, July 1, 2011

DataType Convert

Friday, July 1, 2011

NUMBER(n)

DataType Convert

Friday, July 1, 2011

TINYINTINT/BIGINT

NUMBER(n)

DataType Convert

Friday, July 1, 2011

TINYINTINT/BIGINT

NUMBER(n)

NUMBER(n,m)

DataType Convert

Friday, July 1, 2011

TINYINTINT/BIGINT

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

DataType Convert

Friday, July 1, 2011

TINYINTINT/BIGINT

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

VARCHAR2

DataType Convert

Friday, July 1, 2011

TINYINTINT/BIGINT

STRING

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

VARCHAR2

DataType Convert

Friday, July 1, 2011

TINYINTINT/BIGINT

STRING

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

DATE

VARCHAR2

DataType Convert

Friday, July 1, 2011

TINYINTINT/BIGINT

STRING “yyyy-MM-dd HH:mm:ss” format

STRING

FLOAT/DOUBLE

NUMBER(n)

NUMBER(n,m)

DATE

VARCHAR2

DataType Convert

Friday, July 1, 2011

HIVE DML

• HIVE supports ANSI-SQL

• Only Support Sub-Queries in FROM clause

• Join query : equi-join/inner-join

outer-join

self-join

Friday, July 1, 2011

IN Clause

Friday, July 1, 2011

IN Clause

IN SubQuery

Friday, July 1, 2011

IN Clause

SELECT * from Employee e WHERE e.DeptNo

IN(SELECT d.DeptNo FROM Dept d)

IN SubQuery

Friday, July 1, 2011

IN Clause

SELECT * from Employee e WHERE e.DeptNo

IN(SELECT d.DeptNo FROM Dept d)

IN SubQuery

SELECT * from Employee e

LEFT SEMI JOIN Dept d ON (e.DeptNo=d.DeptNo)

Friday, July 1, 2011

NOT IN Clause

Friday, July 1, 2011

NOT IN Clause

NOT IN SubQuery

Friday, July 1, 2011

NOT IN Clause

SELECT * from Employee e WHERE e.DeptNo

NOT IN(SELECT d.DeptNo FROM Dept d)

NOT IN SubQuery

Friday, July 1, 2011

NOT IN Clause

SELECT * from Employee e WHERE e.DeptNo

NOT IN(SELECT d.DeptNo FROM Dept d)

NOT IN SubQuery

SELECT e.* from Employee e

LEFT OUTER JOIN Dept d ON (e.DeptNo=d.DeptNo)

WHERE d.DeptNo IS NULL

Friday, July 1, 2011

JOIN Operator

Friday, July 1, 2011

JOIN Operator

JOIN

Friday, July 1, 2011

JOIN Operator

SELECT *

FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id

JOIN

Friday, July 1, 2011

JOIN Operator

SELECT *

FROM Employee e1, Dept d1 WHERE e1.ID = d1.Id

JOIN

SELECT *

FROM Employee e1 JOIN Dept d1 ON (e1.ID = d1.Id)

Friday, July 1, 2011

Oracle Function

Friday, July 1, 2011

Functions

Friday, July 1, 2011

Functions

Math Functionround,ceil,mod,

power,sqrt,sin/cos

Friday, July 1, 2011

Math Functionround,ceil,pmod,

power,sqrt,sin/cos

Functions

Math Functionround,ceil,mod,

power,sqrt,sin/cos

Friday, July 1, 2011

Math Functionround,ceil,pmod,

power,sqrt,sin/cos

Functions

Math Functionround,ceil,mod,

power,sqrt,sin/cos

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,replace

Friday, July 1, 2011

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,regexp_replace

Math Functionround,ceil,pmod,

power,sqrt,sin/cos

Functions

Math Functionround,ceil,mod,

power,sqrt,sin/cos

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,replace

Friday, July 1, 2011

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,regexp_replace

Math Functionround,ceil,pmod,

power,sqrt,sin/cos

Functions

Math Functionround,ceil,mod,

power,sqrt,sin/cos

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,replace

NULL Functioncoalesce,nvl,nvl2

Friday, July 1, 2011

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,regexp_replace

Math Functionround,ceil,pmod,

power,sqrt,sin/cos

Functions

Math Functionround,ceil,mod,

power,sqrt,sin/cos

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,replace

NULL Functioncoalesce

NULL Functioncoalesce,nvl,nvl2

Friday, July 1, 2011

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,regexp_replace

Math Functionround,ceil,pmod,

power,sqrt,sin/cos

Functions

Math Functionround,ceil,mod,

power,sqrt,sin/cos

Character Functionsubstr,trim,lpad/rpad

ltrim/rtrim,replace

NULL Functioncoalesce

NULL Functioncoalesce,nvl,nvl2

No NVL,NVL2

Friday, July 1, 2011

• Condition Function

• DECODE, GREATEST

• Null Comparison Function

• NVL / NVL2

• Type Conversion

• TO_NUMBER

• TO_CHAR

• TO_DATE

• INSTR4

• DATE_FORMAT

• LAST_DAY

Custom UDF Function

Friday, July 1, 2011

Oracle Analytic Function

Friday, July 1, 2011

Analytic Function

Friday, July 1, 2011

Analytic Function

RANK

Friday, July 1, 2011

Analytic Function

SELECT name,dept,salary,RANK() OVER (PARTITION BY dept

ORDER BY salary DESC) FROM emp

RANK

Friday, July 1, 2011

Analytic Function

SELECT name,dept,salary,RANK() OVER (PARTITION BY dept

ORDER BY salary DESC) FROM emp

RANK

SELECT e.name,e.dept,e.salary,RANK(e.dept,e.salary) FROM (SELECT name, dept, salary FROM emp DISTRIBUTED BY dept SORT BY dept, salary DESC) e

Friday, July 1, 2011

Analytic Function

SELECT name,dept,salary,RANK() OVER (PARTITION BY dept

ORDER BY salary DESC) FROM emp

RANK

SELECT e.name,e.dept,e.salary,RANK(e.dept,e.salary) FROM (SELECT name, dept, salary FROM emp DISTRIBUTED BY dept SORT BY dept, salary DESC) e

RANK(arg1,arg2) - Custom UDF

Friday, July 1, 2011

Analytic Aggregation Function

Friday, July 1, 2011

Analytic Aggregation Function

MIN

Friday, July 1, 2011

Analytic Aggregation Function

SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp

MIN

Friday, July 1, 2011

Analytic Aggregation Function

SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp

MIN

SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept

Friday, July 1, 2011

Analytic Aggregation Function

SELECT dept, MIN(salary) OVER (PARTITION BY dept) FROM emp

MIN

SELECT dept,tmp.m FROM emp JOIN (SELECT dept, MIN(salary) m FROM emp GROUP BY dept) tmp ON emp.dept = tmp.dept

Aggregation + JOIN

Friday, July 1, 2011

Hive Internal

Friday, July 1, 2011

Merge Join Tree Bug

• select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2

• select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1

Friday, July 1, 2011

Merge Join Tree Bug

• select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2

• select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1

MapReduce #3

Friday, July 1, 2011

Merge Join Tree Bug

• select * from a join b on a.v1 = b.v1 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join e on a.v2 = e.v2

• select * from a join e on a.v2 = e.v2 join c on a.v1 = c.v1 join d on a.v1 = d.v1 join b on a.v1 = b.v1

MapReduce #3

MapReduce #2

Friday, July 1, 2011

• SemanticAnalyzer private void mergeJoinTree(QB qb) {

QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());

if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); }

} else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); }

Merge Join Tree Bug Fix

Friday, July 1, 2011

• SemanticAnalyzer private void mergeJoinTree(QB qb) {

QBJoinTree root = qb.getQbJoinTree(); QBJoinTree parent = null; while (root != null) { boolean merged = mergeJoinNodes(qb, parent, root, root.getJoinSrc());

if (parent == null) { if (merged) { root = qb.getQbJoinTree(); } else { parent = root; root = root.getJoinSrc(); }

} else { parent = parent.getJoinSrc(); root = parent.getJoinSrc(); }

Merge Join Tree Bug Fix

} else { if (merged) { root = qb.getQbJoinTree();

} else { parent = parent.getJoinSrc(); root = parent.getJoinSrc();

}}

Friday, July 1, 2011

New HQL Syntax

Friday, July 1, 2011

New HQL Syntax

INSERT INTO

Friday, July 1, 2011

New HQL Syntax

INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ...

INSERT INTO

Friday, July 1, 2011

New HQL Syntax

• INSERT [OVERWRITE] destination

• grammar

• modify FileSinkPlan

• New Feature - HIVE-306

• INSERT INTO destination

INSERT INTO table VALUES(col1 ... coln) SELECT ... FROM tmp ...

INSERT INTO

Friday, July 1, 2011

Tuning

Friday, July 1, 2011

Tuning

• Hadoop Tunning

Friday, July 1, 2011

Tuning

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

Friday, July 1, 2011

Tuning

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

Friday, July 1, 2011

Tuning

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

Friday, July 1, 2011

Tuning

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

• dfs.block.size

Friday, July 1, 2011

Tuning

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

• dfs.block.size

• Hive Tunning

Friday, July 1, 2011

Tuning

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

• dfs.block.size

• Hive Tunning

• hive.input.format = CombineHiveInputFormat

Friday, July 1, 2011

Tuning

• Hadoop Tunning

• mapred.job.reuse.jvm.num.task

• mapred.child.java.opts

• mapred.min.split.size / mapred.max.split.size

• dfs.block.size

• Hive Tunning

• hive.input.format = CombineHiveInputFormat

• query tuning - reduce # of MapReduce using HQL Plan

Friday, July 1, 2011

Oracle 2 Hive

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & model

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTF

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf - join + udf (aggregation)

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hive

Wrap-Up

Friday, July 1, 2011

Oracle 2 HiveLook insight data flow & modelModify Oracle SQL to Hive Query Syntax Use Built-in function Develop custom UDF/UDAF/UDTFSupport analytic function - distributed by + sort by + udf - join + udf (aggregation) Modify internal hiveHadoop + Hive Tunning

Wrap-Up

Friday, July 1, 2011

Friday, July 1, 2011

Friday, July 1, 2011

Question ?

Friday, July 1, 2011

top related