hive evolution: apachecon na 2010

Hive Evolution

A Progress Report

November 2010

John Sichi (Facebook)

Agenda

• Hive Overview

• Version 0.6 (just released!)

• Version 0.7 (under development)

• Hive is now a TLP!

• Roadmaps

What is Hive?

• A Hadoop-based system for querying and managing structured data– Uses Map/Reduce for execution– Uses Hadoop Distributed File System

(HDFS) for storage

Hive Origins

• Data explosion at Facebook

• Traditional DBMS technology could not keep up with the growth

• Hadoop to the rescue!

• Incubation with ASF, then became a Hadoop sub-project

• Now a top-level ASF project

Hive Evolution

• Originally:– a way for Hadoop users to express

queries in a high-level language without having to write map/reduce programs

• Now more and more:– A parallel SQL DBMS which happens to

use Hadoop for its storage and execution architecture

Intended Usage

• Web-scale Big Data– 100’s of terabytes

• Large Hadoop cluster– 100’s of nodes (heterogeneous OK)

• Data has a schema

• Batch jobs– for both loads and queries

So Don’t Use Hive If…

• Your data is measured in GB

• You don’t want to impose a schema

• You need responses in seconds

• A “conventional” analytic DBMS can already do the job– (and you can afford it)

• You don’t have a lot of time and smart people

Scaling Up

• Facebook warehouse, July 2010:– 2250 nodes– 36 petabytes disk space

• Data access per day:– 80 to 90 terabytes added (uncompressed)– 25000 map/reduce jobs

• 300-400 users/month

Facebook Deployment

Web Servers Scribe MidTier

Production Hive-Hadoop Cluster

Sharded MySQL

Scribe-Hadoop Clusters

Adhoc Hive-Hadoop Cluster

Hive replication

Hive Architecture

Metastore

Query Engine

CLIHive Thrift API

MetastoreThrift API

JDBC/ODBC clients

Hadoop Map/Reduce+ HDFS Clusters

Web Management Console

Physical Data Model

clicks

ds=‘2010-10-28’

ds=‘2010-10-29’

ds=‘2010-10-30’

Partitions(possiblymulti-level)

Table HDFS Files(possibly ashash buckets)

Map/Reduce Plans

Input Files Map Tasks

Reduce Tasks

Splits

Result Files

Query Translation Example

• SELECT url, count(*) FROM page_views GROUP BY url

• Map tasks compute partial counts for each URL in a hash table– “map side” preaggregation– map outputs are partitioned by URL and

shipped to corresponding reducers

• Reduce tasks tally up partial counts to produce final results

It Gets Quite Complicated!

Behavior Extensibility

• TRANSFORM scripts (any language)– Serialization+IPC overhead

• User defined functions (Java)– In-process, lazy object evaluation

• Pre/Post Hooks (Java)– Statement validation/execution– Example uses: auditing, replication,

authorization

UDF vs UDAF vs UDTF

• User Defined Function• One-to-one row mapping• Concat(‘foo’, ‘bar’)

• User Defined Aggregate Function• Many-to-one row mapping• Sum(num_ads)

• User Defined Table Function• One-to-many row mapping• Explode([1,2,3])

Storage Extensibility

• Input/OutputFormat: file formats– SequenceFile, RCFile, TextFile, …

• SerDe: row formats– Thrift, JSON, ProtocolBuffer, …

• Storage Handlers (new in 0.6)– Integrate foreign metadata, e.g. HBase

• Indexing– Under development in 0.7

Release 0.6

• October 2010– Views– Multiple Databases– Dynamic Partitioning– Automatic Merge– New Join Strategies– Storage Handlers

Views: Syntax

CREATE VIEW [IF NOT EXISTS] view_name

[ (column_name [COMMENT column_comment], … ) ]

[COMMENT ‘view_comment’]

AS SELECT …

[ ORDER BY … LIMIT … ]

Views: Usage

• Use Cases– Column/table renaming– Encapsulate complex query logic– Security (Future)

• Limitations– Read-only– Obscures partition metadata from underlying

tables– No dependency management

Multiple Databases

• Follows MySQL convention– CREATE DATABASE [IF NOT EXISTS]

db_name [COMMENT ‘db_comment’]– USE db_name

• Logical namespace for tables

• ‘default’ database is still there

• Does not yet support queries across multiple databases

Dynamic Partitions: Syntax

• ExampleINSERT OVERWRITE TABLE page_view

PARTITION(dt='2008-06-08', country)

SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.country

FROM page_view_stg pvs

Dynamic Partitions: Usage

• Automatically create partitions based on distinct values in columns

• Works as rudimentary indexing– Prune partitions via WHERE clause

• But be careful…– Don’t create too many partitions!– Configuration parameters can be used to

prevent accidents

Automatic merge

• Jobs can produce many files

• Why is this bad?– Namenode pressure– Downstream jobs have to deal with file

processing overhead

• So, clean up by merging results into a few large files (configurable)– Use conditional map-only task to do this

Join Strategies Before 0.6

• Map/reduce join– Map tasks partition inputs on join keys

and ship to corresponding reducers– Reduce tasks perform sort-merge-join

• Map-join– Each mapper builds lookup hashtable

from copy of small table– Then hash-join the splits of big table

New Join Strategies

• Bucketed map-join– Each mapper filters its lookup table by the

bucketing hash function – Allows “small” table to be much bigger

• Sorted merge in map-join– Requires presorted input tables

• Deal with skew in map/reduce join– Conditional plan step for skew keys p(after

main map/reduce join step)

Storage Handlers

Hive

HDFS

NativeTables

StorageHandlerInterface

HBaseHandler

CassandraHandler

HypertableHandler

HypertableAPI

CassandraAPI

HBaseAPI

HBaseTables

Low Latency Warehouse

HBaseHBase

OtherFiles/

Tables

OtherFiles/

TablesPeriodic LoadPeriodic Load

Continuous Update

Continuous Update

HiveQueries

HiveQueries

Storage Handler Syntax

• HBase ExampleCREATE TABLE users(

userid int, name string, email string, notes string)

STORED BY

'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES (

“hbase.columns.mapping” = “small:name,small:email,large:notes”)

TBLPROPERTIES (

“hbase.table.name” = “user_list”);

Release 0.7• In development

– Concurrency Control

– Stats Collection– Stats Functions– Indexes– Local Mode

– Faster map join– Multiple DISTINCT

aggregates– Archiving– JDBC/ODBC

improvements

Concurrency Control

• Pluggable distributed lock manager– Default is Zookeeper-based

• Simple read/write locking

• Table-level and partition-level

• Implicit locking (statement level)– Deadlock-free via lock ordering

• Explicit LOCK TABLE (global)

Statistics Collection

• Implicit metastore update during load– Or explicit via ANALYZE TABLE

• Table/partition-level– Number of rows– Number of files– Size in bytes

Stats-driven Optimization

• Automatic map-side join

• Automatic map-side aggregation

• Need column-level stats for better estimates– Filter/join selectivity– Distinct value counts– Column correlation

Statistical Functions

• Stats 101– Stddev, var, covar– Percentile_approx

• Data Mining– Ngrams, sentences (text analysis)– Histogram_numeric

• SELECT histogram_numeric(dob_year) FROM users GROUP BY relationshipstatus

Histogram query results

• “It’s complicated” peaks at 18-19, but lasts into late 40s!• “In a relationship” peaks at 20• “Engaged” peaks at 25• Married peaks in early 30s• More married than single at 28• Only teenagers use widowed?

Pluggable Indexing

• Reference implementation– Index is stored in a normal Hive table– Compact: distinct block addresses– Partition-level rebuild

• Currently in R&D– Automatic use for WHERE, GROUP BY– New index types (e.g. bitmap, HBase)

Local Mode Execution

• Avoids map/reduce cluster job latency

• Good for jobs which process small amounts of data

• Let Hive decide when to use it– set hive.exec.model.local.auto=true;

• Or force its usage– set mapred.job.tracker=local;

Faster map join

• Make sure small table can fit in memory– If it can’t, fall back to reduce join

• Optimize hash table data structures

• Use distributed cache to push out pre-filtered lookup table– Avoid swamping HDFS with reads from

thousands of mappers

Multiple DISTINCT Aggs

• ExampleSELECT

view_date,

COUNT(DISTINCT userid),

COUNT(DISTINCT page_url)

FROM page_views

GROUP BY view_date

Archiving

• Use HAR (Hadoop archive format) to combine many files into a few

• Relieves namenode memory• Archived partition becomes read-only• Syntax:

ALTER TABLE page_views

{ARCHIVE|UNARCHIVE}

PARTITION (ds=‘2010-10-30’)

JDBC/ODBC Improvements

• JDBC: Basic metadata calls– Good enough for use with UI’s such as

SQuirreL

• JDBC: some PreparedStatement support– Pentaho Data Integration

• ODBC: new driver under development (based on sqllite)

Hive is now a TLP• PMC

– Namit Jain (chair)– John Sichi– Zheng Shao– Edward Capriolo– Raghotham Murthy– Ning Zhang

– Paul Yang– He Yongqiang– Prasad Chakka– Joydeep Sen Sarma– Ashish Thusoo

• Welcome to new committer Carl Steinbach!

Developer Diversity

• Recent Contributors– Facebook, Yahoo, Cloudera– Netflix, Amazon, Media6Degrees, Intuit– Numerous research projects– Many many more…

• Monthly San Francisco bay area contributor meetups

• East coast meetups?

Roadmap: Security

• Authentication– Upgrading to SASL-enabled Thrift

• Authorization– HDFS-level

• Very limited (no ACL’s)• Can’t support all Hive features (e.g. views)

– Hive-level (GRANT/REVOKE)• Hive server deployment for full effectiveness

Roadmap: Hadoop API

• Dropping pre-0.20 support starting with Hive 0.7– But Hive is still using old mapred.*

• Moving to mapreduce.* will be required in order to support newer Hadoop versions– Need to resolve some complications with

0.7’s indexing feature

Roadmap: Howl

• Reuse metastore across Hadoop

Howl

Hive Pig Oozie Flume

HDFS

Roadmap: Heavy-Duty Tests

• Unit tests are insufficient

• What is needed:– Real-world schemas/queries– Non-toy data scales– Scripted setup; configuration matrix– Correctness/performance verification– Automatic reports: throughput, latency,

profiles, coverage, perf counters…

Roadmap: Shared Test Site

• Nightly runs, regression alerting

• Performance trending

• Synthetic workload (e.g. TPC-H)

• Real-world workload (anonymized?)

• This is critical for– Non-subjective commit criteria– Release quality

Resources

• http://hive.apache.org

• [email protected]

• [email protected]

• Questions?

hive evolution: apachecon na 2010

Technology