xml to hive

Download XML TO HIVE

If you can't read please download the document

Upload: rajesh-kumar-mandal

Post on 11-Feb-2017

79 views

Category:

Documents


1 download

TRANSCRIPT

Multiple XML data load to Hive Environment

Daily region wise Sales XML data file from Univlever Group :

File name and path : \home\hadoop\Unilever_RAW_data\ market.xml TelenganakurnoolFruit-MarketBathing-Soap(Lux)Bath-soap24/04/2016252829

Telenganakadapasanjay marketBathing-Soap(Margo)Bath-soap25/04/2016243030

Andra PradeshchitoorTTD MarketLaddu(Spl made with Ghee)Ladoo25/04/201625100120

Andra PradeshAnantapurVishal MarketTyres(MRF, CEAT, Zeal) Tyre25/04/20161200025000c1200

File name and path : \home\hadoop\Unilever_RAW_data\ market1.xml

Andra PradeshNelloreBig Theaterscinemascienmas(Hindi,English,Telegu)24/07/2015100150BiG-C

Andra PradeshGunturchilli Marketchilliclilli(Red)24/07/201550100RED-C

Objective :

The aim of the exercise is to analyse the data and generate sales trends like 52 weeks high/low, month by month trends, state wise trends and overall price fluctuations for various products and store the final output in form of JSON documents in NoSQL database MongoDB.

Solution Step 1:

hadoop fs -mkdir /xml/data/commodityhadoop fs -copyFromLocal *.xml /xml/data/commodity

Step 2: Need to download hivexmlserde-1.0.5.3.jar; [ By default this jar will not be available in hive/lib folder, for that reason need to download and add the below jar ...]http://mvnrepository.com/artifact/com.ibm.spss.hive.serde2.xml/hivexmlserde/1.0.5.3

hive> add jar /home/hadoop/apache-hive-1.2.1-bin/lib/hivexmlserde-1.0.5.3.jar;

Objective :Step 3:

CREATE TABLE xml_items(State STRING, District STRING, Market STRING, Commodity STRING, Variety STRING,Arrival_Date STRING, Min_x0020_Price FLOAT, Max_x0020_Price FLOAT, Modal_x0020_Price FLOAT)ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'WITH SERDEPROPERTIES ("column.xpath.State"="/Table/State/text()","column.xpath.District"="/Table/District/text()","column.xpath.Market"="/Table/Market/text()","column.xpath.Commodity"="/Table/Commodity/text()","column.xpath.Variety"="/Table/Variety/text()","column.xpath.Arrival_Date"="/Table/Arrival_Date/text()","column.xpath.Min_x0020_Price"="/Table/Min_x0020_Price/text()","column.xpath.Max_x0020_Price"="/Table/Max_x0020_Price/text()","column.xpath.Modal_x0020_Price"="/Table/Modal_x0020_Price/text()")STORED ASINPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'TBLPROPERTIES ("xmlinput.start"="","xmlinput.end"="");

Step 5:

hive> load data inpath '/xml/data/commodity/*.xml' OVERWRITE INTO TABLE xml_items;

hive> select * from xml_items;

Step 6:We can perform any hive query (example select sum, max,min...)