geolocation analysis using hiveql

Post on 21-Apr-2017

316 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Geolocation Data Analysis for Safe Residence using HiveQL

TEAM: PRIYANKA KALE, PRIYAL MISTRY, HITESH JAGTAP GUIDE: DR. JONGWOOK WOO

24th Annual Student Symposium, CSULA26th February 2016

Table of Contents1. Introduction

2. Big Data

3. Flowchart

4. Specifications

5. Implementation

6. Visualization

7. GitHub

8. Business Perspective

9. References

Introduction: Goal- To determine if a location is safe or not by analyzing huge

crime data (1.3 GB) for Chicago city in IL collected from 2001 to present(November 2015).

This is a study of real dataset provided by the government of United States of America using Big Data Analytics and related Tools.

Query output is visualized using different graphs and maps for better interpretation.

Big Data

Volume

Complexity

Variety

Variability

Flowchart

Download Dataset

Upload data into HDFS

Trigger Hive Queries

Result Tables

Output visualization

Specifications

• Microsoft Azure Hortonwork’s sandbox: 1. Linux system2. No. of nodes: 43. 8 cores4. Size-14 Gb

Implementation

Hue is a web application which helps to browse HDFS and work with Hive and Cloudera Impala queries, MapReduce jobs.

Creation of tables in Hcatalog:

Hive and Beeswax

Hive is an infrastructure built on top of Hadoop for data summarization, query and analysis

Beeswax an application to perform HIVE queries

Processing in Beeswax:

Total no and rank of crime type –

select primary_type, count(iucr), rank() over (ORDER BY count(iucr) desc) from crime group by primary_type limit

100;

Queries and Visualization

number of crime as per location type for a given area- select location_description, count(iucr) from crime where address = '008XX N MICHIGAN AVE' group by location_description limit 100;

0200400600800

10001200

Total

Total

Final Outcome of Analysis:CREATE TABLE UnsafeArea row format delimited fields terminated by ',' STORED AS RCFile AS select address,count(iucr) AS total_crimes,rank() over (ORDER BY count(iucr) desc) AS rank from crime GROUP BY address;

GitHub

URL: https://github.com/priya708/Project-520

Business Perspective Get better advertisement

Predictive Policing for Police department: The future of Law enforcement?

• Reducing Random Gunfire• Connecting Burglaries and Code Violations

References

https://catalog.data.gov

https://cwiki.apache.org/confluence/display/Hive/Tutorial

https://hortonworks.com/tutorials

THANK YOU

top related