Download - Hadoop,Big Data Analytics and More
![Page 1: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/1.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
Bangalore Hadoop Meetup – 6 3rd August 2013
Video
![Page 2: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/2.jpg)
Trendwise Analytics
Agenda
Introduction to Big Data
![Page 3: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/3.jpg)
Trendwise Analytics
Market opportunity
IDC, a research firm, predicts that the market for Big Data technology and services will reach $16.9 billion by 2015, up from $3.2 billion in 2010. That is a 40 percent-a-year growth rate — about seven times the estimated growth rate for the overall information technology and communications business, according to IDC.
Billions and billions: big data becomes a big deal :
Deloitte predicts that in 2012, “big data” will likely experience accelerating growth and market penetration.
![Page 4: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/4.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
Forecast growth of Hadoop Job Market
Source: Indeed -- http://www.indeed.com/jobtrends/Hadoop.html
![Page 5: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/5.jpg)
Trendwise Analytics
What is Big Data
![Page 6: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/6.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
Internet Minute ....
![Page 7: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/7.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
Other sources
• A commercial aircraft generates 3GB of flight sensor data in 1 hour
An ERP system for an mid size company grows by 1-2TB annually
A Video Suveillance Camera generates 1-3TB data in 3 months
Airtel or Vodafone generates 3TB of Call Details Records (CDR) every day
Every day 2.5 quintillion (2.5×10^18) bytes of data is created
i.e., 2,500,000TB
![Page 8: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/8.jpg)
Trendwise Analytics
How are Companies using Big Data Technology?
![Page 9: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/9.jpg)
Trendwise Analytics
Copyright Trendwise Analytics917 Aug 2013
Watson wins Jeopardy!
Feb 14th 2011 – Watson wins Jeopardy! beating its human opponents.
Watson is IBM’s super computer built using Big Data Technology.
![Page 10: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/10.jpg)
Trendwise Analytics
Across All Industries
Web app optimization
Smart meter monitoring
Equipment monitoring
Advertising analysis
Life sciences research
Fraud detection
Healthcare outcomes
Weather forecasting
Natural resource exploration
Social network analysis
Churn analysis
Traffic flow optimization
IT infrastructure optimization
Legal discovery
![Page 11: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/11.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
What are the types of business problems?
Source: Cloudera “Ten Common Hadoopable Problems”
![Page 12: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/12.jpg)
Trendwise Analytics
Companies using Big Data
12
New $1B corporate center for software and analytics
Hiring 400 data scientists
Includes financial and marketing applications,
but with special focus on industrial uses of big data
When will this gas turbine need maintenance?
Ford collects and aggregates data from the 4 million vehicles that use in-car sensing and remote app management software
The data allows to glean information on a range of issues, from how drivers are using their vehicles, to the driving environment that could help them improve the quality of the vehicle
Partnered with Microsoft to develop SYNC
Amazon has been collecting customer information for years--not just addresses and payment information but the identity of everything that a customer had ever bought or even looked at.
They’re using that data to build customer relationship
AT&T has 300 million customers
A team of researchers is working to turn data collected through the company’s cellular network into a trove of information for policymakers, urban planners and traffic engineers.
The researchers want to see how the city changes hourly by looking at calls and text messages relayed through cell towers around the region, noting that certain towers see more activity at different times
![Page 13: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/13.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
TECHNOLOGY
Hadoop Non-Hadoop
![Page 14: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/14.jpg)
Trendwise Analytics
What is Hadoop?Open source project started by Doug Cutting
A platform to manage Big Data
Helps in Distributed computing
Runs on Commodity Hardware
![Page 15: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/15.jpg)
Trendwise Analytics
Hadoop is a set of Apache Frameworks and more…
Data storage (HDFS) Runs on commodity hardware (usually Linux) Horizontally scalable
Processing (MapReduce) Parallelized (scalable) processing Fault Tolerant
Other Tools / Frameworks Data Access
HBase, Hive, Pig, Mahout Tools
Hue, Sqoop Monitoring
Greenplum, Cloudera
Hadoop Core - HDFS
MapReduce API
Data Access
Tools & Libraries
Monitoring & Alerting
![Page 16: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/16.jpg)
Trendwise Analytics
What are the core parts of a Hadoop distribution?
![Page 17: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/17.jpg)
A View of Hadoop (from Hortonworks)
Source: “Intro to Map Reduce” -- http://www.youtube.com/watch?v=ht3dNvdNDzI
![Page 18: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/18.jpg)
Hadoop Cluster HDFS (Physical) Storage
![Page 19: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/19.jpg)
Trendwise Analytics
MapReduce Job – Logical View
Image from - http://mm-tom.s3.amazonaws.com/blog/MapReduce.png
![Page 20: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/20.jpg)
Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png
MapReduce Example - WordCount
![Page 21: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/21.jpg)
Ways to MapReduce
Libraries Languages
Note: Java is most common, but other languages can be used
![Page 22: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/22.jpg)
Hadoop Ecosystem
![Page 23: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/23.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
Other components
Hive• Data Warehouse infrastructure
that provides data summarization and ad hoc querying on top of Hadoop
PIG• A high-level data-flow language and
execution framework for parallel computation
Sqoop• Sqoop is a tool designed to
help users of large data import existing relational databases into their Hadoop clusters
Zookeeper• Zookeeper is a centralized
service for maintaining configuration information, naming, providing distributed synchronization, and providing group services
![Page 24: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/24.jpg)
Trendwise Analytics
Benefits of Hadoop
• Hadoop is designed to run on cheap commodity hardware
• It automatically handles data replication and node failure
• Handles large volumes of unstructured data easily• Last but not least – its free! ( Open source)
![Page 25: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/25.jpg)
Trendwise Analytics
Copyright Trendwise Analytics
Commercial Hadoop Distributions
• Cloudera• Hortonworks• Greenplum, A Division of EMC• IBM InfoSphere BigInsights
![Page 26: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/26.jpg)
Trendwise Analytics
How to get started?
1. http://hadoop.apache.org/
2. Pre-requisite:
- Linux ( Preferred OS) - Java JDK 3. Install and run a single node cluster
4. Follow the instructions on Mukul's Blog.
![Page 28: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/28.jpg)
Trendwise Analytics
Web Interface – NameNode Tracker
![Page 29: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/29.jpg)
Trendwise Analytics
Job Tracker
![Page 30: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/30.jpg)
Trendwise Analytics
Task Tracker
![Page 31: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/31.jpg)
Trendwise Analytics
Technology – Non Hadoop
•HPCC - HPCC Systems from LexisNexis Risk Solutions offers a proven, • open-source, data-intensive supercomputing platform
• designed for the enterprise to solve big data problems.
• SAP HANA is SAP AG’s implementation of in-memory database technology.
•No-SQL Databases – Cassandra, CouchDB, Redis
![Page 32: Hadoop,Big Data Analytics and More](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6871c4a7959a2128b4620/html5/thumbnails/32.jpg)
Trendwise Analytics
Contact us
S.Mohan Kumar
Co-Founder and CEO
Trendwise Analytics
Website: www.TrendwiseAnalytics.com
Email: [email protected]
US Tollfree: +1 877 268 2872
India Number: +91 80 4094 9600
Copyright Trendwise Analytics