big data - applications and technologies overview
Post on 15-Jul-2015
336 Views
Preview:
TRANSCRIPT
Introduction
Big Data – use cases, applications, technologies and vendors overview
Aimed at providing high level overview of tools and technologies related to big data
Topics covered
Introduction to Big Data◦ Definition, need for Big Data, hype cycle
Applications of Big Data◦ Industry-wise applications
Big Data Technologies Overview◦ Hadoop, PIG, Hive, NoSQL, Columnar DB
Big Data Vendors Overview◦ Amazon, Cloudera, Hortonworks, MapR
etc
Big Data - definition
popular term used to describe the exponential growth and availability of data, both structured and unstructured.
collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.
may refer to both volume of data as well as the tools and processes
Need for Big Data
2.7 ZB of data in Digital Universe Today
FB stores and analyzes 30+ PB of data
Walmart data exceeds 2.5 PB
better decision making and increased operational efficiency
When to go for a Big Data
SolnAnalyze all types of data
Most or all of the data to be analyzed
Iterative and exploratory
Business measures not predetermined
Traditional warehouse not suitable for unstructured data and schema compliant
Retail – Pricing Optimization
Analyze millions of sold or items for sale
Valuable insights about customers and markets in quicker timeframes
Aggregate data from multiple channels in multiple formats
Day long jobs complete in minutes
Retail – Smart shopping exp
Pricing data, POS, txns, Social media, call center records, promotions
Better understanding of customer preferences, shopping patterns
Geo location apps - deliver personalized marketing experience
Big Data in Finance
Customer segmentation
◦ Correlate purchase history, profile info, behaviour on social media
◦ Generate portfolio advice
Fraud Detection systems
Wealth Management
◦ Investment Research – try out new investment ideas, improve algorithmic trading
◦ Customer knowledge – unified view of customer
Big Data in Finance
Regulatory Compliance
◦ Impact of Credit Crisis ‘08 – regulatory
compliance
◦ Stringent monitoring and reporting of data
Risk Management
◦ Better analysis of investment positions and risk
metrics
Big Data in Healthcare
EMR – Electronic Medical Records initiative in US
Complete digitization of a patient’s medical info such as profile, disease treatment, pharmacy visits etc
Shared across networks
Slow adoption and challenges in aggregation
Big Data in Healthcare
Predict health issues ◦ Build Model that predicts patient’s risk
◦ Hospital to do followup with high-risk patients to avoid hospitalization
Predicting outbreaks◦ IBM Research project -STEM
◦ Model – correlates disease data with climate and temperature
◦ Can predict disease outbreak for regions expecting climatic change
Big Data – Internet of
ThingsData generated by machine – RFID chips implanted in devices
3 phases◦ Data ingestion – cost
◦ Data storage - cost
◦ Analytics – real value
Outsouce phases 1 and 2 to DBAAS (redshift, hortonworks, cloudera)
UPS – Case study
Aim◦ Find the fastest and most fuel-efficient
way to deliver packages to customers
ORION research project◦ Captures driver behaviour and safety
habits thru GPS
◦ Sensor data on fuel emissions and consumption
◦ Monitors deliveries and customer service
◦ Runs advanced algorithms to optimize routes
UPS – Case Study
early testing in 2011-2012 for 10k routes – 1.5 million gallons of fuel saved
Complete deployment in 55000 routes throughout North America by 2017
Big Data – Technologies
Mapreduce◦ programming paradigm allows massive
job execution parallely across thousands of servers
◦ Map task - input dataset is converted into a different set of key/value pairs
◦ Reduce task - several of the outputs of the "Map" task are combined to form a reduced set of tuples
Big Data - Technologies
Hadoop◦ Most popular open-source implementation
of mapreduce
◦ Can work with multiple forms of data
◦ run processor-intensive machine learning jobs
HIVE◦ Developed by FB and later made open-
source
◦ SQL like feature on top of hadoop
◦ Query data stored in a hadoop cluster
Big Data - Technologies
PIG◦ Scripting language
◦ Transforms data present in Hadoopcluster
◦ Developed by Yahoo and made open-source
NoSQL◦ Schema less databases
◦ Storage and retrieval of huge amounts of unstructured data
◦ Scalable, flexible and cloud-friendly but less consistent
◦ Cassandra, MongoDB, CouchDB,
Other Big Data
TechnologiesSearch engines – Lucene, Solr, ElasticSearch, Amazon CloudSearch
Stream Processing ◦ Apache Storm, Apache Spark, Cloudera’s
Impala, Yahoo’s S4 and Apache Tez
Big Data – Vendors
Amazon◦ Elastic Map Reduce – Amazon’s hadoop
distribution to be run on AWS infrastructure
◦ “largest adoption of hadoop platforms in the market” – Forrester report
Cloudera◦ Uses many aspects of open-source
hadoop
◦ Lot of features built on top of its hadoopnamely Cloudera Manager and Impala
◦ Strategy – stick to core hadoop but innovate
Big Data - Vendors
Hortonworks◦ Builds open-source hadoop ecosystem
◦ Also innovates – Ambari – cluster management software
IBM◦ Infosphere BigInsights – Analytics at rest
◦ Infosphere streams – Analytics in motion
◦ Hadoop-based analytics
◦ Stream computing
◦ Data Warehousing
◦ Application development
Big Data - Vendors
Intel◦ Develops custom Hadoop version on
Xeon chips
◦ Closest affinity between hardware and software
MapR◦ Best growing Hadoop distribution
company
◦ Highest scores for distribution architechture and data processing capabilities
◦ Needs more branding
Big Data - Vendors
Microsoft◦ Does not encourage open-source but
promotes hadoop
◦ HDInsight
Hadoop as a service to be run on Windows Azure
based on Hortonworks’ hadoop distribution
◦ Polybase
SQL server info can be searched on hadoop
◦ Big presence in other markets enables delivering end-end Hadoop solution
Big Data - Vendors
Teradata◦ SQL and RDBMS specialization
◦ Partnered with HortonWorks
◦ Integrated Hadoop with existing SQL offerings
◦ Existing teradata users can use Hadoopplatform to process warehouses data
top related