girish juneja - intel big data & cloud summit 2013
Post on 15-Jan-2015
740 Views
Preview:
DESCRIPTION
TRANSCRIPT
APAC Big Data &
Cloud Summit 2013
Girish JunejaGM, Big Data Software
Software & Services Group
Data Fab Transistor System Enablement Optimization Intelligence
Data
30 million networked
sensors growing at 30% a year
Computing
1 trillion devices connected to the Internet by 2015
Experience
500 million smart phone users
increasing 20% a year
Social
Machine Generated
User Generated
Feedback loops driving exponential growth
Evolving towards end-to-end real-time analytics
Decade Paradigm Architecture Platform
• Reporting / Data Mining• High Cost / Isolated use
90s
2000s
Today
• Model-based discovery• High Cost / Dept Use
• Unbounded Map Reduce Query • Low Cost / Enterprise Use• Arrival of vast amounts of
unstructured data
• Batch – “sales reports”• Sequential SQL queries
• Batch-ie correlated buying pattern• No SQL. parallel analysis• Shared disk/memory
Unlimited Linear Scale
RDMS
Proprietary MPP/DW Appliance
Open Source SW looselycoupled to commodity HW
No SQL RDMS
Scale
Scale NodeNode
• Real-time - ie recommend engine• Process @ storage node• Built-in data replication/reliability• Shared nothing, in memory
Distributed node addition
NodeNode Node
Multi-core
Node
Make big data work for you
Amount of data your enterprise will need to ingest: 50X
Proportion of data that is useful to you: 10%
Projected increase in your IT budget: 10%
=> Business as usual is not an option
SoftwareGlobal
EcosystemSecurity
Systems Architecture
Energy Efficient
Performance
ManufacturingLeadership
Benefit from Intel’s long-standing investments
Using volume economics to drive innovation
Intel
Fabricating silicon for big data
22nmA Revolutionary Leap
in Process Technology
37%Performance Gain at Low
Voltage1
>50%Active Power Reduction at
Constant Performance1
Intel lead vs. Industry
3.5 years
2007
45 nm2009
32 nm2011
22 nm
High-k Metal Gate Tri Gate
Intel lead vs. Industry
4 years
Intel® Xeon® Processor E5-4600 Product Family
Highest reliability & scalability
Highest memory capacity
Highest enterprise & database performance
Density-optimized
Cost-optimized
Improved HPC performance
1 Source: Published results as of 8 May 2012. See http://www.intel.com/performance/server/xeonE7/summary.htm for full list of benchmarks and configuration details.
Pumping the heart of the open datacenterIntel® Xeon® Processor E7-4800
Product Family
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Enabling open source solutionsOptimize software to take advantage of Intel® architecture
AES-NI SSD, 10GbE TXTMCAVT-*
3x performance in 3 years
Mission Critical deployments
Accelerates Crypto in JBoss
30x throughput Trusted Compute Pools
Contributing to Apache Hadoop
• File based encryption for Hadoop jobs• ACLs for HDFS and HBase at cell level
• Flash storage for MapReduce shuffle data• Caching and non-volatile memory for increased throughput• HDFS adaptive replication of hot-files
• HBase distributed tables across data centers• HDFS data replication across data centers• Archival storage support for cold data on HDFS
• SSE Instructions• JVM Enhancements• Infiniband RDMA Support
Supporting Intel Distribution for Apache Hadoop
Data Mining
Graph Analytics
Full Text SearchFull SQL
Batch Analytics
Security
Intel® Distribution for Apache Hadoop* software
Granular access control in HBase
Up to 20X faster crypto with AES-NI*
30X faster Terasort on Intel® Xeon processors, Intel 10GbE, and SSD
Up to 8.5X faster queries in Hive*
Job profiling and configuration, automated by Intel® Active Tuner
*Based on internal testing
Rhino
Cloud
HPC
Common authentication, access control, auditing
Bringing MapReduce to data on Lustre FS
Enabling real-time 100% SQL on Hadoop
Optimizing Hadoop for virtualization & cloud
Backed by portfolio of datacenter products
Software
NetworkStorage & MemoryServer
Cache
Acceleration Software
With broad support from the ecosystem
* Other names and brands may be claimed as the property of others.
Proven in the enterprise
Using the Intel® Distribution to gain tremendous results
* Other names and brands may be claimed as the property of others.
IT
Putting advanced capabilities at work…• Expose new data• Dashboard/historical reporting• Real-time campaigns• Vertical apps• Predictive data services• Graph visualization• Log analysis
to solve real use cases• Fraud & threat detection• Life sciences research• Behavioral analysis
• Warranty analysis• Customer segmentation• Infrastructure optimization
From Hype to High Performance
Data-Driven Business: Customer Service
Value
• Enable subscriber access to billing data
• 30X gain in performance; lower TCO
Analytics
• Provides real-time retrieval of 6 months data
• Supports new BI with 15 types of queries
• Enables targeted ad serving and promotions
Data Management
• 30 TB/month of billing data
• 300K reads/second; 800K inserts/second
• 133-node cluster / Intel Xeon E5 processorsCDR
Subscriber Self Service
Intel Distribution
Value
Enable researchers to discover biomarkers and drug targets by correlating genomic data sets
90% gain in throughput; 6X data compression
Analytics
Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers)
Provide APIs for applications to combine and analyze public and private data sets
Data Management
Use Hive and Hadoop for query and search
Dynamically partition and scale Hbase
10-node cluster / Intel Xeon E5 processors / 10GbE
Data-Intensive Discovery: Genomics
Intel Distribution
Data-Rich Communities: Smart City
Value
• Enforce traffic laws and detect license fraud
• Monitor and predict traffic patterns
• In a city of 31 million people
Analytics
• Detect traffic law violations automatically
• Detect driver license fraud by data mining
• Forecast traffic with predictive analytics
Data Management
• 30,000 cameras
• 6Mb/s stream rate per camera
• 15 PB of images in use / 2B records in HBase
Detection Prevention
Regional
Local
Catalyzing the ecosystemFoster the ecosystem and develop new markets for Intel and its partners
Resources
Content
Case Studies
Whitepapers
Demos
http://hadoop.intel.com
Contacts
Girish Juneja
RK Hiremane
Eddie Toh
hadoop@intel.com
top related