what is big data - architectures and practical use cases
DESCRIPTION
Five ways to get started with big data using IBM's big data platform.TRANSCRIPT
© 2013 IBM Corporation
What is big data?Architectures and Practical use casesTony Pearson – IBM Master Inventor and Senior Managing Consultant
March 2013
© 2013 IBM Corporation2
“At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “big data, big impact,”declared data a new class of economic asset, like currency or gold.
“Companies are being inundated with data —from information on customer-buying habits to supply-chain efficiency. But many managers struggle to make sense of the numbers.”
“Increasingly, businesses are applying analytics to social media such as Facebook and Twitter, as well as to product review websites, to try to “understand where customers are, what makes them tick and what they want”, says Deepak Advani, who heads IBM’s predictive analytics group.”
“Big Data has arrived at Seton Health Care Family, fortunately accompanied by an analytics tool that will help deal with the complexity of more than two million patient contacts a year…”
“Data is the new oil.”Clive Humby
The Oscar Senti-meter — a tool developed by the L.A. Times, IBM and the USC Annenberg Innovation Lab — analyzes opinions about the Academy Awards race shared in millions of public messages on Twitter .”
“Data is the new Oil.”In its raw form, oil has little value. Once processed & refined, it helps power the world.
“…now Watson is being put to work digesting millions of pages of research,incorporating the best clinical practices and monitoring the outcomes to assist physicians in treating cancer patients.”
2
© 2013 IBM Corporation3
Big data is the analysis of information to identify trends, patterns and insights to make better business decisions
Collectively Analyzing the broadening Variety
Responding to the increasing Velocity
Cost efficiently processing the growing Volume
Establishing the Veracity of big data sources
30 Billion RFID sensors and counting
1 in 3 business leaders don’t trust the information they use to make decisions
50x 35 ZB
2020
80% of the worlds data is unstructured
2010
3
© 2013 IBM Corporation4
Where is all this data coming from?
© 2013 IBM Corporation5
Why is big data such a hot topic? Because technology makes it possible to analyze ALL available data
Cost effectively manage and analyze
all available data,
in its native form – unstructured, structured, streaming
ERPCRM RFID
Website
Network Switches
Social Media
Billing
5
© 2013 IBM Corporation6
Imagine the Possibilities of Harnessing your Data Resources
Retailer reduces time to run queries by 80% to
optimize inventory
Stock Exchange cuts queries from 26 hours to2 minutes on 2 PB
Government cuts acoustic analysis from hours to 70
Milliseconds
Utility avoids power failures by analyzing 10 PB of data in minutes
Telco analyses streaming network data to reduce
hardware costs by 90%
Hospital analyzes streaming vitals to intervene 24
hours earlier
Big data challenges exist in every business today
6
© 2013 IBM Corporation77
Impact every aspect of your business
7
Run Zero-latency OperationsAnalyze all available operational data and react in real-time to optimize processes.
Reduce the cost of IT with new cost-effective technologies.
Know Everything about your CustomerAnalyze all sources of data to know your customers as individuals, from channel interactions to social media.
Instant Awareness of Fraud and RiskDevelop better fraud/risk models by analyzing all available data, and detect fraud in real-time with streaming transaction analysis.
Innovate New Products at Speed and ScaleCapture all sources of feedback and analyze vast amounts of market and research data to drive innovation.
Exploit Instrumented AssetsMonitor assets from real-time data feeds to predict and prevent maintenance issues and develop new products & services.
© 2013 IBM Corporation8
In order to realize new opportunities, you need to think beyond traditional sources of data
Social Data
• Variety
• Highly unstructured
• Veracity
Machine Data
• Velocity
• Semi-structured
• Ingestion
Transactional & Application Data
• Volume
• Structured
• Throughput
Enterprise Content
• Variety
• Highly unstructured
• Volume
8
© 2013 IBM Corporation9
Leveraging big data requires multiple platform capabilities
Manage & store huge volume of any data
Hadoop File System
MapReduce
Manage Streaming Data Stream Computing
Analyze Unstructured Data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all data sources
Integration, Data Quality, Security, Lifecycle Management, MDM
Understand and navigate federated big data sources
Federated Discovery and Navigation
© 2013 IBM Corporation10
Business-centric big data enables you to start with most critical business pain and expand the foundation for future requirements
� Not just new technology—big data is a business strategy for capitalizing on information resources
� Getting started is crucial
� Success at each entry point is accelerated by products within the big data platform
� Build the foundation for future requirements by expanding further into the big data platform
IBMbig dataplatform
PureD
ata
System
s
Info
sphe
r e O
pti m
© 2013 IBM Corporation11
1 – Federated Discovery and Navigation
• Customer Need– Understand existing data sources– Expose the data within existing
content management and file systems for new uses, without copying the data to a central location
– Search and navigate big data from federated sources
• Value Statement– Get up and running quickly and
discover and retrieve relevant big data – Use big data sources in new
information-centric applications
• Customer examples– Proctor and Gamble – Connect
employees with a 360° view of big data sources
• Get started with: IBM Vivisimo Velocity
11
© 2013 IBM Corporation12
2 – Analyze raw data
• Customer Need– Ingest data as-is into Hadoop and derive insight from it– Process large volumes of diverse data within Hadoop– Combine insights with the data warehouse– Low-cost ad-hoc analysis with Hadoop to test new
hypothesis
• Value Statement– Gain new insights from a variety and combination of
data sources – Overcome the prohibitively high cost of converting
unstructured data sources to a structured format – Extend the value of the data warehouse by bringing in
new types of data and driving new types of analysis– Experiment with analysis of different data combinations
to modify the analytic models in the data warehouse
• Customer examples– Financial Services Regulatory Org – managed
additional data types and integrated with their existing data warehouse
• Get started with: InfoSphere BigInsights
12
© 2013 IBM Corporation13
3 – Analyze streaming data
• Customer Need– Harness and process streaming data sources – Select valuable data and insights to be stored
for further processing– Quickly process and analyze perishable data,
and take timely action
• Value Statement– Significantly reduced processing time and
cost – process and then store what’s valuable
– React in real-time to capture opportunities before they expire
• Customer examples– Ufone – Telco Call Detail Record (CDR)
analytics for customer churn prevention
• Get started with: InfoSphere Streams
Streams ComputingStreaming Data
Sources
ACTION
13
© 2013 IBM Corporation14
4 – Simplify your data warehouse
• Customer Need– Business users are hampered by the poor
performance of analytics of a general-purpose enterprise warehouse – queries take hours to run
– Enterprise data warehouse is encumbered by too much data for too many purposes
– Need to ingest huge volumes of structured data and run multiple concurrent deep analytic queries against it
– IT needs to reduce the cost of maintaining the data warehouse
• Value Statement– Speed and Simplicity for deep analytics– 100s to 1000s users/second for operation analytics
• Customer examples– Catalina Marketing – executing 10x the amount of
predictive workloads with the same staff
• Get started with: IBM PureData Systems
14
© 2013 IBM Corporation15
5 – Reduce costs with Archive and Storage Tiering
• Customer Need– Reduce the overall cost to maintain data in the
warehouse – often its seldom used and kept ‘just in case’
– Lower costs as data grows within the data warehouse – Reduce expensive infrastructure used for processing
and transformations
• Value Statement– Support existing and new workloads on the most cost
effective alternative, while preserving existing access and queries
– Lower storage costs– Reduce processing costs by pushing processing onto
commodity hardware and parallel processing
• Customer examples– Financial Services Firm – archive production data to
archive files on less expensive disk and tape storage, while maintaining access to data
• Get started with: IBM Infosphere Optim, Infosphere Information Server, Master Data Management (MDM)
15
© 2013 IBM Corporation16
Entry points are accelerated by products within the big data platform
BI / Reporting
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictive Analytics
Content Analytics
Analytic Applications
IBM big data platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
HadoopSystem
Stream Computing
Data Warehouse
2 – Analyze raw data with Hadoop
InfoSphereBigInsights
3 – Analyze streaming data
InfoSphere Streams
1 – Federated Discovery and Navigation
IBM Vivisimo4 – Simplify your data warehouse
IBM Warehouse Solutions and PureData systems
5 – Reduce costs through archive and storage tiering
InfoSphere OptimInformation ServerMDM
16
© 2013 IBM Corporation17
There are many use cases for a big data platform
Innovate new Products Speed and Scale
Know Everything About your Customers
• Social Media - Product/brand Sentiment analysis
• Brand strategy• Market analysis• RFID tracking & analysis• Transaction analysis to create insight-
based product/service offerings
• Social media customer sentiment analysis• Promotion optimization• Segmentation• Customer profitability • Click-stream analysis• CDR processing• Multi-channel interaction analysis• Loyalty program analytics• Churn prediction
Run Zero Latency Operations• Smart Grid/meter management• Distribution load forecasting• Sales reporting• Inventory & merchandising optimization• Options trading• ICU patient monitoring• Disease surveillance• Transportation network optimization• Store performance• Environmental analysis• Experimental research
Instant Awareness of Risk and Fraud
• Multimodal surveillance• Cyber security• Fraud modeling & detection• Risk modeling & management• Regulatory reporting
Exploit Instrumented Assets• Network analytics• Asset management and predictive issue resolution• Website analytics• IT log analysis
17
© 2013 IBM Corporation18
Achieve Breakthrough Outcomes with big data capabilities
With Unique CapabilitiesWith Unique Capabilities
EnterpriseContent
Transactional / Application Data
Machine Data
Social Media Data
Visualization & Discovery
Hadoop
Data warehousing
Stream Computing
Text Analytics
Integration & governance
18
Analyze any big data TypeAnalyze any
big data TypeTo Achieve Breakthrough
OutcomesTo Achieve Breakthrough
Outcomes
Know Everything about your Customers
Run Zero-latency Operations
Innovate new products at Speed and Scale
Instant Awareness of Fraud and Risk
Exploit Instrumented Assets
© 2013 IBM Corporation19
Example Configuration for big data
IBM PureDataSystem for Analytics
© 2013 IBM Corporation20
The IBM big data platform advantage
BI / Reporting
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictive Analytics
Content Analytics
Analytic Applications
IBM big data platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
HadoopSystem
Stream Computing
Data Warehouse
• The platform provides benefit as you move from an entry point to a second and third project
• Shared components and integration between systems lowers deployment costs
• Key points of leverage• Reuse text analytics across streams
and Hadoop
• HDFS connectors between Streams and Information Integration
• Common integration, metadata and governance across all engines
• Accelerators built across multiple engines – common analytics, models, and visualization
20
© 2013 IBM Corporation21
© 2013 IBM Corporation22
IBM Redbooks Available!
� IBM Redbooks available on deployment of big data using IBM System x and InfoSphere software
� Reference Architecture for different sizes
© 2013 IBM Corporation23
About the Speaker
Mr. Tony Pearson
Master Inventor,
Senior Managing Consultant
IBM System Storage
Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined
IBM Corporation in 1986 in Tucson, Arizona, USA, and has lived there ever since. In his current role, Tony presents briefings on storage topics covering the entire System Storage product line, Tivoli storage software products, and topics related to Cloud
Computing. He interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for IBM’s integrated set of storage management software, hardware, and virtualization products.
Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine, and #1
most read IBM blog on IBM’s developerWorks. The blog has been published in series of books, Inside System Storage: Volume I through V.
Over the past years, Tony has worked in development, marketing and customer care positions for various storage hardware and software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in
Electrical Engineering, both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and software products.
9000 S. Rita RoadBldg 9032 Room 1238Tucson, AZ 85744
+1 520-799-4309 (Office)
Tony Pearson
Master Inventor, Senior Managing Consultant
IBM System Storage™
© 2013 IBM Corporation24
Additional Resources
24
Email:[email protected]
Twitter:http://twitter.com/az99Øtony
Blog: http://ibm.co/brAeZØ
Books:http://www.lulu.com/spotlight/99Ø_tony
IBM Expert Network:http://www.slideshare.net/az99Øtony
24
© 2013 IBM Corporation25
Trademarks and disclaimersAdobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.
Photographs shown may be engineering prototypes. Changes may be incorporated in production models.
© IBM Corporation 2013. All rights reserved.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml. ZSP03490-USEN-00