gail zhou on "big data technology, strategy, and applications"

30
Gail Z Associates, LLC DevNexus 2014, Data + Integration Big Data Technology, Strategy, and Applications Dr. Gail Zhou Gail Z Associates, LLC February 25, 2014 LinkedIn: http://www.linkedin.com/in/gailZhou Email: [email protected]

Upload: gail-zhou-mba-phd

Post on 26-Jan-2015

103 views

Category:

Technology


1 download

DESCRIPTION

Dr. Gail Zhou presented this topic at DevNexus on Feb 25, 2014. Big Data history, opportunities, and applications. Big Data key concepts, reference architecture with open source technology stacks. Hadoop architecture explained (HDFS, Map Reduce, and YARN). Big Data start-up challenges and strategies to overcome them. Technology update: Hadoop and Cassandra based technology offerings.

TRANSCRIPT

Page 1: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

DevNexus 2014, Data + Integration

Big Data Technology, Strategy, and Applications

Dr. Gail Zhou

Gail Z Associates, LLC

February 25, 2014LinkedIn: http://www.linkedin.com/in/gailZhou

Email: [email protected]

Page 2: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Outline

•What is Big Data and why is it such a big deal? Where can we use Big Data?

• Big Data Key Concepts and Technologies using Hadoop as an example

•Big Data Challenges and Start up Strategy: What are the challenges? How do you get started on Big Data?

Appendix: Other Big Data Technologies, Integration of Big Data with Existing Applications (an example)

2

Page 3: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

What is Big Data and why is it such a big deal?

3

Page 4: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

A Brief History of Big Data Sources: Wikipedia, Forbes.com, and other articles

• 1941: “Information Explosion” term coined.• 1963: Physicist and science historian Derek Price concluded the number of new journals grown exponentially. • 1990: Computer Scientist Peter J. Denning, “Saving All the Bits”, what machines can we build to monitor, process, and understand the data, its meanings, and patterns? – Intelligence out of the data? • 1998: Steve Bryson et all, “Visually exploring gigabyte data sets in real time”, ACM, Section “Big Data for Scientific Visualization”.

4

Page 5: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

A Brief History of Big Data Cont’dSources: Wikipedia, Forbes.com, and other articles

• 2001, Doug Laney, Meta Group, “3D Data Management, Controlling Data Volume, Velocity, and Variety” (More now: Veracity, Variability, and Value)

5

Page 6: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

A Brief History of Big Data Cont’dSources: Wikipedia, Forbes.com, and other articles

• 2001 - 2003: Google outgrown as a result of new revenue model, 5 cents per click. Google is now a giant big data leader.

• 1994 – Present: Yahoo!, Hadoop Shop (10K Nodes), Genome, Big Data Analytics.

• 1994 – Present: Amazon, AWS Cloud.

• 2003 – Present: Facebook, Twitter, LinkedIn, etc.

• 2013 and beyond : Many others.

6

Page 7: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC7

Page 8: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Source: Global Education Project

Population Growth Chart: Does it have something to do with Big Data? Machines, Satellites, Cameras, Internet, computers, and mobile phones are just “enablers” of big data.

8

Page 9: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Information Explosion. It is just the real beginning.

You got mail (too much).

You are embarrassed to admit you don’t know a lot of cool things happening in the world.

Don’t despair. You are not alone.

Source: Newbury College, UK

www.ucg.org

www.spchui.net

9

Page 10: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Big Data Opportunities

10

Page 11: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Big Data Opportunities

• Medical Research and Healthcare: Massive collected research and clinical information can be used to predict and prevent diseases, moving us from ‘sick care’ to ‘health care’. • Telecom: Traffic data and patterns can be utilized in real time to re-route.• Defense: Satellite images and other information can be meshed up to identify threats. • Utilities: Smart meter monitoring.• Public Safety: Pattern recognition and social media can help to predict crimes. • Financial Industry: Patten recognition and business rules to flag fraudulent activities.• Functional Areas: Investigational Search, Pricing Optimization, Risk Analysis, Churn Analysis, Behavior Analysis, Transactions Analysis, Revenue Assurance, Recommendation Engines, etc.

11

Page 12: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

• Traditional (Examples) Financial Transactions Energy and

InfrastructureTransportation Life Science and

HealthCare

Where Big Data Can Shine

• Notes – Big Data Technology is not the replacement – Big Data is complementary – In some cases, Big Data is the only way to get things done – Big Data has its own challenges

•Big Data (Examples)AdvertisementsSearch and Indexing Social NetworksScience Research Communications

12

Page 13: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Key Concepts in Big Data – Technology and Architectures

13

Page 14: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC14

Page 15: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC 15

Page 16: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC 16

Hadoop HDFS

Blocks (64M, 128M, etc.) are saved in different nodes with a replication factor ( default 3)

Page 17: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC 17

Hadoop Logical View

http://nosqlessentials.comProfessor: Fernando Rodriguez Olivera

Page 18: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC 18

Hadoop Logical View (HDFS + Map Reduce)

Page 19: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC 19

Hadoop V1 – Map Reduce Jobs Execution

Page 20: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Hadoop 2.0 with YARN

Page 21: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

YARN Interaction & Sequence

Page 22: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Big Data Challenges, Suggested Startup Strategy

22

Page 23: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Business urgency, time to market pressures Big Data start up needs careful planning Big Data needs infrastructure, software stacks, people, start up

plan Lack of Big Data Resources, Lack of Sponsorships (except in some

companies) Big Data is complex and multiple skill sets (mostly new to many

companies) – Infrastructure, Administration, Security, Programming, Testing, etc.

Skepticism about Big Data Integration with Existing Technologies and Systems

Can not develop isolated big data solutions Integration with existing systems will be a top challenge (requires

both sides to do additional work) Open Sources: Stability, Maturity, and Security

Big Data Start up Challenges

Page 24: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Full business needs and information requirements analysis. Business Drivers Revenue generation? Cost reduction? Customer retention? Compliance? Process Improvement? Fraud detection? Analytics? Dashboard? Solving a tough problem? Retiring/replacing technologies and systems?

Technology Evaluation and Selection Define requirements and objective first Evaluation a variety of technology stacks – develop a framework first

Executive Support for Start up Resources Prototyping, Discovery, and Planning

Rent Infrastructure in Cloud – VMWare, Amazon EC2, and others Use Spare Hardware and Network Bandwidth Assessment, Proposal. Project/Program Plan for next steps Start small and keep delivering

Architecture Design, Estimation, Business Case Obtain funding and executive sponsorships, owners, etc. SDLC, don’t forget Hardware, Security, Testing, etc.

Suggested Big Data Start up Strategy

Page 25: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC25

Appendix

Page 26: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Hadoop & Cassandra Based OfferingsName Offerings Notes

Apache Hadoop Hadoop Core Enhancement: YARN

Cloudera Enhanced Hadoop Leader

DataStax Enhanced Apache Cassandra Cassandra is a distributed NoSQL DB

Hortonworks Hadoop Development and support. Hortonworks Data Platform (HDP)

Yahoo Funded $23M + Others . Major alliances.

MapR Develops and sells Hadoop-derived software. M3. M5, M7.

Alliance with EMC, Amazon, and Google.

Sqoop HDFS and SQL Integration

Hue Hadoop GUI Tools

Amazon AWS, Cloud Hadoop Cluster

Microsoft Windows Azure HDInsight

IBM, Dell, etc. Hardware, Software, Services

Page 27: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Hadoop Related Technologies (Examples)Name Functions Notes

Apache Hue Hadoop GUI Hadoop has cmd.

Apache HBase NoSQL Distributed DB, Key/value Column Family Store, runs on top of Hadoop

Big Table Like Storage for Hadoop, written in Java.

Apache PIG High Level programming language for Map Reduce

Pig Latin, interoperability with Python, JavaScript, Ruby and Groovy

Apache HIVE Data Warehouse on top of Hadoop. HiveQL

Summaries, queries, and analysis. Open Sourced by Facebook.

Apache Zoo Keeper Hadoop Configuration / Build Tools

Distributed configuration, synchronization, etc)

Apache Sqoop Move RDBMS data into Hadoop Command lines

Page 28: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

Cassandra

http://nosqlessentials.comProfessor: Fernando Rodriguez Olivera

Page 29: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

HBase

http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/

Page 30: Gail Zhou on "Big Data Technology, Strategy, and Applications"

Gail Z Associates, LLC

http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/