info vision sanjeev kumar _ why data is drowning it world

53
1 Why Data is Drowning the (IT) World? Sanjeev Kumar VP & MD, Informatica India Infovision 2012 Summit October 2012

Upload: information-excellence

Post on 23-Jan-2015

539 views

Category:

Documents


1 download

DESCRIPTION

Why Data is Drowning the (IT) World? Sanjeev Kumar VP & MD, Informatica India

TRANSCRIPT

Page 1: Info vision sanjeev kumar _ why data is drowning it world

1

Why Data is Drowning the (IT) World?

Sanjeev KumarVP & MD, Informatica India

Infovision 2012 SummitOctober 2012

Page 2: Info vision sanjeev kumar _ why data is drowning it world

2

Agenda

• Why the Data Deluge?

• Trends Affecting Data Growth

• New Use-cases Enabled by Big Data

Page 3: Info vision sanjeev kumar _ why data is drowning it world

3

Agenda

• Why the Data Deluge?

• Trends Affecting Data Growth

• New Use-cases Enabled by Big Data

• Trends Underlying Big Data

• Building-blocks for Managing Big Data

• Q&A

Page 4: Info vision sanjeev kumar _ why data is drowning it world

4

Data is the New Plastic

Page 5: Info vision sanjeev kumar _ why data is drowning it world

5

Where Are We? Computing Circa 2012!

Page 6: Info vision sanjeev kumar _ why data is drowning it world

6

Where Are We? Computing Circa 2012!

• Six decades into the Computer Revolution

Page 7: Info vision sanjeev kumar _ why data is drowning it world

7

Where Are We? Computing Circa 2012!

• Six decades into the Computer Revolution

• Four decades since the invention of Microprocessor

Page 8: Info vision sanjeev kumar _ why data is drowning it world

8

Where Are We? Computing Circa 2012!

• Six decades into the Computer Revolution

• Four decades since the invention of Microprocessor

• Two decades into the rise of modern Internet

Page 9: Info vision sanjeev kumar _ why data is drowning it world

9

Where Are We? Computing Circa 2012!

• Six decades into the Computer Revolution

• Four decades since the invention of Microprocessor

• Two decades into the rise of modern Internet

• Two billion people using the broadband Internet

Page 10: Info vision sanjeev kumar _ why data is drowning it world

10

Where Are We? Computing Circa 2012!

• Six decades into the Computer Revolution

• Four decades since the invention of Microprocessor

• Two decades into the rise of modern Internet

• Two billion people using the broadband Internet

Major businesses and industries running on software and delivered as online services*

*”Why software is eating the world” Marc Andreessen, WSJ Aug 2011

Page 11: Info vision sanjeev kumar _ why data is drowning it world

11

Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.

.

Trends: Exploding Data Volumes, “Big Data”

Relational

Complex, Unstructured

• 2,500 Exabytes of new information in 2012 with Internet as primary driver• Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “Zettabytes” this year

Kilo – Mega – Giga – Terra –Peta – Exa – Zetta - Yotta

Page 12: Info vision sanjeev kumar _ why data is drowning it world

12

Big Data Buzz!

• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity

Page 13: Info vision sanjeev kumar _ why data is drowning it world

13

Big Data Buzz!

• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data

Page 14: Info vision sanjeev kumar _ why data is drowning it world

14

Big Data Buzz!

• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data

• 9000 job search results for “data scientists”

Page 15: Info vision sanjeev kumar _ why data is drowning it world

15

Big Data Buzz!

• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data

• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month

Page 16: Info vision sanjeev kumar _ why data is drowning it world

16

Big Data Buzz!

• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data

• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month• 2,000,000 PDFs from search on “big data white paper”

Page 17: Info vision sanjeev kumar _ why data is drowning it world

17

Big Data Buzz!

• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data

• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month• 2,000,000 PDFs from search on “big data white paper”• 112,000,000 Blog posts discussing big data

Page 18: Info vision sanjeev kumar _ why data is drowning it world

18

Big Data Buzz!

• 16 Big Data “V”s; Original 3: Volume, Variety & Velocity• 120+ Twitter accounts relating to Big Data

• 9000 job search results for “data scientists”• 70,000 Wikipedia “big data” hits per month• 2,000,000 PDFs from search on “big data white paper”• 112,000,000 Blog posts discussing big data• 1,350,000,000 Google results for “What is big data?”

Source IBM 2012

Page 19: Info vision sanjeev kumar _ why data is drowning it world

19

Why Now? Exploding Data Volumes

Internet of things

Increased consumption of digital content

Explosion in user generated content

Proliferation of web connected devices

Page 20: Info vision sanjeev kumar _ why data is drowning it world

20

Trends: Changing Data Economics

Low ROB

Return on Byte = value to be extracted from that byte / cost of storing that byte.

High ROB

Page 21: Info vision sanjeev kumar _ why data is drowning it world

21

Trends : Data Seen as a Strategic Asset

• Companies leveraging data assets to• Create new and differentiated products

• Product recommendation engines • Increase revenues

• Optimize ad placement to improve click-thru• Improve customer satisfaction / retention

• Analyze CDRs for dropped calls The sexy job in the next ten years will be statisticians. The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill. Hal Varian : Chief Economist, Google.

Page 22: Info vision sanjeev kumar _ why data is drowning it world

22

Big Data in the Enterprise

Page 23: Info vision sanjeev kumar _ why data is drowning it world

23

Why Now? Big Data Use-cases – User Behavior

• Location & Proximity Tracking• GPS in operational apps, security analysis, navigation & social media• New business opportunities for sales and services in proximity

Page 24: Info vision sanjeev kumar _ why data is drowning it world

24

Why Now? Big Data Use-cases – User Behavior

• Location & Proximity Tracking• GPS in operational apps, security analysis, navigation & social media• New business opportunities for sales and services in proximity

• Ad Tracking• Dynamic changes in ad placement, color, size and wording• Improved click-through behavior

Page 25: Info vision sanjeev kumar _ why data is drowning it world

25

Why Now? Big Data Use-cases – User Behavior

• Location & Proximity Tracking• GPS in operational apps, security analysis, navigation & social media• New business opportunities for sales and services in proximity

• Ad Tracking• Dynamic changes in ad placement, color, size and wording• Improved click-through behavior

• Social CRM• Text analytics on huge array of unstructured social media• KPI’s: share of voice, audience engagement, conversation reach, …

Page 26: Info vision sanjeev kumar _ why data is drowning it world

26

Why Now? Big Data Use-cases – User Behavior

• Location & Proximity Tracking• GPS in operational apps, security analysis, navigation & social media• New business opportunities for sales and services in proximity

• Ad Tracking• Dynamic changes in ad placement, color, size and wording• Improved click-through behavior

• Social CRM• Text analytics on huge array of unstructured social media• KPI’s: share of voice, audience engagement, conversation reach, …

• Causal Factor Discovery in Retail• Deviations based on competition, weather, promos, holidays, events

Page 27: Info vision sanjeev kumar _ why data is drowning it world

27

Why Now? “Hadoop-able” Use-cases – Sensors

• Building Sensors• Temperature, humidity, vibration and noise• Energy usage, security violations, failures in a/c, heat, plumbing

Page 28: Info vision sanjeev kumar _ why data is drowning it world

28

Why Now? “Hadoop-able” Use-cases – Sensors

• Building Sensors• Temperature, humidity, vibration and noise• Energy usage, security violations, failures in a/c, heat, plumbing

• In-flight Aircraft Sensors• Variables on engines, hydraulics, fuel & electrical systems• Real-time adaptive control, fuel usage, part failure prediction

Page 29: Info vision sanjeev kumar _ why data is drowning it world

29

Why Now? “Hadoop-able” Use-cases – Sensors

• Building Sensors• Temperature, humidity, vibration and noise• Energy usage, security violations, failures in a/c, heat, plumbing

• In-flight Aircraft Sensors• Variables on engines, hydraulics, fuel & electrical systems• Real-time adaptive control, fuel usage, part failure prediction

• Smart Utility Meters – Electric Grid• One read-out per second per meter across entire customer base• Dynamic load balancing on grid, failure response, adaptive pricing

Page 30: Info vision sanjeev kumar _ why data is drowning it world

30

Why Now? “Hadoop-able” Use-cases – Sensors

• Building Sensors• Temperature, humidity, vibration and noise• Energy usage, security violations, failures in a/c, heat, plumbing

• In-flight Aircraft Sensors• Variables on engines, hydraulics, fuel & electrical systems• Real-time adaptive control, fuel usage, part failure prediction

• Smart Utility Meters – Electric Grid• One read-out per second per meter across entire customer base• Dynamic load balancing on grid, failure response, adaptive pricing

• Mobile Cell Tower Networks• Analyze call-data-records(CDRs) to optimize cell tower placement• Improved user experience and network monetization

Page 31: Info vision sanjeev kumar _ why data is drowning it world

31

“Hadoop-able” Use-cases – Computing Delta’s

• Commercial Seed Gene Sequencing• Analyzing the sequence, identifying genes and gene families• Baseline reference for the larger cotton crop genome

Page 32: Info vision sanjeev kumar _ why data is drowning it world

32

“Hadoop-able” Use-cases – Computing Delta’s

• Commercial Seed Gene Sequencing• Analyzing the sequence, identifying genes and gene families• Baseline reference for the larger cotton crop genome

• Satellite Image Comparison• Overlay of images to create “hot spot” maps to show differences• Construction, destruction, changes due to disasters, encroachment

Page 33: Info vision sanjeev kumar _ why data is drowning it world

33

“Hadoop-able” Use-cases – Computing Delta’s

• Commercial Seed Gene Sequencing• Analyzing the sequence, identifying genes and gene families• Baseline reference for the larger cotton crop genome

• Satellite Image Comparison• Overlay of images to create “hot spot” maps to show differences• Construction, destruction, changes due to disasters, encroachment

• CAT Scan Comparison• Images taken as “slices” of human body• Automatic diagnosis of medical issues and their prevalence

Page 34: Info vision sanjeev kumar _ why data is drowning it world

34

“Hadoop-able” Use-cases – Computing Delta’s

• Commercial Seed Gene Sequencing• Analyzing the sequence, identifying genes and gene families• Baseline reference for the larger cotton crop genome

• Satellite Image Comparison• Overlay of images to create “hot spot” maps to show differences• Construction, destruction, changes due to disasters, encroachment

• CAT Scan Comparison• Images taken as “slices” of human body• Automatic diagnosis of medical issues and their prevalence

• Document Similarity Testing• Latent semantic analysis: “documents that agree with my doc”• Threat discovery, sentiment analysis and opinion polls

Page 35: Info vision sanjeev kumar _ why data is drowning it world

35

Agenda

• Why the Data Deluge?

• Trends Affecting Data Growth

• New Use-cases Enabled by Big Data

• Trends Underlying Big Data

• Building-blocks for Managing Big Data

• Q&A

Page 36: Info vision sanjeev kumar _ why data is drowning it world

36

Big DataConfluence of Big Transaction, Big Interaction and Big Data Processing

OnlineTransactionProcessing(OLTP)

Online AnalyticalProcessing(OLAP) &DW Appliances

SocialMedia Data

DeviceSensor Data

Scientific, genomic

Machine/Device

BIG TRANSACTION DATA BIG INTERACTION DATA

BIG DATA PROCESSING

Call detail records, image, click stream data

Page 37: Info vision sanjeev kumar _ why data is drowning it world

37

OnlineTransactionProcessing

(OLTP)

Online AnalyticalProcessing(OLAP) &

DW Appliances

OracleDB2Britton-LeeIngresInformixSybaseSQLServer

TeradataRedbrickEssBaseSybase IQNetezzaGreenplumDataAllegroAsterdataVerticaParaccelHana

BIG TRANSACTION DATA

Big Transaction DataOLTP and Analytic Databases

Page 38: Info vision sanjeev kumar _ why data is drowning it world

38

HRApplication

CRMApplication

MainframeCustomApplication

CustomApplication

CustomApplication

CustomApplication

CustomApplication

Big Transaction DataChanging Economics of Computing From Buy To Rent

Page 39: Info vision sanjeev kumar _ why data is drowning it world

39

Big Interaction DataChanging Role Of Computing From Transactions to Interactions

SocialMedia Data

DeviceSensor Data

Clickstream

Image/Text

Scientific• Genomic/Pharma• Medical

Machine/Device• Sensors/Meters/

RFID Tags• CDR/Mobile

BIG INTERACTION DATA

Social Media

Device Sensor Data

Page 40: Info vision sanjeev kumar _ why data is drowning it world

40

Big Interaction DataFrom Operational Efficiency To Organizational Effectiveness

Business Management• Business Analysis • Operational Automation

Brand Management• Sentiment Analysis• Proactive Customer

Engagement

RelationalTransactions

1970 - Current

SocialInteractions

2008 - Current

Page 41: Info vision sanjeev kumar _ why data is drowning it world

41

Big Interaction DataHow Do You Leverage Device Sensor Data?

• Geo Encoding

• Cell-phone Towers

• Medical Sensors

• RFID Tags

• Edge Networks

Page 42: Info vision sanjeev kumar _ why data is drowning it world

42

Big Data ProcessingHighly Scalable Processing Of All Data

OnlineTransactionProcessing(OLTP)

Online AnalyticalProcessing(OLAP) &DW Appliances

SocialMedia Data

DeviceSensor Data

Scientific, genomic

Machine/Device

BIG TRANSACTION DATA BIG INTERACTION DATA

BIG DATA PROCESSING

Call detail records, image, click stream data

Page 43: Info vision sanjeev kumar _ why data is drowning it world

43

Big Data Processing What is Hadoop?

PARALLEL

PERSISTENCE

SCRIPTING SQL QUERY

Page 44: Info vision sanjeev kumar _ why data is drowning it world

44

Big Data ProcessingWhat does Hadoop do?

• Cost effective scalability• Scale out on commodity hardware

• Support for processing all data types• Structured, Semi-structured and Unstructured data

• Extensibility• Open APIs to implement custom data processing logic

• Hadoop Challenges• Data movement into/out of Hadoop / HDFS• Requires specialized development skills

• Java, Hive, PIG etc.

Page 45: Info vision sanjeev kumar _ why data is drowning it world

45

Ingest Data Into HDFSSupport over 100different data sources

Native HDFSSource and

Target Support

Integrated development environment with metadata and preview support

Perform any preprocessing

needed before ingestion

Page 46: Info vision sanjeev kumar _ why data is drowning it world

46

Design and Execute Data Integration Logic on Hadoop

Design integration logic for Hadoop in a graphical and metadata driven environment

Configure where the integration logic should run – Hadoop or Native

Page 47: Info vision sanjeev kumar _ why data is drowning it world

47

Address Validation

Standardize

Parsing

Matching

Design and Execute Data Quality on HadoopBig Data Cleansing, Dedup, Unstructured Parsing

Address Validation and Geocoding enrichment across

260 countries

Probabilistic or Deterministic Matching

Standardization and Reference Data Management

Parsing of Unstructured Data/Text Fields of all data types of data (customer/

product/ social/ logs)

DQ logic pushed down/run natively ON Hadoop

Page 48: Info vision sanjeev kumar _ why data is drowning it world

48

Extract data from HDFS and Hive

48

Persist and write hadoop data into DW, HDFS or any target systems

Extract from HDFS as a native source

Extract from Hive as a native source

Perform any post processing

needed after extraction

Page 49: Info vision sanjeev kumar _ why data is drowning it world

49

Processing Big Data : What is missing?• Support for graph/networked data

• How does one visualize complex relationships?

• Data with dynamic schemas• Do the current patterns scale for very large number of columns?

• Are mappings the right paradigm?

• Ability to extract entities from unstructured data

49

Page 50: Info vision sanjeev kumar _ why data is drowning it world

50

References

• Why Software is Eating the World• Marc Andreessen, WSJ Aug 2011

• Evolving Role of EDW in Era of Big Data Analytics• Ralph Kimball, Kimball Group 2011

• Data Scientist: Sexiest Job of the 21st Century• Thomas H. Davenport & D.J.Patil, HBR Sept 2012

• Newly Emerging Best Practices for Big Data• Ralph Kimball, Kimball Group Oct 2012

Page 51: Info vision sanjeev kumar _ why data is drowning it world

51

Questions

Page 52: Info vision sanjeev kumar _ why data is drowning it world

52

INFA = Data + [ Archival | As a Service | Cleansing | Clustering | Consolidation | Conversion | De-duping | Exchange | Extraction | Federation | Hub | Identity | Integration | Life-cycle Management | Loading | Masking | Mastering | Matching | Migration | On Demand | Privacy | Profiling | Provisioning | Quality | Quality Assessment | Registry | Replication | Retirement | Services | Stewardship | Sub-setting | Synchronization | Test Management | Transformation | Validation | Virtualization | Warehousing |

]

Informatica & DataVerbs on Data – We do things to data!

Page 53: Info vision sanjeev kumar _ why data is drowning it world

53