big data …big opportunities ? ……big hype ? (or just a big mess ?) data challenges and ibm...

Post on 05-Jan-2016

35 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Big Data …Big Opportunities ? ……Big Hype ? (or just a Big Mess ?) Data challenges and IBM views. Dr. Matthew Ganis IBM Senior Technical Staff Member CIO Social Media Analytics Chief Architect Member, IBM Academy of Technology ganis@us.ibm.com @mattganis (twitter). - PowerPoint PPT Presentation

TRANSCRIPT

Big Data…Big Opportunities ?……Big Hype ?(or just a Big Mess ?)

Data challenges and IBM views

Dr. Matthew GanisIBM Senior Technical Staff Member

CIO Social Media Analytics Chief ArchitectMember, IBM Academy of Technology

ganis@us.ibm.com@mattganis (twitter)

The Term “Big Data” is pervasive - but still provokes a bit of confusion.

SO what is it ?

Big Data has been used to convey all sorts of concepts, including huge Quantities of data, social media analytics, next generation data managementCapabilities, real time data and much much more.....

That means we create about 1.8 Zetabytes of Information everytwo years.

Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible.

2009

800,000 petabytes

2020

35 zettabytesas much Data and ContentOver Coming Decade

44xBusiness leaders frequently make decisions based on information they don’t trust, or don’t have1 in 3

83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness

Business leaders say they don’t have access to the information they need to do their jobs

1 in 2

of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions

60%

… And Organizations Need Deeper Insights

Of world’s datais unstructured

80%

Information is at the Center of a New Wave of Opportunity…

5

Structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite.

The lack of structure makes compilation a time and energy-consuming task.

Structured vs Unstructured

The Challenge: Bring Together a Large Volume and Variety of Data to Find New Insights

Identify criminals and threats from disparate video, audio, and data feeds

Make risk decisions based on real-time transactional data

Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement

Detect life-threatening conditions at hospitals in time to intervene

Multi-channel customer sentiment and experience a analysis

7

Where we want to go

Merging the Traditional and Big Data Approaches

IT

Structures the data to answer that question

IT

Delivers a platform to enable creative discovery

Business Users

Explores what questions could be asked

Business Users

Determine what question to ask

Monthly sales reportsProfitability analysisCustomer surveys

Brand sentimentProduct strategyMaximum asset utilization

Big Data ApproachIterative & Exploratory Analysis

Traditional ApproachStructured & Repeatable Analysis

9

Structured vs. Exploratory

Where is all this data coming from ?

Where is all this data coming from ?

The Internet of Things (IoT) is a scenario in which objects, animals or people are provided with unique identifies and the ability to automatically transfer data over a network without requiring human-to-human or human-to-computer interaction

Where is all this data coming from ?

Approximately 2.7 billion userson the Internet today

Social Media as Big Data

What are we running ?

Who is talking about us ?Male / Female / Student / Professional / Retired / Customers ?

What do they “feel” ?Positive/Negative Sentiment / Angry / Annoyed ?

Where are they talking ?

Who are they influencing ?Who’s listening to them ?

When customers are talking about us or about our products we want to know where those conversations are happening so we can:

•Interact with interested customers•Get in front of any issues

Numerous studies show that word-of-mouth and personal recommendations are seen as far more credible to consumers than newspaper and television advertisements. While such mass advertisements are still necessary because of their powerful reach, these findings show that companies need to increase their focus on more personalized approaches. Clearly, this is incredibly difficult, maybe even impossible, for most companies to deal directly with the countless number of potential consumers. This is where influencers come in……

What makes someone Influential ?

The number of tweets they make ? The number of times people mention them ?

The number of followers they have?How often they are retweeted ?

We were asked to look at why a particular product launch wasn’t performing as expected. We pulled all the “chatter” about it and found:

But there were people talking about it…..

Some things to think about…..

Where is all this data coming from ?

While it is true that vast amounts of data are and will be generated from financial transactions, medical records, mobile phones and social media to the Internet of Things but there are questions that need to be asked to understand data’s meaningful use:

• How will data be managed?• How will data be shared?

Some thoughts about “data as a service”

•Establishment of standards, governance, guidelines. (E.g., open architectures)•Creation of industry specific data exchanges. (E.g., healthcare data exchanges, environment data exchanges etc.)•Creation of cross-industry data exchanges. (E.g., healthcare data exchanges seamlessly interacting with environmental data exchanges etc.)

Enterprise Integration

Trusted Information & Governance

– Companies need to govern what comes in, and the insights that come out

Data Management– Insights from Big Data must

be incorporated into the warehouse

Big Data PlatformData Warehouse

Enterprise Integration

Traditional Sources New Sources

34

Poor data quality

Dirty dataMissing valuesInadequate data sizePoor representation in data sampling

Data variety - trying to accommodate data that comes from different sources and in a variety of different forms (images, geo data, text, social, numeric, etc.).

How do we link them together ?Is there a common taxonomy or why to organize it ?Is there a “signal” in one source of data that points to another ?

Dealing with huge datasets, or 'Big Data,' that require distributed approaches.

Who is influential ?

How do we define influence ?

39

Thank you for your attention

Where is all this data coming from ?

Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible.

The Big Data Opportunity

Manage the complexity of multiple relational and non-relational data types and schemas

Streaming data and large volume data movement

Scale from terabytes to zettabytes (1B TBs)

Variety:

Velocity:

Volume:

41

Big Data : why is it possible Now ?

Traditional approach : Data to Function

Big Data approach : Function to Data

Database server

Data

Query Data

return Data

process Data

Master node

Data nodes

Data

Application server

User request

Send result

User request

Send Function to process on Data

Query & process Data

Data nodes

Data

Data nodes

Data

Data nodes

DataSend Consolidate result

Traditional approachApplication server and Database server are separateData can be on multiple serversAnalysis Program can run on multiple Application serversNetwork is still a the middleData have to go through the network

•Big Data Approach Analysis Program runs where are the data : on Data NodeOnly the Analysis Program are have to go through the networkAnalysis Program need to be MapReduce awareHighly Scalable :

1000s NodesPetabytes and more

42

What Big Data Is Not

It is not a replacement for your Database strategy

It is not a replacement for your Warehouse strategy

It is not a solution by itself, it needs jobs/applications to drive value

43

top related