Download - Understanding Big Data Services ~ Sharada Rao · 2013. 6. 18. · Social Networks Enterprise SOA Smart Devices Mobile Cloud Big Data There are many different measures of this phenomenon

Understanding Big Data Services ~ Sharada Rao

“Every two days now we create as much information as we did from the dawn of

civilization up until 2003.” Eric Schmidt, CEO Google

1. The speed at which the world is building services around data merits a robust approach to how data is modeled, understood and served up in a stable, regulated and predictable manner.

2. The unprecedented monies being spent in this market allows for SMBs, large players, disruptors and products to spawn and proliferate in hordes, making it an expensive mammoth ecosystem over the long term.

3. Differentiators would be intelligent around the craft of data services and tailor it to the Enterprise Data Architecture and industry wise custom-built genetic re-engineering of the same.

Advanced Analytics

REPORT TO THE PRESIDENT: EVERY FEDERAL

AGENCY NEEDS A 'BIG DATA' STRATEGY

• Speed • Traffic flow • Detection

GOT BIG DATA?

YOU’RE GONNA NEED

A FASTER NETWORK

BIG DATA NEEDS TO

THINK BIGGER

• Counter-intelligence • Situational awareness

MORE CUSTOMERS

EXPOSED AS BIG DATA

BREACH GROWS

• Credit and risk scoring • Fraud detection • Trade analysis

WHY WE STILL CAN'T

PREVENT FLASH

CRASHES

• “Whole earth” modeling • Climate change • Alternative energy

LINKING “BIG WEATHER” TO

GLOBAL WARMING

BIG DATA: YOU'LL

HAVE IT, BUT CAN

YOU HANDLE IT?

Core I/T (Compute, Storage, Database)

• Disease surveillance • Drug discovery • Personalized healthcare

MANAGING HEALTHCARE’S

“BIG DATA TSUNAMI”

WORLD RECORD IN DATA TRANSMISSION:

26 TERABITS PER SECOND ON A SINGLE

LASER BEAM

• Sentiment • Consumption • Promotions

P&G BIND WITH SOCIAL MEDIA MOMS — IMAGINE IF THIS WERE PHARMA

Advanced

Analytics

Core I/T

(Compute,

Storage,

Database)

Social Networks

Enterprise SOA

Smart Devices

Cloud Mobile

Big Data

There are many different measures of this phenomenon. IDC predicts that the digital universe will be 44 times bigger in 2020 than it was in 2009, totaling a staggering 35 zettabytes.3 EMC reports that

the number of custom-ers storing a petabyte or more of data will grow from 1,000 (reached in 2010) to 100,000 before the end of the decade.4 By 2012 it expects that some customers will be storing exabytes (1,000 petabytes) of information.5 In 2010 Gartner reported that enterprise data

growth will be 650 percent over the next five years, and that 80 percent of that will be unstructured.

Global Data volume is predicted to grow by 44 times by 2020 The ask is for mammoth scale out infrastructures, to manage large distributed sources of data The ask is for parallel processing capability in data modeling, cleansing, analytics and services

IaaS 1. Compute as a Service 2. Storage as a Service 3. Cloud Services – Data

Migration to & fro

DWaaS 1. DataManage as a Service 2. DataModeling as a

Service 3. DW as a Service

DaaS 1. DataAPIs as a Service 2. Data as a Product per

industry

BIaaS 1. Analytics as a Service 2. Insights as a Service

BPaaS 1. Data process re-

engineering as a Service

In order to accord rapid and parallel processing speed, scale and flexibilty, Google and Amazon have begun to use a ‘Shared Nothing’ architecture wherein each node is self contained and can crawl and link as per need of the outcome. Hence the emphasis is on the link and open-noding, stateless tri-axial plane of data/application APIs/ infrastructure. So the link is in effect an arrow which is a data retrieval form. This blows away legacy relational databases. Hence shared nothing. The largest application of this concept is the Linked Open Data W3C project. This in turn means a marriage with NoSQL – data that will not fit into row/column relational databases easily. This also means an IDOL framework – Intelligent Data Operating Layer; which can interpret human language semantically, process into a SQL like query, cross reference different kinds of data and integrate several algorithms of Search to serve up appropriate logical results. All with parallel processing and speed; crawling several stateless data universes.

http://lod-cloud.net/




Trading

Insurance Claims

Automotive Design

Genomics and Proteomics

Patient profiling

Disease Surveillance

Retail consumer trends

RFID and SCM

Social Networking

Mobile

Global Climate

Sensors and Energy

Space Research

Defence Intelligence

Security Information

Banking – Risk/Investment

Federal classified information

CERN data project

1. Templatized Metadata would be a data ‘productization’ for above areas 2. Predictable, industry specific regulated real time data

Map Reduce is a patentedsoftware framework introduced by Google and used to simplify data processing across massive data sets. As people rapidly increase their online activity and digital footprint, a huge amount of data is generated continuously. This data can be of multiple types (text, rich text, rdbms, graph, etc…) and organizations are finding it vital to quickly analyze this huge amounts of data generated by their customers and audiences to better understand and serve them. MapReduce is the tool that is helping those organizations in quick and efficient analysis and bring business value to the organization.

Read More @ Google

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/PTO/srchnum.htm&r=1&f=G&l=50&s1=7,650,331.PN.&OS=PN/7,650,331&RS=PN/7,650,331

http://en.wikipedia.org/wiki/Software_framework



http://en.wikipedia.org/wiki/Google

Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud that was developed utilizing Azure cloud

infrastructure services. Twister4Azure extends MapReduce paradigm by introducing extensions and optimizations for iterative MapReduce applications. Twister4Azure supports

cachingof l

oop-invarient data, adds a new merge step (map->reduce->merge) to the programming model and introduces a novel cache-aware task scheduling

mechanism. Twister4Azure running in Azure cloud outperforms Hadoop in local cluster by 2 to 4 times.

•Decentralized architecture for clouds

•Avoids single point of failures

•Utilize highly available and scalable Cloud services •Efficient execution of Iterative MapReduce applications

•Extends the MR programming model with iterative extensions

•Multi-level data caching to overcome data access latencies

•Cache aware hybrid scheduling

•Collective communication primitives for Iterative MapReduce

•Support for traditional MapReduce and pleasingly parallel applications

•Ability to execute multiple MR applications inside a single iteration

•Dynamic scheduling achieving better load balancing

•Typical MapReduce fault tolerance, ensuring eventual completion of your computation

•Web based monitoring console

•Azure local emulator based local testing/debugging We are happy to provide support for scientific application developments using Twister4Azure.

http://salsahpc.indiana.edu/twister4azure/images/amr_arch.jpg

http://salsahpc.indiana.edu/twister4azure/images/amr_arch.jpg

Big Data needs to be given context or meaning based on 3 parameters

1. Timing

2. Location

3. Situational Intelligence

Such actionable context driven data can help in better informed decision making, be that clinical analytics or geospatial data or flood / tornado reports or sentiment analytics on social media. Examples – FreeBase, Google’s Image Labeler, Verbosity, Tag a Tune

In an ideal world, actionable data and statistical algorithms have to be integrated with human knowledge, so that human intelligence can then supersede the BI of a predictive analytics wrold.

Then again, sometimes its best to let the data tell the story….

Download - Understanding Big Data Services ~ Sharada Rao · 2013. 6. 18. · Social Networks Enterprise SOA Smart Devices Mobile Cloud Big Data There are many different measures of this phenomenon

Top Related