high performance analytics and the challenges of big data

31
Copyright © 2012, SAS Institute Inc. All rights reserved. High Performance Analytics and the Challenges of Big Data Toronto Area SAS Users Group 12 Dec 2014 Charu Shankar, SAS Technical Training Specialist

Upload: others

Post on 22-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

High Performance Analytics and the Challenges of Big Data

Toronto Area SAS Users Group12 Dec 2014

Charu Shankar, SAS Technical Training Specialist

Page 2: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

What is Big DataThriving in the Big Data eraOur Perspective – the Analytics Gap1.1. Volume1.2 Variety1.3 Velocity

2.1 Problem #1 Data Prep time part of problem2.2 Problem #2 Shortage of talent2.3 Problem #3 Our working ways don’t help

3.1 Some definitions3.2 What can data mining models tell us?3.3 How can HPA help?Questions

Agenda

Page 3: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making

Big Data is RELATIVE not ABSOLUTE

What is BIG DATA

Page 4: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

VOLUME

VARIETY

VELOCITY

VALUE

TODAY THE FUTURE

DA

TA

SIZ

E

THRIVING IN THE BIG DATA ERAThriving in the BIG DATA era

Page 5: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

Most organizations:

� Can’t generate the information they need.

� Can’t generate information fast enough to act on it.

� Continue to incur huge costs due to uninformed decisions and misguided strategies.

The opportunities afforded by analytics have never been greater.

THE ANALYTICS GAPOUR

PERSPECTIVE

Page 6: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

Data is a corporate asset yet org are not leveraging the asset like they do labour & capital assets they normally have.

Does this look familiar?

Page 7: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

Data is no longer in megabytes or gigabytes

We’re talking PetabytesAnd that is 10 15

1.1 VOLUME

Page 8: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

• If the average MP3 encoding for mobile is around 1MB per minute, and the average song lasts about 4 minutes, then a petabyte of songs would last over 2,000 years playing continuously.

• If the average smartphone camera photo is 3MB in size and the average printed photo is 8.5 inches wide, then the assembled petabyte of photos placed side by side would be over 48,000 miles long - almost long enough to wrap around the equator twice.

• 1 petabyte is enough to store the DNA of the entire population of the US – and then clone them, twice.

Putting a Petabyte in perspective

Wes Biggs, chief technology officer at Adfonic

Page 9: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

Big data on social media

73% of online adults use a social networking site of some kind

684 million daily active users on Facebook

500 million tweets per day in 2013

Page 10: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

The New

LinkedINTwitterInstagramTumblrGoogle+VineOoovooAsk.fmYik YakWhatsAppWhisperYIKES!

The Old

Print Media TelevisionRadio

And it was only a 1-way monologue

1.2 Variety – And this is a real life experience

Page 11: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

VELOCITY

1.3 Velocity. Big data is coming at high velocity. Are you Ready ?

Page 12: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

IDENTIFY /

FORMULATE

PROBLEM

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATEMODEL

DEPLOYMODEL

EVALUATE /MONITORRESULTS

THE ANALYTICS LIFECYCLE

Data is the number one challenge in the adoption or use of business analytics.

Companies continue to struggle with data accuracy, consistency, and even access.

Bloomberg BusinessWeek Survey 2011

• Consumes up to 80% of the project

• Specific to the data and the analysis

DATAPREPARATION

2.1 Problem #1 Data Prep time part of problem

Page 13: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

A single electronic medical record (EMR) system from one cancer center showed lab results for Albumin, a protein measured in cancer patients, in over 30 ways.

Page 14: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

2.2 Problem #2 – Shortage of Talent

Page 15: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

2.2 Problem #2 – Shortage of Talent

Who is a data scientist?

Page 16: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

2.3 Problem #3 – Our working ways don’t help

Page 17: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

2.2 Problem #2 – Our working ways

Page 18: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

1. HPA is the ability to rapidly perform complex analysis on big data, enabling you to solve problems that you thought were unsolvable. HP on the front of

a proc. 2. HPA Server - lifts data into memory. When it sees HP PROC it splits into

worker nodes to split up sorting data, summarizing data, and even the sort

it splits up to do the work parallely

3. SAS VA provides a drag and drop web interface to enable you to quickly explore huge amounts of data.

4. Hadoop Think of it as an infinitely expandable filing cabinet 5. That has the ability to help you summarize

what is stored in it5. SAS LASR Server - is part of HPAS(High performance analytic server). Its

role is to push data into Memory.

3.1 Some definitions

Page 19: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

� Data Mining Models

� Which products are customers likely to buy?

� Which workers are likely to quit/resign/be fired?

� Text Models

� What are people saying about my products and services? Can I detect emerging issues from customer feedback or service claims?

� Forecasting Models

� How many products will be sold this year, next year?

� How does this break down into each product over the next 3 months, 6 months?

� Operations Research

� What is the optimal inventory and stock to be held of each of the products to minimize out of stock and overall holding costs?

� What is the least cost route for transporting goods from warehouses to final destinations? (PRESCRIPTIVE)

3.2 What questions should we be asking?

Page 20: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

Range penetration -

salary level compared to peers

3.2 What can data mining models tell us?

Page 21: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

TELCO -cust satisfaction at a telco, wait time is imp, then I might take action to put best customers head of the line. I can influence cust satisfaction by understanding underlying factors & then taking action to influence purchasing behaviour.

HEALTH -The next cure for cancer lies in big data. If we had a way to track, monitor, store & retrieve cancer patients’ way of life, we would be able to draw inferences to lead us to cure.

The value of harvesting big data in different industries

Page 22: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

example-HPA in unemployment statistics

Saskatchewan-5%Alberta - 4.5%Ontario - 7.9% Looks like labour doesn't move easily.

Page 23: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

HPA value another exampleMore labour economics, this time about your work. The Data Scientist

EMC Survey 65% of the respondents expect demand for data scientists to outstrip availability over the next five years

Page 24: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA Help?

Page 25: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA Help?

Page 26: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA Help?

Page 27: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA Help?

Page 28: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA Help?

Page 29: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

3.3 How can HPA Help?

Page 30: High Performance Analytics and the Challenges of Big Data

Copyright © 2012, SAS Institute Inc. All rights reserved.

Key Takeaways of working with big data using HPA

• Working with entire data no longer just a sample

• Leverage real time data access

Page 31: High Performance Analytics and the Challenges of Big Data

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .sas.com

Thanks for attendingQUESTIONS???

Charu Shankar, SAS institute Inc.

BLOG http://blogs.sas.com/content/sastraining/author/charushankar/

LINKEDIN http://ca.linkedin.com/in/charushankar

TWITTER https://twitter.com/CharuSAS

EMAIL [email protected]