data lake benefits

11
Strategic Advisory Big Data – Cloud Analytics Info Strategy Fishing in the big data lake DATA EXPLORATION AND DISCOVERY ANALYTICS FOR DEEPER BUSINESS INSIGHTS

Upload: ricky-barron

Post on 28-Jul-2015

97 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Data lake benefits

Strategic  AdvisoryBig  Data  – Cloud   -­‐ Analytics

InfoStrategy

Fishing  in  the  big  data  lake

DATA  EXPLORATION  AND  DISCOVERY  ANALYTICS  FOR  DEEPER  BUSINESS  INSIGHTS

Page 2: Data lake benefits

InfoStrategy

What  is  a  “data  lake”

data  lake (plural data  lakes)A  massive,  easily  accessible  data  repository  built  on   (relatively)  inexpensive  computer  hardware  for  storing  "big  data".  Unlike  data  marts,  which  are  optimized  for  data  analysis   by  storing  only   some  attributes  and  dropping  data  below  the  level  aggregation,  a  data  lake  is  designed   to  retain  all  attributes,  especially  so  when  you  do  not  yet  know  what  the  scope  of  data  or  its  use  will  be.

http://en.wiktionary.org/wiki/data_lake

…  Enterprise  Data  Hub  sounds   too  boring   !

Page 3: Data lake benefits

InfoStrategy

Optimise  business  through  insights

Insight

Action

OptimiseMove  a  metricChange  a  productChange  behaviour/process

Hindsight

Realtime

Foresight

Trusted  informationAct  on  insights  gainedExecute  theories

Measure OutcomesSentimentFeedback

Explore  datasets,  discover  correlations,  patterns.Undiscovered  facts

Information  Value

Data  Volum

es

Forecasting,  planning  &  trendingStatistical  Analysis

Operational  reporting,  SCADA  controlAlerts  &  Events

Historical  reporting, Proof  of  operationRegulatory,  statutory,  financial

Uncover  previously  unknown   facts  

from  enriched  data  in  the  data  lake

Page 4: Data lake benefits

InfoStrategy

Future  state  of  analytics

Strategic  Intent

To  improve  BI  and  Analytical  capabilities  to  a  level  where  organisations  are  able  to  access  and  analyse  information  in  a  secure,  timely  and  cost-­‐effective  manner.

Gain  key  insights   to  optimise  the  operations  of  your  business,  predict  the  best  possible  outcomes  for  growth,  new  opportunities,   and  competitive  advantage  across  all  business   lines.

Mission  Statement

“Providing  advanced  analytics  capability  across  all  business  units,  empowering  our  people  with  the    processes  and  supporting  technologies  to  exploit  our  information  

assets  for  business  benefit.”

Target  Operating  Model  will  deliver:

Rapid  access  to  data  to  uncover  new  facts  via  advanced  data  exploration  and  discovery  analytics.

Clarity  of  who  is  responsible  and  accountable  for  maintaining  critical  information  assets  via  a  well  structured  governance  and  engagement  model.

A  trusted  and  highly  secure  source  of  data  for  all  analytical  information  requirements  via  a  data  quality  assurance  program.

Trawling  for  value  in  the  big  data  lake

Page 5: Data lake benefits

InfoStrategy

‘Fish  stocks’  are  replenished  from  existing  and  future  operational   systems  plus  external  sources

Core  Transactional  Data  “operational”

Management  Reporting

Unstructured  &  External  Data“contextual”

Enterprise  Dashboards

Reporting

Consolidation

Data  ScientistsBusiness  AnalystsBusiness  UsersCustomers

Data  Extraction

Discovery  Analytics  Platform

Visualisation

Analysis

Data  Preparation

Data  Collection

Operational  Reporting

Operational  Dashboards

Real-­‐time  Reports

Alerts  &  Exceptions

Embedded  BI

Production   Data  Repository“Data  Lake”

Inform

ation  Go

vernance Data  M

anagement

Supplier  &  Industry  Data“comparative”

Page 6: Data lake benefits

InfoStrategy

ConsolidatedManagement

Reporting

Operational

SupportingCapability

DiscoveryAnalytics

To  meet  the  demand  for  rapid  access  to  information  users  must  adopt  a  flexible  multi-­‐platform   architecture  

What  reporting  does  for  established  operations  …  discovery  analytics  does  for  new  business  development.

The  trend  within  industry  is  to  move  away  from  the  single-­‐platform  monolithic  data  warehouses  towards  a  physically  distributed  environment  for  information  delivery.  Many  businesses  are  extending  their  data  warehouse  environments  to  include  new  standalone  data  platforms  that  are  conducive  to  discovery  analytics.  A  holistic  view  is  maintained  via  a  common,  single  replicated  dataset  and  an  enterprise information  management  program,  governing  delivery  and  access  to  key  information  (data  lake).

Source   Applications

ERP

CRM

HR

Finance

Telemetry

Geospatial  GIS

Documents

Email

Files

Real-­time  Data  Capture

Cleansing

Loading

Data  Warehouse

Modelling

Relational  DW

Data  Marts

Analysis  Cubes

Analytics Delivery

Cloud-­based    Service  Model

Actuarial  Applications

Event-­Based  Applications

Reporting

Production  Reporting

OLAP  Analytics

Ad  Hoc  Query

ExternalData

Exploration  &  Discovery

Metadata  Integration

Event  Processing Results

Detailed  Datasets Results  

Collection  and  blending Insights

Portal

PDF

Desktop

Guided  Visualisation

Mobile  BI

Active  Dashboards

Data  R

eplication

Historical

Data  Preparation

Storytelling

Information  Governance

Operational  Reporting  

Dimensional  Modelling

ProductioniseInsights

Page 7: Data lake benefits

InfoStrategy

Principles:   Easier  access  information   to  discover  new  facts  about  the  business.

◦ Described  as  a  ‘sandpit’  environment,  providing   the  ability  to  explore  and  discover  new  facts  about  the  business,  it’s  members  and  customers,  partners  and  competitive  pressures.

◦ Also  used  for  testing  a  hypothesis  or  running  scenarios  across  the  data◦ Getting  answers  to  ‘one-­‐off ’  questions  which  are  not  addressed  through   the  normal  

published,  scheduled  operational  reporting  channels

◦ Data  is  replicated  from  all  operational  systems  into  a  single  landing  area,  ensuring  traceability  and  reconciliation  to  all  consuming  applications,  such  as  the  data  warehouse,  analytical  application,  and  other  business  applications.

◦ Clearly  defined  critical  business  entities/records  are  synchronised  (or  Mastered)  across  all  applications  eliminating  duplication  and  confusion.  Data  quality  attributes  are  defined  and  managed  for  each  critical  business  entity.

◦ A  fully  integrated  Member/Customer  view  is  established  across  both  analytical  and  transactional  applications.

◦ Using  the  replicated  data  to  build  more  dynamic  analytical  data  structures  for  scheduled  production   reporting  and  ah-­‐hoc  analysis

◦ Provide  users  with  the  tools  to  access    and  analyse data,  freely  explore  current  and  new  datasets,  and  visualise patterns  and  discoveries  to  gain  deep  insights.

Providing  business  users  with  direct  access  to  data  to  meet  immediate  

information  needs  where  the  accuracy  of  the  data  is  not   the  

primary  objective.  

Having  a  single  source  of  truth  across  all  business  applications  at  

detailed  level  from  which  all  information  requests  are  satisfied.

Improved  environment   for  more  cost  effective  and  faster  business  

intelligence  delivery.

Provide  business   users  with  the  ability  to  access  production   information  directly,   collect  it  as  needed,  and  prepare  the  data  for  analysis.  Exploring   the  data  to  uncover  previously   unknown   facts  about  the  business,   and  sharing  those  facts  visually  with  others.  Enrich  production  data  with  external  “context”  to  extend  insights.

Key  Principles Description

Page 8: Data lake benefits

InfoStrategy

Benefits   of  Discovery  Analytics   versus  traditional   data  warehousing

Classic  Data  Warehouse  Issues Discovery  Analytics Benefit

Lengthy  IT  Backlog  and  lack  of  resources  to  extend the  EDW  to  support  new  business  requirements.

Data  can  be  explored  and  analysed  outside  of the  EDW  environment  before  it  is  put  into  production  use.

High  costs  of  supporting increasing  data  volumes  and  new  types  of  data.

Data  can  be  filtered  and  transformed  before  it  is  loaded  into  the  EDW

Lack  of  flexibility  in  the  EDW  data  model  to  support  constantly changing  business  requirements.

Data  discovery  support  dynamic  schema  on  read  approach  which reduces  the  need   for  detailed  up-­‐front  modelling.

Need  to  have  data  quality  and  governance  processes  in  place  before  user  can  access  the  EDW  data.

The  investigative  nature  of data  discovery  has  lower  data  quality  and  governance  requirements

Growing  use  of  personal  data  marts to  overcome   IT  barriers  and  the  performance  overheads  of  ad  hoc  processing

The  flexibility  and  performance  of  data  discovery  encourages  shared  use  of  data  and  analytics.

Recent  proof  of  concept  for  Discovery  Analytics  in  the  cloud  (AWS),  has  provided  some  considerable  cost  &  time  savings  in  infrastructure  and  hosting,  viz.:

$55  per  day  to  host  a  960GB  data  warehouse  $32  per  day  to  host  a  Data  Integration  server  AND  a  BI  server.

2.5  weeks  to  setup  POC  environment  and  start  analysis  and  visualising  results.

Page 9: Data lake benefits

InfoStrategy

Discovery  Analytics   Target  POC  Architecture

Structured  Data

Unstructured  Data

ERP

Telemetry

Web/External

Replication  of  corporate  data,  enriched  with  external  data  and  content,  available  in  a  centrally  available  and  scalable  repository  ready  for  exploration,  discovery   and  predictive  analysis  to  gain  deep  insights   and  actionable  results.

Page 10: Data lake benefits

InfoStrategy

Fishing  safely  with  the  appropriate   life  vests  is  important  too.Security  and  data  management  standards  are  available

International  Standard  on  Assurance  Engagements

Service  Organisation  Control  framework

Federal  Information  Management  Security  Act

Payment  Card  Industry  –Data  Security  Standard

Federal  Information  Processing  Standard

International  Standards  Organisation  –Information  Security  Standard

Source:  Amazon  Web  Services

Page 11: Data lake benefits

InfoStrategy

To  learn  more  about  how  InfoStrategycan  help  you  develop  your  big  data  strategy  to  solve  your  big  business  problems,  or  to  arrange  a  Proof  of  

Concept,  please  contact  us  today  using  the  details  below.

InfoStrategy Pty  Ltd246  Oxford  St,  BalmoralQueensland  4171Australia

Tel:  +61  7  3151  2021Email:  [email protected]