data lake benefits
TRANSCRIPT
Strategic AdvisoryBig Data – Cloud -‐ Analytics
InfoStrategy
Fishing in the big data lake
DATA EXPLORATION AND DISCOVERY ANALYTICS FOR DEEPER BUSINESS INSIGHTS
InfoStrategy
What is a “data lake”
data lake (plural data lakes)A massive, easily accessible data repository built on (relatively) inexpensive computer hardware for storing "big data". Unlike data marts, which are optimized for data analysis by storing only some attributes and dropping data below the level aggregation, a data lake is designed to retain all attributes, especially so when you do not yet know what the scope of data or its use will be.
http://en.wiktionary.org/wiki/data_lake
… Enterprise Data Hub sounds too boring !
InfoStrategy
Optimise business through insights
Insight
Action
OptimiseMove a metricChange a productChange behaviour/process
Hindsight
Realtime
Foresight
Trusted informationAct on insights gainedExecute theories
Measure OutcomesSentimentFeedback
Explore datasets, discover correlations, patterns.Undiscovered facts
Information Value
Data Volum
es
Forecasting, planning & trendingStatistical Analysis
Operational reporting, SCADA controlAlerts & Events
Historical reporting, Proof of operationRegulatory, statutory, financial
Uncover previously unknown facts
from enriched data in the data lake
InfoStrategy
Future state of analytics
Strategic Intent
To improve BI and Analytical capabilities to a level where organisations are able to access and analyse information in a secure, timely and cost-‐effective manner.
Gain key insights to optimise the operations of your business, predict the best possible outcomes for growth, new opportunities, and competitive advantage across all business lines.
Mission Statement
“Providing advanced analytics capability across all business units, empowering our people with the processes and supporting technologies to exploit our information
assets for business benefit.”
Target Operating Model will deliver:
Rapid access to data to uncover new facts via advanced data exploration and discovery analytics.
Clarity of who is responsible and accountable for maintaining critical information assets via a well structured governance and engagement model.
A trusted and highly secure source of data for all analytical information requirements via a data quality assurance program.
Trawling for value in the big data lake
InfoStrategy
‘Fish stocks’ are replenished from existing and future operational systems plus external sources
Core Transactional Data “operational”
Management Reporting
Unstructured & External Data“contextual”
Enterprise Dashboards
Reporting
Consolidation
Data ScientistsBusiness AnalystsBusiness UsersCustomers
Data Extraction
Discovery Analytics Platform
Visualisation
Analysis
Data Preparation
Data Collection
Operational Reporting
Operational Dashboards
Real-‐time Reports
Alerts & Exceptions
Embedded BI
Production Data Repository“Data Lake”
Inform
ation Go
vernance Data M
anagement
Supplier & Industry Data“comparative”
InfoStrategy
ConsolidatedManagement
Reporting
Operational
SupportingCapability
DiscoveryAnalytics
To meet the demand for rapid access to information users must adopt a flexible multi-‐platform architecture
What reporting does for established operations … discovery analytics does for new business development.
The trend within industry is to move away from the single-‐platform monolithic data warehouses towards a physically distributed environment for information delivery. Many businesses are extending their data warehouse environments to include new standalone data platforms that are conducive to discovery analytics. A holistic view is maintained via a common, single replicated dataset and an enterprise information management program, governing delivery and access to key information (data lake).
Source Applications
ERP
CRM
HR
Finance
Telemetry
Geospatial GIS
Documents
Files
Real-time Data Capture
Cleansing
Loading
Data Warehouse
Modelling
Relational DW
Data Marts
Analysis Cubes
Analytics Delivery
Cloud-based Service Model
Actuarial Applications
Event-Based Applications
Reporting
Production Reporting
OLAP Analytics
Ad Hoc Query
ExternalData
Exploration & Discovery
Metadata Integration
Event Processing Results
Detailed Datasets Results
Collection and blending Insights
Portal
Desktop
Guided Visualisation
Mobile BI
Active Dashboards
Data R
eplication
Historical
Data Preparation
Storytelling
Information Governance
Operational Reporting
Dimensional Modelling
ProductioniseInsights
InfoStrategy
Principles: Easier access information to discover new facts about the business.
◦ Described as a ‘sandpit’ environment, providing the ability to explore and discover new facts about the business, it’s members and customers, partners and competitive pressures.
◦ Also used for testing a hypothesis or running scenarios across the data◦ Getting answers to ‘one-‐off ’ questions which are not addressed through the normal
published, scheduled operational reporting channels
◦ Data is replicated from all operational systems into a single landing area, ensuring traceability and reconciliation to all consuming applications, such as the data warehouse, analytical application, and other business applications.
◦ Clearly defined critical business entities/records are synchronised (or Mastered) across all applications eliminating duplication and confusion. Data quality attributes are defined and managed for each critical business entity.
◦ A fully integrated Member/Customer view is established across both analytical and transactional applications.
◦ Using the replicated data to build more dynamic analytical data structures for scheduled production reporting and ah-‐hoc analysis
◦ Provide users with the tools to access and analyse data, freely explore current and new datasets, and visualise patterns and discoveries to gain deep insights.
Providing business users with direct access to data to meet immediate
information needs where the accuracy of the data is not the
primary objective.
Having a single source of truth across all business applications at
detailed level from which all information requests are satisfied.
Improved environment for more cost effective and faster business
intelligence delivery.
Provide business users with the ability to access production information directly, collect it as needed, and prepare the data for analysis. Exploring the data to uncover previously unknown facts about the business, and sharing those facts visually with others. Enrich production data with external “context” to extend insights.
Key Principles Description
InfoStrategy
Benefits of Discovery Analytics versus traditional data warehousing
Classic Data Warehouse Issues Discovery Analytics Benefit
Lengthy IT Backlog and lack of resources to extend the EDW to support new business requirements.
Data can be explored and analysed outside of the EDW environment before it is put into production use.
High costs of supporting increasing data volumes and new types of data.
Data can be filtered and transformed before it is loaded into the EDW
Lack of flexibility in the EDW data model to support constantly changing business requirements.
Data discovery support dynamic schema on read approach which reduces the need for detailed up-‐front modelling.
Need to have data quality and governance processes in place before user can access the EDW data.
The investigative nature of data discovery has lower data quality and governance requirements
Growing use of personal data marts to overcome IT barriers and the performance overheads of ad hoc processing
The flexibility and performance of data discovery encourages shared use of data and analytics.
Recent proof of concept for Discovery Analytics in the cloud (AWS), has provided some considerable cost & time savings in infrastructure and hosting, viz.:
$55 per day to host a 960GB data warehouse $32 per day to host a Data Integration server AND a BI server.
2.5 weeks to setup POC environment and start analysis and visualising results.
InfoStrategy
Discovery Analytics Target POC Architecture
Structured Data
Unstructured Data
ERP
Telemetry
Web/External
Replication of corporate data, enriched with external data and content, available in a centrally available and scalable repository ready for exploration, discovery and predictive analysis to gain deep insights and actionable results.
InfoStrategy
Fishing safely with the appropriate life vests is important too.Security and data management standards are available
International Standard on Assurance Engagements
Service Organisation Control framework
Federal Information Management Security Act
Payment Card Industry –Data Security Standard
Federal Information Processing Standard
International Standards Organisation –Information Security Standard
Source: Amazon Web Services
InfoStrategy
To learn more about how InfoStrategycan help you develop your big data strategy to solve your big business problems, or to arrange a Proof of
Concept, please contact us today using the details below.
InfoStrategy Pty Ltd246 Oxford St, BalmoralQueensland 4171Australia
Tel: +61 7 3151 2021Email: [email protected]