draft positioning data discovery for greater impact october 2014
TRANSCRIPT
DRAFT
Positioning Data Discovery for Greater Impact
October 2014
DRAFT
Agenda
Department of Public Welfare Data Analytics Landscape
Positioning Endeca
Enablement Highlights and Outcomes
Future Roadmap
Questions & Discussion
Landscape
DRAFT
Department of Public Welfare
DRAFT
EDW - Landscape
Technology: Cognos 10.2, Informatica 9, Oracle 11G
• Office Income Maintenance (OIM)
• Pennsylvania Insurance Department (PID)
EDW
DW
• Pennsylvania Department of Education (PDE)
DW
• Office of Medical Assistance Program (OMAP)
• Office of Children, Youth and Families (OCYF)
• Office of Children, Youth and Families (OCYF)
• Office of Child Development & Early Learning (OCDEL)
• Office of Developmental Programs (ODP)• Office of Long Term Living (OLTL)• Office of Medical Assistance Program (OMAP)• Office of Mental Health and Substance Abuse
Services (OMHSAS)
Technology: Cognos 7, Decision Stream, Oracle 10G
Technology: Cognos, Informatica, Oracle 10G
PIMS Bridge
(OCDEL-PDE)
Enterprise Incident Management• ODP• OLTL
DRAFT
Investment in Information Management
Stage 2:
Stage 3:
Stage 4:
Stage 1:
What might happen?
StaticReporting
Business IntelligenceAnalytics
Advanced AnalyticsData Gathering
Str
ateg
ic i
mp
act
What is available?
Pre-defined Reporting: • Prompt reports• Scheduled
reporting
Ad hoc capabilities:• Self service reports• OLAP cubes
Monitoring KPIs:• Dashboards• Scorecards
Predictive Analytics:•Incident prediction•Financial forecasting•Service effectiveness• Fraud detection and prevention
Mobile Analytics:•Alerts•On-the-go Metrics
Why is it happening?
What is happening?
Business Analytics Capabilities
Positioning Data Discovery
DRAFT
Key Drivers for Data Discovery
Challenges Details
Data Tsunami & Unpredictability Critical data is being collected at an unprecedented scale from varied sources driving
up analytics complexities Data volumes and integration efforts are roadblocks to insights
Value Proposition Value of data explodes when it is linked with other data for correlations Collecting detailed recipient service measures ensures quantifying program impact
Greater Community Impact
Goal being to provide a better quality of life to each person in a shorten timeframe. Help propagate the design of high impact programs for current and future recipients Drive actions for prevention of child abuse 360-degree view of clients, families and providers to supplement the department's
mission
Positioning Endeca
DRAFT
Oracle Endeca
Consulting
• In-memory architecture and innovative caching deliver extreme performance
• Powerful text analytics extracts key themes and sentiments
• Support for sentiment analysis in 10 languages, localization in 13, and search and self-service term extraction in 33+ enable truly global analytics
• Sophisticated data integration and ETL streamline access to enterprise sources, including Oracle Business Intelligence
• Agile, data-driven approach requires no up-front modeling, for fast time to value
Deep Text Analysis
Enterprise Data
Discovery
In Memory Analytics
Robust Data Integration
Oracle Endeca
Self Service Discovery
• Easily create, configure, and securely share discovery applications within the context of enterprise governance and security
• Upload information from a wide array of self service sources including Excel, JSON, and any data source accessible via JDBC
• State-of-the-art search and guided navigation surface insights with a click
• Live data enrichment allows users to enhance analytics in the moment
Endeca is a complete solution for agile data discovery across the enterprise, empowering business user independence in balance with IT governance. Endeca offers fast, intuitive access to both traditional analytic data and non-traditional data, including external and unstructured information.
DRAFTFragmented Source Compilation
Web Sentiment Analysis
Self-Service Enablement
Transactional/Stage Data Discovery
Unstructured/Semi-Structured Data Analysis
Endeca allows for the ingestion of unstructured and semi-structured data and provides analytics capabilities to uncover hidden trends and details
Endeca allows for applications to be created directly on source and stage data which help Program Office Business Analyst’s slice and dice information to uncover previously un-realized questions to complement enterprise reporting requirements
Endeca allows for rapidly assimilation of data from multiple sources to garner an executive view of the data from across multiple data stores
Its capability for Program Office Business Analyst’s to upload diverse data for snapshot analysis with minimal dependence on IT for basic data setup and support
Ability to setup web crawls for gathering data and provision online sentiment analysis which could potentially lead to drawing correlations with enterprise data
Value Proposition & Applications
DRAFT
Perceived Benefits
1. Fragmented Source Compilation
• Combining EDW, OCYF, and CY48 data allowed program offices to drill into causes for heightened days for investigation and expose potential reasons for bottlenecks
• OIM compilation of demographics, census, and CQCCOM service information helped draw a holistic view of the recipients
2. Advanced Analytics
• Sentiment analysis of structured and un-structured data which includes whitelist tagging and text extractions, alongside spreadsheet consumption and visualization
• Built-in mapping and advanced visualization engines like tag clouds and capabilities for negative refinements
3. Data Validations
• Provisioning access to view data captured by OCYF enabled a window into potential future enterprise reporting needs
• Access to previously unavailable SAMS, eCIS, and HCSIS transaction data
4. Delivery Cycles
• Typical delivery cycle for an Endeca project is 8-12 weeks with a 16-20 week update cycle based on end-user feedback for required enhancements
• 2-4 week cycles for applications built using self-service for a quick window into the data
Enablement
DRAFT
DPW Enablement
Objectives
The objectives being targeted with the initial 25 user enablement:
Enterprise-wide Adoption• Uncover the potential landscape for the application of Endeca within the department• Determine use and adoption of Endeca and the concept of data discovery across program offices
Concept Positioning• Build the utilization of the complete set of Endeca’s standard capabilities • Blend its use within the existing Business Intelligence/Data Analytics Landscape
Solution Scalability• Determine factors to be considered during deployment within the Enterprise for a significant user-base• Document governance for People, Process, and Technology considerations encompassing rollouts
DRAFT
FY 2013 – Dec
FY 2014 – Jan
FY 2014 – Feb
FY 2014 – Mar
FY 2014 – Apr
FY 2014 – May
FY 2014 – Jun
FY 2014 – Jul
FY 2014 – Aug
FY 2014 – Sept
FY 2014 – Oct
Basic Install, capability demonstration and Self-Service Enablement
Wave 1 (Initiation)
Configurations, assessment and initiate attempts to build end-user
content for program offices
Wave 2
Gain targeted adoption and consensus for an enterprise rollout
Wave 3
Imp
lem
enta
tio
n T
imel
ine
Basic Install, capability demonstration and Self-Service Enablement
Wave 1 (Initiation)
Configurations, assessments and initiate attempts to build end-user content for
program offices
Wave 2
Gain targeted adoption and consensus for an enterprise rollout
Wave 3
Wave 1 Lessons Learned
Phase 1 Phase 2
Executive Touch Points
Timelines & Targets
DRAFT
15
The development of the self service applications for the program areas resulted in common themes across the program offices.
Findings – Data/Application Rendition
• Allows for rendering previously unavailable data for mining and analysis
• Provides access to unstructured and fragmented data
• Allows for the ability to include traditional and non-traditional sources
• Gaps and limitations that warrants governance through maintenance cycles
Benefits • Exposed fraudulent activity to drive cost savings• Exposed issues with data quality and
corresponding business analysis implications• Showed previously unknown information and
sentiments captured within comments• Shortened build cycles of 2-4 weeks for
demos/POCs• Accelerated end user delivery of feedback and
enhancements• Ability to decide if POC should be developed into
ongoing report• 8-12 week production application delivery
alongside total 16-20 week window for incorporating end-user driven enhancements
Key Findings and Benefits
DRAFT
Advanced visualizations like geo-spatial maps allowed for a simplified user-experience in uncovering insights
Advanced Visualization & Data Mashup
Data Mashups allowed for merging and drawing comparisons across internal & external data sources
DRAFT
Negative Refinement
Review of SNAP transactions for the month. Appears most transactions occur within our state.What happens if we remove PA and border states?
DRAFT
Negative Refinement (cont.)
Information appears that we may not have known. We see transactions occurring outside of PA and bordering states.Opportunity for further evaluation and discovery on that information.
DRAFT
Capabilities for Tag Cloud highlights and Summarizations drive Advanced Analytics
Advanced Analytics
Ability to house vast amounts of data within domains propagated “big data” mining and exploration
Ability to quickly perform a ‘negative’ refinement. Remove the big number to see what remains and may discover new unknowns.
DRAFT
Big Data Mining
Ability to house roughly 100 million records within a single domain provisioned capabilities to mine otherwise unusable data resulting in fraud prevention and summarized reporting
DRAFT
21
While the current applications were created by IT, there is an ability to transition development to program office users based on the vision of the rollout.
Future Vision
End-User/Program Office Driven Self-Service
50%
-
50%
BIS/ IT Driven Self-Service
Program Office/ End-User Driven Self-Service (10% Utilization)
• Technical/Super users within program office currently driven to utilize capability
• Limited time/effort availability and tool or conceptual knowledge gaps • Challenges with utilizing self-service capabilities
Considerations to increment adoption: Endeca training (Train the trainer), Identify program office FTEs developers, Re-use content across applications
IT Supported Self-Service (90% Utilization)
• Conducted initial conversations with program offices for insights into challenges with data availability and analysis
• Built out drafts to highlight possibilities leveraging Endeca• Follow-up sessions with program office stakeholders to finalize application
layouts and drive long-term value• Governance for environment stability and functionality deliverance
Considerations for decreasing involvement: Involvement just with alleviating roadblocks, Augmenting re-usable content (e.g. Blacklists)
Current State & Future Vision
DRAFT
22
Long Term Concept Positioning
IT/BIS Supported
Self-Service
Program Office/ End-User
Driven Self-Service
Future
Governed use of Self-Service for snapshot
analysis
Automated Endeca Production Applications for
Regular Use
Uncover use-cases/KPIs for Enterprise Reporting
through Cognos
DRAFT
Collaborative Project Delivery
Future Roadmap
DRAFT
25
Future High Level roadmap
2. Expand Deployment to Program Areas for Self Service Apps
Today Tomorrow
3. Deployed to All Program Offices;
100+ Users
4. Scale Users & Data Volumes; Expand Self Service Apps
Future
4a. IT Provisioned Applications to Program Offices
1. Production Pilot
25 NUP
5. Enterprise Wide Adoption
DRAFT
26
Current Configuration
Current (Test & Development) Configuration
Test/Dev Configuration
Studio Server Endeca Server
Integration Suite Server + Text Enrichment & Sentiment Analysis
User count- Up to 25 users
Server Configuration- Up to 4 cores- 8 GB RAM minimum16GB+ recommended
Server Configuration- Up to 8 cores- 64 GB RAM minimum128GB+ recommended
User count- Up to 25 users
Server List
DRAFT
Enterprise Configuration
Server List
Studio Server Endeca Server
Integrator Server
OVM
User count- Up to 100 users
Exalytics Hardware Platform- 40 total cores
-Hard partitioning allows you to only license what you need
- 2 TB of RAM- 2.4 TB of Flash Disk
Server Configuration-Up to 8 cores- 64GB RAM
User count- Up to 100 users
Server Configuration-Up to 8 cores- 128 GB RAM
Server Configuration-Up to 24 cores- up to 1.856 TB RAM
Option 1
Perceived Future State
DRAFT
Estimated Sizing
Server ListPros• Improved end user experience and productivity• Efficiently leverage the power of Exalytics by licensing
100% of the server
Cons• No room on Exalytics for future growth• Single points of failure at the Studio / Endeca Server
tiers
Potential Outcomes
DRAFT
Integration Suite Server + Text Enrichment & Sentiment Analysis
Endeca Server Cluster Node 1
Studio Server
User count- up to 150 Users
User count- up to 150 Users
Server Configuration- 4-8 cores- 8 GB RAM minimum64GB+ recommended
Configuration- Up to 24 cores- Up to 2 TB of RAM- 2.4 TB of Flash Disk
OVM
Endeca Server Cluster Node 2
Perceived Future State
Option 2
DRAFTEstimated Sizing
Pros• Clustered design removes single points of failure• Enable Consistent, Stable, & Scalable Application • Room to Grow on each server, supporting Future Growth• Greater user adoption and experience • High Availability for Business Continuity
Cons• Clustered design makes CPU studio pricing for unlimited
users less attractive
Enterprise Configuration
Potential Outcomes
DRAFT
Questions?