Big Data in The Cloud: Architecting a Better Platform

Download Big Data in The Cloud: Architecting a Better Platform

Post on 05-Aug-2015

577 views

Category:

Technology

4 download

Embed Size (px)

TRANSCRIPT

<p> 1. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Welcome! Big Data in The Cloud: Architecting a Better Platform Brian Kinlaw, Principal Solution Architect, CSC 2. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Todays Presenters Brian Kinlaw Principal Solution Architect CSC Emerging Business Group Leads the initiation, development and execution of Big Data, Analytics, Social Media, Mobile, Cloud, Cyber Security, and Internet of Things (IoT) solutions for the Office of the CTO. 3. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Agenda I. CSC BDPaaS Overview II. CSC Approach III. BDPaaS Architecture IV. BDPaaS Security V. Questions &amp; Answers 4. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Rapidly Evolving Analytics Landscape BIG DATA 1.0 (EDW/BI) BIG DATA 3.0 (OPEN SOURCE / NEXT GEN) KEY CHARACTERISTICS Relatively Small, Structured Data Sets Proprietary RDBMS Internally Sourced / Small Teams Reactive Reporting Mechanisms Introduction of Unstructured Data Sources New In-Memory Analytic Capabilities Data Scientists Emerge Ad-hoc Reporting Becoming Pervasive Seamless Blend of Traditional Analytics and Big Data Heavily Open Sourced Reporting Becomes Predictive &amp; Influence Business Process Change REPRESENTATIVE TECHNOLOGIES IBM DB2, Oracle DB, IBM Cognos, SAP Business Objects, Oracle BI, Informatica IBM Netezza, HP Vertica, Oracle Exadata &amp; Exalytics, Teradata, Pivotal Greenplum Cloudera Hadoop, Hortonworks Hadoop, Spark, Storm, Kafka, Tableau, Pentaho POTENTIAL BUSINESS ROI Low-Medium Medium Very High CUSTOMER SKILLS/TALENT Bulk of Talent Today Talent Investments Required High Demand Talent BIG DATA 2.0 (ANALYTIC APPLIANCES) DETERMINING VALUE SECURITY &amp; COMPLIANCESKILLS &amp; CAPABILITIES 32%30%65% The Market is Here Today Yet Challenges Remain 5. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 CSC BIG DATA &amp; ANALYTICS: WE ARE UNIQUELY POSITIONED TO ADD VALUE Technology Expertise Working with Hadoop since its Creation Faster Time to Value Deliver a Big Data Platform in 30 Days Enterprise Security Data, Application, Platform Security and Compliance SHAPE TRANSFORM MANAGEMENT AS A SERVICE DIFFERENTIATION: OUR UNIQUE STRENGTHS FIVECORE OFFERINGS Analytics aaSBig Data Analytic Insights Big Data Strategy Big Data Platform Innovation Big Data Platform aaS STRATEGY ANALYTICS PLATFORMS INDUSTRY ACCELERATOR S Product Innovation: Optimize product mix &amp; feature set to improve revenue by 25-30% Customer Intelligence: Identify innovative new revenue channels up to 2x revenue increase Smart Operations: Improve operating margins ~60% thru efficiency and quality improvements Risk Insights: Reduce fraudulent activity by up to 75%, avoid millions in cost &amp; exposure Revenue Enhancers Profit Enhancers 6. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Client Value Achieved Prioritized Roadmap of Initiatives to Achieve Growth Vision within 2-3 years: BU Growth from $200M to $1B Through Analytic Insights Client Value Achieved 331% ROI Payback Period of 2.1 Months 2% Yield Improvement = $300M Client Value Achieved Reduced time to onboard customers by 80% Improved visibility on service levels Increased customer satisfaction Client Value Achieved BSL Met Strategic Objective (ITaaS) Reduced Costs by 20% Improved Analytic Cycle Time by 50% Client Value Achieved Access to Information in Minutes versus Weeks Speed: Solution Deployed within Days Access to Key Next Gen Talent Client Value Achieved Speed to Market: 30 Days to Platform, 60 Days to Full Working Mobile Telematics Application Flexible Deployment Options Achieving Real Business Value With Our Clients Integrated data for ~100M people from 40 member companies Healthcare Maximized diamond company profitability through BI and analytics Wholesale Railway punctuality improved from 92% to a world-leading 96% Transportation Reduced tax evasion and litigation through DW and predictive modeling Government 16% increase in claims fraud investigations for significant ROI in 6 months Insurance Performance optimization and analytical insights into POS and sales trends Retail/CPG $10M reduction in annual operating expenses Printing Customer intelligence lifetime value model driving marketing and customer service Travel &amp; Leisure Use of sensor data for real- time management of mining and mfg. ops and maintenance Natural Resources Comprehensive global view of exposure in near real time Banking Global Insurance Company 7. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 RISK RESULT Structuring all data at the point of ingestion Schema on Write vs Schema on Read Significant upfront expense ( and $$) for planning Significant expense ( and $$) to adapt to changes/needs of the business Data silos Disparate information streams Reduced ability to obtain requirements from entire business Does not allow for holistic decisions to be made No golden source of truth Proprietary/custom data warehousing/infrastructure Expensive Non standard to environment Scale Not economically feasible Not technically possible Risk to Traditional Data Model the status quo 8. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Risk of Transforming to a Big Data Business RISK RESULT Numerous different technologies Hard to select the best tool without specific experience with these technologies Lack of Big Data specific expertise Unreasonable expectations without having done it before R&amp;D in Big Data is lost or as time permits Scope creep is common Learning as your go Immature Big Data Technologies Compliance risk Security Risk Complex deployments Complex integrations between technologies High operational costs Large CapEx expenditure Buying upfront growth More complex to scale Big Data &amp; Analytic systems should be a tool to enable companies with better information and insights, not a roadblock 9. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 1. Implementation Complexity Integration Speed 2. Operation 3. Data Science Business Relevance Feedback loop 4. Talent Robust &amp; Scalable Monitoring &amp; Automated Alerts Operational Big Data Risks The right talent at the right time 5. Infrastructure Upfront - CapEx investment Iterative Flexibility Matching Hardware to Software 10. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 A New Mitigation Strategy Big Data Platform-as-a-Service Operation Managed to your SLA needs Global delivery teams and support Integrated testing Implementation DevOps infrastructure-as-code deployment Pre-defined orchestration scripts Flexible deployment locations Talent Data engineers Solution Architects ETL expertise Support Team R&amp;D Team BI/Viz/Reporting expertise Data Science Subject matter expertise as needed Global Data Science team Applying analysis at the right point Infrastructure as-a-Service model Pay-as-you-go structure Pre-configured hardware designs 11. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Agenda I. CSC BDPaaS Overview II. CSC Approach III. BDPaaS Architecture IV. BDPaaS Security V. Questions &amp; Answers 12. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Descriptive Analytics I What happened? Reporting Query, Reporting, and Search Tools Diagnostic Analytics Why did it happen? Analysis OLAP and Visualization Tools Descriptive Analytics II Whats happening now? Monitoring Dashboards and Scorecards Predictive Analytics What might happen? Predictive Analysis Big Data Prescriptive Analytics How can we make it happen? Recommendations, Risk Avoidance Complexity BusinessValue Operations Triggers High ImpactLow Impact Process Improvement via Applied Intelligence The Analytics Journey 13. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 MAJOR ACTIVITIES Solution Iterative App DevelopmentPlatform RolloutTechnical DesignDiscovery Interview Key Business Stakeholders Interview Key Technical Stakeholders Define Objectives &amp; Challenges Define Target Use Case Identify Data Sources Define Business Benefits Define Architecture Develop High- Level Approach &amp; Costs Agree to Project Plan/Rollout Standup / Connect Environment Design Data Flows Architecture Validation Build Data Flows Historical Data Real-Time Data Flow MANAGETRANSFORMSHAPE Iterate Identify data sources for target use case Develop high level tech approach and costs Define high level benefits Develop initial case for action Develop go forward plan Develop Data Model Technical architecture &amp; integration design Stand up environment Dashboard design workshops Data mapping Build dashboard Configure application Data load Run solution iterations Analytical modeling 2-4 hour Design Thinking Workshop Review current state metrics Review business pain points &amp; opportunities Review application &amp; infrastructure environment Define target use case Customer Engagement Framework 14. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Data Exploration &amp; Transformation Data Modeling &amp; Algorithm Development Data Visualization &amp; Reporting Business Discovery InsightLab: Rapid Analytics Development Insight Operationalization Change Management Use Case Prioritization &amp; Roadmap Data Inventory Identification &amp; Coordination 8 12 Week Sprint Agile Scientific Approach to Measurable Business Improvement Inputs Outputs InsightLab 15. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 How to Build a Business Outcome for anything Tools &amp; TechnologiesR / Python / Java / Javascript Tableau / Pentaho / Qlik Cognos / BobJ / OBIEE SAS / SPSS / MatLab / Rapid Miner Relational DB Columnar DB Graph DB Hadoop In-Memory / Streaming Visualization Time Series Spatial Charts Mapping Histogram Graphs Line Charts Scatter Plots Decision Trees Data Exploration Data Science Decision Trees Regression Analysis Classification Clustering Anomaly Detection Natural Language Processing (NLP) Correlation Analysis Ingestion / Munging Discovery Integration Normalization Dimensionality Reduction Feature Extraction Transformation &amp; Enrichment Data Fusion Business Insights Descriptive (1.0) Diagnostics (2.0) Predictive (3.0) Prescriptive (4.0) 5Define the right tools for the task at hand 4 Define consumption and interaction 3 Define the types of Analysis 2 Define data needed &amp; format for analysis 1 Define the desired insights by stage 16. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Case Study Decrease warranty inquiry response times Increase operational efficiency Enable the business to extract new insights Conducted 5-week big data strategy assessment Established cloud-based big data platform Built the apps and analytics to capitalize on the data Over 10,000 queries/day 30+ data connections 1,000+TB of data Response times of 2-3 months now done with a single query Improved customer satisfaction Reduced churn Reduced support costs New product management capabilities, fixes Better supply chain coordination Increased security New data and analytics products Increased cross-sales and up-sales Increased renewals Better license compliance HGST, a Western Digital company, develops innovative, advanced hard disk drives, enterprise-class solid state drives, and external storage solutions and services. CSC improved customer support and product quality. Solution ResultsChallenge 17. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Case Study Network Rail manages the most of the rail infrastructure across Great Britain, responsible for control and maintenance of over 2,500 railway stations, 20,000 miles of track, and 40,000 bridges and tunnels. CSC provides a data and analytics hub for massive amounts of imagery and analog track monitoring data. Network Rail needed a platform that could not only store, but also analyze petabytes of data over the long-term: Track imagery and video data captured via drones and cameras Vibration data captured via maintenance trains Other forms of large file size analog data crossed with operational, structured data sets Network Rail wanted to implement the solution quickly, and ramp up data volumes at a fast pace Goal of leveraging combined services to assist with loading data, managing the underlying infrastructure, and working with and analyzing the data CSC designed and configured the solution, built and deployed it in the cloud, and developed ETL flows to import massive amounts of bulk data on an ongoing basis Core platform (BDPaaS) leveraging Hortonworks Data Platform, including Hive with Tez CSCs platform integrated with ESRI ArcGIS for Big Data geolocation analysis features including geotagging and geo tiles CSC managed the infrastructure, platform components, and data flows, in addition to providing continued support/consultation services to the client Network Rail is generating insights on how to prioritize in near real-time the improvement and maintenance of the massive railway track and infrastructure footprint Advanced analytics of analog data, including geolocation capabilities Ability to handle the scale required by the massive amount of data under management and data growth Complete transformation of a business units analytics capability on track for success in less than 12 months SOLUTIONCHALLENGE RESULTS Image Files YARN HDFS Hive Hue AWS S3 Object Storage Hue Hadoop- ArcGIS Connector ESRI ArcGIS Analog Data Geo Info PostgreSQL PostGIS ArcGIS Geocortex 18. AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 Case Study This Food &amp; Hospitality Retailer has a footprint of over 650 regional hotels, 2,800 coffee shops, and a number of restaurant chains. CSC provides the infrastructure, data platform, and analytics that uncovers revenue opportunities in customer web interactions. The client wanted to quickly evaluate the use of big data and the value that it brings as it relates to identifying new business opportunities Ease of use was a key need in making insights and reporting more accessible to analysts and increasing the speed with which they could analyze Time to market was a key factor in the decision to implement a comprehensive big data platform. The client realized: A bare platform would not be easy to manage Their staff does not possess the skills to operate a bare platform They needed to focus on the big data applications, rather than...</p>