oil and gas big data edition
TRANSCRIPT
Big Data andThe Informatica Platform9/8/2015
David RamirezSenior Solution Architect Oil and Gas Accounts
About Informatica
• Founded: 1993 INFA Nasdaq• 2014 Revenue: $1.2b• Partners: 450+
• Major SI, ISV, OEM and On-Demand Leaders
• Customers: 5,000+• > 70% of the Global 500• Customers in 82 Countries• Direct Presence in 26 Countries• # 1 in Customer Loyalty Rankings (7
Years in a Row)
2
B2B Data Exchange
Informatica supports the requirements of cross-organizational data exchange, so users apply familiar & trusted data integration tools and techniques to the growing practice of B2B data integration.
Cloud Data IntegrationEnterprise Data Integration
Complex Event Processing
Informatica received high praise for its services from customers. For deployments involving systems monitoring use cases, Informatica offers a five-day stand‐up of RulePoint.
Ultra Messaging
In spite of the new entrants, Informatica remains the market leader in this highly demanding part of the messaging market.
Data Quality Master Data Management
Application ILM
Proven Technology Leadership
3
Problem: • Analytics teams spend most
of their time looking for and preparing data not analyzing it
• Impacts project delays, cost overruns, missed opportunities
Data Lake Solution• A single place to manage the
supply and demand of data
• Converts raw big data into fit-for-purpose, trusted, and secure information
Intelligent Data LakeManage Supply & Demand of Data
80% of the work in big data projects is data intelligence
“I spend more than half my time integrating, cleansing, and
transforming data without doing any actual analysis.”
“80% of the work in any data project is in cleaning the data”
“70% of my value is an ability to pull the data, 20% of my
value is using data-science…”
Sources: (1) DJ Patil, Data Jujitsu; (2-3) Kandel, et al. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Visual Analytics Science and Technology (VAST), 2012
First Pilot(s)
Data Warehouse Optimization
Data Discovery
Real-Time Operational Intelligence
Escalating return on data
Lower operational IT costs
Big Data Analytics
Operationalize Big Data Insights
Predictive Maintenance
Lower Total Cost of Care
Customer X/Up-Sell
Public Safety
Fraud Detection
Machine Device, Cloud
Documents and Emails
Relational, Mainframe
Social Media, Web Logs
Driv
en b
y IT
D
riven
by
Bus
ines
s
Lower Infrastructure Cost Added Business Value
What’s Hadoop?
Intelligent Data Lake
Intelligent Data LakePlatform for Big Data Projects
Informatica knows the Data LifecycleRelated Challenges
Source:- Gartner
InformaticaPlatform
DataIngestion
Refinement
Mastery/Delivery
Data Security
DataRetirement
• Data Quality•Exception Management
• Any Platform, Appication•Structured, Unstructured•Any latency
• Master Data Management• Data Integration Hub
• Data Archive•Records Retention/Discovery•Data Masking
Informatica Platform Overview
RelationalDB
.pdf, email,
Dev
Test
Prod Archive
ILM-Archive
3. Analyze
1. Profile 2. Define Targets
5. Monitor
4. BuildRules
DATA QUALITY
SECURITY
ETL
PowerCenter
MDM
Multi-Domain
Informatica Data Quality
MaterialsWellhead CustomerCustomer
CustomerWellhead
WellheadMaterials
Materials
Databases
Unstructured Data
Big Data
Cloud
Visualizations
Application Database Partner Data
SWIFT NACHA HIPAA …
Cloud Computing Unstructured
Data Warehouse
DataMigration
Test DataManagement& Archiving
Master DataManagement
Data Synchronization
B2B DataExchange
DataConsolidation
The Informatica DI PlatformComprehensive, Unified, Open and Economical platform
Data Sources Applications
Data Warehouse
MDM / PIM
Data Ingestion
Visualization
Data Governance
Data Security
Archiving
Replication
Data Streaming
Change DataCapture
Batch Load
Data Virtualization
Event-BasedProcessing
Data Integration Hub
Data Integration & Data Quality
Agile Analytics
Advanced Analytics
Machine Learning
Virtual Data Machine
Data Management Data Delivery
Machine Device, Cloud
Documents and Emails
Relational, Mainframe
Social Media, Web Logs
Mobile Apps
Visualization& Analytics
Real-Time Alerts
Batch Load
Pub / Sub
Data Service
Integrate & Prepare
Loose Coupling & Abstraction
Logical Data Objects
PRODUCT …CUSTOMER ORDER
Jumpstart/Accelerate Projects
Data SourceData SourceData Source
1 Instant Business-IT Collaboration with Analyst Tool 2 Profile to Discover Data
Patterns and Issues
3
4
Prototype and ValidateResults
Data Source
Fine-tune and Deploy Desired Solution in Days
Business
IT
IT
Business
Business IT
Business
IT
CommonRepository
Entire Life Cycle Supported by PowerCenter Standard Edition 9.6
Scale-up As Your Needs Grow
14
IT
IT
IT
ITHigh Availability
PushdownOptimization
Enterprise Grid
ConcurrentUsers
PartitionedData
IT
Included in PowerCenter Advanced Edition 9.6
15
Manage Metadata for Better Data InsightsData
LineageConsolidated Metadata Catalog
Federated Business Glossary
Mainframe Flat FilesDatabase Data Modeling BI ToolsERP
Metadata Repository
Custom
Metadata Reports
3rd party BI
Metadata Bookmarks
16
Common Biz Language Via Business Glossary
Provide a common vocabulary of business terms
Easily search for glossary assets with workflow
Manage relationships with other assets
Manage business policies governing the assets
Analyst
Improve Operational Confidence With Automated Testing and Monitoring
18
End-to-End Agility
RequirementsGathering
Prototype& Validate Deploy
IT
ITBusiness
IT
IT
Business Satisfied
Business-IT Collaboration
Develop
Business
IT
IT
SelfService
Monitor
IT
Test
IT
Automate Data Validation Testing
Data Validation Testing Capability
Enterprise Data
PowerCenter
Execute Tests
DVO Repository & Warehouse ReportsDatabase
Views
Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float
V_Summary
Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float
V_Tests
Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float
V_Results
Define Tests
DVO Clients
Write Results
Data Accessed
• Relational databases• Flat files• Mainframe data• DW Appliances• Cloud-based data
Proactively Monitor with PowerCenter 9.6
20
PowerCenterWS Hub
Send Alerts to Stakeholders
EnvironnentInformation
Get Operating System, Database Statistics
PowerCenterRepository Automated Monitoring
and Detection (Source Feeds, Rules/Templates, Watchlists, Alerts)
Analyst
IT
IT Operations
Analyst
Configure / Build Rules
1
2
4
Get PowerCenterStatistics
Monitor PowerCenter Operations3
1. Entire Informatica mapping translated to optimal open source project
2. Currently, MapReduce submitted to Hadoop cluster.
3. Advanced mapping transformations executed on Hadoop through User Defined Functions using Vibe
MapReduce
UDF
Informatica on HadoopInformatica Execution on Hadoop Architecture
Flink
INFA’s Unified Platform = Strong Time-to-Value
“Informatica and Microsoft are so much more consistent than their competitors [because] the platforms provided by these companies support transferable skills across projects more flexibly than do their rivals.“
TCO – Informatica vs. Hand Coding
Informatica
Hand Coding
$0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
$8,500
$11,500
Average Costs (3-year TCO) per project per end point
Hand coding
Informatica
0 1 2 3 4 5 6
2.4
1
2.4
0.7
5.3
1.2
2.7
0.8
Master Data managementData WarehousingData MigrationApplication Integration
Informatica is Far More Productive than Hand Coding
Source: “ Comparative Costs and Uses for Data Integration Platforms” Bloor Research, March 2014 24
Average Time to Develop by Project Type (Weeks)
Depending on the project hand coding can take more than 4 weeks longer to develop!