ten tools for ten big data areas 01 informatica
TRANSCRIPT
Informatica OverviewTen Tools for Ten Big Data Areas
Series 01 Big Data Integration
www.sparkera.ca
2
Ten Tools for Ten Big Data Areas – Overview
© Sparkera. Confidential. All Rights Reserved
10 Tools10 Areas Data Warehouse
Data Platform
Data Bus
Programm
ing
Data A
nalyti
cs
Sear
ch a
nd In
dex
Visualization
Data Integration
Streaming
Data
base
First ETL fully on Yarn
Data storing platformData computing platform
SQL & Metadata
Visualize with just few clicks
Powerful as JavaSimple as Python
real-time streamingMade easier
Yours Google
Lightning-fast cluster computing
Real-time distributed data store
High throughput distributed messaging
3
Agenda
© Sparkera. Confidential. All Rights Reserved
About data integration
2 About Informatica company and its approach
3 Informatica architecture, client, server components, developer tool overview
4 Informatica why and why not
5 Informatica job trend
1
Little About DI – Data Integration
• DI involves combining data residing in different sources and providing users with a unified view of these data.
• DI process is also called Enterprise Information Integration (EII).
• DI usually means ETL - data extract, transformation, load.
• 80% of enterprise data projects' efforts are spent on DI work.
• Data cleansing, audit, master data management are usually considered with DI.
© Sparkera. Confidential. All Rights Reserved
About Informatica Company
• Found in 1993• 2014 revenue – US$1.05 billion• Average growth rate 17% per year• Employee – 5500+• Customers – 5000• Value customer covers up to 70% of global top 500 company • Partners – 500+• Cover various business, industries and government organizations
including telecommunications, health care, financial and insurance services.
• A company dedicate on data integration and management• Bought out as private company on August 2015.
© Sparkera. Confidential. All Rights Reserved
The Tradition Approach
Application Database Partner Data
SWIFT NACHA HIPAA …
Cloud Computing Unstructured
87% of enterprises use hand-coding for data integration
75% of enterprises reported increased maintenance costs
Data Warehouse
DataMigration
Test DataManagement& Archiving
Master DataManagement
Data Synchronization B2B Data
ExchangeData
ConsolidationComplex
EventProcessing
UltraMessaging
© Sparkera. Confidential. All Rights Reserved
The Informatica Approach
Application Partner Data
SWIFT NACHA HIPAA …
Cloud Computing UnstructuredDatabase
Data Warehouse
DataMigration
Test DataManagement& Archiving
Master DataManagement
Data Synchronization
B2B DataExchange
DataConsolidation
ComplexEvent
ProcessingUltra
Messaging
© Sparkera. Confidential. All Rights Reserved
Informatica Latest Products v9.6
• Data Integration PowerCenter PowerExchange
• Master Data Management
• Cloud Integration
• Big Data BDE – Informatica Developer Big data parser
© Sparkera. Confidential. All Rights Reserved
Informatica PowerCenter Overview
• An ETL tool ( Extract, Transform and Load)
• The main advantages over other ETL tools lies in its robustness, across OS, and high performance.
• It can read from a variety of different sources and write to as many targets, while transforming data in between.
• The architecture design use SOA concept for better extensibility and high availability
• Single sign on access, built-in version control, GUI development, built-in schedule and monitoring
© Sparkera. Confidential. All Rights Reserved
Informatica PowerCenter Architecture
© Sparkera. Confidential. All Rights Reserved
Informatica PowerCenter Client Component
• Repository Manager – meta data management
• Designer – Tool to build mapping for ETL logic
• Workflow Manager – Tool to build/run session and workflow
• Workflow Monitor – Tool to monitor job running
• Administration Console (browser based) - administration
© Sparkera. Confidential. All Rights Reserved
Repository Manager
Navigate through multiple folders and repositories, export & import, user & folder management
© Sparkera. Confidential. All Rights Reserved
Designer
Create and debug mapping & maplet including source, target, transformations for core ETL logic.
© Sparkera. Confidential. All Rights Reserved
Workflow Manager
Create, schedule, and run session, workflow, worklet wrapping mapping.
© Sparkera. Confidential. All Rights Reserved
Workflow Monitor
Monitor running statistics and control execution of workflows.
© Sparkera. Confidential. All Rights Reserved
Administration Console
Monitor and manager various of Informatica service, licenses, etc.
© Sparkera. Confidential. All Rights Reserved
Informatica PowerCenter Server Components
• Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables.
• Integration service: The Integration service runs sessions and workflows.
• Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenter workflows as services.
• Informatica service: Overall service management and coordination
© Sparkera. Confidential. All Rights Reserved
Informatica Big Data Edition Overview
Extract, load, and transform with big data ecosystem.
© Sparkera. Confidential. All Rights Reserved
Informatica BDE Component - Developer
BDE is all in one tool and can fully push job running on Hadoop
Developer component• Mapping – Tool to build mapping for ETL logic• Maplet – Reusable mapping• Workflow – Tool to build workflow• Application – Tool to deploy mapping/workflow
Others• Monitoring Console (browser based) – job monitoring• Administration Console (browser based) - administration
© Sparkera. Confidential. All Rights Reserved
Why Informatica Product
• Proven technology leadership• A track record of continuous innovation• The most neutral trusted partner – very focus• Long history of customer success• Over 5000+ industry leaders relies on Informatica• Major banks, telecom, insurance, energy, health, research
companies are using Informatica in Toronto • Easy and popular to use• Pull push job to Hadoop• Connector for many kinds of source• Performance and reliability
© Sparkera. Confidential. All Rights Reserved
Side Effect - When May Not To
• High price: 150K+ to start
• Get challenges from ELT – Leverage database for transformation. Need investment on ETL server. Its push to database optimization has limitations.
• Schedule, monitoring, and version control functions are limited
• BDE is relative new although the concept is great
• Alternatives - MS SSIS, Talend Studio, Pentaho Data Integration
© Sparkera. Confidential. All Rights Reserved
Informatica Job Trends
Level Junior Level(20%)
Middle Level(40%)
Expert Level(40%)
Position ETL developerInformatica dev.DW developer
Sr. ETL developerData SpecialistETL specialistETL designerETL Admin
Big data ETL dev.BDE developerInformatica architectInformatica consultant
Tool PowerCenter Informatica Developer
Other
UsagePercentage
80% 10% 10%
© Sparkera. Confidential. All Rights Reserved
www.sparkera.ca
BIG DATA is not only about data, but the understanding of the data and how people use data actively to improve their life.