open source dwbi-a primer

50
OPEN SOURCE DATA WAREHOUSE /BI-A PRIMER Webinar session for TechGig.com Presentor Parthasarathi Doraisamy Enterprise BIDI Solutions 1

Upload: partha69

Post on 14-Jul-2015

316 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Open Source DWBI-A Primer

OPEN SOURCE DATA WAREHOUSE

/BI-A PRIMER

Webinar session for TechGig.com

Presentor –Parthasarathi Doraisamy

Enterprise BIDI Solutions

1

Page 2: Open Source DWBI-A Primer

CLOUD --WHAT DOES THIS MEAN?

UC Berkeley RAD Lab definition:

1. The illusion of infinite computing resources available on

demand, thereby eliminating the need for Cloud Computing

users to plan far ahead for provisioning

2. The elimination of an up-front commitment by Cloud users,

thereby allowing companies to start small and increase hardware

resources only when there is an increase in their needs; and

3. The ability to pay for use of computing resources on a short term

basis as needed (e.g., processors by the hour and storage

by the day) and release them as needed, thereby rewarding

conservation by letting machines and storage go when they are

no longer useful.

2

Page 3: Open Source DWBI-A Primer

REFERENCES/ACKNOWLEDGEMENT

Talend

Pentaho

Birt-eclipse

Birst

Jaspersoft

Greenplum

ASA –ODW model

Gartner research analysis

TDWI

3

Page 4: Open Source DWBI-A Primer

WHAT IS OPEN DW/BI?

Beware:Open doesn‘t means the product(s) are free!!!!!!!!

Open DW consists of pre designed,prebuilt Data warehouse architecture which comes free

Thereby it reduces overall cost and risk by reducing design,development and implementation time

-> Reduces consumer‘s initial development cost(DQ,ETL,BI & Analytics etc.)

But the vendors charge for the related services in maintainig the DW solution,further customizing to their exact business need ,Support & maintenance of the system.

Mitigates the risk through Rapid development

There are technical, social, and economic reasons that will move data warehousing and, perhaps all data models toward ‗open‘ solutions

4

Page 5: Open Source DWBI-A Primer

NEED FOR OPEN DW/BI

Open data warehouse,BI development progressed rapidly over the past few years due to compelling economic downturn

Faster deployment need of the proposed solution due to dynamic business changes

Now a days we can get‗Open Source‘ product for almost every aspect of the BI/Data warehouse stack including architectures which are picking up pace.(Few noticable players Talend,Pentaho,Jaspersoft,Birst .Qlikview etc.)

5

Page 6: Open Source DWBI-A Primer

INDUSTRY STATS ON TRADITIONAL DWBI

The average cost of these projects was $2.2

million ($3.1 million today, adjusted for inflation).

The average payback period was 2.3

years, with over 30% experiencing a 5+ year

payback period.

The majority of respondents reported that their

data warehouses consumed enormous

resources and remained ―works in progress‖ for

extended periods of time.

6

Page 7: Open Source DWBI-A Primer

NEED FOR OPEN DW/BI ….

Popular open source databases which help

in these Open data warehouse are MySql

(and its eco-system of add-

ons), Ingres, EnterpriseDB.

Hardware,software cost considerations are

further reduced by extending the Open

solution in the hosted SaaS environment.

7

Page 8: Open Source DWBI-A Primer

ODW MODEL –A FRAMEWORK

Open Data Warehouse Model (ODWM) provides a generic framework for delivering an Open data warehouse

This generic data warehouse model can be further fine tuned to specific industry

Domain experts work upon these specific industry solutions just like in typical proprietary DW/BI solutions earlier,but differ in certain critical aspects like pre-design of Open DWBI architecture –data model,Etl design,BI design for the

concerned industry domains

8

Page 9: Open Source DWBI-A Primer

ODW MODEL PRINCIPLE

The Open Datamodel consists of Hundreds of potential dimension tables with thousands of fields which forms the ―Foundation‖

These Open data warehouse are carefully designed to ensure stability of the DW system and easily facilitates the use of commercial ETL bridges/connectors

(yet allow for interpretation through aggregation and by other means)

OLAP cubes and data marts can be constructed from the foundation as required by the business through similar bridges/connectors

These are the potential opportunity for Developers in their respective technology-ie.ETL,BI & Analytics area to come up with appropriate bridge solutions to seamlessly develop the entire ODW & BI model into a functional datamart,Enterprise Data warehouse

9

Page 10: Open Source DWBI-A Primer

ODW MODEL & ITS EXTENSIONS…..

They must allow for integration of multiple data sources of different granularity ;should in some manner, accommodate slowly changing dimensions

Each of the baseline ODW Db instance model can further create a range of domain specific(we can call it a Industry‘Slice‘) packaged solutions.These package may comprise of DQ,ETL,BI solution as outlined earlier.

These package solutions comprises of

Host the domain specific ODW solution(s) in the cloud .

These hosted Open DWBI solutions leads us to the packaged Data warehouse/BI Appliances 10

Page 11: Open Source DWBI-A Primer

OPEN DATAWAREHOUSE/BI APPLIANCE

11

Page 12: Open Source DWBI-A Primer

OPEN DWBI APPLIANCES ……

The Open DWBI Appliance combines and supports thousands of data warehouses, many of those with hundreds of millions of records in a scalable multi-tenant environment.

These appliances got the capablity to generate complex datamodels, complex algorithms inbuilt within their query engine

These appliance vendors tie up with Hardware suppliers to construct the appliance in such a way for performing to its maximum efficiency

12

Page 13: Open Source DWBI-A Primer

OPEN DWBI APPLIANCES ……

These appliances are designed to power an

on-demand software solution that needs to

support a large number of users

simultaneously and has the ability to quickly

increase capacity

Built on a shared-nothing architecture and no

data is shared across nodes (servers).

Popular appliances are

Nettezza,Greenplum..

13

Page 14: Open Source DWBI-A Primer

MULTIPLE APPLIANCES FOR ENTERPRISE NEED

14

Page 15: Open Source DWBI-A Primer

DWBI APPLIANCES –SALENT FEATURES

High Availability and Failover Support

Designed for operation in a high-availability clustered Open DWBI environment

Global Cache

Provides superior query performance via its massive-scale caching capabilities

Simplified software Deployment and Upgrades in Place

Dramatically simplifies its deployment by freeing IT from having to worry about resolving potentially complex OS compatibility issues, library dependencies or undesirable interactions with other applications.

15

Page 16: Open Source DWBI-A Primer

DWBI APPLIANCES –SALENT FEATURES….

Advanced ETL Services and a complete analytical data warehouse with automated warehouse generation

Cloud Connectors, for connecting to operational cloud applications- Eg.Salesforce.com,Google Analytics

These Connecters allow for automatic uploading of data into the appliance from various sources

Live Access, which allows you to analyze data from on-premise data warehouseswithout uploading

16

Page 17: Open Source DWBI-A Primer

SAAS BASED OPEN BI SOLUTION

17

Page 18: Open Source DWBI-A Primer

SAAS –OPEN BI SOLUTION…..

Low-cost, open source solution.

End-to-end, integrated BI and ETL

capabilities.

Full enterprise-level support.

Flexibility of on-demand and on-premise

deployment.

Support for mobile devices as a BI platform.

Support for iterative IT and business-user

report generation process.18

Page 19: Open Source DWBI-A Primer

CLOUD --WHAT DOES THIS MEAN?

Depends upon how you slice it vertically

• IaaS -AWS, GoGrid, Mosso

• PaaS -Google App Engine, Microsoft Azure

• SaaS(BaaS) -Salesforce ,Talend,Jaspersoft,

Pentaho,BIRT etc.

19

Page 20: Open Source DWBI-A Primer

AGILE BI-ASTER,CHEAPER,BETTER….

20

Page 21: Open Source DWBI-A Primer

CLOUD --WHAT DOES THIS MEAN?

21

Page 22: Open Source DWBI-A Primer

ODW -WHEN TO USE THE CLOUD?

Transient application lifespan or use

Quick start required

Budget pressure

Variable use/scale of application unknown

IT unavailable/unresponsive

22

Page 23: Open Source DWBI-A Primer

SAAS –OPEN DWBI

23

Page 24: Open Source DWBI-A Primer

KEY FINDINGS FOR BUSINESS TRANSITION TO

CLOUD TECHNOLOGY(IN 2009)

By 2012, at least 50% of direct commercial revenue attributed to open-source products or services will come from projects under a single vendor's patronage.

Through 2011, less than 50% of Global 2000 IT organizations will have implemented a formal open-source adoption and management policy as part of an enterprise software asset management strategy.

Through 2013, 50% of mainstream IT projects using open-source software (OSS) will not achieve cost savings over closed-source alternatives.

Through 2013, 90% of market-leading, cloud-computing providers will depend on OSS to deliver products and services.

24

Page 25: Open Source DWBI-A Primer

MOVING TO CLOUD-RECOMMENDATIONS

Expect vendors to play an increasing role in the governance of many market-leading, open-source solutions during the next several years.

Move aggressively to establish an effective enterprise adoption policy, and bring OSS and hardware under asset management controls.

Do not expect to automatically save money with OSS or any technology without effective financial management. Do expect to carefully manage open-source solutions in the appropriate scenarios to realize total cost of ownership (TCO) advantages.

Manage cloud-based software strategies and open-source strategies together for maximum effect. Look for synergies between both, and the ability of OSS to move your workloads to the cloud.

25

Page 26: Open Source DWBI-A Primer

STRATEGIC PLANNING ASSUMPTION(S)

By 2012, at least 50% of direct commercial revenue attributed to open-source products or services will come from projects under a single vendor's patronage.

Through 2011, less than 35% of Global 2000 IT organizations will have implemented a formal open-source adoption and management policy.

Through 2013, 50% of mainstream IT projects using OSS will not achieve cost savings over closed-source alternatives.

Through 2013, 90% of market-leading, cloud-computing providers will depend on OSS to deliver products and services.

26

Page 27: Open Source DWBI-A Primer

CLOUD USAGE BY VARIOUS ORGANIZATIONS..

27

Page 28: Open Source DWBI-A Primer

OPENSOURCE BI TOOLS

28

Page 29: Open Source DWBI-A Primer

TDWI RESEARCH STUDY…

29

Page 30: Open Source DWBI-A Primer

SAAS BI PROCESS FLOW

30

Page 31: Open Source DWBI-A Primer

HARDWARE ACCESS IN CLOUD OPEN DW/BI…

Secure access via web,RDC,VPN or combo..

Customized server(Choose ur own

CPU,RAM,Disk space)

Scale up your capacity anytime

Level 2,3 Server support incl 24 * 7

monitoring service

Applicaton support on demand

Integrate with your local & Global IT groups

31

Page 32: Open Source DWBI-A Primer

SECURITY ASPECTS IN CLOUD OPEN DW/BI…

Web,RDC,VPN or a combo

Firewalls

Certified Data center –SAS 70 type II

NDA

Virus protection

32

Page 33: Open Source DWBI-A Primer

MDM

MDM success for enterprise open source

DWBI implementation—

High quality master data is extremely

valuable to enterprise business

processes and analytics

33

Page 34: Open Source DWBI-A Primer

MDM-KEY CONSIDERATIONS

Some key considerations for creating a master reference data source are outlined below:

Central master reference data model

Mapping

Populating the master

Publish data

Access and provisioning

Ownership and process

34

Page 35: Open Source DWBI-A Primer

MDM CHECKLIST

MDM provides the system in obtaining the

―Single version of truth‖ across the various

applications within the enterprise(despite the

disparity of source systems)

The following checklist provides functional

requirements for implementing and deploying

MDM in an enterprise environment :

.

35

Page 36: Open Source DWBI-A Primer

MDM CHECKLIST –FUNCTIONALITY COVERED

Profiling,

Modeling

Data quality

Data Stewardship & Governance -Hierarchy

management & security

Workflow administration

36

Page 37: Open Source DWBI-A Primer

MDM-ACTIVE DATA MODEL ….

Multi-Domain capability

Object-Oriented Data Modeling

Domain Templates

Basic Data Validations and Business Rules

Graphical Modeling Tool

Multiple Language Support

37

Page 38: Open Source DWBI-A Primer

MDM-DOMAIN INTEGRATION

Complete Data Integration Functionality

Automated Services-Based Integration

Real-Time and Batch Integration

SOA Manager/Console

38

Page 39: Open Source DWBI-A Primer

MDM-DQ INTEGRATION WITH ETL,BI

Data Profiling

Accurate Data Match and Merge

Data Bucketing and Blocking

Data Augmentation

Advanced Data Validations and Business Rules

Data Standardization

Data Cleansing

39

Page 40: Open Source DWBI-A Primer

MDM-DATA STEWARDSHIP & GOVERNANCE

Hierarchy Management – Multiple and Recursive Hierarchies

Hierarchy Import and Overlays

Business Process Management (BPM) and Workflow

Automated Data Survivorship

Manual Resolution through intuitive GUI interface

40

Page 41: Open Source DWBI-A Primer

MDM-ADMINSITRATION

Historical Views of Hub Data

Hub Versioning

Master Data Audit Trail Information

Roles-Based Security and Active Directory Integration

Versioning

41

Page 42: Open Source DWBI-A Primer

TALEND MDM SOLUTION –OS PRODUCTS

IBM Eclipse; JBoss Application Server and Portal; eXist Open database;

XSD / XML Schema for the XML data models;

XSLT for data transformation;

Object programming following the EJB 2.1 standards ("Enterprise Java Beans") on Jboss server

XQuery for queries on XML database; Document/literal WSI norm ("Web Service Interoperability") for web services

Bonita for business process management.

42

Page 43: Open Source DWBI-A Primer

COST COMPARISION

43

Eg: Total cost for a small project, comparing the use of 3 approaches to

data integration: opensource, proprietary and manual coding

Page 44: Open Source DWBI-A Primer

SUMMARISED COST-SMALL ETL PROJECT

44

Page 45: Open Source DWBI-A Primer

SUMMARY COST FOR MEDIUM ETL PROJECT

45

Page 46: Open Source DWBI-A Primer

ODW /BI --WHY IT WILL SUCCEED IN MARKET

ODW/BI has got lot of winner(financial) groups……..

Owners get low cost rapid entry into a data warehouses they can extend.

Developers get to create/sell new ETL/BI products in a new market(Tool providers)

‗Source‘ vendors can solve reporting problems and advance new ways to compete(Source providers)

Consultants get a bigger market for their services (Service providers).

Domain exerts can participate by creating new open data warehouses using their deep industry knowledge (Service providers).

46

Page 47: Open Source DWBI-A Primer

ODW /BI --WHY IT WILL SUCCEED IN MARKET

Development licenses

Training curve

Development time

Run-time licenses

Deployment of hardware and operating

system licenses

IT operations

47

Page 48: Open Source DWBI-A Primer

ODW /BI --WHY IT WILL SUCCEED IN MARKET

Maintenance/subscription

Maintenance time

Reliability and predictability of the data

integration processes

48

Page 49: Open Source DWBI-A Primer

QUESTIONS?

Any questions,please get in touch with me at

[email protected]

Skype -ebidisolutions

49

Page 50: Open Source DWBI-A Primer

Thank You!

50