data quality meets soa

15
Data quality functions were already being provided as services for Unix, Windows and Linux, before the analysts of Gartner had invented the term SOA. For the most part, technical reasons were the decisive factor for this architecture. In addition to this, the implementation of service-orien- ted architectures results in new and chan- ged requirements for data quality servi- ces and also increases the opportunities and benefits which they can create. __________________________ All company and product names and logos used in this document are trade names and/or registered trademarks of the respective companies. WHITE PAPER / Data Quality meets SOA – Making Data Quality available for all Business Processes WHITE PAPER: SOA

Upload: uniserv

Post on 22-Apr-2015

1.091 views

Category:

Business


1 download

DESCRIPTION

Data quality functions were already being provided as services for Unix, Windows and Linux, before the analysts of Gartner had invented the term SOA. For the most part, technical reasons were the decisive factor for this architecture. In addition to this, the implementation of service-oriented architectures results in new and changed requirements for data quality services and also increases the opportunities and benefits which they can create.

TRANSCRIPT

Page 1: Data Quality meets SOA

Page 1

Data quality functions were already being provided as services for Unix, Windows and Linux, before the analysts of Gartner had invented the term SOA. For the most part, technical reasons were the decisive factor for this architecture. In addition to this, the implementation of service-orien-ted architectures results in new and chan-ged requirements for data quality servi-ces and also increases the opportunities and benefits which they can create.

__________________________

All company and product names and logos used in this document are trade names

and/or registered trademarks of the respective companies.

WHITE PAPER / Data Quality meets SOA – Making Data Quality available for all Business Processes

WHITE PAPER: SOA

Page 2: Data Quality meets SOA

Page 2© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

APPLICATION A: A classic application is the validation of a postal address on the basis of reference data, which includes street and place names and the dependencies of the postcodes on places, streets and house numbers. In contrast to simple database access, the correct address should also be found here if the input is incomplete or contains recording or hearing errors. The goal is a high matching accuracy, in order to be able to correct the greatest possible number of incorrect addresses automatically.

APPLICATION B: Another application is searching for duplicates in the in-house database. Here too the goal is to quickly and reliably identify a business object, a business partner, a product or a sales oppor-tunity in spite of incomplete, divergent or incorrect input, in order to (a) simplify the search and thereby increase the productivity of the users, and (b) to prevent the creation of duplicates, i.e. multiple entries which refer to the same object in the real world. Consistent, complete and unambiguous mapping of the real objects in the database will be achieved as a result.

WHITE PAPER: SOA

The starting point for data quality services In order to consider the importance of service-oriented architectures for the pro-vision and use of data quality functions, it is first of all useful to look at the typi-cal data quality functions themselves.

An important goal in the context of improving the data quality consists in preventing incomplete or incorrect data from being stored in the data-base. Possible problems should be detected at data entry and then cleaned up either automatically or manually by the user after appropriate feedback. Specialized search indices ensure that a search in databases with 1,000,000 to 100,000,000 data records normally only requires a fraction of a second, even with divergent spelling. Nevertheless, these response times require intelligent caching of the indices. This can be provided very efficiently by implementing the software as a central service which is made available from an in-house server.

Apart from the response time, the integration in a wide range of environments already played an important role in data quality services. Decoupling the data quality services from the service consum-ers and utilization via a client/server protocol is important for reaching this goal. This function had therefore already been provided and used for data quality, at least for more sophisticated applications, before the „invention“ of service-oriented architec-tures as a service. As a result, both the high require-ments for the response times could be met and the provision of functions guaranteed for a wide range of environments.

Page 3: Data Quality meets SOA

Page 3© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

From the „fat client“ to the 3-layer architectureLayered Architecture

The typical architecture in the early days of the cli-ent/server world consisted of a database which, in addition to the storage of transaction and mas-ter data, enabled asynchronous communication between different system components, thereby allowing these components to be decoupled. Messages were written in the database by the sender and read there by the receiver. For this pur-pose, however, regular «polling» of the respective table was required, in order to establish whether new unprocessed messages were available. The business logic was mainly implemented in so-called «fat» clients. Tasks which were executed via batch processes were implemented via back-ground processes which accessed the database.

Interactive functions for the validation of addresses or for the detection of duplicates directly at data entry were normally integrated in the graphical user interface. They were usually called via proprietary interfaces. Specifications such as DCE1 or CORBA2, whose goal was the standardization of interfaces for the communication of distributed components, had nothing more than a niche existence.

LAYERED ARCHITECTURE

Name

Street

Postcode

Customer

ROLE

Supplier

Reseller

Name

Street

Postcode

Customer

Supplier

Reseller

FAT CLIENT CLIENT

APPLICATION SERVER

ROLE

WHITE PAPER: SOA

1 http://en.wikipedia.org/wiki/Distributed_Computing_Environment

2 http://en.wikipedia.org/wiki/CORBA

Page 4: Data Quality meets SOA

Page 4© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

This situation has fundamentally changed in the past few years. The starting point for this was mainly the establishment of standards within the framework of JEE3 (Java Enterprise Edition) which resulted in the provision of high-performance implementations of these standards both as commercial products and open source solutions.

For the Windows world, Microsoft followed with the development of .NET4 as a language-independent platform and the .NET Enterprise Services. As a result, a high-performance infrastructure software was available irrespective of the selected platform (JEE, .NET), in order to largely detach the business logic from the presentation layer and implement it in its own layer on the server. This also changed the requirements for data quality services, which were now executed mainly from the business logic on the server side.

3 http://en.wikipedia.org/wiki/Java_Platform,_Enterprise_Edition

4 http://en.wikipedia.org/wiki/.NET_Framework

LAYERED ARCHITECTURE

Name

Street

Postcode

Customer

ROLE

Supplier

Reseller

Name

Street

Postcode

Customer

Supplier

Reseller

FAT CLIENT CLIENT

APPLICATION SERVER

ROLE

SIMPLE INTEGRATION IN THE APPLICATION SERVER – EITHER WITH A JEE OR A .NET ARCHITECTURE – NOW CAME TO THE FORE.

Page 5: Data Quality meets SOA

Page 5© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

SOAP as an Enabler for SOA SOA does not become really effective until a stand-ard protocol for the provision and use of services has been established. This gap was closed by the Web Service protocol SOAP5. It is supported by practically all middleware and infrastructure com-ponents, thereby enabling interoperability between service providers, middleware and service con-sumers. As a result, the provision of connectors for the use of proprietary protocols in proprietary middleware is no longer necessary. This therefore lays the basis for the establishment of powerful mid-dleware components. It includes the concept of the Enterprise Service Bus6 (ESB), which enables the loose coupling of different components which play a significant role in the routing of messages, as well as engines which can directly execute defined workflows (BPEL)7 in a business process language.SOAP-based Web Services have therefore devel-oped into a central instrument for the provision of interactive data quality services in modern enter-prise architectures. These mainly concern services which run according to the request/response pat-tern, in which the service consumer initiates a request, e.g. «validate the specified address», and the service makes a direct response with a confir-mation, a correction suggestion or a selection of possible correct addresses.

5 http://en.wikipedia.org/wiki/SOAP

6 http://en.wikipedia.org/wiki/Enterprise_service_bus

7 http://en.wikipedia.org/wiki/BPEL

Proprietary Protocol

Page 6: Data Quality meets SOA

Page 6© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

This procedure corresponds to the interactive char-acter of the validation, which should support the user directly at entry of a business object and offer options for intervention in the event of problems. However, this function is no longer implemented in isolation in the presentation layer but normally takes place in the context of a higher-ranking busi-

ness process, e.g. the implementation of the order-ing process in an e-business application, the imple-mentation of a process for lead conversion and qualification in a CRM application or a compa-rable process. The correlation between the imple-mentation of business processes and data quality functions immediately becomes obvious and there-fore also the contribution which they make to the success of the respective business process.

SOAP Protocol

THE CORRELATION BETWEEN THE IMPLEMENTATION OF BUSINESS PROCESSES AND DATA QUALITY FUNCTIONS IMMEDIATELY BECOMES OBVIOUS AND THEREFORE ALSO THE CONTRIBUTION WHICH THEY MAKE TO THE SUCCESS OF THE RESPECTIVE BUSINESS PROCESS.

Page 7: Data Quality meets SOA

Page 7© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

Customer CasesOrangeThe customers of the telecommunications company Orange in France can contact the company via

various channels. They can visit the web portal of Orange in the Internet, call the call center of Orange and visit the mobile phone business of a partner of Orange. However, irrespective

of which contact channel the customer chooses, it must be ensured that the respective processes are executed with the same quality. An important com-ponent of this process quality consists in ensuring that customer addresses are correct.The technical basis is the open source Java appli-cation server JONAS. This JEE server is the central point for the provision of services which are required to implement the business processes of Orange. The Uniserv data quality service for the valida-tion, restructuring and standardization of customer addresses is also provided in this environment.

This service-oriented approach makes it possible to use the same services in different processes and different channels. Irrespective of whether it con-cerns the creation of a new Orange customer or the change of address of an existing customer and irrespective of which contact channel a process is initiated, the underlying service-oriented architecture always ensures that the executed processes are configured consistently and can access the same services. It is thereby possible to guarantee a con-sistently high quality standard of the address data across the company.

Page 8: Data Quality meets SOA

Page 8© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

Customer CasesWinGroup AGThe German WinGroup AG, a service network for sales and marketing, offers its customers an

extensive range of services in the areas of call center, letter-shop, dialog marketing and IT services. In order to guarantee

a consistently high quality level of the underly-ing processes in all service areas and customer-specific applications, the subsidiary company, WinLogic, has developed a service-oriented archi-tecture based on an Enterprise Service Bus. This represents the central middleware for docking all applications in the company in the central services. The address validation and the duplicate check of Uniserv are linked here to secure the data quality.

Page 9: Data Quality meets SOA

Page 9© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

Lightweight REST – when SOAP is too unwieldy

SERVICES CONFIGURED IN THIS MANNER CAN BE IDEALLY USED IN TYPICAL WEB 2.0 APPLICATIONS.

Even if the SOAP-based protocol for integration in typical enterprise middleware seems to be the ideal solution, there are various applications where alternatives such as RESTful8 Services are advan-tageous. This is particularly the case when data quality services are to be directly activated in a presentation layer which is HTML/AJAX-based. Input aids which automatically complete a partial input by the user or offer suggestions for comple-tion of the input based on the partial input are a typical scenario for this case. Input aids are located in the presentation layer by their nature. However, they make considerably higher demands for the response time, since they are called more frequently during the input, usually after each input character.

Although the same services used for address valida-tion present themselves for the address input in this implementation, SOAP-based Web Services are not always suitable for this application scenario on account of the overhead which the SOAP protocol entails. RESTful Web Services are extremely lean in comparison to SOAP-based Web Services.

The call is via the http protocol, and the call argu-ments in the URL are encoded as a result. In the case of a call from JavaScript, the result is output in the JSONformat9 in the ideal case. This results in a JavaScript code which can be directly inter-

preted by the JavaScript interpreter of the browser, in order to provide the result of the call directly as JavaScript objects. Services configured in this manner can be ideally used in typical Web 2.0 applications.

8 http://en.wikipedia.org/wiki/REST

9 http://en.wikipedia.org/wiki/JSON

Page 10: Data Quality meets SOA

Page 10© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

SOAP - the fine differences

10 http://en.wikipedia.org/wiki/Web_Services_Description_Language

A learning curve also has to be overcome for adapting proprietary interfaces to a SOAP-based communication. The packets exchanged in the framework of a SOAP-based communication are described in a metaformat, the so-called WSDL (Web Service Description Language).10

During the development of the WSDL, it must first of all be ensured without fail that the data types used are actually supported by all the target lan-guages and systems, in which the service is to be consumed. This aspect is relatively non-critical in a pure in-house development with a homogeneous software infrastructure, e.g. JEE or .NET.

However, this criterion is essential in a data qual-ity service which must useable in a great variety of environments which may not even be known beforehand.In addition to this, the SOAP specification offers two basic options for defining in the WSDL the linkage between the packet structure in XML and the constructs of the programming language which provides or interprets the packet. As the acronym suggests, the so-called «rpc-style» corresponds more to the conventional Remote Procedure Call (RPC) and models operations as method calls which do not differ from local calls. They are ideal when the Web Service is to be called from an object-oriented programming language such as Java or C#. The so-called «document-style» is more suitable for modelling complex contents as a result which is represented as an XML document with its own XML scheme. Validation against the XML scheme using standard XML means is therefore possible on the one hand, and the result document can be eas-ily further processed, transformed or processed for the presentation layer using standard XML means or suitable frameworks on the other. Both variants have their advantages and disadvantages.

DURING THE DEVELOPMENT OF THE WSDL, IT MUST FIRST OF ALL BE ENSURED WITHOUT FAIL THAT THE DATA TYPES USED ARE ACTUALLY SUPPORTED BY ALL THE TARGET LANGUAGES AND SYSTEMS.

Page 11: Data Quality meets SOA

Page 11© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

Both variants may also have to be made avail-able to services which need to be called from any context.

Web Services are stateless by their nature. In the ideal case, this means no partial results; instead the overall result is output as a result of the call, and two successive calls are totally independent of each other.

WEB SERVICES ARE STATELESS BY THEIR NATURE. IN THE IDEAL CASE, THIS MEANS NO PARTIAL RESULTS; INSTEAD THE OVERALL RESULT IS OUTPUT AS A RESULT OF THE CALL, AND TWO SUCCES-SIVE CALLS ARE TOTALLY INDEPENDENT OF EACH OTHER.

Web Services must be scalable. A prerequisite for this is the described statelessness. In addition to this, the Web Service should access global resources as little as possible or not at all. If this is necessary, however, the administration and synchronization of the access to these resources should never be implemented ad-hoc for the respective WEB service. Resource pools such as are offered by most application servers or exist as open source extensions should be used instead. Effective and configurable pooling is thereby enabled.The last two points in particular normally require at least a partial redesign if the existing functionality is to be made available as a Web Service.

Page 12: Data Quality meets SOA

Page 12© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

SOA blurs the distinction between «on premise» and «on demand»

11 http://en.wikipedia.org/wiki/Software_as_a_Service

The provision of software services and applica-tions via the Internet, referred to by the buzzwords «Software on Demand», «Software as a Service»11 or «Cloud Computing», is a theme which is grow-ing in importance. In many cases, a fundamen-tal and profound contradiction between locally installed software («on premise») and software used via the Internet («on demand») is depicted. This is not the case from the SOA perspective, because within the framework of a service-oriented architecture it is not critical how the respective service is provided. The provision of a service via the Internet as an alternative to a locally installed server presents interesting new application possi-bilities, particularly in the area of data quality serv-ices which carry out validations and corrections by matching and merging against reference data.

The reference data required for the service must be regularly updated. This means both regularly recurring manual work as well as regular subscrip-tion charges which have to be paid to the data provider. In the case of country-specific postcode directories, this work and expense are incurred for each country for which addresses are checked.

This only makes the use of such solutions interesting for larger quantities of data. This restriction is not applicable if the service is provided as an SaaS offer, in which invoicing is based exclusively on the executed transactions. Locally installed services and services used via the Internet can be also com-bined as required or exchanged by using them in a service-oriented architecture. As a result, an opti-mum solution, which can also be flexibly adapted in retrospect to changed basic conditions, can be found for the respective user.

AS A RESULT, AN OPTIMUM SOLUTION, WHICH CAN ALSO BE FLEXIBLY ADAPTED IN RETROSPECT TO CHANGED BASIC CONDITIONS, CAN BE FOUND FOR THE RESPECTIVE USER.

Page 13: Data Quality meets SOA

Page 13© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

ASPECT ONE: What are the requirements for data quality functions which are to be used in an SOA environment?

WHITE PAPER: SOA

ConclusionsThe following conclusions about two aspects can be drawn from the above described experience:

– The functions must be accessible via SOAP as Services. Integration in practically any infra-structure for the service-oriented implementation of the business processes is thereby guaran-teed. However, care should be taken in the detail that mapping between the XML elements of the service description and the respective constructs of the respective server environment can be represented.

– If use in the presentation layer is foreseeable, it must be checked whether a RESTful Service implementation is necessary.

– It should be checked whether an alternative use scenario, in which the service is provided via the Internet, provides commercial or technical advantages, and whether the service provider supports such a scenario.

Page 14: Data Quality meets SOA

Page 14© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

WHITE PAPER: SOA

– The use scenarios must be clearly defined. The question as to whether the application of the services takes place within the framework of the business logic or the presentation logic is particularly important. In the first case, integra-tion takes place in the application server, the Enterprise Services Bus or a BPEL engine, in the second case in a graphical user interface, which can in turn make its own demands. The decision as to whether SOAP-based and/or RESTful services are more appropriate is made depending on this.

– The target systems and languages in which the service is to be used must be specified. The decision on the degree of complexity of the modelling with respect to the XML structures used and XML data types of the call results is made depending on this.

– The service must be designed, so that it is state-less, i.e. it must function without the storage of an internal state between two calls. If a state between the calls is required, it must be trans-ferred for the follow-up call or made persistent in a suitable manner.

ASPECT TWO: Which aspects have to be considered if functions for the validation, enhancement and processing of data are to be made SOA-capable?

– The scalability of the service must be provided: if the Web Service requires global resources, e.g. a database connection, these should be administered by means of a suitable resource pool in the server container. Otherwise, these resources quickly become a bottleneck which prevents genuine scalability of the service.

– The service should be internet-capable, i.e. it should be irrelevant for the functionality of the service whether it is provided in the local net-work or via the Internet. The possible applica-tions are extended enormously as a result.

For further informationplease visit our web page www.uniserv.com or contact us directly:

We are looking forward for advising and sup-porting you through your project.

Page 15: Data Quality meets SOA

Page 15© UNISERV GmbH / +49 7231 936-0 / All rights reserved.

UniservUniserv is the largest specialised supplier of data quality solutions in Europe with an internationally usable software portfolio and services for the quality as-surance of data in business intelligence, CRM applications, data warehousing, eBusiness and direct and database marketing.

With several thousand installations worldwide, Uniserv supports hundreds of customers in their endeavours to map the Single View of Customer in their customer data-base. Uniserv employs more than 110 people at its head-quarters in Pforzheim and its subsidiary in Paris, France, and serves a large number of prestigious customers in all sectors of industry and commerce, such as ADAC, Al-lianz, BMW, Commerzbank, DBV Winterthur, Deutsche Bank, Deutsche Börse Group, France Telecom, Green-peace, GEZ, Heineken, Johnson & Johnson, Nestlé, Payback, PSA Peugeot Citroën as well as Time Life and Union Investment.

Further information is available at www.uniserv.com

Experience: OVER 40 YEARS

Market position:LARGESTEUROPEAN SUPPLIER

Employees: MORE THAN 110 PEOPLE

DIRECT MARKETING

BI/BDW

CPM

CRM

ERP

E-COMMERCE

DATA MIGRATION

PROJECTS

SOA

ON-PREMISE/ON-DEMAND

MDM/CDI

COMPLIANCE

WHITE PAPER: SOA

Contact:+49 7231 936-0

UNISERV GmbH Rastatter Straße 13 • 75179 Pforzheim • Germany • T +49 7231 936-0 • F +49 7231 936-3002 • E [email protected] • www.uniserv.com© Copyright Uniserv • Pforzheim/Germany • All rights reserved.