1 designing a data exchange - best practices data exchange scenarios –sender vs....

26
1 Designing a Data Exchange - Best Practices Data Exchange Scenarios Sender vs. Receiver-initiated exchanges Node Design Best Practices: Handling Large Transactions State Management Data Services Data Validation Schema Design

Upload: garry-wilcox

Post on 29-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

1

Designing a Data Exchange - Best Practices

• Data Exchange Scenarios– Sender vs. Receiver-initiated exchanges– Node Design

• Best Practices:– Handling Large Transactions– State Management– Data Services– Data Validation– Schema Design

2

Data Exchange Scenarios

Submit to Data Consumer

DATA PROVIDER

Menu of Services

1. GetFacilities2.GetPermits3.GetProjects

4.Get...

Exchange Network

NodeDatabase

EPA

EPA CDX Node Database

DATA CONSUMER CDATA CONSUMER B

Exchange Network

Node

Web Site

DATA CONSUMER A

DesktopSoftware

Data Synchronization Exchange

Data Publishing Exchange

Get from Data Provider

3

Requesting Data (1 of 3)

Simple Query

– Synchronous process– Ideal for small data sets– Ideal for both ad hoc and planned

exchanges– Onus is on requestor to initiate exchange

PARTNER A PARTNER B

Query

Query Response

4

Requesting Data (2 of 3)

Solicit with Download

– Asynchronous process– Good for larger datasets– Data Provider can schedule processing of

request– Requester can use “GetStatus” to see if data

is ready yet

PARTNER A PARTNER B

Solicit

Solicit Response

...time passes...

Download

Download Response

5

Requesting Data (3 of 3)

Solicit with Submit

– Asynchronous process– Good for larger datasets– Does not require the requestor to

continuously poll the data provider to see if data is ready

PARTNER A PARTNER B

Solicit

Solicit Response

...time passes...

Submit

Submit Response

6

Sending Data (1 of 2)

Simple Submit

– Very simple and very common process– Typical for traditional regulatory flows– “Hides” data since is not exposed as a

service

PARTNER A PARTNER B

Submit

Submit Response

7

Sending Data (2 of 2)

Notify with Download

– Asynchronous approach to Simple Submit– Receiver can perform download at the time

of their own choosing

PARTNER A PARTNER B

Notify

...time passes...

Download

Download Response

8

Data Exchange Scenarios

• Nodes wait for requests

• Nodes may initiate actions (i.e. Submit)

• How can a node do both?

9

Node Components

Web ServicesInterface

Request Processor

Node Administration

Utility

Internet

Node Database

`Flow Database

`

Flow B

Flow A

Example Node Architecture

10

Node Components

Node can be divided into components, each playing a different role:

1. The Web Services Interface• Acts as a listener for inbound requests

and submissions• Hosted on a Web Server (i.e. IIS,

WebSphere)• Should not do any heavy lifting (i.e.

data processing)

Web ServicesInterface

11

Node Components (continued)

2. Request Processor

• Performs all data processing– Composes XML files for outbound delivery

– Decomposes and processes inbound XML files

• Coupled with a scheduler component– Enables node to process Solicit requests at a

time of the node administrator’s choosing

– Automatically kick off outbound processes (i.e. daily Submit)

• Flow agnostic– Decoupled from specific flow implementations

• Ideally installed on an Application Server

Request Processor

12

Node Components (continued)

3. Node Administration Utility

– Create and manage local accounts– Install new data exchange components– Set processing schedules– Audit Node activity– Extract documents (inbound and outbound

should be stored)

Node Administration

Utility

13

Node Components (continued)

4. Flow-specific components

– Discrete components tailored for a specific data exchange

– Hot-swappable– Services (interface) is generic

• Node configuration determines which services are internal or public

• Node configuration determines whether a given service is for Query or Solicit

Flow B

Flow A

14

Node Components (continued)

Flow-to-Node Interface

Flow A

GetFacilities(params[])

GetInspections(params[])

ProcessInboundData(XML)

Request Processor

Web ServicesInterface

Pass Thru (solicit)

Pass Thru(query)

Internal(submit (in))

Pass Thru (submit (out))

Flow B...

Node AdminUtility

15

Large Transactions

• Can cause problems in several areas:– Data retrieval (SQL)– XML serialization (sender side)– Transmission over Internet– XML deserialization (receiver side)– Schema validation (both sender and

receiver)

16

Large Transactions

• Stage data in a model similar to that which is used by the schema

– XML is hierarchal whereas RDBMS is relational– More secure – source system unaffected by node operations– Index query parameter fields

Source Database(Intranet)

Firewall

Flow Database(DMZ)

NODE

(SQL)

17

Large Transactions (continued)

• Use an asynchronous exchange– Use Solicit, not Query

• Schema design considerations– Schema KEY/KEYREF discouraged– Element naming may significantly affect file

size<MailingAddressStateUSPSCode>OR</MailingAddressStateUSPSCode>

• Query “costing”– Calculate the size of a given result set (i.e.

COUNT(*)) before running full query.– Not very much experience in this area

18

Large Transactions (continued)

• A well-designed flow can help avoid large transactions– “List” services can return only high-level data

Scenario 1: • RCRA.GetFacilities(“WA”)

Scenario 2: • RCRA.GetFacilityList(“WA”)• RCRA.GetFacilityDetail(“WA”,”FACID1234”)

– Data service parameters can be used to limit transaction size

Scenario 3:• RCRA.GetFacilitiesByType(“WA”,”LQG”)

– All options affect schema design

19

Large Transactions (continued)

• File compression– zipping files can reduce file size by over

90%• Compact storage (archiving)• Significant reduction in time to transmit

• Disk I/O versus memory I/O– If possible, avoid using techniques which

require system to read entire document into memory in order to process. Toughie…

20

State Management

• State Management is required any time two systems must be synchronized

• Contrast to Data Publishing exchange• Typically the sender’s burden, but does

not have to be• Partial rejects compound the difficulty

21

State Management (continued)

• Flagging source data– Set “submission status” indicator on source data– Complexity is directly related to transaction

granularity– Compounded if record-level rejects are performed

Permit

Discharge Point

Parameter

Measurement

Fine-Grain Transactions

Permit

Discharge Point

Parameter

Measurement

Coarse-Grain Transactions

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

INSERT, UPDATE, DELETE

GetPermits()

GetPipes()

GetMeasurements()

GetParameters()

GetPermitDetails()

GetMeasurements()

22

State Management (continued)

• Exchange Network Header– Same schema can be used to perform

different transactions– Can remove the need for TransactionCode

(i.e. INSERT, UPDATE, DELETE) in schema

• “Delta” to derive data changes since last submit– Many systems do not store deleted data– Compare last submission snapshot with

current snapshot, derive what has changed

• Incremental and full refresh services– i.e. Facility Flow

23

Data Service Best Practices

• Data service naming conventions

{Prefix}.{Action}{Object}[By{Parameter(s)}]

i.e.: FacID.GetFacilityByName

• Work in Progress• What about versioning?

24

Data Services Best Practices

Documenting data services:

– Data Service name– Whether the service is supported by Query, Solicit, or both– Parameters

• Parameter Name• Index (order)• Required/Optional• Minimum/Maximum allowed values • Data type (string, integer, Boolean, Date…)• Whether multiple values can be supplied to the parameter• Whether wildcard searches are supported and default wildcard

behavior• Special formatting considerations

– Access/Security settings– Return schema– Special fault conditions

• Wildcards: %• Parameter delimiter: | (pipe character)• Parameter operation: AND

25

Data Validation Best Practices

• XML instance files should be validated against the schema by the sender before submittal

• CDX offering pre-submittal validation services for some flows

• Schematron (Doug Timms)

26

Schema Design Best Practices

• DRC 1.0 and DRC 1.1– Schema Namespace– Schema Versioning– Exchange Network Schema Types– Use the Shared Schema Components