Best Practices: IMS CDC toModern Streaming PlatformsScott QuillicySQData – a Syncsort Company
November 2019Session HA
Place your customsessionQR code here.Please remove theborder and textbeforehand.
Topics
Introductions
Concepts
Architectural Overview Modern Streaming Platforms IMS Changed Data Capture (CDC) Flows
Best Practices Overall Approach Performance & Throughput
Conclusion
Q&A
About Me Scott Quillicy
38 Years Database Experience Database Performance Software Development Mainframe Changed Data Capture / Replication Since the Early 90’s
Started With: IMS Version 1.2 DB2 V1.3
Founded SQData in 2000 to Provide Customers with: A Better Way of Replicating Mainframe Data → Particularly IMS Solutions that Combine Consulting Expertise with Technology Technology Built Around Best Practices
SQData Acquired by Syncsort in August 2019 OEM Supplier of Mainframe CDC Connectors Now Part of the Connect Family of Products
Objectives
1) Practical Overview of Streaming IMS CDC to Modern Platforms
2) Focus → IMS System / DBA Perspective
3) How to Minimize Impact on the Mainframe
4) Early Detection of Red Flags
5) How to Avoid a Data Replication Time Sink
First Hurdle → The Great Divide
Today’s Popular Streaming Platforms
Apache Kafka
Azure Event Hubs
Amazon Kinesis
Google Pub/Sub
Streaming Platforms → Interest Over Time
Streaming Platform Architecture
Recommended Streaming Deployment Model
Initial Data Landing Highly Secure
Sanitized Data Ready for Consumption Capture & Ingestion
Best of ClassTransform &Sanitization
General Best Practices
Avoid Data Collection Overkill
Help Bridge the Great Divide
Approach with a Comprehensive Strategy
Involve the Business Unit(s) from the Beginning
Be Aware of Application Release Cycle
Major Red Flag
You Need to Start Small
Minimize Data Volume
Deploy in Small Increments Realize Success Early Adjust Infrastructure Predictable Costs & Duration
Adapt Easier to New Targets
In “The Lake”
Everything
Does Not Have to Be
Data Collection Overkill
Bridging The Great Divide Realize that they are going to take your data – Make the best of it Be able to translate mainframe-speak to distributed / cloud Mentor them on working with copybooks and data types Help them understand IMS data structures and keys
Technical Best Practices Summary
Keep it Simple & Efficient → Minimize the Number of Moving Parts
Avoid Overscaling → Start with Basic Configuration & Work Up
Keep Full Extracts / Initial Loads on Standby Need if ‘Point of No Return’ Reached Must be Able to Run Against Live Databases
Discovery / Planning Data Sources → CDC and Bulk Loads Latency Requirements Data Volume → CDC and Bulk Loads Peak Transaction Arrival Rate
Ensure Proper Monitoring / Alerts are Set
Goal → ‘Set & Forget’ Deployment
Streaming Performance Aspects
Throughput / Latency Primarily Depends on Speed of the Target
Fortunately, Streaming Platforms are Very Fast Targets
Top Items Affecting Performance / Throughput:
Message Size
IMS CDC Streaming → Transaction Size and Arrival Rate
Target Replication Scaling Factor → Affects Acknowledgement Repsonse
IMS Initial Loads → Overall Data Volume
Streaming Platform Configuration → Must be Tuned for Source Workload
Key Monitoring Metrics
Operational Component Down Unexpectedly
Latency → How Far Behind You Are in Publishing
Latency → When Did You Last Hear from the Engine(s).
Informational Overall Throughput → Records / Bytes per Second
Workload Patterns / Peak Transaction Arrival Rate
Number of CDC Records by Database / Segment
Number of Transactions
zIIP Offload Statistics
Message Size
Small Messages Perform Better than Larger Ones (a bit obvious) It Boils Down to the Number of Bytes per Message
IMS Segments can be Large (redefines, arrays, etc.)
Updates can Contain Before and After Images
So...What to Do? Suggest Avro as the Target Data Format
A Condensed Version of JSON JSON Typically Used for Data Validation Only → Can be Easily Read Avro Messages Roughly the Size of Source Segment (x 2 for Updates)
Reduce the Number of Fields in the CDC Message
Evaluation the Requirement for Publishing Before Images of Updates
Sample IMS CDC Record in JSON Format{"object_name":"IMSDB01.SEG02 "stck":"d4c4b51993db0000", "timestamp":"2018-08-12T11:11:18Z", "change_op":"U", "seq":"2", "parent_key":{ "seg1.key1":12345 }, "after_image":{ "fname":"MARY", "lname":"JOHNSON", "city":"CHICAGO", "amount":"4087.66" }, "before_image":{ "fname":"MARY", "lname":"JOHNSON", "city":"CHICAGO", "amount":"2964.32" }}
Deployment Architecture
IMS CDC Streams → Methods of Capture Log Based (x’99’s) Data Capture Exit Vendor Proprietary (over the top)
IMS CDC - Architecture to Avoid
IMS Unloader
IMS Logger Exit Significant Performance Impact on IMS Definite No-Go for High Volume IMS Shops Severe Choke Point → Intrusive Can Cause IMS to Crash
IMS Log CaptureExit
(DFSFLGX0)
Publisher
IMS Log Datasets(OLDS)
IMS CDC Streaming Illustration Start with Basic Configuration and Scale Up as Required
Need to be Able to Scale on Both the Source and the Target Sides
Suggestion → Use a Separate Set of Topics for the CDC Streams vs Initial Loads
IMS Unloader
TransientStorage
Publisher
IngestEngine
CDC Topics
SchemaRegistry
Capture
z/OS Linux
IMS CDC Streams → Target Side Scaling Start Here First Parallelize the Apply/Ingest Engines – One (1) Publisher Subscription Order May Matter → Engines Need to be able to Maintain Transaction Order Desired Behavior
Minimal Back Pressure on the Source Side Publisher Latency Should be within a Tolerable Limit
IMS Unloader
TransientStorage
Publisher
CDC Topics
SchemaRegistry
IngestEngine
Capture
z/OS Linux
IMS CDC Streams → Source Side Scaling
IMS Unloader
TransientStorage
Publisher
CDC Topics
SchemaRegistry
Multiple Publisher Subscriptions → Split by Database / Partition / FP Area Parallel Ingest Engines on Target Side – One (1) per Subscription Desired Behavior
Minimal Back Pressure on Source Side Minimal CPU Consumption
IngestEngine
IngestEngine
IngestEngine
Capture
z/OS Linux
IMS CDC Streams → Source Side Scaling...
IMS Unloader
CDC Topics
SchemaRegistry
Consider for Extreme CDC Data Volume → 1TB+ per Day Multiple Captures / Publishers → Split by IMS Subsystem / Data Sharing Partner Suggestion → Combine Online SSIDs and Split Out Batch SSIDs
IngestEngine
IngestEngine
IngestEngine
IngestEngine
IngestEngine
IngestEngine
IMS Unloader
IMS Unloader
TransientStorage
Publishers
Capture
Capture
Capture
z/OS Linux
The Initial Load Required to Prime the Target(s) and to Re-Materialize if Point of No Return Reached Load via Mass Inserts → There are No Traditional Utilities on the Target Side Alternative: Online Snapshots → FTP → Transform → Ingest into Kafka Important: Need to be Able to Run Unloads Against Live Databases Recommend → Use a Separate Set of Topics for the Initial Load vs CDC
IMS Unloader
TransientStorage
Publisher
IngestEngine
Initial Load Topics
SchemaRegistry
z/OS Linux
The Initial Load → Target Side ScalingSame Model as for CDC Streams Parallelize the Ingest Engine – One (1) Publisher Subscription Desired Behavior
Minimal Back Pressure on the Source Side Minimize Time and CPU Required to Complete the Initial Loads
IMS Unloader
TransientStorage
Publisher
Initial Load Topics
SchemaRegistry
IngestEngine
z/OS Linux
The Initial Load → Source Side Scaling
IMS Unloader
TransientStorage
Publisher
Initial Load Topics
SchemaRegistry
IngestEngine
IngestEngine
IngestEngine
Multiple Publisher Subscriptions → Split by Database / Partition / FP Area Parallel Ingest Engines on Target Side – One (1) per Subscription Desired Behavior
Minimal Back Pressure on the Source Side Minimize Time and CPU Required to Complete the Initial Loads
z/OS Linux
The Initial Load → Source Side Scaling...
IMS Unloader
Initial Load Topics
SchemaRegistry
IngestEngine
IngestEngine
IngestEngine
IngestEngine
IngestEngine
IngestEngine
IMS Unloader
IMS Unloader
TransientStorage
Publishers
Consider for Extreme Data Volume → 5TB+ IMS Data Multiple Unloads / Publishers → Split by IMS Subsystem / Data Sharing Partner Goal → Minimize Time & CPU Required to Complete the Initial Loads
z/OS Linux
Common Operational Situations
Target Streaming Platform Outage Target Unavailable to Receive Data → Ingest Engine(s) Should Stop and Disconnect
Data will Start Backing Up on the Source Side Option 1: Wait Until Cluster is Available and Restart
OK if for a Short Period of Time – Depends on Source Transaction Rate Eventually, You Mayl Reach the Point of No Return
Option 2: Spin Off CDC Data to a File(s) – Process when Target Comes Back Online
IMS Unloader
TransientStorage
Publisher
IngestEngine
CDC Topics
SchemaRegistry
Capture XBack Pressure Building UpCDC
Data File(s)
Target Streaming Platform Slow Throughput Measured with a Calendar → Back Pressure on Source Increases Common Cause → Target Configuration
One (1) or More Brokers Down – Correct and Restart Out of Memory: Set queued.max.requests at a Reasonable Value (more can be trouble) Memory Buffers: Should be Entirely in RAM Log Files: Review Strategy and Settings – There are Many
Important: Leverage Target Monitor Tools such as the Confluent Control Center May Need to Back Off CDC Until Tuning has been Optimized
IMS Unloader
TransientStorage
Publisher
IngestEngine
CDC Topics
SchemaRegistry
Capture
CDC Data Backing Up
Compressed PRILOG in RECON Physical SLDS Exist, but Entries Disappear from the RECON IMS Considers SLDS to be Inactive → CDC Thinks Otherwise Can be Avoided by Increasing the DBRC Log Retention Period Capture Agent Should Provide a Method of Recovery If Capture Cannot Handle → Unwelcome Manual Intervention & Perhaps a Reload
IMS Unloader
SLDS IMS RECON
CompressedPRILOG
IMS Capture / Publisher Slow Target Not Doing Much → Issue Appears to be on the Source Side (but probably not) Common Causes
Busy System - Processes are Running at Low Priority Abnormally Large Units-of-Work – Exclude ‘Purgers’ (mass deletes)
Monitor: Capture Latency vs Publisher Latency Capture Current → Look at the Publisher Publisher → Check Data Flow Rate and Last Ack Time from Engine
IMS Unloader
Publisher
IngestEngine
CDC Topics
SchemaRegistry
Capture
In Conclusion... Be Patient with The Great Divide...It will be Challenging
Avoid Data Collection Overkill...At All Costs
Keep Things Simple → Minimize the Number of Moving Parts
Scale Only as Required Meet Throughput Latency Requirements Minimize Back Pressure on the Source
Involve the Business from the Beginning
Encourage / Insist Distributed Team to Have a Target Outage Mitigation Process Spinning Off CDC Records to Disk on Target Side Reloading Source Data
Don’t Skimp on Planning / Discovery
Please submit your session feedback!
• Do it online at http://conferences.gse.org.uk/2019/feedback/nn
• This session is HA