WHT/082311
1 | | ©2013, Cognizant1 | ©2017, Cognizant 1
Hammer and beyond – An ensembling journey
WHT/082311
2 | ©2017, Cognizant
Who am I?
Sunil Babu PeethambaramArchitect, Cognizant Technology Solutions, CTSH (NASDAQ)
Total IT experience – 13+ years
Consulting with LexisNexis since 2013 (Chennai, Dayton, Buford, Alpharetta)
Experience in HPCC Systems – more than 3 years
Domains worked on :• Supply Chain Management• Logistics • Retail –
• Merchandise and Store operations • Order Management and • Warehouse Management Systems
• Insurance • Healthcare • Aviation
WHT/082311
3 | ©2017, Cognizant
Problem statement - How did it all start
Build valid flight connections (VFC) based on direct flight schedules (DFS)
DFS come in a proprietary encoded format
DFS spans across 1000 carriers and over 4 million records
DFS are for a year or more into the future
DFS keeps changing every day and VFC needs to be versioned for every day (potentially)
Building VFC requires evaluating feasibility of over 16 trillion potential connections
Valid connections to be identified by applying:• Circuitry• Cabotage• BIETA and LCC• Schedule conflicts • MCT rules of over 100,000 to be applied in sequence
WHT/082311
4 | ©2017, Cognizant
The Legacy Setup
• Complex Business Logic
• Data intensive
• .NET/SQL Server
• Local datacenter
• Scaled-up architecture
• Ageing hardware
• Sequential processing
• Low fault tolerance
• Stale data delivery
• 24 X 7 life support
WHT/082311
5 | ©2017, Cognizant
The ask
SOS!
WHT/082311
6 | ©2017, Cognizant
The ask
Not really!
WHT/082311
7 | ©2017, Cognizant
The ask
Relevant data delivery – faster processing, parallelize independent tasks
Don’t marry the hardware(just friends with benefits)
Performance as a configuration (take your time, hurry up, choice is yours, don't be late)
Fail fast, recover faster
Onboard new customers quickly
Automated data delivery pipeline
Better maintainability – support and enhance the complex business logic
WHT/082311
8 | ©2017, Cognizant
So What?
WHT/082311
9 | ©2017, Cognizant
Every project has complex business logic
WHT/082311
10 | ©2017, Cognizant
But, We have to generate hundreds of millions of records…
WHT/082311
11 | ©2017, Cognizant
…which means we have a “big data” problem
WHT/082311
12 | ©2017, Cognizant
And we are going to do whatever it takes…
WHT/082311
13 | ©2017, Cognizant
OK Google.. What is big data?
WHT/082311
14 | ©2017, Cognizant
This is what we got!
WHT/082311
15 | ©2017, Cognizant
Our problem was different
WHT/082311
16 | ©2017, Cognizant
We have a big “data problem”
and the answers are a whole lot bigger!!!
WHT/082311
17 | ©2017, Cognizant
So, why HPCC Systems?
Why not?
WHT/082311
18 | ©2017, Cognizant
So, why HPCC Systems?
Our use case was data intensive and batch oriented
Embarrassingly parallel
ECL was built specifically for distributed data processing and gave us the fine control we needed
Been there.. done that, lot of real experiences to tap into
Access to the HPCC Systems development team
It’s performing and maintainable
We did a proof of concept and validated fitment anyway• 45 minute job ran in 1 second• 4 hours job ran in 90 seconds• 4 weeks planned proof of concept was completed in 4 days
WHT/082311
19 | ©2017, Cognizant
What did Bill have to say about it?
WHT/082311
20 | ©2017, Cognizant
Why AWS?
Bring a multi-node HPCC Systems cluster up or down at a click of a button
Scale up or down with zero upfront cost
Validating multiple configurations for performance and choose the best
And…
No need for Data Centers
Pay as you USE
Go Global
Speed of computing
WHT/082311
21 | ©2017, Cognizant
High level flow
WHT/082311
22 | ©2017, Cognizant
Inside HPCC Systems
Data warehouse as Source of Truth
Data warehouse is the base on which our solution was built.
Follows a push-pull architecture
The raw data from different data sources are cleansed and transformed to data cubes (push).
The cubes acts as views that are used by downstream applications (pull). Eg: Connection builder
Data warehouse is the only way by which data enters into the distributed data processing system
All views follow a common interface through which data can be accessed
WHT/082311
23 | ©2017, Cognizant
Lifecycle of a view in DW
WHT/082311
24 | ©2017, Cognizant
How did we fare?
Metrics Measure (Legacy – UTG) Measure (HPCC Systems)
Building connections (Singles) 40 hours 1 hour
Lines of Code 26535 (Not including SQL) 3973
Delivery Frequency Weekly Daily (Possible)
Hardware 24 GB and 12 cores for Batch Server384 GB and 24 Cores for SQL Server
Thor Master + Middleware – 16 GB Thor Slaves 64 GB – 16 cores across 4 nodes
AWS
4.4 million
100 million
13.5 million
WHT/082311
25 | ©2017, Cognizant
Happy Side Effects
Data Warehouse as a framework for new data sources
Data Warehouse as an interface for downstream applications
Plug and play by design
File builder template – Blue print for all data delivery jobs
Unit testing framework for HPCC Systems
Regression testing suite – Can run all tests in the code base and provide report
We integrated comparison testing tool from LNR into Hammer
HPCC Systems cluster can now be built in AWS at a click of a button (puppet)
Seamless sync between external FTP location and landing zone through S3
WHT/082311
26 | ©2017, Cognizant
What next?
WHT/082311
27 | ©2017, Cognizant
What next?
WHT/082311
28 | ©2017, Cognizant
What next?
WHT/082311
29 | ©2017, Cognizant
What next?
WHT/082311
30 | ©2017, Cognizant
What next?
WHT/082311
31 | ©2017, Cognizant
What next?
WHT/082311
32 | ©2017, Cognizant
What next?
WHT/082311
33 | ©2017, Cognizant
What next?
WHT/082311
34 | ©2017, Cognizant
What next?
WHT/082311
35 | ©2017, Cognizant
?
Questions?
WHT/082311
36 | ©2017, Cognizant
Thank youReach out to me: [email protected]
Useful links
Cognizant: http://www.cognizant.com
FlightGlobal http://www.flightglobal.com
HPCC Systems Portal: http://hpccsystems.com
Machine Learning: http://hpccsystems.com/ml
Online Training: http://learn.lexisnexis.com/hpcc
HPCC Systems Wiki & Red Book: https://wiki.hpccsystems.com
Our GitHub portal: https://github.com/hpcc-systems
Community Forums: http://hpccsystems.com/bb
Documentation: https://hpccsystems.com/download/documentation