splunk @ murex for test and development -...
TRANSCRIPT
“IF YOU GRAPH IT, THEY WILL SEE IT”
SPLUNK@MUREX
FOR TEST AND DEVELOPMENT
OVERVIEW
WHO AM I
WHO IS MUREX
WHAT IS SPLUNK
MUREX HAD A PROBLEM!
HOW WE USED SPLUNK FOR
TEST AND DEVELOPMENT
USE CASE 1 & II
© 2017 Murex S.A.S. All rights reserved 3
WHO AM I SNAPSHOT
FAC T S & F I G U R E S
Name:
Robert Lynch
Current Position:
Global SPLUNK Manager
Non-Function Test Manager
History:
13 years working at Murex
From Production Development to Production Crisis
© 2017 Murex S.A.S. All rights reserved 4
WHO WE ARE
MUREX IS THE #1 PROVIDER OF INTEGRATED SOLUTIONS FRONT TO BACK TO RISK FOR THE CAPITAL MARKETS INDUSTRY
SNAPSHOT
FAC T S & F I G U R E S
30 years dedicated to our domain and clients
Around 45,000 users and 250+ clients in 65 countries
3 Regional Hubs , 17 offices covering all time zones
Staff of 2,000, over 60 nationalities
Revenue of €460 m in 2016
Over €1 billion invested in R&D over the last 10 years
© 2017 Murex S.A.S. All rights reserved 5
A WORLDWIDE PRESENCE SNAPSHOT
NEW YORK TORONTO
SANTIAGO
SAO PAULO
DUBLIN
PARIS
LONDON
BEIRUT
DUBAI
MOSCOW
SYDNEY
TOKYO
BEIJING
SEOUL
SINGAPORE
HONG KONG
40,000+ users
65 countries
65 COUNTRIES
45,000+ USERS
3 REGIONAL HUBS, 17 OFFICES COVERING ALL TIME ZONES
LUXEMBOURG
© 2017 Murex S.A.S. All rights reserved 6
WHAT IS SPLUNK?
SPLUNK IS USED TO ANALYSE VARIOUS DATA INPUTS TO PROVIDE VALUABLE INSIGHTS IN REAL-TIME.
Murex produces many many logs, which Splunk can use to produce more meaningful and useable information. Splunk
is configured to generate reports on GC, TPS, LTS tasks, DB row-counts, LAS, MX timing, server CPU and network.
Within Murex, Splunk is used by Test/Development/Consultants/DevOps
© 2017 Murex S.A.S. All rights reserved 7
SPLUNK DEPLOYMENT AT MUREX
LIVE
MUREX ENVIRONMENTS
SPLUNK
Log Drop
Internal Testing
From Clients
Any Murex Employee
*20 Second to add new Environment
© 2017 Murex S.A.S. All rights reserved 8
Count Average Total
Class(X). Method(Y)
1 5 Minutes 5 Minutes
Class(A). Method(B)
6000 50 ms 5 Minutes
WHAT IS TPS TRACING?
TPS IS LIGHT PERFORMANCE TRACING ACROSS ANY JAVA SERVICES
CODE LEVEL TIMING ON THE CLASS.METHOD
• Class(X).Method(Y) Does action
• It calls Class(A).Method(B) 6000 Times
Class(X).Method(Y) Class(A).Method(B)
Time (Mesured)
TPS Event
22:05 22:10
5 Minutes
~50ms …
Time
© 2017 Murex S.A.S. All rights reserved 9
BUT WE HAD A PROBLEM!
STATIC REPORT
• TPS Tracing produced a
Nice Report – However!
• The existing PDF report
was slow to graph
millions of lines
• We could not zoom into
areas of investigation
• Experts were always
needed for analysis
• CLICK_HERE
Distribution (includes percentages and cumulative counts)
Overall Statistics
Graph
Stats per (configurable) interval
Several CPU Pipelines configured in this Reporting Engine
© 2017 Murex S.A.S. All rights reserved 10
“IF YOU GRAPH IT, THEY WILL SEE IT”
WHEN YOU GRAPH AN ISSUES IT CAN BECOME MORE OBVIOUS
SOMETIMES “AVERAGE” AND “MEDIUMS” AND “MAXIMUMS” ARE NOT ENOUGH TO SEE ISSUES…
IN THE BELOW GRAPH AN OBVIOUS INCREASE IN THE “CLASS.METHOD” OVER TIME CAN BE SEEN
THIS MIGHT NOT HAVE BEEN OBVIOUS WITH OUT A VISUALIZATION - CLICK_HERE
© 2017 Murex S.A.S. All rights reserved 11
WHAT IS A SPLUNK_ID? SPLUNK_ID IS A REFERENCE TO YOUR DATA IN SPLUNK
A NUMERIC CODE THAT WILL REFERENCE YOUR “TEST DATA” @ “TIME” = SPLUNK_ID
Standard Time/Data Graph Client X Log Drop
Time
22:00 23:00 00:00
100ms
Results 50ms
SPLUNK_ID
Time
1st Nov 2nd Nov
3rd Nov
100ms Results
50ms
Test Data
Client X
Client Y
Client Z
SPLUNK_ID= @ Time Test Data
E.G CPU Usage
This means we can create URL’s to a specific “Time” in a set of “Test Data”
This is used by Developers/Tester to Share Data to each other quickly
E.G CPU Usage
© 2017 Murex S.A.S. All rights reserved 12
USE CASE 1: OVERLAPPING DIFFERENT DATA STREAMS MUREX HAD A LARGE CLIENT PROSPECT WERE IT NEEDED TO DO 60 TRADES PER SECOND
17 Milliseconds per trade
BY USING SPLUNK WE WERE ABLE TO OVERLAP DIFFERENT DATA SETS
TPS Tracing with Garbage Collection(GC) [GC is native to Java to clean memory, however if not handled correctly it can slow down
performance]
BELOW WE CAN SEE THE GC OVERLAPPED WITH TPS
The green line is Trade Insertion (COMMIT), a lot of times in parallel we can see Purple, Yellow or Red lines, this is “Java GC” slowing down the COMMIT! - CLICK_HERE
© 2017 Murex S.A.S. All rights reserved 13
USE CASE 1: BEFORE OPTIMIZATIONS TO KNOW WE WERE MAKING SOME PROGRESS WE NEED TO QUANTIFY THE PROBLEM.
IN SPLUNK WE COULD ADD UP ALL THE GC TIME DIVIDED BY THE TPS TIME TO SEE WHAT % WE WERE SPENDING DOING GC AND NOT PROCESS THE COMMIT.
Par_new_%_of_TPS = ∑ParNew(GC) / ∑TPS
Par_new_%_of_total_time = ∑ParNew(GC) / Total Time
NOW WE CAN ADD OPTIMIZATIONS AND JUDGE IF WE ARE IMPROVING
TEST 1: BEFORE ANY OPTIMIZATIONS
The average was 96 milliseconds [That is 10 Trades per second, we needed 60!]
Par_new_%_of_TPS = 37% [Major bottle neck identified]
Par_new_%_of_total_time = 20% [Major bottle neck identified]
© 2017 Murex S.A.S. All rights reserved 14
USE CASE 1: FINAL TEST, AFTER OPTIMIZATIONS AFTER A MULTIPLE BINARY FIXES AND GARABAGE COLLECTION TUNING WE HIT THE 17MS MARK
SOME GC PEAKS WE STILL PRESENT, HOWEVER MASSIVE IMPROVEMENT WERE SEEN
We can see from the graph below that the GC peaks in Yellow have reduced and we have also reduced the COMMIT peaks
Par_new_%_of_TPS = 37% Before, 7.8% After
Par_new_%_of_total_time = 20% Before, 0.89% After
CLICK_HERE
© 2017 Murex S.A.S. All rights reserved 15
USE CASE 2 - GANTT CHART ZOOM
VIEWING THE CODE LINE IN CHRONOLOGICAL ORDER, HELPED TO IMPROVE VISUALIZATIONS
WE CAN ZOOM INTO THE MILLISECOND TO SEE THE SEQUENCE OF METHOD CALLS ONE_TRADE_INSERTION_CLIENT_ISSUE
© 2017 Murex S.A.S. All rights reserved 16
IDENTIFYING DIFFERENT TESTS ON A TIMELINE OVER A LARGE TIMELINE WE NEED TO BE ABLE TO IDENTIFY DIFFERENT TESTS QUICKLY
OVER NIGHT A USER MIGHT RUN 10 DIFFERENT TESTS
BY DEVELOPING AN OVERLAP IT BECOMES OBVIOUS WHERE EACH TEST IS (TEST 1 OR TEST 2) CLICK_HERE
© 2017 Murex S.A.S. All rights reserved 17
CONCLUSION THE SPLUNK “TPS VIEW” WHAT HAS THE NEW “TPS VIEW” ACHIEVED?
Problem:
We needed to View Millions and Millions of TPS Tracing quickly
Solution:
How about….7 Million Lines of Data (Sliced and Diced) in 7 seconds!!
New TPS VIEW is hyper accelerated to handle heavy volumes
Problem:
We needed to be able to get Data Into SPLUNK Quickly
Solution:
LIVE Monitoring: 20 seconds to attach any Environment to SPLUNK (One line command)
Log DROP: Upload logs from anywhere (One line command)
Problem
We needed to overlap different data sets to find bottle necks
Solution:
By overlapping TPS Tracing with GC, CPU etc.. New bottle necks became obvious:
Problem:
How can I show my other team mates quickly what I am looking at?
Solution:
SPLUNK ID: Unique ID to allow you to save a point of investigation and pass it on to you colleagues
THANK
YOU
PARIS 8 rue Bellini
75782 Paris cedex 16 FRANCE
Tel + 33 1 4405 3200
NEW YORK 810 Seventh Avenue - 14th floor
New York, NY 10019 USA
Tel +1 212 381 4300
SINGAPORE 10 Marina Boulevard #19-01
Marina Bay Financial Centre Tower 2 SINGAPORE 018983 Tel + 65 6216 02 88
linkedin.com/company/murex twitter.com/Murex_Group
www.murex.com [email protected]