soa performance patterns

40
<Insert Picture Here> Performance Analysis Overview Examples Tooling

Upload: iasa

Post on 07-Dec-2014

3.922 views

Category:

Business


4 download

DESCRIPTION

Performance management for SOA applications - Randy StaffordPerformance of SOA applications is a serious concern for their owners and architects, especially as usage and complexity increase over the life of an application. The fact is that attending to quality attributes like performance, scalability, and availability will always be part of architecting systems, even as technology platforms evolve and new development models emerge over the years. This presentation will review the fundamental aspects of a pragmatic performance management approach for operational SOA applications, borne of first-hand experience in responsible positions. And it will drill into specialties such as performance analysis including understanding resource utilization and response time breakdown, and performance optimization using offline request stream replay and patterns specific to applications of different types and architectural styles. Attendees will gain a working knowledge of concepts, vocabulary, and measures from the world of performance management, in categories such as load, performance, and resource utilization. They will see examples of the kinds of charts and other artifacts that are useful in performance analysis activities, and learn of simple techniques and tools they can use to initiate a performance management program for their own systems. Attendees need not meet any special prerequisites, but an appreciation of evidence-based decision-making would be helpful. Ultimately, the more we are able to describe and manage the performance of our applications using common measures, the more mature we will be as a profession.

TRANSCRIPT

Page 1: Soa Performance Patterns

<Insert Picture Here>

Performance Analysis

• Overview• Examples• Tooling

Page 2: Soa Performance Patterns

Performance Analysis Overview

• Goals are to understand detailed “whats”, and “whys”• For whom is performance poor, and when, and exactly how poor?• Where does the major response time contribution come from?• Where is the throughput bottleneck?• Why is this happening?• Only then can you know how best to optimize

• Usually initiated by a report of some trouble• Better initiated proactively as ongoing APM activity

• Benefits from the application of the scientific method• First, lay down some monitoring• Then, begin generating and evaluating hypotheses…

• Requires another level of tooling

Page 3: Soa Performance Patterns

General Performance Analysis ApproachDescribe > Hypothesize > Evaluate > Recommend > Confirm

1. Get a precise description of the situation• Exactly what is observed, and when it is/not observed

• Example: “The server crashes.” The host reboots? A process terminates unexpectedly? A process becomes unresponsive? At startup? Under certain load? After a certain duration of operation?

2. Formulate hypotheses about cause This uses your experience, engineering judgment, intuition

3. Evaluate hypotheses by collecting and examining evidence• Configuration files, log files, metrics, etc.

• You may have to instrument & reproduce

• You’re trying to confirm or reject hypotheses

4. With confirmed hypothesis, recommend resolution This uses product knowledge, architectural knowledge

5. After resolution implementation, confirm it worked via evidence

Beware of skipping steps: Describe > Hypothesize > Evaluate > Recommend > Confirm

Page 4: Soa Performance Patterns

Typical Path for Heap ProblemsSeen in Traditional J2EE Apps and Integration Apps

Java VM unresponsive

Hypothesis: GC-bound

Monitor heap, GC activity

Recommend tuning, redesign

Note this is not the only possible cause!

SymptomPossible Cause

Likely Fix

Unresponsive Deadlock Redesign

Unresponsive Blocked on IFs Redesign

Unresponsive CPU-bound Upgrade?

Unresponsive GC-bound Redesign

OutOfMemErrorToo small PermGen

Param chg

OutOfMemErrorHeap over-consumption

Redesign

Problem Manifestation

Heap over-consumption

OutOfMemoryError

Heap over-consumption

Frequent major GC

Infinite recursion StackOverflowError

Infinite recursion Unexpected termination

Page 5: Soa Performance Patterns

Analyzing Heap IssuesLook from VM Perspective, not OS Perspective

VM GC-Bound?

Load-Caused?

Y

NextHypothesis

N

DiagnoseMem Leak

N

ChangeConfig/

Resources/Design

Y

Y

13900.575: [Full GC 1854335K->1687971K(1975744K), 10.5195922 secs]

13911.456: [Full GC 1854335K->1696808K(1975744K), 8.1486279 secs]

13920.094: [Full GC 1854335K->1690473K(1975744K), 8.2035046 secs]

13928.622: [Full GC 1854335K->1697287K(1975744K), 8.1587776 secs]

13937.049: [Full GC 1854335K->1688773K(1975744K), 10.4990567 secs]

13948.293: [Full GC 1854335K->1699080K(1975744K), 8.2193787 secs]

13956.974: [Full GC 1854335K->1692996K(1975744K), 8.2299517 secs]

13965.561: [Full GC 1854335K->1700780K(1975744K), 8.2348413 secs]

13973.968: [Full GC

Time

HeapUtil.

Time

HeapUtil.

Page 6: Soa Performance Patterns

Analyzing Heap IssuesFirst Level of Tooling: jstat -t -gc <pid> 60s

Timestamp S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT

71126.4 1280.0 1408.0 584.1 0.0 171648.0 113746.6 932096.0 422943.4 109568.0 109239.4 2538 54.318 1165 1165.298 1219.616

71186.4 1408.0 1600.0 1336.1 0.0 171520.0 165054.3 932096.0 423536.4 109568.0 109245.6 2560 54.988 1166 1167.631 1222.619

71246.4 1728.0 1856.0 544.0 0.0 170816.0 117204.8 932096.0 423600.7 109568.0 109261.4 2584 55.720 1167 1169.948 1225.668

71306.4 1600.0 1600.0 0.0 1048.6 171136.0 145917.6 932096.0 418206.3 109568.0 109160.3 2599 56.182 1168 1172.607 1228.789

71366.5 1600.0 1536.0 0.0 1000.1 171456.0 63392.6 932096.0 422258.1 109568.0 109175.3 2617 56.744 1169 1174.944 1231.688

71426.4 1600.0 1536.0 0.0 1529.7 171584.0 45124.1 932096.0 422541.6 109568.0 109354.1 2619 56.811 1170 1177.296 1234.108

71486.5 1600.0 2112.0 0.0 0.0 170496.0 32226.8 932096.0 378758.2 109824.0 109359.3 2620 56.846 1171 1179.798 1236.643

SOA VM Old Space Utilization vs. Time

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Hours into JVM Run

Bytes

Page 7: Soa Performance Patterns

Diagnosing Memory LeaksSecond Level of Tooling: What’s on the Heap and Why?

• Want heap snapshots / instance histograms after major GC• -XX:+PrintClassHistogram

• On SIGQUIT (kill -3), forces GC, then prints histogram to stdout

• SIGQUIT VM periodically, compare histograms to ID instance growth

• Pros:• Lightweight, low-impact way to gain initial insight into leak

• Cons:• Broken with default collector in JDK 5.0• Won’t tell how instances are reachable

• Enterprise Manager Java Application Diagnostic Expert (JADE)• Acquired Auptyma, a JVMTI-based Java application profiler

• The ultimate recommended solution

• Simple agent install; analysis uses larger install and heap snapshots

Page 8: Soa Performance Patterns

-XX:+PrintClassHistogram OutputAnd Comparison Between Successive Outputs

num #instances #bytes class name-------------------------------------- 1: 26753 206538760 [B 2: 2264478 181158240 com.collaxa.cube.xml.dom.CubeDOMElement 3: 2551791 115425128 [Ljava.lang.Object; 4: 1030051 81715400 [C 5: 1835933 44062392 com.collaxa.cube.xml.dom.CubeDOMAttribute 6: 1741524 41796576 java.util.ArrayList 7: 1736747 41681928 java.lang.String 8: 815454 39141792 org.apache.xerces.dom.ElementImpl 9: 727890 29115600 org.apache.xerces.dom.AttrNSImpl 10: 130525 20910656 <methodKlass> 11: 51697 19844016 [Ljava.util.HashMap$Entry; 12: 784107 18818568 java.util.Vector 13: 783155 18795720 org.apache.xerces.dom.AttributeMap 14: 35912 12851944 [I 15: 258748 8279936 org.collaxa.thirdparty.dom4j.QName 16: 159477 7333608 <symbolKlass> 17: 11481 6857008 <constantPoolKlass> 18: 87374 5808152 [J 19: 178734 5719488 oracle.xml.parser.v2.XMLAttr 20: 176691 5654112 oracle.xml.parser.v2.XMLElement 21: 11481 5157504 <instanceKlassKlass> 22: 10037 4062384 <constantPoolCacheKlass> 23: 161627 3879048 java.util.HashMap$Entry 24: 231331 3701296 com.collaxa.cube.xml.dom.CubeDOMText 25: 50974 3262336 com.collaxa.cube.xml.dom.persistence.DomMoniker 26: 18916 2796096 [Ljava.lang.String; 27: 76813 2458016 org.apache.xerces.dom.TextImpl 28: 16101 1932120 oracle.xml.parser.schema.XSDElement 29: 48088 1923520 java.util.HashMap 30: 12178 1461360 oracle.dms.instrument.State 31: 12768 1456000 [Ljava.util.Hashtable$Entry; 32: 35861 1434440 oracle.dms.spy.Metric 33: 55646 1335504 java.util.Hashtable$Entry 34: 40418 1293376 oracle.xml.parser.v2.XMLText

Class Namedelta

Instancesdelta

Bytes

[B: 718 303625912

java.util.ArrayList: 637419 15298056

com.collaxa.cube.xml.dom.CubeDOMElement: 153334 12266720

com.collaxa.cube.xml.dom.CubeDOMAttribute: 161453 3874872

com.collaxa.cube.xml.dom.CubeDOMText: 7755 124080

[C: -18261 84776

<methodKlass>: 196 56904

[Ljava.util.HashMap$Entry;: -1 54704

[I: 26 51992

org.apache.xerces.impl.xs.XSElementDecl: 812 45472

<symbolKlass>: 587 43704

oracle.xml.parser.v2.XMLElement: 1237 39584

org.apache.xerces.impl.xs.XSParticleDecl: 848 27136

oracle.jsp.parse.LineInfoMapObj: 674 21568

<constantPoolKlass>: 23 20640

[Lorg.apache.xerces.impl.xs.identity.IdentityConstraint;: 812 19488

org.apache.xerces.util.SymbolTable$Entry: 770 18480

[Lorg.apache.xerces.util.SymbolHash$Entry;: 42 17472

java.lang.ref.Finalizer: 412 13184

oracle.xml.parser.v2.XSLExprItem: 178 12816

[Loracle.toplink.internal.helper.IdentityHashtableEntry;: 126 12096

Page 9: Soa Performance Patterns

JADE Heap Analysis

Page 10: Soa Performance Patterns

JADE Heap Comparison

Page 11: Soa Performance Patterns

Analyzing Response Time Breakdown

• Key to optimization: where is response time coming from?• 10.1.3 out-of-the-box capabilities

• BPEL Console – instance audit trails, statistics page• ESB Control – instances view processing time statistics• OWSM Monitor• EM transaction tracing

• Aggregating measurements• A single interaction’s invocation tree• Creative capabilities

Page 12: Soa Performance Patterns

BPEL Console Statistics Page

Page 13: Soa Performance Patterns

ESB Control Instances View Statistics

Page 14: Soa Performance Patterns

OWSM MonitorSee Chapter 6 of OWSM Administrator’s Guide

Page 15: Soa Performance Patterns

EM Transaction Tracing

Page 16: Soa Performance Patterns

Analyzing Response Time Breakdown

• Key to optimization: where is response time coming from?• 10.1.3 out-of-the-box capabilities

• BPEL Console – instance audit trails, statistics page• ESB Control – instances view processing time statistics• OWSM Monitor• EM transaction tracing

• Aggregating measurements• A single interaction’s invocation tree• Creative capabilities

Page 17: Soa Performance Patterns

Example Response Time AnalysisWhere is the Time Coming From?

select domain, process, count(*), round(avg(time_taken),2) “Avg Time“ from bpel_access group by domain, process order by 3 desc

Domain Process #Txns Avg TimeDomain_T Process_A 22161 23.53Domain_R Process_A 6865 7.37Domain_V Process_A 760 5.74Domain_N Process_A 439 4.43Domain_B Process_A 227 7.78Domain_T Process_R 51 2.73Domain_V Process_R 36 9.69Domain_S Process_A 35 6.51Domain_T Process_C 13 14.77Domain_B Process_D 12 4.17Domain_N Process_R 9 0.67Domain_B Process_R 5 7.4

Page 18: Soa Performance Patterns

Example Response Time AnalysisFrom Monitoring to Analysis

• We now knew what was slow, and how slow• Began evaluating hypotheses on why it was slow• Applied BPEL performance tuning playbook

• Thread counts• Durable -> transient• Minimized logging• Deferred dehydration• Removed BAM sensors

• Used monitoring to observe effects• None of that solved the problem• Suspected endpoint as major contributor

Page 19: Soa Performance Patterns

Example Response Time AnalysisCorrelated with Load – Endpoint Overloaded?

Domain_T Process_A Response Time vs. Load 03-Apr-2007

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Transactions per Hour

Res

po

nse

Tim

e (s

eco

nd

s)

Page 20: Soa Performance Patterns

Creative CapabilitiesEmbed Timing Instrumentation in BPEL Process

<bpelx:exec name="assign_invoke_start_time" language="Java" version="1.4"><![CDATA[setVariableData("invoke_start_time", new Long(System.currentTimeMillis()));]]/>

<invoke name="InvokeEndpoint" partnerLink=“EndpointLink" portType="ns3:Endpoint" operation=“A" inputVariable=“EndpointRequest" outputVariable=“EndpointResponse"/>

<bpelx:exec name="output_invoke_response_time" language="Java" version="1.4"><![CDATA[long endTime = System.currentTimeMillis();

long startTime = ((Long) getVariableData("invoke_start_time")).longValue();long responseTime = endTime - startTime;StringBuffer stringBuffer = new StringBuffer();stringBuffer.append("TIMING,");stringBuffer.append(getVariableData("domain_id"));stringBuffer.append(",");stringBuffer.append(getVariableData("process_id"));stringBuffer.append(",");stringBuffer.append(getVariableData("instance_id"));stringBuffer.append(",InvokeEndpoint,");stringBuffer.append(responseTime);System.out.println(stringBuffer.toString());]]/>

Page 21: Soa Performance Patterns

Creative CapabilitiesEmbed Timing Instrumentation in BPEL Process

• Yields the following output in BPEL JVM output stream:

07/04/12 15:11:55 TIMING,Domain_T,Process_A,7192844,InvokeEndpoint,171933

• Added another table to the monitoring schema, and Java loader

create table ENDPOINT_INVOCATION (NODE_ID VARCHAR2(16) NOT NULL,TIMESTAMP TIMESTAMP WITH TIME ZONE NOT NULL,DOMAIN VARCHAR2(64) NOT NULL,PROCESS VARCHAR2(64) NOT NULL,INSTANCE_ID NUMBER(38) NOT NULL,ACTIVITY VARCHAR2(16) NOT NULL,TIME_TAKEN NUMBER(38) NOT NULL

);

Page 22: Soa Performance Patterns

Example Response Time AnalysisEndpoint Response Time Distribution

Endpoint Response Time Distribution 12-Apr for Domain_T Process_A

0

2000

4000

6000

8000

10000

12000

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102

108

114

120

128

134

140

146

152

158

164

170

Rounded Response Time in Seconds

Nu

mb

er o

f T

ran

sact

ion

s

Page 23: Soa Performance Patterns

Example Response Time AnalysisRoot Cause Identification

• Endpoint vendor’s measurements differed from ours• Led us to focus on “the network”• tcpdumps revealed endpoint requests sent one-at-a-time• Close inspection showed

• Synchronization in third-party library used by HTTP adapter• Classic queuing behavior ensued under sufficient load• We could work around with adapter config parameters – lucky!

• Lessons learned• Could have identified cause quickly with JADE thread analysis• Embedded instrumentation is crude, query dehydration store instead• Too many threads exhausts native stack space, gives OOME

Page 24: Soa Performance Patterns

<Insert Picture Here>

Performance Optimization

• Overview• Examples• Principles, Patterns, Playbooks

Page 25: Soa Performance Patterns

Performance Optimization

• There is no silver bullet one-size-fits-all prescription• Every situation is highly context-dependent• But there are some principles and patterns• And product tuning playbooks• With occasional exceptions, application architecture is

a MUCH more significant determinant of performance than product configuration/tuning

Page 26: Soa Performance Patterns

BPEL Application Architecture

BPEL Host

BPEL

Server

Must be product defect!NAÏVE!

Page 27: Soa Performance Patterns

BPEL Application ArchitectureClustering & Load Balancing for Scalability

BPEL Host

BPEL

Server

BPEL Host

BPEL

Server

BPEL Host

BPEL

Server

• Clustering has implications for design & administration

• See http://www.oracle.com/technology/products/ias/hi_av/BPEL_HA_Paper.pdf

Page 28: Soa Performance Patterns

BPEL Application ArchitecturePattern: Tier per Pipeline Stage

BPEL Host

BPEL

Server

BPEL Host

BPEL

Server

BPEL Host

BPEL

Server

BPEL Host

BPEL

Server

BPEL Host

BPEL

Server

Page 29: Soa Performance Patterns

Principles – Response Time

• Minimize inter-process communication• Co-location not distribution• Granularity not chattiness

• Avoid marshalling• Co-locate components• Configure optimizations

• Leverage caching• Incorporate Coherence

• Move processing to data• Logic in PL/SQL• XTP with Coherence

Page 30: Soa Performance Patterns

Principles - Throughput

• Network of queues model, queuing theory• Queues build upstream of slow service / great constraint• Theory of constraints – remove constraints

• Open throttles as much as is beneficial and reasonable• Eventually a resource will be exhausted

• Heap space (manifesting as GC frequency)• Connections to resources• CPU cycles• Network bandwidth

Page 31: Soa Performance Patterns

Principles – Heap Utilization

• Reduce thread count – fewer threads to concurrently instantiate objects on heap

• Reduce caching• Reduce the size of the object graphs the application

works with at a time, if possible• Stream the input (batching)• In ORM-based services: more querying, less caching

Page 32: Soa Performance Patterns

Patterns

• SOA patterns is a nascent field – choose wisely• Start with Gregor Hohpe’s survey

• http://eaipatterns.com/ramblings/52_soapatterns.html

• Grid patterns is even more nascent• Choose even more wisely

• ORM performance optimization patterns• Replace Caching Variable with Query (for heap, response time relief)• Repository plus Query Object (for ripple load mitigation)

• Performance engineering patterns• Invocation Tree• Production Replay

Page 33: Soa Performance Patterns

Product Tuning Playbooks

• BPEL• OOW 2006 BPEL performance presentation (S282687)• BPEL performance webinar on OTN

• ESB• ESB performance document on OTN

Page 34: Soa Performance Patterns

<Insert Picture Here>

Summary, Q&A

Page 35: Soa Performance Patterns

Summary

• APM is a professional responsibility of app ops/dev teams• Adopt tooling for monitoring, publish reports regularly

• Analyses often need highly purpose-specific, custom charts

• Use scientific method for analysis• Optimization is an art form

• Architectural principles and patterns• Product tuning playbooks

• Engineering means specifying, testing before production!• Best practices for performance requirements specification, and load

testing, are whole other topics unto themselves!

Page 36: Soa Performance Patterns

SummaryProspects for Standarization

Activity Standardizable? Remarks

Montoring YesCollect, store, and chart standard measures

Analysis Maybe

Use the scientific method. It’s often a matter of understanding the response time breakdown or the throughput bottleneck.

Optimization Probably notPatterns and playbooks apply here, but there is context-sensitivity & art

Management YesCan have a standard APM program of monitoring, analysis, optimization

Engineering MaybeAt a high level, it’s a development lifecycle activity, with best practices

Page 37: Soa Performance Patterns

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

Page 38: Soa Performance Patterns

For More Information

search.oracle.com

+"application performance management" +"service-oriented architecture"

Page 39: Soa Performance Patterns
Page 40: Soa Performance Patterns