stp 2014 - lets learn from the top performance mistakes in 2013

Post on 10-May-2015

731 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation given at STPCon 2014. It highlights the top performance problems seen in 2013 and how we can identify these problems in dev & test instead of waiting until the app crashes in production

TRANSCRIPT

LET’S LEARN FROM THE TOP PERF MISTAKES

@grabnerandihttp://apmblog.compuware.com

What to do with the fastest car …

… if it fails to reach the finish line

What to do with millions of $$ for

building a web site …

Performance, Scalability & Architecture

#1: Architectural Decisions

#1: “We want more Web 2.0”

#1: Load Test Prior to Change

#1: Load Test After Change

Metrics: # Visitors# Requests / User

Business: Do we need all these bells and

whistles?

#2: Disconnected Teams

#2: “Teamwork” between Dev and Ops

SEV1 Problem in Production

Need access to log files

Where are they? Can’t get them

Need to increase log level

Can’t do! Can’t change config files in prod!

#2: Solution: Implement a Custom “On Demand” Remote Logger

#2: Implementation and Rollout

Implemented Custom Logger

Worked well in Load Testing

#2: What happened?

~ 1Mio Lock Exceptions in 30 mins

#2: Root Cause: A special WebSphere Setting!

Log Service provides a synchronized log file across ALL JVMs

Log Service provides a synchronized log file across

ALL JVMs

Metrics: # Log Messages, # Exceptions

Share: Same Server Settings

#3: Implementation

Flaws

#3: Solution: Cache to the RESCUE!!

#3: Implementation and Rollout

Implemented InMemory Cache

Worked well in Load Testing

#3: Result: Out of Memory Crashes!!

Still crashes

Problem fixed!Fixed Version Deployed

Metrics: Heap Size, # Objects Allocated,# Objects in Cache

Cache Hit Ratio

Test: With realistic Data

#4: Push without a Plan

#4: Mobile Landing Page of Super Bowl Ad

434 Resources in total on that page:230 JPEGs, 75 PNGs, 50 GIFs, …

Total size of ~ 20MB

#4: m.store.com redirects to www.store.com

ALL CSS and JS files are redirected to the www domain

This is a lot of time “wasted” especially on high latency mobile

connections

#4: Critical Pages not Optimized!

Browse, Search and Product Info

performs well

… because they don’t follow best practices: 87 Requests, 28

Redirects, …

Critical Pages such as Shopping Cart are very

slow …

Metrics: Load Time, # Resources (Images, …),

# HTTP 3xx, 4xx, 5xx

Dev: Build for Mobile

Test: Test on Mobile

#5: “Blindly” (Re)use Existing

Components

#5: Requirement: We need a report

#5: Using Hibernate results in 4k+ SQL Statements to display 3 items!

Hibernate Executes 4k+ Statements

Individual Execution VERY

FAST

But Total SUM takes 6s

#5: Requirement: We need a fancy UI

#5: Using Telerik Controls Results in 9s for Data-Binding of UI Controls

#1: Slow Stored ProcedureDepending on Request

execution time of this SP varies between 1 and 7.5s

#2: 240! Similar SQL StatementsMost of these 240! Statements are

not prepared and just differ in things like Column Names

Metrics: # Total SQLs# SQLs / Web Request# Same SQLs / Request

Transferred Rows

Test: With realistic Data

Dev: “Learn” Frameworks

12 000 000 $

#6: No “Agile” Deployment

Ad on air

Availability dropped to 0%

#6: Load Spike resulted in Unavailability

#6: Alternative: “GoDaddy goes DevOps”

Response time improved 4x

1h before SuperBowl KickOff

1h after Game ended

#6: Behind the Scenes

Metrics: AvailabilityPage Size, # Objects

# Hosts, # Connections

DevOps: “Feature” Switches

What have we learned today?

UNDERSTAND THE TECHNOLOGY

WE ARE WORKING WITH

# of Requests / User

# of Log Messages

# of Exceptions

# Objects Allocated

# Objects In Cache

Cache Hit Ratio

# of Images

# of SQLs

# SQLs per RequestAvailability

# HTTP 3xx, 4xx

Page Size

A final thought …

How about this idea?

12 0 120ms

3 1 68ms

Build 20 testPurchase OK

testSearch OK

Build 17 testPurchase OK

testSearch OK

Build 18 testPurchase FAILED

testSearch OK

Build 19 testPurchase OK

testSearch OK

Build # Test Case Status # SQL # Excep CPU

12 0 120ms

3 1 68ms

12 5 60ms

3 1 68ms

75 0 230ms

3 1 68ms

Test Framework Results Architectural Data

We identified a regresesion

Problem solved

Let’s look behind the scenes

Exceptions probably reason for failed tests

Problem fixed but now we have an architectural regression

Problem fixed but now we have an architectural regression

Now we have the functional and architectural confidence

How? Performance Focus in Test Automation

Cross Impact of KPIs

Analyzing All Unit / Performance Tests

Analyze Perf Metrics

Identify Regressions

More Info

• My Blog: http://apmblog.compuware.com

• Tweet about it: @grabnerandi

• dynaTrace Enterprise– Full End-to-End Visibility in your Java, .NET, PHP Apps

– Sign up for a 15 Days Free Trial on http://compuwareapm.com

• dynaTrace AJAX Edition– Browser Diagnostics for IE + FF

– Download @ http://ajax.dynatrace.com

THANK YOU@grabnerandi

top related