escaping automated test hell - one year later

99
Main sponsor Escaping Automated Test Hell Wojciech Seliga One year later...

Upload: wseliga

Post on 06-May-2015

957 views

Category:

Technology


4 download

DESCRIPTION

Slides from my talk at 33rd Degree 2013 Conference in Warsaw. More than year ago we faced the fact that we are hitting the wall with our large scale automated testing in Atlassian JIRA. We analysed the problems and possible solutions and shared them with community at 33rd Degree in 2012. Since then we've implemented a lot of our ideas and come up with new, learnt new quite unexpected things and got rid of Selenium 1 completely. This session shows the learnings from our journey – escaping from Test Hell – back to the normality. If you are interested to hear what problems you can (and probably will) face if you have thousands of automated tests on on levels of abstractions (functional, integration, unit, UI, performance) and what solutions can be applied to remedy them – this presentation is for you.

TRANSCRIPT

Page 1: Escaping Automated Test Hell - One Year Later

Main sponsor

Escaping Automated Test Hell

Wojciech Seliga

One year later...

Page 2: Escaping Automated Test Hell - One Year Later

About me

• Coding for 30 years

• Agile Practices (inc. TDD) since 2003

• Dev Nerd, Tech Leader, Agile Coach, Speaker

• 5+ years with Atlassian (JIRA Development Team Lead)

• Spartez Co-founder

Page 3: Escaping Automated Test Hell - One Year Later

Year ago - recap

Page 4: Escaping Automated Test Hell - One Year Later

18 000 tests on all levels

Very slow and fragile feedback loop

Page 5: Escaping Automated Test Hell - One Year Later

Serious performance and reliability issues

Page 6: Escaping Automated Test Hell - One Year Later

FeedbackSpeed

`Test

Quality

Page 7: Escaping Automated Test Hell - One Year Later

Test Code is Not Trash

Design

MaintainRefactor

Share

Review

Prune

Respect

Discuss

Restructure

Page 8: Escaping Automated Test Hell - One Year Later

Optimum Balance

Page 9: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation

Page 10: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed

Page 11: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage

Page 12: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage Level

Page 13: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage Level Access

Page 14: Escaping Automated Test Hell - One Year Later

Optimum Balance

Isolation Speed Coverage Level Access Effort

Page 15: Escaping Automated Test Hell - One Year Later

Dangerous to temper with

Page 16: Escaping Automated Test Hell - One Year Later

Dangerous to temper with

Quality / Determinism

Page 17: Escaping Automated Test Hell - One Year Later

Dangerous to temper with

MaintainabilityQuality / Determinism

Page 18: Escaping Automated Test Hell - One Year Later

Splitting codebase is key aspect of short test feedback loop

Page 19: Escaping Automated Test Hell - One Year Later

Now

Page 20: Escaping Automated Test Hell - One Year Later

People - Motivation

Page 21: Escaping Automated Test Hell - One Year Later

Shades of Red

Page 22: Escaping Automated Test Hell - One Year Later

Pragmatic CI Health

Page 23: Escaping Automated Test Hell - One Year Later

Build Tiers and Policy

Tier A1 - green soon after all commits

Tier A2 - green at the end of the day

Tier A3 - green at the end of the iteration

unit tests and functional* tests

WebDriver and bundled plugins tests

supported platforms tests, compatibility tests

Page 24: Escaping Automated Test Hell - One Year Later

Wallboards: Constant

Awareness

Page 25: Escaping Automated Test Hell - One Year Later

Training

• assertThat over assertTrue/False and assertEquals

• avoiding races - Atlassian Selenium with its TimedElement

• Unit tests over functional tests

• Brownbags, blogs, code reviews

Page 26: Escaping Automated Test Hell - One Year Later

Quality

Page 27: Escaping Automated Test Hell - One Year Later

Automatic Flakiness Detection Quarantine

Re-run failed tests and see if they pass

Page 28: Escaping Automated Test Hell - One Year Later

Quarantine - Healing

Page 29: Escaping Automated Test Hell - One Year Later

SlowMo - expose races

Page 30: Escaping Automated Test Hell - One Year Later

Selenium 1

Page 31: Escaping Automated Test Hell - One Year Later

Selenium 1

Page 32: Escaping Automated Test Hell - One Year Later

Selenium ditching Sky did not fall in

Page 33: Escaping Automated Test Hell - One Year Later

Ditching - benefits

• Freed build agents - better system throughput

• Boosted morale

• Gazillion of developer hours saved

• Money saved on infrastructure

Page 34: Escaping Automated Test Hell - One Year Later

Ditching - due diligence

• conducting the audit - analysis of the coverage we lost

• determining which tests needs to rewritten (e.g. security related)

• rewriting the tests

Page 35: Escaping Automated Test Hell - One Year Later

Flaky Browser-based TestsRaces between test code and asynchronous page logic

Playing with "loading" CSS class does not really help

Page 36: Escaping Automated Test Hell - One Year Later

Races Removal with Tracing// in the browser:function mySearchClickHandler() {    doSomeXhr().always(function() {        // This executes when the XHR has completed (either success or failure)        JIRA.trace("search.completed");    });}// In production code JIRA.trace is a no-op

// in my page object:@InjectTraceContext traceContext; public SearchResults doASearch() {    Tracer snapshot = traceContext.checkpoint();    getSearchButton().click(); // causes mySearchClickHandler to be invoked    // This waits until the "search.completed" // event has been emitted, *after* previous snapshot        traceContext.waitFor(snapshot, "search.completed");     return pageBinder.bind(SearchResults.class);}

Page 37: Escaping Automated Test Hell - One Year Later

Speed

Page 38: Escaping Automated Test Hell - One Year Later

Can we halve our build times?

Speed

Page 39: Escaping Automated Test Hell - One Year Later

Parallel Execution - Theory

End of Build

A1

Batches

Start of Build

Page 40: Escaping Automated Test Hell - One Year Later

Parallel Execution

End of Build

A1

Batches

Start of Build

Page 41: Escaping Automated Test Hell - One Year Later

Parallel Execution - Reality Bites

End of Build

A1

Batches

Start of Build

Agent availability

Page 42: Escaping Automated Test Hell - One Year Later

Dynamic Test Execution Dispatch - Hallelujah

Page 43: Escaping Automated Test Hell - One Year Later

Dynamic Test Execution Dispatch - Hallelujah

Page 44: Escaping Automated Test Hell - One Year Later

"You can't manage what you can't measure."

W. Edwards Deming

Page 45: Escaping Automated Test Hell - One Year Later

"You can't manage what you can't measure."

W. Edwards Deming

If you believe just in it

you are doomed.

Page 46: Escaping Automated Test Hell - One Year Later

You can't improve something if you can't measure it

Page 47: Escaping Automated Test Hell - One Year Later

You can't improve something if you can't measure it

Profiler, Build statistics, Logs, statsd → Graphite

Page 48: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Page 49: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

Page 50: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

Page 51: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

SCM Update

Page 52: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

SCM Update

Agent Availability/Setup

Page 53: Escaping Automated Test Hell - One Year Later

Anatomy of Build*

CompilationPackaging

Executing Tests

Fetching Dependencies

*Any resemblance to maven build is entirely accidental

SCM Update

Agent Availability/Setup

Publishing Results

Page 54: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Page 55: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Page 56: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)

Page 57: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)

Publishing Results (1min)

Page 58: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)Fetching Dependencies (1.5min)

Publishing Results (1min)

Page 59: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)Fetching Dependencies (1.5min)

SCM Update (2min)

Publishing Results (1min)

Page 60: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build

Compilation (7min)

Packaging (0min)

Executing Tests (7min)Fetching Dependencies (1.5min)

SCM Update (2min)

Agent Availability/Setup (mean 10min)

Publishing Results (1min)

Page 61: Escaping Automated Test Hell - One Year Later

Decreasing Test Execution Time to

ZERRO alone would not let us

achieve our goal!

Page 62: Escaping Automated Test Hell - One Year Later

Agent Availability/Setup

• starved builds due to busy agents building very long builds

• time synchronization issue - NTPD problem

Page 63: Escaping Automated Test Hell - One Year Later

• Proximity of SCM repo

• shallow git clones are not so fast and lightweight + generating extra git server CPU load

• git clone per agent/plan + git pull + git clone per build (hard links!)

• Stash was thankful (queue)

SCM Update - Checkout time

Page 64: Escaping Automated Test Hell - One Year Later

• Proximity of SCM repo

• shallow git clones are not so fast and lightweight + generating extra git server CPU load

• git clone per agent/plan + git pull + git clone per build (hard links!)

• Stash was thankful (queue)

SCM Update - Checkout time

2 min → 5 seconds

Page 65: Escaping Automated Test Hell - One Year Later
Page 66: Escaping Automated Test Hell - One Year Later

• Fix Predator

• Sandboxing/isolation agent trade-off:rm -rf $HOME/.m2/repository/com/atlassian/*

intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm

• Network hardware failure found (dropping packets)

Fetching Dependencies

Page 67: Escaping Automated Test Hell - One Year Later

• Fix Predator

• Sandboxing/isolation agent trade-off:rm -rf $HOME/.m2/repository/com/atlassian/*

intofind $HOME/.m2/repository/com/atlassian/ -name “*SNAPSHOT*” | xargs rm

• Network hardware failure found (dropping packets)

Fetching Dependencies

1.5 min → 10 seconds

Page 68: Escaping Automated Test Hell - One Year Later

Compilation

• Restructuring multi-pom maven project and dependencies

• Maven 3 parallel compilation FTW -T 1.5C*optimal factor thanks to scientific trial and error research

Page 69: Escaping Automated Test Hell - One Year Later

Compilation

• Restructuring multi-pom maven project and dependencies

• Maven 3 parallel compilation FTW -T 1.5C*optimal factor thanks to scientific trial and error research

7 min → 1 min

Page 70: Escaping Automated Test Hell - One Year Later

Unit Test Execution

• Splitting unit tests into 2 buckets: good and legacy (much longer)

• Maven 3 parallel test execution (-T 1.5C)

3000 poor tests(5min)

11000 good tests(1.5min)

Page 71: Escaping Automated Test Hell - One Year Later

Unit Test Execution

• Splitting unit tests into 2 buckets: good and legacy (much longer)

• Maven 3 parallel test execution (-T 1.5C)

7 min → 5 min

3000 poor tests(5min)

11000 good tests(1.5min)

Page 72: Escaping Automated Test Hell - One Year Later

Functional Tests

• Selenium 1 removal did help

• Faster reset/restore (avoid unnecessary stuff, intercepting SQL operations for debug purposes - building stacktraces is costly)

• Restoring via Backdoor REST API

• Using REST API for common setup/teardown operations

Page 73: Escaping Automated Test Hell - One Year Later

Functional Tests

Page 74: Escaping Automated Test Hell - One Year Later

Publishing Results

• Server log allocation per test → using now Backdoor REST API (was Selenium)

• Bamboo DB performance degradation for rich build history - to be addressed

Page 75: Escaping Automated Test Hell - One Year Later

Publishing Results

• Server log allocation per test → using now Backdoor REST API (was Selenium)

• Bamboo DB performance degradation for rich build history - to be addressed

1 min → 40 s

Page 76: Escaping Automated Test Hell - One Year Later

Unexpected Problem

• Stability Issues with our CI server

• The bottleneck changed from I/O to CPU

• Too many agents per physical machine

Page 77: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Page 78: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Page 79: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Page 80: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Publishing Results (40sec)

Page 81: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Fetching Dependencies (10sec)

Publishing Results (40sec)

Page 82: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Fetching Dependencies (10sec)

SCM Update (5sec)

Publishing Results (40sec)

Page 83: Escaping Automated Test Hell - One Year Later

JIRA Unit Tests Build Improved

Compilation (1min)

Packaging (0min)

Executing Tests (5min)

Fetching Dependencies (10sec)

SCM Update (5sec)

Agent Availability/Setup (3min)*

Publishing Results (40sec)

Page 84: Escaping Automated Test Hell - One Year Later

Improvements Summary

Tests Before After Improvement %

Unit tests 29 min 17 min 41%

Functional tests 56 min 34 min 39%

WebDriver tests 39 min 21 min 46%

Overall 124 min 72 min 42%

* Additional ca. 5% improvement expected once new git clone strategy is consistently rolled-out everywhere

Page 85: Escaping Automated Test Hell - One Year Later

The Quality Follows

Page 86: Escaping Automated Test Hell - One Year Later

The Quality Follows

Page 87: Escaping Automated Test Hell - One Year Later

The Quality Follows

Page 88: Escaping Automated Test Hell - One Year Later

But that's still bad

We want CI feedback loop in a few minutes maximum

Page 89: Escaping Automated Test Hell - One Year Later

Splitting The Codebase

Page 90: Escaping Automated Test Hell - One Year Later

Resistance against splittingThe last attempt: Magic Machine

Decide with high confidence (e.g. > 95%) which subset of tests to run basing on the committed changes

Page 91: Escaping Automated Test Hell - One Year Later

Magic Machine

• Looking at Bamboo history (analysing correlation between changes and failures)

• Matching: package test/prod code and transitive imports

• Code instrumentation (Clover, Emma, AspectJ)

• Run most often failing first

Page 92: Escaping Automated Test Hell - One Year Later

Inevitable Split - Fears

• Organizational concerns - understanding, managing, integrating, releasing

• Mindset change - if something worked for 10 years why to change it?

• We damned ourselves with big buckets for all tests - where do they belong to?

Page 93: Escaping Automated Test Hell - One Year Later

Magic Machine strikes back

With heavy use of brain, common sense and expert judgement

Page 94: Escaping Automated Test Hell - One Year Later

Splitting code base• Step 0 - JIRA Importers Plugin (3 years ago)

• Step 1- New Issue View and NavigatorJIRA 6.0

Page 95: Escaping Automated Test Hell - One Year Later

We are still escaping hell. Hell sucks in your soul.

Page 96: Escaping Automated Test Hell - One Year Later

Conclusions

• Visibility and problem awareness help

• Maintaing huge testbed is difficult and costly

• Measure the problem

• No prejudice - no sacred cows

• Automated tests are not one-off investment, it's a continuous journey

• Performance is a damn important feature

Page 97: Escaping Automated Test Hell - One Year Later

Do you want to help?We are hiring in Gdańsk• Principal Java Developer

• Development Team Lead

• Java and Scala Developers

• UX Designer

• Front-End Developer

• QA Engineer

Visit us at the booth or apply at http://www.atlassian.com/company/careers

Page 98: Escaping Automated Test Hell - One Year Later

• Turtle - by Jonathan Zander, CC-BY-SA-3.0

• Loading - by MatthewJ13, CC-SA-3.0

• Magic Potion - by Koolmann1, CC-BY-SA-2.0

• Merlin Tool - by By L. Mahin, CC-BY-SA-3.0

• Choose Pills - by *rockysprings, CC-BY-SA-3.0

Images - Credits

Page 99: Escaping Automated Test Hell - One Year Later

Thank You!