20 ways to shaft your split tesring : conversion conference

89
20 ways to Shaft your Split testing @OptimiseOrDie

Upload: craig-sullivan

Post on 20-Aug-2015

1.783 views

Category:

Internet


0 download

TRANSCRIPT

20 ways to Shaft your Split testing

@OptimiseOrDie

@OptimiseOrDie

• UX, Analytics, Testing and Innovation

• Started doing testing & CRO 2004

• Split tested over 40M visitors in 19 languages

• 60+ mistakes with AB testing

• I’ve made every one of them

• Like riding a bike…

• Get in touch for workshops, skill transfer, CRO methodology design, training and programme mentoring…

@OptimiseOrDie

Hands on!

@OptimiseOrDie

Timeline

Tested stupid ideas, lots

Most AB or MVT tests are bullshit

Discovered AB testing

Triage, Triangulation,

Prioritisation, Maths

Zen Plumbing

AB Test Hype Cycle

@OptimiseOrDie

@OptimiseOrDie

Oppan Gangnam Style!

@OptimiseOrDie

#1 : You’re doing it in the wrong place

@OptimiseOrDie

#1 : You’re doing it in the wrong place

There are 4 areas a CRO expert always looks at:

1. Inbound attrition (medium, source, landing page, keyword, intent and many more…)

2. Key conversion points (product, basket, registration)3. Processes, lifecycles and steps (forms, logins,

registration, checkout, onboarding, emails, push)4. Layers of engagement (search, category, product, add)

5. Use visitor flow reports for attrition – very useful.6. For key conversion points, look at loss rates &

interactions7. Processes and steps – look at funnels or make your own8. Layers and engagement – make a ring model @OptimiseOrD

ie

Examples – Concept

Bounce

Engage

Outcome

@OptimiseOrDie

Examples – 16-25Railcard.co.uk

Bounce

Login to Account

Content Engage

Start Application

Type and Details

Eligibility

Photo

Complete

@OptimiseOrDie

Examples – Guide Dogs

Bounce

Content Engage

Donation Pathway

Donation Page

Starts process

Funnel steps

Complete

@OptimiseOrDie

Within a layer

Page 1

Page 2

Page 3

Page 4 Page 5

Exit

Deeper Layer

Email

LikeContact

Wishlist

Micro Conversions

@OptimiseOrDie

#1 : Make a Money Model• Get to know the flow and loss (leaks) inbound, inside and

through key processes or conversion points.• Once you know the key steps you’re losing people at and

how much traffic you have – make a money model.• 20,000 see the basket page – what’s the basket page to

checkout page ratio?• Estimate how much you think you can shift the key metric

(e.g. basket adds, basket -> checkout)• What downstream revenue or profit would that generate?• Sort by the money column• Congratulations – you’ve now built the worlds first IT plan

for growth with a return on investment estimate attached!• I’ll talk more about prioritising later – but a good real

world analogy for you to use:@OptimiseOrD

ie

Think like a store owner!

If you can’t refurbish the entire store, which floors or departments will you invest in optimising?

Wherever there is:

• Footfall• Low return• Opportunity

@OptimiseOrDie

Insight - Inputs

#FAIL

Competitor copying

GuessingDice rolling

An article the CEO

read

Competitor change

Panic

Ego

OpinionCherished

notions Marketing whims Cosmic rays

Not ‘on brand’ enough

IT inflexibility

Internal company

needs

Some dumbass

consultant

Shiny feature

blindnessKnee jerk reactons

#2 : Your hypothesis is crap!

@OptimiseOrDie

Insight - Inputs

Insight

Segmentation

SurveysSales and

Call Centre

Session Replay

Social analytics

Customer contact

Eye tracking

Usability testing

Forms analytics Search

analytics Voice of Customer

Market research

A/B and MVT testing

Big & unstructured

data

Web analytics

Competitor evalsCustomer

services

#2 : These are the inputs you need…

@OptimiseOrDie

Insight - Inputs

@OptimiseOrDie

#2 : Brainstorming the test

• Check your inputs• Assemble the widest possible team• Share your data and research• Design Emotive Writing guidelines

Insight - Inputs

@OptimiseOrDie

#2 : Emotive Writing - exampleCustomers do not know what to do and need support and advice• Emphasize the fact that you understand that their situation is stressful • Emphasize your expertise and leadership in vehicle glazing and will help

them get the best solution for their situation• Explain what they will need to do online and during the call-back so that they

know what the next steps will be• Explain that they will be able ask any other questions they might have during the

call-back Customers do not feel confident in assessing the damage• Emphasize the fact that you will help them assess the damage correctly online Customers need to understand the benefits of booking online• Emphasize that the online booking system is quick, easy and provides all the

information they need in regards with their appointment and general cost information

 Customers mistrust insurers and find dealing with their insurance situation very frustrating• Where possible communicate the fact that the job is most likely to be free for

insured customers, or good value for money for cash customers• Show that you understand the hassle of dealing with insurance companies –

emphasise that you will help with their insurance paperwork for them, freeing them of this burden

 Some customers cannot be bothered to take action to fix their car glass • Emphasize the consequences of not doing anything,

e.g. ‘It’s going to cost you more if the chip develops into a crack’

Insight - Inputs

@OptimiseOrDie

#2 : THE DARK SIDE

“Keep your family safe and get back on the road fast with Autoglass.”

Insight - Inputs

@OptimiseOrDie

#2 : NOW YOU CAN BEGIN

• You should have inputs, research, data, guidelines• Sit down with the team and prompt with 12

questions:

– Who is this page (or process) for?– What problem does this solve for the user?– How do we know they need it?– What is the primary action we want people to take?– What might prompt the user to take this action?– How will we know if this is doing what we want it to do?– How do people get to this page?– How long are people here on this page?– What can we remove from this page?– How can we test this solution with people?– How are we solving the users needs in different and better ways

than other places on our site?– If this is a homepage, ask these too (bit.ly/1fX2RAa)

Insight - Inputs

@OptimiseOrDie

#2 : PROMPT YOURSELF

• Check your UX or Copywriting guidelines.

• Use Get Mental Notes• What levers can we apply now?• Create a hypothesis:

“WE BELIEVE THAT DOING [A] FOR PEOPLE [B] WILL MAKE OUTCOME [C] HAPPEN. WE'LL KNOW THIS WHEN WE SEE DATA [D] AND FEEDBACK [E]”

www.GetMentalNotes.com

Insight - Inputs

@OptimiseOrDie

#2 : THE FUN BIT!

• Collaborative Sketching

• Brainwriting• Refine and Test!

We believe that doing [A] for People [B] will make outcome [C] happen.

We’ll know this when we observe data [D] and obtain feedback [E]. (reverse)

@OptimiseOrDie

#2 : Solutions• You need multiple tool inputs

– Tool decks are here : www.slideshare.net/sullivac

• Collaborative, Customer connected team– If you’re not doing this, you’re hosed

• Session replay tools provide vital input– Get vital additional customer evidence

• Simple page Analytics don’t cut it– Invest in your analytics, especially event tracking

• Ego, Opinion, Cherished notions – fill gaps– Fill these vacuums with insights and data

• Champion the user– Give them a chair at every meeting

@OptimiseOrDie

Insight - Inputs

@OptimiseOrDie

#2 : HYPOTHESIS DESIGN SUMMARY

• Inputs – get the right stuff• Research, Guidelines, Data• Framing the problem(s)• Questions to get you going• Use card prompts for

Psychology• Create a hypothesis• Collaborative Sketching • Brainwriting• Refine and Check Hypothesis• Instrument and Test

#3 : No analytics integration

• Investigating problems with tests• Segmentation of results• Tests that fail, flip or move around• Tests that don’t make sense• Broken test setups• What drives the averages you see?

@OptimiseOrDie

29

A B B A

These Danish porn sites are so hardcore!

We’re still waiting for

our AB tests to finish!

• Use a test length calculator like this one:• visualwebsiteoptimizer.com/ab-split-test-duration/

#4 : The test will finish after you die

@OptimiseOrDie

#5 : You get false results

The 95% Stopping Problem

• Many people use 95, 99% ‘confidence’ to stop

• This value is unreliable• Read this Nature article : bit.ly/1dwk0if• You can hit 95% early in a test• If you stop, it could be a false positive• Tools need to be smarter about inference• This 95% thingy – it’s last on your list for

reasons to stop testing• Let me explain

@OptimiseOrDie

#5 : When to stop• Self stopping is a huge problem:

– “I stopped the test when it looked good”– “It hit 20% on Thursday, so I figured – time to cut and run”– “We need test time for something else. Looks good to us”– “We’ve got a big sample now so why not finish it today?”

• False Positives and Negatives– If you cut part of a business cycle, you bias the segments you have in

the test.– So if you ignore weekend shoppers by stopping your test on Friday, that

will affect results– The other problems is FALSE POSITIVES and FALSE NEGATIVES

@OptimiseOrDie

#5 : When to stop

@OptimiseOrDie

Scenario 1 Scenario 2 Scenario 3 Scenario 4

After 200 observations Insignificant Insignificant Significant! Significant!

After 500 observations Insignificant Significant! Insignificant Significant!

End of experiment Insignificant Significant! Insignificant Significant!

Scenario 1 Scenario 2 Scenario 3 Scenario 4

After 200 observations Insignificant Insignificant Significant! Significant!

After 500 observations Insignificant Significant! trial stopped trial stopped

End of experiment Insignificant Significant! Significant! Significant!

@OptimiseOrDie

The 95% Stopping Problem

The 95% Stopping Problem

@OptimiseOrDie

The 95% Stopping Problem

@OptimiseOrDie

abtestguide.com/calc/

62.5cm+/- 1cm

@OptimiseOrDie

9.1% ± 0.5

9.3% ± 0.5

9.1% ± 0.2

9.3% ± 0.2

9.1% ± 0.1

9.3% ± 0.1

Graph is a range, not a line:

9.1 ± 0.3%9.1 ± 0.9%9.1 ± 1.9%

“You should know that stopping a test once it’s significant is deadly sin number 1 in A/B testing land. 77% of A/A tests (testing the same thing as A and B) will reach significance at a certain point.”Ton Wesseling, Online Dialogue

“I always tell people that you need a representative sample if your data needs to be valid. What does ‘representative’ mean? First of all you need to include all the weekdays and weekends. You need different weather, because it impacts buyer behaviour. But most important: Your traffic needs to have all traffic sources, especially newsletter, special campaigns, TV,… everything! The longer the test runs, the more insights you get.Andre Morys, Web Arts

The 95% Stopping Problem

“Statistical Significance does not equal Validity”http://bit.ly/1wMfmY2

“Why every Internet Marketer should be a Statistician”http://bit.ly/1wMfs1G

“Understanding the Cycles in your site”http://mklnd.com/1pGSOUP

Three Articles you MUST read

Business & Purchase Cycles

@OptimiseOrDie

• Customers change• Your traffic mix changes• Markets, competitors• Be aware of all the waves• Always test whole cycles• Minimum 2 cycles

(wk/mo)• Don’t exclude slower

buyers

Start Test Finish Avg Cycle

When to stop?• MINIMUM two business cycles (week/mo.)• MINIMUM of 1 purchase cycle• MINIMUM 250 outcomes/conversions per creative• MORE if relative difference is low• ALWAYS test full weeks• KNOW what marketing and cycles are doing• RUN a test length calculator - bit.ly/XqCxuu• SET your test run time • Run it• Stop it• Analyse the data• When do I run over? Not enough data…

@OptimiseOrDie

44

#6 : You peek too early!

45

#6 : The early stages of a test…• Ignore the graphs. Don’t draw conclusions. Don’t dance. Calm down.• Get a feel for the test but don’t do anything yet! • Remember – in A/B - 50% of returning visitors will see a new shiny website!• Until your test has had at least 2 business cycles and 250+ outcomes, don’t bother

even getting remotely excited!• Watching regularly is good though. You’re looking for anything that looks really

odd – if everyone is looking (but not concluding) then oddities will get spotted.• All tests move around or show big swings early in the testing cycle. Here is a very

high traffic site – it still takes 10 days to start settling. Lower traffic sites will stretch this period further.

#7 : No QA testing for the AB

test?

#7 – BIG SECRET!• Over 40% of tests have had QA issues.• Over £20M in browser conversion issues!

Browser testing www.crossbrowsertesting.comwww.browserstack.comwww.spoon.netwww.cloudtesting.comwww.multibrowserviewer.comwww.saucelabs.com

Tablets & Mobiles www.deviceanywhere.com

www.perfectomobile.com

FREE Device lab! www.opendevicelab.com

@OptimiseOrDie

#7 : What other QA testing should I do?• Testing from several locations (office, home, elsewhere)• Testing the IP filtering is set up• Test tags are firing correctly (analytics and the test tool)• Test as a repeat visitor and check session timeouts• Cross check figures from 2+ sources • Monitor closely from launch, recheck, watch• WATCH FOR BIAS!

@OptimiseOrDie

#8 : Tests are random and not prioritised

Once you have a list of potential test areas, rank them by opportunity vs. effort.

The common ranking metrics that I use include:

• Opportunity (revenue, impact)

• Dev resource• Time to market • Risk / Complexity

Make yourself a quadrant diagram and plot them

#9 : Velocity or Scope problems

0 6 12 18

Months

Conversion

@OptimiseOrDie

#9 : Widen the optimisation scope“Rather than try and improve one thing by 10% - which would be very, very difficult to do,

We go and find 1,000 things and improve them all by a fraction of a per cent, which is totally do-able.”

@OptimiseOrDie

#9 : Solutions• Give Priority Boarding for opportunities

– The best seats reserved for metric shifters

• Release more often to close the gap– More testing resource helps, analytics ‘hawk eye’

• Kaizen – continuous improvement– Others call it JFDI (just f***ing do it)

• Make changes AS WELL as tests, basically!– These small things add up as well as compounding effort

• Run simultaneous tests– With analytics integration, decoding this becomes easy

• Online Hair Booking – over 100 tiny tweaks– No functional changes at all – 37% improvement

• Completed in-between product releases– The added lift for 10 days work, worth 360k

@OptimiseOrDie

53

#11 : Your test fails

@OptimiseOrDie

#11: Your test fails• Learn from the failure! If you can’t learn from the failure, you’ve

designed a crap test. • Next time you design, imagine all your stuff failing. What would you

do? If you don’t know or you’re not sure, get it changed so that a negative becomes insightful.

• So : failure itself at a creative or variable level should tell you something.• On a failed test, always analyse the segmentation and analytics• One or more segments will be over and under• Check for varied performance• Now add the failure info to your Knowledge Base:• Look at it carefully – what does the failure tell you? Which element do

you think drove the failure?• If you know what failed (e.g. making the price bigger) then you have

very useful information• You turned the handle the wrong way• Now brainstorm a new test

@OptimiseOrDie

#12 : The test is ‘about the same’• Analyse the segmentation• Check the analytics and instrumentation• One or more segments may be over and under• They may be cancelling out – the average is a lie• The segment level performance will help you (beware of

small sample sizes)• If you genuinely have a test which failed to move any

segments, it’s a crap test – be bolder• This usually happens when it isn’t bold or brave enough in

shifting away from the original design, particularly on lower traffic sites

• Get testing again!

@OptimiseOrDie

• There are three reasons it is moving around– Your sample size (outcomes) is still too small– The external traffic mix, customers or reaction has

suddenly changed or – Your inbound marketing driven traffic mix is

completely volatile (very rare)

• Check the sample size• Check all your marketing activity• Check the instrumentation• If no reason, check segmentation

#13 : The test keeps moving around

@OptimiseOrDie

• Something like this can happen:

• Check your sample size. If it’s still small, then expect this until the test settles.

• If the test does genuinely flip – and quite severely – then something has changed with the traffic mix, the customer base or your advertising. Maybe the PPC budget ran out? Seriously!

• To analyse a flipped test, you’ll need to check your segmented data. This is why you have a split testing package AND an analytics system.

• The segmented data will help you to identify the source of the shift in response to your test. I rarely get a flipped one and it’s always something changing on me, without being told. The heartless bastards.

#14 : The test has flipped on me

58

• No – and this is why:– It’s a waste of time– It’s easier to test and monitor instead– You are eating into test time– Also applies to A/A/B/B testing– A/B/A running at 25%/50%/25% is the best

• Read my post here :http://bit.ly/WcI9EZ

#15 : Should I run an A/A test first

#16 : Nobody feels the test

• You promised a 25% rise in checkouts - you only see 2%• Traffic, Advertising, Marketing may have changed• Check they’re using the same precise metrics• Run a calibration exercise• I often leave a 5 or 10% stub running in a test• This tracks old creative once new one goes live• If conversion is also down for that one, BINGO!• Remember – the AB test is an estimate – it doesn’t

precisely record future performance• This is why infrequent testing is bad• Always be trying a new test instead of basking in the

glory of one you ran 6 months ago. You’re only as good as your next test.

@OptimiseOrDie

#17 : You forgot about Mobile & Tablet

• If you’re AB testing a responsive site, pay attention• Content will break differently on many screens• Know thy users and their devices• Use bango or google analytics to define a test list• Make sure you test mobile devices & viewports• What looks good on your desk may not be for the user• Harder to design cross device tests• You’ll need to segment mobile, tablet & desktop response

in the analytics or AB testing package• Your personal phone is not a device mix• Ask me about making your device list• Buy core devices, rent the rest from deviceanywhere.com

@OptimiseOrDie

• If small volumes, contact customers – reach out. • If data volumes aren’t there, there are still customers!• Drive design from levers you can apply – game the system• Pick clean and simple clusters of change (hypothesis driven)• Use a goal at an earlier ring stage or funnel step• Beware of using clickthroughs when attrition is high on the

other side• Try before and after testing on identical time periods

(measure in analytics model)• Be careful about small sample sizes (<100 outcomes)• Are you working automated emails?• Fix JFDI, performance and UX issues too!

#18 : Oh shit – no traffic

• Forget MVT or A/B/N tests – run your numbers• Test things with high impact – don’t be a wuss!• Use UX, Session Replay to aid insight• Run a task gap survey (4Q style)• Run a dropped basket survey (LF style)• Run a general survey + check social + other sites• Run sitewide tests that appear on all pages or large clusters

of pages – • UVPs (“We are a cool brand”), USPs (“Free returns!”), UCPs

(“10% off today”).• Headers, Footers, Nudge Bars, USP bars, footer changes,

Navigation, Product pages, Delivery info etc.

#18 : Oh shit – no traffic

#19 : I chose the wrong test type

• A/B testing – good for:– A single change of content or design layout– A group of related changes (e.g. payment security)– Finding a new and radical shift for a template design– Lower traffic pages or shorter test times

• Multivariate testing – good for:– Higher traffic pages – Groups of unrelated changes (e.g. delivery & security)– Multiple content or design style changes– Finding specific drivers of test lifts– Testing multiple versions (e.g. click here, book now, go)– Where you need to understand strong and weak cross variable

interactions– Don’t use to settle arguments or sloppy thinking!

Netherlands A/B Shift Example

Previous winner +7.25%

+8.19% additional lift

#20 – Other flavours of testing• Micro testing (tiny change) – good for:

– Proving to the boss that testing works– Demonstrating to IT that it works without impact– Showing the impact of a seemingly tiny change– Proof of concept before larger test

• Funnel testing – good for:– Checkouts– Lead gen– Forms processes– Quotations– Any multi-step process with data entry

• Fake it and Build it – good for:– Testing new business ideas– Trying out promotions on a test sample– Estimating impact before you build– Helps you calculate ROI– You can even split test entire server farms

Vs.

#20 – Other flavours of testing

“Congratulations! Today you’re the lucky winner of our random awards programme. You get all these extra features for free, on us. Enjoy.”

Top F***ups for 20141. Testing in the wrong place2. Your hypothesis inputs are crap3. No analytics integration4. Your test will finish after you die5. You don’t test for long enough6. You peek before it’s ready7. No QA for your split test8. Opportunities are not prioritised9. Testing cycles are too slow10. You don’t know when tests are ready11. Your test fails12. The test is ‘about the same’13. Test flips behaviour14. Test keeps moving around15. You run an A/A test and waste time16. Nobody ‘feels’ the test17. You forgot you were responsive18. You forgot you had no traffic19. You ran the wrong test type20. You didn’t try all the flavours of testing @OptimiseOrD

ie

WE’RE ALL WINGING IT

2004 Headspace

What I thought I knew in 2004

Reality

2014 Headspace

What I know I know

On a good day

Guessaholics Anonymous

Rumsfeldian Space

@OptimiseOrDie

Rumsfeldian Space

@OptimiseOrDie

The 5 Legged Optimisation Barstool@OptimiseOrD

ie

#1 Smart Talented Polymath People

Flexible and Agile teams

@OptimiseOrDie

Fittest? Agile!

@OptimiseOrDie

#2 : Analytics Investment (tools, people, dev time)

@OptimiseOrDie

#3 : User research and insight

@OptimiseOrDie

#3 : THE BEST IDEAS COME FROM?

“On the average, five times as many people read the headline as read the body copy. When you have written your headline, you have spent eighty cents out of your dollar.”David Ogilvy

“In 9 years and 40M split tests with visitors, the majority of my testing success came from playing with the words.”@OptimiseOrDie

#4 : GREAT COPYWRITING

• Google Content Experiments bit.ly/Ljg7Ds

• Optimizelywww.optimizely.com

• Visual Website Optimizerwww.visualwebsiteoptimizer.com

• Multi Armed Bandit Explanationbit.ly/Xa80O8

• New Machine Learning Toolswww.conductrics.comwww.rekko.com

@OptimiseOrDie

#5 : Split Testing Tools

@OptimiseOrDie

#1 Culture & Team#2 Toolkit & Analytics investment#3 UX, CX, Service Design, Insight#4 Persuasive Copywriting#5 Experimentation (testing) tools

The 5 Legged Optimisation Barstool

READ STUFF

READ STUFF

READ STUFF

#5 : FIND STUFF

@OptimiseOrDie

@danbarker Analytics@fastbloke Analytics@timlb Analytics@jamesgurd Analytics@therustybear Analytics@carmenmardiros Analytics@davechaffey Analytics@priteshpatel9 Analytics@cutroni Analytics@avinash Analytics@AschottmullerAnalytics, CRO@cartmetrix Analytics, CRO@Kissmetrics CRO / UX@Unbounce CRO / UX@Morys CRO / Neuro@UXFeeds UX / Neuro@Psyblog Neuro@Gfiorelli1 SEO / Analytics

@PeepLaja CRO@TheGrok CRO@UIE UX@LukeW UX / Forms@cjforms UX / Forms@axbom UX@iatv UX@Chudders Photo UX@JeffreyGroks Innovation@StephanieRieger Innovation@BrianSolis Innovation@DrEscotet Neuro@TheBrainLadyNeuro@RogerDooley Neuro@Cugelman Neuro@Smashingmag Dev / UX@uxmag UX@Webtrends UX / CRO

#5 : LEARN STUFF

@OptimiseOrDie

Baymard.comLukew.comSmashingmagazine.comConversionXL.comMedium.comWhichtestwon.comUnbounce.comMeasuringusability.comRogerDooley.comKissmetrics.comUxmatters.comSmartinsights.comEconsultancy.comCutroni.com

www.GetMentalNotes.com

#12 : The Best Companies…

• Invest continually in analytics instrumentation, tools, people• Use an Agile, iterative, cross-silo, one team project culture• Prefer collaborative tools to having lots of meetings• Prioritise development based on numbers and insight• Practice real continuous product improvement, not SLEDD*

• Are fixing bugs, cruft, bad stuff as well as optimising• Source photos and content that support persuasion and utility• Have cross channel, cross device design, testing and QA• Segment their data for valuable insights, every test or change• Continually reduce cycle (iteration) time in their process• Blend ‘long’ design, continuous improvement AND split tests• Make optimisation the engine of change, not the slave of ego

* Single Large Expensive Doomed Developments

THE FUTURE OF TESTING

Projects? Questions? Mail me!

@OptimiseOrDie

Mail : [email protected]

Deck : slideshare.com/sullivac

Linkedin : linkd.in/pvrg14