On creating A/B tests (properly!) – by a developer

by Damian Dawber

posted on: April 6th 2018

While CRO is growing, many-a-tech-savvy marketer and the vast majority of developers hold perfunctory and misguided attitudes towards the technical implementation of A/B tests.

The consensus is about as follows:

  • A/B tests are disposable / it’s cost-effective to churn out tests / the implementation will never see the light of day / A/B tests are short-lived
  • A/B tests are easy / a proof of concept can work within the limitations of a given tool or platform / you can write meaningful A/B tests without writing any code

These opinions might hold true sometimes – and didn’t someone change the colour of a button and conversion increased by 20% and everyone was a wizard all of a sudden? – but if you’re playing the long game, these attitudes will bite you on the bum.

WYSIWYG editor

Many A/B testing platforms offer drag and drop / what-you-see-is-what-you-get test building interfaces that allow A/B tests to be created without writing a single line of code.

Things like adding images, changing text, modifying the colour of buttons, etc. can all be accomplished with ease offering marketers and developers the ability to change the look and feel of A/B tests without having to write a single line of code.

In theory then, simple A/B tests can be created using this method. And so it goes without saying that while more complex tests should be deferred to experienced coders, some tests can be built utilising a WYSIWYG editor (and maybe just a tad CSS, JavaScript and HTML). Right?

Wrong.

This approach limits even the simplest of tests for the following reasons:

  • Limited or no control over event tracking – let’s say you add a banner just above the footer: we might reasonably want to know (a) who clicked it (b) did users hover over it but not click it (c) did users scroll sufficiently far down the page to see the banner at all?
  • Difficulty iterating on a test – future iterations on a test may either be impossible or limited by the scope of the original A/B test – what works for you now might not work for you in future.
  • Limited scope = limited thinking – if you’re limited by what you can achieve, an insight-driven hypotheses may be limited in scope – that a simple experiment is sufficient to test a given hypothesis isn’t to say that the hypothesis is as strong as it could be (ensuring results are meaningful and actually useful to businesses and hypothesising as a learning opportunity for future tests) were you given more scope in execution.
  • Subtleties – so you want to change the colour of a link but aren’t really able to control its hover styling, its behaviour, animation effects on elements, … – these are par for the course considerations for experienced developers.
  • Re-usability and portability – I might reasonably expect to be able to build on old code, but changes to the way that a platform works are outside of our control. Copies of my code exist outside of the platform and should I wish to move to another platform in future it is easy to port that code over.

Code Quality

Code quality might reasonably be summarised as follows in the realm of A/B tests: (a) well-structured (b) resusable (c) DRY (do not repeat thy self).

You’re not putting a team in space (or are you? If so do get in touch) but solid development principles do apply.

While it’s true that the code utilised in A/B tests isn’t always going to be the code written into the site following the build of a winning test, and that A/B tests are often short-lived, running for only a few weeks, code quality matters for a number of reasons:

  • A/B tests evolve – an A/B test is likely to be improved and iterated upon down the line – building on top of poor / unstructured code is inefficient, gives rise to bugs and limits our ability to write quality code in future without rewriting the original code. Mostly it gives one a headache when having to look at old tests that are poorly written.
  • An A/B test often becomes a 100% test – if a test wins it will often be served to 100% of traffic before the test is built into the underlying application codebase. Even though the A/B test only ran at, say, a 50/50 traffic split for a few weeks, it’s now going to live at a 0/100 traffic split indefinitely. But how long will it live at 100% for before going into production? The answer is – sometimes a long time. And having poor code out in the wild over a period of time means that there is, with every passing day, a greater chance of things breaking when developers work on other parts of the site.
  • Some features of a test are reusable – (a) sometimes developers write really good code and yet it never makes its way into component or library form to be used by others – it’s forever locked away inside a test that lives and is then forgotten about; (b) a hypothesis may well be satisfied by reusing functionality from an older test, in which case it’s going to be far easier to utilise that code if it’s well-written.
  • Site developers will use the code to inform the production code – while the code won’t be used directly, it will likely be ported over into the core application’s codebase – handing over reams of spaghetti code is detrimental to that process
  • Bad code is bad code – whatever proportion of traffic is being served your code, having poor code in production is bad in and of itself. Poorly written code is usually going to be inefficient, giving rise to increased page load and DOM rendering times and thus negatively impacting a user’s experience.

The importance of process

At User Conversion, we have a strong development process, some of the key features of which are as follows:

Unique experiment names

Unique identifiers relate to the wider business process in that experiments need to be clearly defined and uniquely identified – from a development point of view, unique identifiers should be considered in a few ways:

  • Namespacing components in your code
  • Namespaced CSS and HTML – unique identifiers ensure that elements do not overlap across experiments or interfere with the underlying code
  • As part of your development build process

Local Development

While we utilise A/B testing platforms to execute scripts, much of our development is done locally – as part of our build process, scripts and CSS are uploaded to an endpoint that is accessible over HTTPS and so we are able to test those scripts quickly by injecting them into the page.

Task runners

Experiment creators are strongly encouraged to use task runners for things such as compiling ES6 to ES5, support for Sass, support for next generation CSS, JavaScript and CSS file minification, and so on. Some of the tools we use are:

  • NPM for package management
  • Babel for ES6 / ES7 compilation
  • Sass
  • PostCSS / autoprefixer
  • Rollup for file bundling

Testing tests

QA is four-pronged at User Conversion. However, within a development focus alone, we focus on two core QA functions:

  • Tech QA – we employ people whose job is to test experiments in the context of client websites across devices. The testing is largely functional where Tech QA acts to bridge between developers, researchers and analysts – ensuring that experiment builds meet exacting standards both in terms of aesthetics and user experience and that the tools required for proper analysis are in place
  • Code-level testing – our focus is on testing reusable components at code level with a view to ensuring that changes to core components to do not break and thereby impact any of the tests using them.

EVENTS

Events, events and more events.

For all the above, a solid A/B test implementation is of no use to anyone if the analysis is in any way flawed or incomplete. While A/B testing platforms feed data into Google Analytics and similar, tracking behaviour in a useful way is going to vary across tests.

My general maxim is: track everything!

In practice: (a) Google Analytics implements event rate limiting, (b) it’s cumbersome to track every conceivable action or behaviour.

But you should certainly be thinking about what might be useful in helping your analysts understand how an experiment is performing. This could mean, for example, tracking user scroll behaviour (for example if an element is never seen we might exclude that traffic from any analysis); you may need to track hover, touch, swipe and pinch behaviours; you might track exit intent or time on page behaviours; and so on.

If it’s measurable and efficient to do so then track it.

Think bigger

This article focuses on utilising third-party platforms to inject JavaScript code into websites in order to create split tests. Much of the code written, then, is to work within the constraints of these frameworks and the focus tends to be on UI changes on page load.

Although this is not about us, at User Conversion we do think of the bigger picture.

Where we’re able to work directly on websites to develop A/B tests we do so and similarly we will help design APIs that can be used to facilitate A/B tests; we create remote data stores for test analysis and user data storage; we work on experiments that take into consideration a user’s behaviour across sessions; we utilise knowledge of users to tailor their experience – To add – CRO isn’t just about creating A/B tests.

Where developers might inform the wider efforts to improve websites and applications they should do. For example we’re currently writing models that will help a client process customer feedback using machine learning… Ultimately it is about utilising whatever tool is right for the job. As developers, we’re fortunate to be informed by excellent researchers and analysts.

Thereafter it’s as simple as taking A/B test development seriously.

avatar for author

Damian Dawber

I have over 8 years experience working with front- and back-end languages and frameworks. I originally studied theoretical physics before moving into the arena of programming and wrote a book on a JavaScript library for creating web-based vector graphics. Got to love Cricket and dogs, too.

On creating A/B tests (properly!) – by a dev…

by Damian Dawber Time to read: <1 min