A/B testing is a powerful practice and used right can yield huge results. However, with the myriad of cowboy blog posts out there (need I mention button colours?) and low-level white papers such as "101 ways to increase your conversions" (#linkbait) the industry is suckered into a more tactical way of thinking. Tactical practices that, specifically, don't cause any significant uplift.
It is for this reason, and others which I'll explain within this blog post, that we at User Conversion promote the fact that...
[blockquote color="#7ecec7" bordercolor="#7ecec7" author="User Conversion"]If you're not testing perception or behaviour, you're not testing[/blockquote]
There are, arguably, those experiments that can be classed under, what Peep Laja from Conversion XL calls, JFDI ("Just Do It"). This classification is for those experiments that are, what he calls, 'no brainers'. Now, we don't really believe in no-brainers where we are, but we do believe in usability issues or bug fixes. The two are slightly different.
This type of mentality leads us into questions like "does everything need to be tested?". The answer is no, but we do need to prioritise what we test - which is a whole other story.
So what should we test?
Bug fixes are elements that are broke or issues within the site journey that don't work.
Often, when in the discovery period for a site, when going through the journey yourself or watching so many users work through the site, we'll naturally see issues and broken elements - no matter how big your site is (sorry!). We naturally find these bugs or issues as we work through a website, but there are also ways in which to identify, more formally, these bugs on a website.
- Use Google Analytics
These issues can often arise from screen resolution or browser nuances. Going to Google Analytics > Audience > Technology > Screen Resolution or deep-diving into different browser versions using this tool in association with Browser Stack can often identify some previously unknown issues or bugs.
- Use My Crowd
My Crowd sources website 'testers' to find bugs on your website. They change on a bug by bug basis and dependant on the severity of the bug. For example, 'large' bugs cost more than 'small' bugs and so forth.
In short, any bug fixes should just be fixed. What would we test? The impact of a broken element vs a fixed element? It's also interesting to note the bug in relation to what it's worth to the company. For example, if we notice a bug on IE10 users is preventing 60% of them from checking out - what is that worth to us? If we get < 2% of users using IE10 as a browser, but suddenly increase the conversion rate by 35% for that browser - that could be an extra 50 transactions a month!
Usability improvements are different. Often these can be classed as 'no-brainers' and derive from 'best practice' or 'common sense'. We do believe at User Conversion that there are no such thing as 'no-brainers', that being said, there are patterns and more common / fixable elements than others. It's because of this that these elements are often more subjective than objective in nature and therefore takes an experienced practitioner to identify, prioritise and classify these types of experiments accordingly.
Examples might include:
- You notice that more users are using the back button than the breadcrumbs which is what you really want them to use. Your proposed improvement is to underline the breadcrumbs to add affordance.
- Insight suggests that users are mis-associating the use of the search bar and pressing enter more than searching, perhaps therefore missing any predictive search recommendations which convert well. Would adding a search button instead of an icon be worth testing to encourage clicks not keyboard returns?
- Although only 5% of users scroll to the bottom of the page to view the footer, insight suggests that users want to contact the company and / or prove it's existence. A solution is therefore to add a contact link to the footer to improve usability for those users that wish to contact the company.
Would you test these? What are you testing and do you expect huge returns from such perceived improvements?
It goes without saying that all recommendations should be validated by data first backed up with a solid hypothesis that contains actionable insight, but such amends could be classified as usability improvements that would not necessarily bring much, if any noticeable, uplift. They border on UX improvements which is a sub-set of conversion rate optimisation. They often don't shift the thought process, flow or impact the user in any great deal.
Ask yourselves these questions when identifying such issues:.
- What proportion of users does this traffic affect?
- Taking that proportion of traffic, how much are they worth to you?
- At what stage of the funnel are you amending and, ergo, what is the propensity to purchase from the user?
- How long will this experiment take to be statistically significant?
With regards to the final point, there might be more JFDIs in a low traffic site, than a high traffic site. If we are testing the impact of a change in the prominence of the search bar, for example, it could be classified on a low traffic site where it might take weeks or even months to determine a winner as a usability improvement - as a result, we should just do it (providing of course there's still a solid hypothesis behind it, that's validated with data, and backed up by insight).
What we can do when implementing direct changes to the website is to monitor their impact. We always recommend tracking your website using Google Tag Manager and often relate a website to that of an ice berg - only 10% of your site interaction is URL based, the other 90% is the mass under the surface, non-URL defined interactions. By creating custom segments we can compare the interaction of our, say, prominent search bar, a couple of weeks later to see the before and after effect. How many more users used the search bar? Of those users that used the search bar, what is their before and after conversion rate determining their propensity to purchase? What about their bounce rate and other secondary or micro-conversion metrics?
For JFDIs, where testing does come in handy is testing iterations or MVT testing. If there are multiple solutions to a problem, testing iterations of that solution can work well to see which provided the largest increase, if any.
As a result, there are those improvements that don't require testing, leaving the ones that should be tested.
VWO state that "Almost anything on your website that affects visitor behavior can be A/B tested". Whilst that might be true, not everything affects user behaviour. Whilst they start off their list with 'headlines' and 'sub-headlines' we would ask - would such a change truly affect user behaviour? Perception; perhaps - but not behaviour.
This is where subjectivity often enters the fold, and where an experienced pair of hands comes in to play; what is classified as 'changing behaviour'? The answer in itself is, no doubt, subjective. There is the possibility of directly affecting user behaviour and indirectly affecting it for example. This is perhaps where the advent of prioritisation is required.
Examples might include:
- Testing whether, when a product is added to the basket, the user is taken directly to the basket or, instead, a notification appears
- Implementing products on the homepage to direct users straight to those products instead of re-routing them through a category structure
- Reducing, or increasing, the amount of fields on your enquiry form
Taking the last point as an example, Oli Gardner found that when testing form fields for an entertainment event company, to directly affect the behaviour of the user he reduced the number of form fields to reduce friction. Makes sense and best practice theory would agree. However, the result was a 14% drop in conversion rate.
This is exactly why it's important to test these changes.
When you're testing improvements that have a direct impact on behaviour these should always be tested. Changing user behaviour can directly affect conversion rate and have a dramatic impact. For us as conversion optimisers, this is great; we have the potential to dramatically shift user behaviour to a positive integer. Therefore, it does leave the strong possibility that, if not tested, changing user behaviour could dramatically negatively impact conversion rates - such as the above Oli Gardner example.
There's a reason why headline testing is so popular within the world of A/B testing. Take a look at the example given by Adam Mordecai where 2 x headlines were tested on an ad, one of which got 200x more views than the control. Why? It's because they're changing perception, in that specific example, altering perception to both curiosity and mystery at the same time.
Take another example - that of the famous Obama campaign given by Optimizely (they have to surely retire this example soon, right?). In the example given, a 'family image' gave a 13.1% noticeable uplift. Why? Addressing and appeasing to the emotional connection of users to alter their perception on the campaign is one that was clearly a, if not sub-conscious, motivating factor.
Why does this work isn't the question we should be asking ourselves, but why should this be tested?
Well, we're testing hypotheses and validating these hypotheses with an implementation in a controlled environment (well, 'controlled' can certainly be debated). Only in practice can these experiments inform us whether a solution will fail or succeed, otherwise it remains hypothetical. Even with all the amount of data in the world, and if an experiment is validated 10x over by multiple sources, the hypothesis still remains a hypothesis; an untested, theoretical belief.
Often, paradigm shifts in the user model should be tested as they are harder to quantify - perception especially. This is not just the riskiest type of perception, but because it affects core user motivations, it also brings with it the most value.
In addition, like usability testing, just more prevalent in this case, when testing perception specifically there are an infinite number of solutions and it's down to us as conversion optimisers to find out what solution, or combination of solutions, works best. Let's say for example, insight suggests that users are worried about payment security. How many ways can we insinuate and reduce anxiety within the user? Infinite. That's iterative testing at it's finest.
An effective and structured conversion rate optimisation program is one that has the sole objective of improving the commercial standing of a company. We often describe it as an effective use of your marketing spend. As a result, ensuring that the program itself is effective by prioritising amends and utilising resource with those experiments that are 'true' experiments is vital.
It's up to you to determine what is a 'true' experiment. For us, that constitutes asking ourselves the question: "Does this experiment affect perception or behaviour?" amongst many other factors. But, as a pre-qualifying question, please do ask this yourself before experimenting.