Better CRO: Best Practices for the Experimentation Document

Conversion rate optimization (CRO) is the process by which a given user facing experience, be that a web site, an app, or a marketing campaign, is studied for improvement. In enterprise-scale websites, experimentation programs can be extensive and multi-layered. Imagine an experimentation program scaled up ten times. That’s five tests running simultaneously and another five in development or QA awaiting a result from the five tests that are running. That’s the simplified story; in practice there will be experiments that get initial approval and then disapproval from stakeholders, and a diligent CRO practitioner will always be creating new test ideas and presenting them to stakeholders so the experimentation pipeline does not run dry.

All experiments need to be documented, and creating and managing well-structured experimentation documents (exdocs for short), will save the CRO practitioner a great deal of time and will create great reporting for stakeholders on an experiment’s details.

What’s In an Experimentation Document?

The experimentation document has the following information:

Problem statement
Hypothesis
Mock-ups
The KPIs and KBRs
Technical requirements
Experiment conclusions
Next steps

Our Example and Filenaming Convention

The first step in keeping an enterprise-level CRO program nice and tidy is to create an agreed-upon naming convention for all documents about, and references to, a given experiment.

Let’s go through an example of an experimentation document for a fictional large international airline called “Pronto Air.”

You can use whatever naming convention you like (as long as everyone agrees) and it’s documented. For this example, the exdoc naming convention will include codes for the following pieces of information:

URL/domain
Business division
Step in path
Element in question
Challenger
Date (ending in go-live date of experiment)

URLs

The CRO practitioner’s remit is to improve conversions for two URLs: prontoair.com (B2C passenger facing) and prontocargo.com (B2B site for air freight reservations).

“PA” will represent the retail passenger facing site prontoair.com and “PC” will represent the cargo business web site. This naming example will only use the retail site as most readers will be familiar with the purchase process of a plane ticket. The theoretical exdoc has a prefix “PA” to define which domain is being experimented upon.

Division

Pronto Air has several divisions that are pretty siloed and almost treated as separate entities. Here’s how we’ll shorthand them:

flight booking (FB)
hotel and car reservations (HC)
loyalty program (LP)
gift cards (GC)

So far the exdoc file name will start with “PA_FB”.

Step in Path/Element in Question

In this experiment it is believed that the seat selection tool in the flight booking path is causing confusion, and thus, dropoffs from the purchase funnel resulting in fewer sales. The seat selector is not available for all flights, so its appearance on the page is dynamic depending on the flight selected. However, the flight selector does appear on “Step 3” of the purchase funnel and that nomenclature is understood by all. Let’s refer to step 3 as “S3” and the seat selector as “SS.”

The exdoc name now reads “PA_FB_S3_SS,” which indicates that the experiment takes place on prontoair.com (PA) in the flight booking path (FB) on step 3 of that path (S3) and in the seat selection tool (SS).

In a healthy enterprise level CRO program, dozens, perhaps hundreds, of experiments will be conducted in a given year and if you wanted to look up all the tests that have occurred on the seat selector tool you need only search by “S3_SS” to bring up the relevant files.

Challenger

Let’s say our hypothesis is that when users select their preferred seat, the seat changes color, but the shade is too light. For certain visitors and for certain hardware display settings it may leave the visitor wondering if the selection happened at all. So, the experiment will test a more visible seat selection color (the “challenger”) against the current design (the “control”). So our experiment is called “brighter selected seat.” Note that we only name the challenger and not the control.

Our file name has grown to: “PA_FB_S3_SS_Brighter_Selected_Seat”

Date

The last part of the filename is the date. The date should be changed as the experimentation document moves through the production/approval process, and the final date should be when the test goes live. For example:

“PA_FB_S3_SS_Brighter_Selected_Seat_2020.12.25” date stamp reflects stakeholder sign off.
“PA_FB_S3_SS_Brighter_Selected_Seat_2020.12.26” date stamp reflects requested change to audience targeting.
“PA_FB_S3_SS_Brighter_Selected_Seat_2020.12.27” shows the go-live date and at this point the file name is locked down for the length of the experiment.

The Problem Statement

The problem statement is the “why” of an experiment.

The problem statement identifies which part of the user experience might be causing friction in a purchase funnel or another type of conversion. This statement can be gleaned from quantitative data, or qualitative data.

Quantitative data: Usually web analytics. By slicing and dicing the web analytics one can discover points of struggle for the customer to get to the next page or other goal you have set out for them to complete. Using the seat selector example, it has been noted that flights without the seat selection option convert at higher rates.
Qualitative data: Qualitative data consists of things like survey comments, customer service calls, and focus groups. Qualitative data is important because it’s a direct response from the customer. While the web analytics can show us that the funnel drop off is pronounced at the seat selection tool, it can’t say why. The all-important “why” comes from people-based qualitative data.

In the seat selection example, we have the analytics showing the funnel drop off at the seat selection tool, but it doesn’t tell us “why.” But if we go through dozens of survey comments, we see the users have commented “can’t select seat, made reservation on the phone.” Now we have qualitative data from the web analytics AND a qualitative data point backing that up. It’s not that visitors don’t want to choose their seat (take that option away and watch the complaints flood in) it’s that they feel they cannot do so given the current user experience.

With that information, our problem statement can be made: “The seat selector is causing confusion and leading to high drop off rates, based on evidence of higher drop off rates for flights involving the seat selector and a user observation.”

The Hypothesis

The hypothesis is the “what” of the experiment.

Hypothesis: “Upon selecting a seat for a given flight, the color of the selected seat only changes slightly and may lack visibility. The lack of certainty in this step may cause users to switch channels and use the phone or abandon the purchase entirely”

The Mock-ups

A picture is worth a thousand words. Make a mockup of the challenger design and place it in the exdoc alongside the control experience, so the reviewer can easily compare the changes to the current design. “Make it look like this” is the best specification you can give a programmer.

The KPIs and KBRs

It’s always key to define the key performance indicators (KPIs) and key business requirements (KBRs) for each experiment. The KBRs are more global in nature, the KPIs are more tactical. For example, an ecommerce site that sells electronics might have a KBR of “increase take rate of protection plans” and KPIs of “purchased with plan,” “put plan in cart” and “visited plan page.” The KPIs are ordered by importance, “purchased with plan” is the main target for success, as it directly relates to the KBR. However, should the experiment not show any lift on the “purchased with plan” metric then we can look to the other KPIs for lift in engagement. Those data points will then be further used to determine what got the visitor interested enough to put a plan in their cart or learn more about the plans, if the visitor’s curiosity didn’t make it to checkout.

Technical Requirements

The technical specs are largely for the programmer’s use although they could easily be of interest to other stakeholders. Since we already have our “make it look like this” mock-up, the technical specs may not be necessary at all except for details of the changes to the control. Using our Proto Air seat selector example, the technical specs could be as basic as “Color change: #32a892” indicating the hex value of the color so the programmer is not left guessing what color might be intended in the mock-up.

In some workflows a programmer might develop the code but not QA the work, or put the code into an experimentation platform. In that case the technical requirements page would be the location for the code change along with instructions on where to swap it out in the challenger.

Other data that would be good to put here: any browsers to be excluded from the experiment, audience targeting details and so forth.

Experiment Data

The experiment data page should be the master record of data for a given experiment. The CRO practitioner needs to tightly control the data. Using Pronto Air as an example, what if the New York Jets and their staff bought up all the first-class seats and a big chunk of business class seats on a given flight to accommodate the players’ large physiques? Supposing the person booking is doing so in the challenger experience. That would lead to an outlier in the data making a test appear more successful than it really is. If one does not control the data, false narratives and assumptions can become a problem.

If, using the Jets’ first-class seat purchases as an example, when you remove the outliers (or any other data smoothing technique) for presentations to stakeholders, always note the filtering method on the experiment data page. Total honesty and transparency are critical for the experiment data page. As data comes in and is smoothed (if necessary) the page becomes the source of information and analysis for meetings and communications with stakeholders.

Experiment Conclusions

On this page of the exdoc, the CRO practitioner will summarize the findings from the experiment data page: special insights, trends, and data in written out, because some people just don’t like numbers.

The experiment conclusions page should should only be filled in when a given experiment has been taken down for underperformance, or, it is put in line for base code development to make the successful challenger experience the new control experience.

Next Steps

The next steps page of the exdoc will take the information from the experiment conclusions page and iterate on the findings for possible next tests. To go back to our seat selection example test for Pronto Air it might say something like “The increased brightness of a selected seat had a positive impact on flight bookings. Due to this finding it is recommended to revisit all CTA buttons and selection experiences in general. One interesting place to start would be….”

The purpose of the next steps page of is to keep the program running with fresh test ideas and to communicate that this is an ongoing process.

In Summary

If you’re organized and follow the method above for making an exdoc, you’ll will have everything you need in one place for the full life cycle of a test and the testing program’s history.

It will contain:

The problem and the hypothesis to ensure experiments are not chosen randomly, that there was logic to why a given experiment took place.
The mock-ups so it’s 100% clear to business users and developers what the changes are and how they’ll look on the page.
The KBRs and KPIs to ensure that there is an understanding of overall business needs (the KBR) and that there is agreement to the KPIs chosen as the primary and secondary goals for a given experiment.
The technical specs will communicate to development resources what to put in their code, or, conversely, what code a developer has created for the practitioner to enter into an experimentation platform.
The experiment data page, which is perhaps the most important page. It will contain all the data, in numbered format, that shows how a test performed. It needs to be communicated clearly that the experiment data page is the master record and not the testing platform, as test data sometimes requires refinement to get to the truth.
The experiment conclusions and next steps pages will show that the test has provided insight (even if the challenger experience was negative on the KPIs) and the new learning, or learnings, will lead to a more informed testing program that stays on a steady pace.