Sunday, October 9, 2011

A/B Testing and Experimentation as a Competitive Advantage

I've touched on the idea of experimentation as a competitive advantage in my Harrah's, CapOne, and Taylorism blogs, although I've avoided this topic so far because it is overwhelming. Experimentation is central to the scientific method, is a whole discipline within statistics ( Design Of Experiments ), and is rapidly shaping the course of digital marketing. In one enlightening demonstration of its importance, Jeff Bezos fired a web design firm for changing the website without running an experiment first. The best indicator of experimentation's growing importance though, is the emphasis on it in the McKinsey Big Data white paper.


Why all the fuss though over experimentation though? Because experimentation is data driven decision making. If your decision making process doesn’t utilize statistically valid experimention then you are a dinosaur.

A/B testing is the first step in creating an experimentation-based competitive advantage, but first we’ll start with an example.


A direct marketer sends out 100 letters to potential customers. He randomly assigns 50 customers to Group A (the control) and 50 customers to group B (the test group). The ‘A’ letters are sent using the normal stationery while the ‘B’ letters are sent using ornate, colored and watermarked stationery. The customers that respond to the letters are tracked when they place an order, and the marketer tallies the results. If Group B had a significantly higher response rate than Group A, they will adopt this stationery as standard.


This technique enables objective comparison of alternatives and makes iterative, incremental improvements possible. Although the method was pioneered in direct mail marketing, it has become a standard tool in digital marketing. One extreme example involved Google deciding between two shades of blue for a webpage. True to their reputation, they A/B tested 41 different shades of blue to determine which inspired the most clicks. 


Google and Amazon aren’t the only big names using this technique though. Netflix, Zynga, eHarmony and Microsoft add to the list... in addition to every internet advertiser using the Google Website Optimizer


Moving Beyond A/B Testing

A/B Testing is ideally suited to comparing two alternatives. It is the first tool in the experimenter’s toolkit, but it is easily overwhelmed for non-incremental comparisons. For example, if you want to know the optimal conditions for car engine performance, you would need to vary temperature, humidity, air pressure, fuel octane and compare the results. There are thousands of such combinations though, and we need more powerful tools to answer such questions. I will save discussion of more sophisticated methods until a later blog though.


Quantification of Benefits


A recent New York Times article, “When There’s No Such Thing As Too Much Information” showed that companies using data driven decision-making (DDDM) were 6% more productive than could be explained by other factors. I would point to experimentation as the differentiator by using the following success stories from Google Website Optimizer:

  • Mattress Liquidators increased online leads 5,000%
  • Doba.com increased conversions 70% and sign-ups 50%
  • Jigsaw Health increased conversions 60%
  • Tourism BC boosted conversion rates by 7 percent over previous campaigns
  • BC Finance saw an increase in conversion rates exceeding 15 points
  • Moishe's Moving Systems increases its lead collection rate by 50%
  • Dr. Gary Berger improved conversions 225% with Google Website Optimizer
  • Calyx Flowers used Website Optimizer to drive a 14% increase in the number of customers adding items to cart

For additional clarification, please note that the Google Website Optimizer is free and turn-key, so every performance improvement went straight to the bottom line.


Commentary




Data Driven Decision Making Requires a Culture of Discipline: People don’t want to abandon their intuition when making decisions, so the corporate culture needs to encourage them to use DDDM. Furthermore, accurately quantifying the effect of marketing campaigns requires that you leave many customers out as a control group. Foregoing these incremental revenues for the sake of quantification accuracy will always be a difficult sell and will require cultural support.


Led Astray By Economics Classes: So many business people learned economics that we’ve lost touch with the scientific method. This is because much of economics is unscientific. For example, the axiom of the rational consumer was first contradicted by Kahnemann and Tversky, is now regularly refuted in behavioral economics, the axiom of ‘normally distributed financial market returns’ was refuted by Eugene Fama and Nassim Taleb, while John Nash proved that pursuit of self-interest does not lead to market-optimal outcomes. And yet none of these blatant contradictions prevents laissez-faire economics or ‘rational consumerism’ from being taught in economics courses.


There is no other ‘quantitative’ discipline where the axioms have proven so flawed. This is largely because nothing in economic theory is tested. ‘Laws’ are ascribed to historical patterns which do not hold in different time periods or countries. For example, the Phillips curve is taught in every macroeconomics class, but data has never supported the existence of this law outside of the original data set that was used.


Is it any wonder then, that after this training, most business people do not experiment but seek interpretations of historical data to support their conclusions?


Local Optima vs. Global Optima: A/B testing is ‘iterative optimization’. It allows you to improve a tiny bit every day, but eventually your results reach an optimum and it stops improving. The question then is, “How do we find a new starting point?” The lucrative way to answer this question is with vision. Amazon Silk achieved a 200x improvement in browser speed by re-thinking the browser. Apple took over the smartphone market by re-thinking the phone rather than by creating a slightly improved BlackBerry. These radical improvements do not result from iterative optimization, but from insight and new technology.





“Putting a Bolder Face on Google.” By Laura M. Holson. The New York Times. February 29, 2009.


“When There’s No Such Thing as Too Much Information.” By Steve Lohr. The New York Times. April 23, 2011.


http://www.hackingnetflix.com/2010/11/netflixs-neil-hunt-says-that-netflix-ab-tests-everything.html


http://thinkvitamin.com/design/how-eharmony-kills-the-romance-with-ab-testing/



http://en.wikipedia.org/wiki/A/B_testing


http://en.wikipedia.org/wiki/Design_of_experiments