This is another response to a post in Dan Ariely’s blog. This time, he was ranting about companies not trying to validate what they (plan to) do through scientific experiments. In principle, I’m with him, but I do have some reservations:
I’m not sure what kind of experiments you were proposing, but I think I can see where these folks are coming from, even though I disagree with their decision, and the double-think behind their “logic”. I think the problem companies have is that an experiment will necessarily creates an inconsistent customer experience. If customers find different experimental groups are equally pleasant, that’s not so bad. What stores fear is that customers won’t like one of the test conditions, and may be driven away. After all, the whole point of an experiment is to see if customers react differently; otherwise, there won’t be any “actionable” results.
I had this exact experience at a web company that I used worked for. In general, web companies seem to be more willing to conduct experiments on their users in order to discover what works “best” (e.g. what leads to more sales). Unfortunately, what I’ve found is that even when they do decide to conduct an experiment, they’re not always willing to take reasonable measures to ensure accurate and objective results, or they don’t know how.
For example, statistics tell us that removing a test condition during an experiment yields invalid results. This should seem completely intuitive to anyone who has taken middle school science, yet my company was perfectly willing to remove experimental groups (behind my back, and despite my protests) while an experiment was still in progress. If that wasn’t bad enough, the experimental groups that they summarily decided were “losers” showed no statistically significant difference. I think these misguided decisions were the result of multiple failings:
- Failing to correctly read the reports – The reports (Google Website Optimizer) made good use of visuals, so this was really inexcusable.
- Not understanding the meaning of “statistically significant difference” – I’ll grant that knowing how to determine whether a difference is statistically significant requires some sophistication, but when the report calculates this for you, you are knowingly and willingly going against science.
- Being too impatient – The hope was that we could concentrate our users into the remaining test conditions in order to reach a conclusion more quickly. This was a fairly popular website, which meant that we did not need to run an experiment very long before sufficient data would be gathered.
- Being afraid of the impact the experiment itself might have on users – My boss was particularly paranoid about experiment code slowing down the site, and driving away users. If you’re not willing to pay the cost of a temporarily slower site in order to answer a question definitively, you shouldn’t be spending development time setting up the experiment in the first place trying to figure out something of such low value.
I think they thought being scientific meant being bureaucratic, not decisive, or to use an industry buzzword, not “agile”.
In addition to our statistical problems, we intentionally failed to control important factors such as weekly usage patterns. I tried to advocate that we conduct our experiments for at least one week so that our experiment would include our Saturday users as well as our Monday users, but my appeals fell on deaf ears. This was further evidence that problems 3 and 4 were affecting our ability to conduct experiments in a reasonably rigorous manner.
What this experience taught me was that we don’t always want companies to do experiments, because some companies will do them flagrantly wrong. I’m not talking about subtle mistakes either; they would be obvious to any eighth grader. In such cases, it may be better that companies forgo experiments, because they can at least avoid the Bionicals phenomena, which I had violently been subjected to, in addition to misleading results.