Friday, August 13, 2010

Does Analysis Of Competing Hypotheses Really Work? (Thesis Months)

The recent announcement that collaborative software based on Richards Heuer's famous methodology, Analysis of Competing Hypotheses, would soon be open-sourced was met with much joy in most quarters but some skepticism in others.

The basis for the skepticism seems to be the lack of hard evidence that ACH actually improves forecasting accuracy.  While this was not the only (and may not have been the most important) reason why Heuer created ACH, it is certainly a question that bears asking.

No matter how good a methodology is at organizing information or creating an analytic audit trail or easing the production burden, etc., the most important element of any intelligence methodology would seem to be its ability to increase the accuracy of the forecasts generated by the method (over what is achievable through raw intuition). 

With a documented increase in forecasting accuracy, analysts should be willing to put up with almost any tedium associated with the method.  A methodology that actually decreases forecasting accuracy, on the other hand, is almost certainly not worth considering, much less implementing.  Methods which match raw intuition in forecasting accuracy really have to demonstrate that the ancillary benefits derived from the method are worth the costs associated with achieving them.

It is with this in mind that Drew Brasfield set out to test ACH in his thesis work while here at Mercyhurst.  His research into ACH and the results of his experiments are captured in his thesis, Forecasting Accuracy And Cognitive Bias In The Analysis Of Competing Hypotheses (full text below or you can download a copy here).

To test ACH, Drew used 70 students divided between a control and an experimental group who were all familiar with ACH.  The groups were asked to research and estimate the results of the 2008 Washington State gubernatorial election between Democrat Christine Gregoire and Republican Dino Rossi (Gregoire won the election by about 6 percentage points).  The students were given a week in September 2008 to independently work on their estimate of who would win the election in November.

The results were in favor of ACH in terms of both forecasting accuracy and bias.  In Drew's words, "The findings of the experiment suggest ACH can improve estimative accuracy, is highly effective at mitigating some cognitive phenomena such as confirmation bias, and is almost certain to encourage analysts to use more information and apply it more appropriately."

The results of the experiment are displayed in the graphs below:
Statistical purists will argue that the results did not meet the traditional 95% confidence interval test suggesting that the accuracy difference may be due to chance. True enough. What is clear, though, is that ACH doesn't hurt forecasting accuracy and, when combined with the other results from the experiment (see below) strongly suggests that Drew's characterization of ACH is correct.

Becasue Drew captured the political affiliation of his test subjects before he conducted his experiment he was able to sort those subjects more or less evenly into the control and experimental groups.  Here again, ACH comes away looking pretty good:
The chart may be a bit confusing at first but the bottomline is that Republicans were far more likely to accurately forecast the eventual victory of the Democratic candidate if they used ACH.  Here again the statistics suggest that chance might play a larger role than normal (an effect exacerbated by the even smaller sample sizes for this test).  At the least, however, these results are consistent with the first set of results and, again, do nothing to suggest that ACH does not work.

Drew's final test is the one that helps clarify any fuzziness in the results so far.  Here he was looking for evidence of confirmation bias -- that is, analysts searching for facts that tend to confirm their hypotheses instead of looking at all facts objectively.  He was able to find statistically significant amounts of such bias in the control group and almost none in the experimental group:
It is difficult for me to imagine a method which worked so well at removing biases that would also not improve forecasting accuracy. In short, based on the results of this experiment, concluding that ACH doesn't improve forecasting accuracy (due to the statistical fuzziness) would also require one to conclude that biases don't matter when it comes to forecasting accuracy. This is an arguable hypothesis, I suppose, but not where I would put my money...

The most interesting part of the thesis, in my opinion, though, is the conclusion.  Here Drew makes the case that the statistical fuzziness was a result of the kind of problem tested, not the methodology.  He suggests that "ACH may be less effective for an analytical problem where the objective probabilities of each hypothesis are nearly equal."

In short, when the objective probability of an event approaches 50%, ACH may no longer have the resolution necessary to generate an accurate forecast.  Likewise, as objective reality approaches either 0% or 100%, ACH becomes increasingly less necessary as the correct estimative conclusion is more or less obvious to the "naked eye". Close elections, like the one in Washington State in 2008 may, therefore, be beyond the resolving power of ACH.

Like much good science, Drew's thesis has generated a new testable hypothesis (one we are, in fact, in the process of testing!).  It is definitely worth the time it takes to read.

Forecasting Accuracy and Cognitive Bias in the Analysis of Competing Hypotheses