Saturday, December 6, 2008

Review: Ian Ayres’ Super Crunchers

“You must be a Super Cruncher!”

Over the past year, several people have told me that. They had read Ian Ayres’ book Super Crunchers, and whatever he was talking about, that must be what I do.

From the first time I heard it, I disliked the term “Super Cruncher.” If it referred to a machine that rendered cars into pellets, that would be fine. But as a description of someone that does data mining and analytics, it doesn’t work for me unless that someone is a comic-book character.

So I avoided the book despite numerous accusations of my involvement with Super Crunchery. Yet when I eventually bought the paperback, Super Crunchers won me over.

For the general reader, it provides an engaging tour of real-world, data-driven decisions and their increasing effect in business, medicine, and government. For example:

  • An algorithm that successfully predicts the best wines of a given year before they are even shipped
  • A Web site, Farecast, that not only shows current fares but advises when to buy based on predictions of whether a fare will go up or down
  • A medical campaign to save 100,000 lives from six improved procedures, derived from a statistical analysis of hospital-related mortality data
  • An analysis of whether longer prison sentences affect whether prisoners commit crimes after their release

Such examples appear throughout the book. Ayres organizes them into a larger story about how the science of data-driven decisions works. Of course he covers how predictions can be made from existing data, but he also highlights the value of generating data specifically to answer questions. For example, “Instead of being satisfied with a historical analysis of consumer behavior, CapOne proactively intervenes in the market by running randomized experiments. In 2006, it ran more than 28,000 experiments—28,000 tests of new products, new advertising approaches, and new contract terms.”

Although he only discusses it near the end of the book, Ayres rightly raises the risk of people overrelying on algorithms. That is, much data mining occurs on data that is noisy, incomplete, inadvertently biased during collection, and otherwise on the edge of a garbage-in/garbage-out scenario. Algorithms, and the applications thereof, can have flaws. Even randomized trials can ask the wrong questions, sample the wrong audience, or otherwise put an unseen tilt on reality.

In this context, Ayres makes the point that human intuition and expertise will always have a role in sanity-checking results, as well as framing the questions to ask and choosing the methodologies for answering them. In fact, I’d agree strongly with Ayres’ statement, “The future belongs to the Super Cruncher who can work back and forth and back between his intuitions and numbers.”

Finally, regarding the book’s title that I dislike so much, I still dislike it as a label for people who do data mining and analytics. But it appears to be an effective title for selling books. To his credit, Ayres tested the title against an alternate, using a Google text-ad campaign. “Super Crunchers” got 63% more clickthroughs than the alternate, “The End of Intuition.” Although I could quibble with the quality or number of alternatives in Ayres’ test, the fact the book made it to the New York Times Business Bestseller suggests the title did its part of the job.

So I’d say Super Crunchers deserves its success, and I’d recommend it without hesitation to the general reader.

That said, even if judged by the standards of a largely anecdotal and nontechnical book, Super Crunchers will no doubt catch flak from some in the data-mining field. Most obvious, many of Ayres’ examples do not qualify for his definition of Super Crunching, which involves “really big” data sets. He never specifically quantifies “really big,” but I can all but guarantee that the wine-prediction example above, as well as many of the social-science examples in the book, are based on a few thousand to perhaps tens of thousands of records. By today’s standards, those are nowhere near “really big” data sets.

In addition, one could argue that Ayres’ coverage of randomized trials is actually about the opposite of Super Crunching. That is, the point of most randomized trials is to generate a clean and relatively small data set that answers a question. Done right, there is little need for sophisticated crunching.

Yet these objections just reinforce that Super Crunchers and Super Crunching are somewhat misleading labels for what the book is actually about. Although this will irritate insiders, I suspect general readers won’t care about the semantic distinctions but will benefit from the wider coverage beyond large-scale data mining.

Here’s a link to Super Crunchers at Amazon.com.

No comments:

Post a Comment