Words & Numbers: October 2005

Sunday, October 30, 2005

When the Means Become the Ends

Organizations can easily do the wrong thing by mistaking the means for the ends. Following are a couple examples I ran across in the past few days.

Misleading Metrics

Airlines are subject to on-time rankings, a means to demonstrate reliability and thus customer satisfaction. However, these rankings are only based on non-stop flights, so they don’t consider flights that act as connections. And thus we have problems like what Ross Mayfield reports here:

It was abundantly clear last night when my connection was delayed that the airline industry is running on the wrong metrics. Half of the plane missed their connecting flight, most by minutes, when doors were still open, but gates closed — for sake of on-time-departure. The last planes left within a half an hour and we were left stranded in Virginia without hotel rooms in the vicinity.

Much of the airline industry operates via the “hub and spoke” method of multi-hop trips, so this kind of scenario is not a fluke. While it’s true that airlines will sometimes hold the connecting flight in close situations, the on-time metric creates a perverse incentive not to do so—despite the fact the metric exists as a proxy for customer satisfaction.

Method versus Mission

Former CIA Director George Tenet used to describe the CIA’s business as “stealing secrets.” In the CIA’s Studies in Intelligence journal, Stephen Mercado critiques this mindset for conflating a method (stealing secrets) with the organization’s mission, which is to provide actionable intelligence about national security. Mercado argues that another method—improving the CIA’s analysis of freely available information (“open sources”) such as from foreign newspapers—is more effective yet underutilized:

Despite numerous surveys putting the contribution of open sources anywhere from 35 to 95 percent of the intelligence used in the government, [open-sources intelligence’s] share of the overall intelligence budget has been estimated at roughly 1 percent.

Mercado argues that the “stealing secrets” mindset is so deeply ingrained in the CIA’s culture that the method has become the mission. After making an argument for open-sources intelligence, he advocates doubling its budget—to 2% of the overall intelligence budget—which is apparently a radical proposal.

Livening Up Asexual-Fungus Research

Earlier this week, Imperial College London issued a press release about research on Penicillium marneffei, an asexual fungus. In this context, asexual means an organism reproduces without a mate, cloning itself.

In what must have been a desperate attempt to get Cosmopolitan magazine to pick up their story, they titled the press release “Lack of sex could be a signpost to extinction, claim researchers.”

For the record, here it is.

Wednesday, October 26, 2005

Word of the Day: Deroach

I came across the word deroach back in my SRI days, when I often talked to people from the cable-television industry. In that industry, deroach referred to the process of refurbishing a set-top box after it had been returned by a customer. (Example usage: “Those boxes need deroaching.”)

Apparently, analog set-top boxes of the day sometimes ended up as unintentional roach motels, and thus the term. Unlike the better known debug, which brings to mind thoughtful diagnosis of a subtle problem, deroach is thoughtless disposal of an obvious problem—shake ’em out and move on.

I’m blogging this topic because when I searched Google for deroach, I was surprised to find only references to the word as a name for places and people. So for future searchers of the non-name deroach, perhaps you will find your answer in this entry.

And for the rest of you who have inadvertently read this far, that concludes today’s intersection of electronics, entomology, and etymology.

Sunday, October 23, 2005

War on the Wane?

Kudos to Chris Anderson on TEDBlog for highlighting a recent study about armed conflict worldwide, or more to the point, the lessening amount of it. Since peaking in 1992, the number of armed conflicts has dropped 40%. Larger conflicts (those with more than 1,000 battle deaths) are down 80%. As Chris asks, shouldn’t this be news?

For a quick take, read Chris’ post. Or for an executive summary of the research, see the Human Security Report 2005’s Overview.

Because it wasn’t in the report and the source data was easy to get, I created my own illustration of the good news (below). It shows, from 1946 to 2004, the number of nations along with the number of armed conflicts. This relationship matters because most conflicts occur within nations, as with insurgencies and civil wars. So the more nations there are, the more venues for conflicts within nations.

And yet...

I didn’t show it in the graph, but if you divide the number of conflicts by the number of nations, 2003 and 2004 are the two lowest years in the data set.

Although the number of armed conflicts is still well above zero, it is encouraging to see this forest from the usual trees.

Costing Out Email’s Manifest Destiny

Robert X. Cringely recently explored the cost of all 202 million American Internet users’ having Gmail accounts that actually consume the free 1 gigabyte of storage. Let’s call it the manifest destiny of email storage.

Cringely enumerated the costs of hard-drive hardware and data-center power necessary to make that storage available. He took the total, $30 million, to be a big number in relation to a “free” (that is, advertising-based) email service.

In a great response, Ethan Stock finished the math to show that, even with Cringely’s assumptions multiplied by five, the capital-expenditure cost of one gigabyte of email per American Internet user is 62 cents, and the yearly operational expenditure is 8 cents. As Ethan indicates, the news here should not be how expensive it is but rather how cheap it is. Paying for it requires well less than a dollar per year in advertising fees (meaning Google’s cut of the advertising spent) per Gmail user.

To be fair, near the end of his piece Cringely raised the ante by saying the addition of pictures and video will raise the cost by two orders of magnitude. However, that is a future scenario. Over the time it takes to happen, hardware costs and operational efficiences will have continued to improve. Not to mention, the average American’s email storage requirements circa 2005 are well less than the gigabyte that Cringely posited before raising the ante. So the real costs have a lower starting point.

Of course, it’s always possible to create scenarios where these services become uneconomic. For me, however, the lesson here is how much can be economic.

Finally, it’s worth noting that while Cringely thought that realizing email’s manifest destiny would be hard and expensive, he still thought it would happen. And thus, in classic American fashion, the question between he and Ethan is not about whether something that seems improbably ambitious will happen but rather when it will happen.

Tuesday, October 18, 2005

John Battelle’s The Search

I recently read The Search by John Battelle. It’s about how search has become central to the Internet economy and why today’s search businesses, epitomized by Google, are the beginning of something even bigger.

John talked to me for the book, so I’m not trying to be Mr. Objective here. I’ll just offer a perspective on its main theme.

To start, let me say that The Search dishes generous helpings of insider history about Google and other search players, past and present. John had access to most of the key people involved, so a lot of the quotes represent first-hand, new stuff. Given today’s rampant Googlephilia, the book would have been plenty successful if it stopped there.

The Search’s distinction, however, is that it threads Google’s story into a larger theme about what John calls the Database of Intentions: “the aggregate results of every search ever entered, every result list ever tendered, and every path taken as a result.” (For the technical crowd, he does not mean “aggregate” in the analytical sense of summarizing/abstracting data; rather, he is referring to all the disparate bits of detailed behavioral data, accumulated together.)

No one company owns the Database of Intentions. It is spread among millions of Web sites, each of which collects its own data, as well as other network-based media (mobile applications, Tivo-style services, and so on). It turns out that the big search companies have among the largest concentrations of such data. Moreover, John argues that Goto.com/Overture was first, and Google has so far been best, at commercializing it on a large scale.

They have done so in a simple but extremely effective way, selling advertising associated with search keywords. When you search for “Sony VAIO,” you are expressing intent, massively qualifying yourself to a certain set of companies: How much is it worth to Sony to appear next to search results for “Sony VAIO”? What about Sony’s competitors? How about all the possible retailers of Sony VAIO products? Bidding is open for this and a practically unlimited number of other keyword combinations. And the kicker is, companies only pay when you click their ads, so the incentive to participate is high. Even a one-person small business can sign-up with a credit card. Hundreds of thousands have.

Now, let’s reinforce two important points:

This new ad marketplace allows targeting down to super-specific niches. An example: In an attempt to get highly obscure, I searched “Charles Fourier,” the utopian-socialist philosopher of the 1800s. I got an ad targeted to people researching him. The company behind this ad might have paid a nickel for my click on its ad, but as John summarizes the business model, it’s “a billion dollars, one nickel at a time.” A substantial number of those nickels represent new spending in the ad economy, from small businesses that have never had a venue beyond the yellow pages, or from bigger business (like Amazon.com) that can efficiently sell products with niche appeal along with mainstream products.
Companies pay for ad responses, not ad impressions. This is a major change. Because a large percentage of advertisers can calculate what an ad response is worth to their business, they are ready to spend more on advertising than ever, for as long as it continues to pay back.

That last phrase—“as long as it continues to pay back”—is crucial. It brings us to The Search’s big idea: that the Database of Intentions, in primitive form, is the key factor behind the current system’s rise. Whereas a traditional advertising venue would be bragging if it could offer 31 flavors of content or demographics, this new system allowed search engines to sell millions of flavors of intent. These were sufficiently targeted to make the pay-per-click business model a winner. And the whole thing was automatable enough to allow a marketplace where anyone could participate.

It’s a powerful combination of factors, yet John’s bigger point is that these factors are only starting to play out. They will apply to other media, like television, soon enough. And as the Database of Intentions evolves and is mined more intelligently, it will make “as long as it continues to pay back” go longer and longer for the average advertiser. With billions of extra dollars awaiting ever more efficient forms of advertising and micro-targeting, further growth is at hand.

This extra efficiency can come from many angles. The Googles of the world will get smarter about mining and leveraging full clickstreams, not just your keywords-of-the-moment. Sites, or networks of sites, specialized in certain areas will make search and advertising more powerful by injecting domain-specific knowledge into their systems. John cites GlobalSpec, a site specialized in engineering parts, as an “intelligent island” that has manually created associations of concepts from its world. These associations—a Semantic Web (that is, web of meaning)—in turn can make searching and advertising smarter. Meanwhile, an informal version of the Semantic Web is emerging with user-created tags for various Web pages, which provides yet another kind of grist for the intelligence mill.

Finally, I would not underestimate the potential of people taking their data into their own hands, something that John does not delve into. In a world driven by data about intent, your intent becomes a kind of currency. As I’ve written elsewhere, the next Google may be a company that does not use data about people as a proprietary asset but rather becomes an asset manager for people’s data. Given the Database of Intentions’ privacy and societal implications, which John raises in the book, this type of approach has much to offer. It may well also be economically optimal.

Whatever happens, The Search makes clear that we’re far closer to the beginning than the end of this important story. Kudos to John for going deep to tell it, conceptualize it, and popularize it.

Wednesday, October 12, 2005

Bubble Calibration Instrument

Post the 2005 Web 2.0 conference, angst is rising about whether irrational exuberance has returned. In an effort to address this situation, I have designed a Bubble Calibration Instrument, pictured below.

Naming In Need Of Taming

Amid China’s real-estate boom, new housing developments are appearing with aspirational names like “Aladdin Gardens” and “White House Mini District.” Alarmed that many of these names have a foreign influence, the city of Kunming is taking action. As reported in China Daily:

“The fashion for foreign sounding names on buildings is a loss to native culture and reflects poor taste,” [Kunming Communist Party Secretary] Yang said in remarks reported by the official Xinhua News Agency. “We must correct this practice immediately.”

So does the French government have a new ally in the battle against cultural imperialism? Sort of. Turns out the policy’s casualties will include “Paris of the East Plaza” and “French Gardens.”

In related news, I was recently talking to someone from China who mentioned “Tycoon City” and “Live Like a Kaiser” as further candidates for housing developments with naming in need of taming.

Tuesday, October 11, 2005

Weather Entrepreneurs

Here are a couple news items about unusual, weather-related entrepreneurial efforts:

The Economist profiled a Canadian engineer, Louis Michaud, who wants to create artificial tornadoes as a source of power. If you’ve ever seen a wind farm, you know that humans already get power from wind. The traditional challenge has been to engineer ever more efficient wind turbines to convert wind to power. By contrast, Michaud is attempting to engineer more powerful wind. In essence, he wants to create the conditions that give rise to a natural tornado. The result would be a real tornado, albeit one (according to Michaud) confined to a single place and controlled in intensity, and thus instrumentable for generating power.
The New York Times had a long article about companies attempting to do business in the Arctic. Included is the story of Pat Broe, who in 1997 bought a disused port in northern Canada, paying the Canadian government $7 (yes, $7; $10 Canadian at the time). But now, with the Arctic ice cap having shrunk to its smallest size on record, Arctic shipping lanes are becoming possible for ever longer stretches of the year. For some ships, these lanes can offer shortcuts that save thousands of miles. And conveniently, Broe now has a port along one of the key routes. He’s estimating potential revenues up to $100 million yearly. He also owns the rail line out of the port, which he snagged after the Canadian government denationalized it.

Apparently, playing the weather futures markets wasn’t enough fun for these guys.

Sunday, October 9, 2005

My Freakonomics Encounter

This weekend I had the opportunity to chat with Steven Levitt, professor of economics at the University of Chicago and co-author/subject of the best-selling book Freakonomics. Levitt is famous for finding unexpected answers to real-world questions via quantitative analyses. For example, which is more dangerous to a child: a household with a gun or a household with a swimming pool? (Answer: When Levitt looked at cause-of-death data for children in the United States, he found that swimming-pool-related deaths were roughly 100 times more prevalent than gunplay-related deaths.)

If such nuggets interest you, you’ll love Freakonomics. His co-author Stephen Dubner does for Levitt what Michael Lewis, author of Moneyball, did for baseball’s sabermetricians and quants, bringing the numbers to life with well-told stories.

That said, the in-person version of Levitt was remarkably similar to the voice of the book. He’s not an ivory-tower type that Dubner had to decode for the world. If fact he comes across as instinctively interesting. By that I mean his research seems motivated entirely by what intrigues him personally, but when he talks about it, you can’t help wanting to follow along: Is sumo wrestling rigged? Do real estate agents act in your best interest? Is the 1990s’ drop in crime related to the legalization of abortion twenty years earlier?

Levitt tends to focus on societal questions, but those of us in business analytics should thank him. Through Freakonomics, he is getting ordinary people interested in the value of using data and analytics to understand problems. Given that the alternatives—conventional wisdom, intuition, and “common sense”—are much easier for people to relate to, this is progress.

A few other Levitt resources for those interested:

Freakonomics Blog — Levitt and Dubner keep the freak-out going, including pointers to their pieces that appear in The New York Times.
Levitt’s Papers — For those accustomed to academic papers and college-level math, you’ll find Levitt an unusually clear writer. Also, the economics part of his work is more prominent here than in Freakonomics. I particularly liked the empirical analysis of gambling in the National Football League.
Treating HIV Doesn’t Pay — Levitt mentioned this research about AIDS in Africa by Emily Oster. She used a Freakonomics-style analysis that generated surprising conclusions about the most effective way to minimize loss of life, given the fixed amount of money available. It may make disquieting reading, but there’s no question it matters.

Thursday, October 6, 2005

Attention Trust, Clickstreams, and the Meaning-Mining Problem

Attention Trust is an organization designed to let Web users take control of data they generate online. For example, the organization recently announced a Firefox plug-in that lets you “record” your clickstream. The idea is that you could potentially share it with companies, presumably for something of value in return (a better site experience, product recommendations, money, whatever). TechCrunch has the most straightforward description I’ve seen.

How exactly such clickstream sharing will work is apparently to be determined. A big challenge will be what I call the meaning-mining problem: having just a clickstream is like having just an index without the book; to make the clickstream useful, you need to understand what it points to.

Let’s illustrate with an example. A clickstream is just a sequence of URLs that you visited. A URL like...

http://www.amazon.com/exec/obidos/tg/detail/-/B000005JA8

...is a 57-character text string that has little meaning by itself; it only points to meaning. Request the URL’s page and you’ll see a music CD, Trout Mask Replica. It’s an album originally released in 1969 by Captain Beefheart, a relatively avant-garde artist. Further bits of significant detail are available either directly on the Amazon.com page or from secondary sources, which are another jump away. For example, if we know we are dealing with a music artist, a secondary source might be All Music Guide’s moods. For Captain Beefheart, they include “difficult,” “eccentric,” “cerebral,” “manic,” and “uncompromising.”

Now, if I’m a marketer, these bits of meaning provide clues about what the person who clicked this URL might like—not just in music but also in other media and many consumer-product categories.

Obviously, our example URL is a single data point, which can be misleading. But clickstreams tend to comprise lots of data points, especially if collected continuously over periods of time. So if you’ve been researching a car on the Web over the past few weeks, I know more than a few auto companies that would love to see your clickstream. Or, to be more precise, they’d love to mine the meaning of your clickstream: What category of car are you looking for? What brands are you considering? What price range are you considering? And so on.

The meaning-mining problem is important because these types of high-value questions are answerable if you can start with a relevant clickstream. But the meaning-mining problem is hard because machines are still mediocre at getting from the clickstream to reliably useful meaning. Of course, a human could do the job, following each link and then the secondary sources, but that doesn’t scale.

The vision of a Semantic Web is meant to help machines with these kinds of problems. In the meantime, today’s search engines can get part way there by extracting meaningful features, like keywords, from Web pages. Now that major search engines like Yahoo and Google have open APIs, I expect someone to make a Web service that takes a sequence of URLs and returns a set of coherent keywords that collectively “profile” the URLs’ immediate content. It will be a productive start.

Speaking of which, Attention Trust deserves credit for its own productive start in bringing these types of issues to higher prominence. Tools like the clickstream recorder are especially useful because they bring tangibility to what otherwise tend to be academic-ish discussions.

However, delivering on the promise of putting people in control of their data will likely take a bigger player than Attention Trust. A lot of resources will be necessary to address the meaning-mining problem, as well as several other technical and practical (chicken/egg-style) obstacles. The success factors are:

Potential access to users’ full clickstreams (by owning the operating system, as Microsoft does; by being an Internet Service Provider (ISP) like AOL and Microsoft are; by partnering with ISPs like various search engines do; by having a browser toolbar, which could operate like Attention Trust’s clickstream recorder).
Proximity to a huge number of users who can quickly generate a critical mass of use for the technology.
A massive technical infrastructure to collect and mine clickstreams’ meanings and to make those meanings exchangeable among individuals and companies (or, for that matter, among individuals and other individuals).

Microsoft, AOL, and the major search engines are the obvious candidates, although an eBay or Amazon.com are possibilities too. It will be interesting to see which, if any, of these companies is first to make the necessary mindshift—and take the necessary risks—to go from using data about people as a proprietary asset to becoming an asset manager for people’s data.

Pages