Thursday, October 6, 2005

Attention Trust, Clickstreams, and the Meaning-Mining Problem

Attention Trust is an organization designed to let Web users take control of data they generate online. For example, the organization recently announced a Firefox plug-in that lets you “record” your clickstream. The idea is that you could potentially share it with companies, presumably for something of value in return (a better site experience, product recommendations, money, whatever). TechCrunch has the most straightforward description I’ve seen.

How exactly such clickstream sharing will work is apparently to be determined. A big challenge will be what I call the meaning-mining problem: having just a clickstream is like having just an index without the book; to make the clickstream useful, you need to understand what it points to.

Let’s illustrate with an example. A clickstream is just a sequence of URLs that you visited. A URL like... a 57-character text string that has little meaning by itself; it only points to meaning. Request the URL’s page and you’ll see a music CD, Trout Mask Replica. It’s an album originally released in 1969 by Captain Beefheart, a relatively avant-garde artist. Further bits of significant detail are available either directly on the page or from secondary sources, which are another jump away. For example, if we know we are dealing with a music artist, a secondary source might be All Music Guide’s moods. For Captain Beefheart, they include “difficult,” “eccentric,” “cerebral,” “manic,” and “uncompromising.”

Now, if I’m a marketer, these bits of meaning provide clues about what the person who clicked this URL might like—not just in music but also in other media and many consumer-product categories.

Obviously, our example URL is a single data point, which can be misleading. But clickstreams tend to comprise lots of data points, especially if collected continuously over periods of time. So if you’ve been researching a car on the Web over the past few weeks, I know more than a few auto companies that would love to see your clickstream. Or, to be more precise, they’d love to mine the meaning of your clickstream: What category of car are you looking for? What brands are you considering? What price range are you considering? And so on.

The meaning-mining problem is important because these types of high-value questions are answerable if you can start with a relevant clickstream. But the meaning-mining problem is hard because machines are still mediocre at getting from the clickstream to reliably useful meaning. Of course, a human could do the job, following each link and then the secondary sources, but that doesn’t scale.

The vision of a Semantic Web is meant to help machines with these kinds of problems. In the meantime, today’s search engines can get part way there by extracting meaningful features, like keywords, from Web pages. Now that major search engines like Yahoo and Google have open APIs, I expect someone to make a Web service that takes a sequence of URLs and returns a set of coherent keywords that collectively “profile” the URLs’ immediate content. It will be a productive start.

Speaking of which, Attention Trust deserves credit for its own productive start in bringing these types of issues to higher prominence. Tools like the clickstream recorder are especially useful because they bring tangibility to what otherwise tend to be academic-ish discussions.

However, delivering on the promise of putting people in control of their data will likely take a bigger player than Attention Trust. A lot of resources will be necessary to address the meaning-mining problem, as well as several other technical and practical (chicken/egg-style) obstacles. The success factors are:

  1. Potential access to users’ full clickstreams (by owning the operating system, as Microsoft does; by being an Internet Service Provider (ISP) like AOL and Microsoft are; by partnering with ISPs like various search engines do; by having a browser toolbar, which could operate like Attention Trust’s clickstream recorder).
  2. Proximity to a huge number of users who can quickly generate a critical mass of use for the technology.
  3. A massive technical infrastructure to collect and mine clickstreams’ meanings and to make those meanings exchangeable among individuals and companies (or, for that matter, among individuals and other individuals).

Microsoft, AOL, and the major search engines are the obvious candidates, although an eBay or are possibilities too. It will be interesting to see which, if any, of these companies is first to make the necessary mindshift—and take the necessary risks—to go from using data about people as a proprietary asset to becoming an asset manager for people’s data.

1 comment:

  1. Hi Steve,
    I'm the Executive Director of AttentionTrust. Great post--very thought provoking. I linked to it the other day on our blog, and I just added another post with some excerpts and comments at Thanks for joining in the discussion--I look forward to staying in touch.