Tuesday, September 11, 2012

At Responsys

An update from the professional front: I have joined Responsys as SVP Product Management.

Responsys is a software-as-service provider for interactive marketing: It helps companies communicate with customers via interactive channels like email, web, social, mobile, and display ads. As people increasingly spend time online, these channels are where marketers want to be. Plus, compared to traditional channels like television or print, in which you get the same message as everyone else, interactive channels promise to be more targeted and personalized—so you get what’s relevant for you.

Having been in the interactive marketing field since the beginning, I believe in this promise. It is good for companies and customers alike. The challenge is to make it real. Responsys is a leader in doing so, having gone public in 2011 (ticker symbol MKTG) after growing straight through the late 2000s recession.

So, I’m excited to help grow something that already has substantial scale, in a market that keeps renewing itself with new channels and technologies. If you want to join me, we have many opportunities across the company for like-minded people.

Monday, September 3, 2012

Back in the Bay Area

I knew I was back in the Bay Area when:

  • On the second day, exiting Highway 92, I was behind a Google self-driving car.

  • My new town’s waste-collection system gave me three cans: a big one for recycling, a big one for composting, and a small one for garbage.

  • I was walking along a trail, wearing an E*Trade hat I randomly acquired in the past. A guy walking the other way asked, “Do you know what E*Trade closed at today?”

Yes, after four-plus years in West Hartford, Connecticut, we–my wife, daughter, and I–are back. As I and others have said before, West Hartford is great. But for us, the Bay Area is home.

Thursday, July 26, 2012

Omnivark’s Time and Place

Today saw the final edition of Omnivark, my personal project to teach a computer to identify great writing. Each day, Omnivark would pick three pieces of new, nonfiction writing on the Web, plus a book. I’m proud of Omnivark’s quality during its six-month run. (Feel free to click a random day from the archive, and see what you think.)

So why stop now? Omnivark was something different and fun to do during the past months while I was also doing consulting. However, that period was an in-between time, from when I completed the Intelligent Cross-Sell integration at RichRelevance until my family’s move back to the Bay Area, which is happening in August. Soon after, I’ll be resuming my normal career—more on that in a future post.

Suffice to say, Omnivark was of a certain place and time, which are changing. I enjoyed creating it; I hope you enjoyed reading it. If you did, here are some suggestions for alternatives to keep your tank topped with great reads.

And finally, for those interested in how the technology worked:

Omnivark’s Division of Labor

As I taught a computer to recognize great writing, the division of labor between me (the teacher) and the computer changed over time. The Omnivark software started relatively stupid and ended relatively smart.

At the beginning, I was heavily influencing Omnivark’s daily picks of great writing on the Web. This was necessary to create a training set of great writing for Omnivark’s algorithms, so they could learn to find other great writing. Over time, and a lot of experimentation, Omnivark became smart enough so it could do most of the work in choosing great reads for an edition.

But to be clear, there was a big distance between Omnivark’s doing most versus all of the work. Most of the work was Omnivark’s finding and scoring hundreds—sometimes over a thousand—of candidate pieces a day. I still needed to pick the best of Omnivark’s best, factoring in issues like diversity of topics and sources.

Unless I was working on the algorithms, I would need to read only perhaps ten of the top-scoring candidates. From them, I often was able to find two of the three Web picks for an edition. I’d find the third pick by scanning further down the list of candidates for an interesting headline, or I would find it from my own normal Web browsing, or I’d occasionally get a great suggestion from an Omnivark reader.

The Omnivark algorithms were capable of rating not only an entire piece but also individual sentences. Once in a while (maybe one in ten times), I’d agree with Omnivark’s choice for the best sentence to use as a quote from the piece. The low agreement was due to Omnivark’s simply judging a sentence for artfulness, whereas I was also judging how well a sentence indicated what the piece was about. Also, I could easily see when multiple sentences, or fragments of sentences, were better than the best sentence—a far from easy task for software.

The fourth and final pick in every Omnivark edition was from a book. This turned out to be a distinct challenge because book excerpts are often in PDFs or special viewer applications such as Amazon’s “Look Inside.” Automating their extraction was different enough from everything else I was doing that I ended up picking the books manually, guided by high user reviews and official endorsements.

So, except for the book picks, the division of labor shifted nicely from human to computer. In terms of replacing a human’s hours, Omnivark got maybe 90% of the way there. However, that last 10%’s hours are a lot harder to automate than the first 90%’s. They involve creative judgment such as knowing when different picks go well together, or recognizing that a certain sentence captures the essence of a larger point. Perhaps someday a computer will do that too, but there will always be the need to model specific humans’ judgments; otherwise, it would be like having the same editor for every magazine. In that sense, humans will always be the teachers.

Thursday, June 21, 2012

Stylistic Signals

As Omnivark trawls the Web for new, great writing, it has two distinct tasks. First, where does it find the candidates—the articles, essays, the blog posts—that might be great writing? My previous post, Following the Elites, was about this challenge.

Second, once Omnivark has a set of candidates, how does it know which few are great? For example, given an entire issue of The New Yorker, what is the best thing in it?

The New Yorker’s editor might say it’s all great. And different readers will surely have different opinions of what’s best. So to clarify: In this case best means most like the structure and style of other great reads. (The other great reads were classified as such by a human expert.)

Note that we are comparing texts’ forms, not their topics. So, given a great read about a boar-hunting congressman, Omnivark will try to find more pieces that are written like that, as opposed to more pieces about boar-hunting congressmen.

This is an important distinction. Most text-analytics systems do topic-matching (find more boar-hunting congressmen). Omnivark is about style-matching. Omnivark will measure a new piece of writing against the characteristics of great writing that Omnivark has already modeled. Those characteristics include statistical, semantic, and structural properties of the text. Some examples:

  • Simple statistical properties include the text’s total number of words, the average numer of words per sentence, and the average number of sentences per paragraph. These simple metrics are better for filtering-out the bad than discerning the best among the good. However, more complex metrics (such as the ratio of nouns to adjectives) resonate with certain writing styles.

  • Semantic properties refer to the meanings of the words used. This is tricky because we want to capture how word choices correlate with style but not with topic. We don’t care that boar appears a lot in the boar-hunting piece; we do care about the artful usage of certain adjectives, adverbs, and other flavoring words, the use of which makes the prose more expressive.

  • Structual properties include how sentences and paragraphs are put together. For example, the use of balanced or parallel phrases is an indicator of expressive writing, as is the use of similes and metaphors. Detecting these structures in a general way is hard.

In the world of search engines like Google, these properties are called signals. Omnivark’s job is to know the signals that best predict great writing. As an extra twist, because great writing takes different forms, Omnivark needs to employ different configurations of signals.

Behind the scenes, I built a tool that makes exploring for signals relatively easy. A new signal can be tested in real time on a set of training texts diverse in style and quality.

For me, this exploration for stylistic signals is the most interesting part of creating Omnivark. Having taught writing, I have reasonably good instincts for prose quality. However, knowing it when you see it is different from generalizing that knowledge into a computer. In practice, it’s easy to identify signals that find great writing but also find a lot of mediocre writing too. It is much harder to find the signals that cleanly discern the best from the rest.