Monday, January 30, 2006

Pandora and Last.fm: Nature vs. Nurture in Music Recommenders

Over the past week, there has been some blog talk (Fred Wilson, TechCrunch, David Porter) comparing music-recommendation services Pandora and Last.fm. I’ve been using both for the past couple months, making notes along the way. The idea was that I’d eventually have something to say. That might as well be now.

Both services allow you to specify a favorite artist, based on which you immediately receive an Internet audio stream of similar music. When I tell people this is possible—that you can have a personalized streaming radio station—most are astonished. So let’s start by saying that what these and similar services do is cool. How Pandora and Last.fm do it is an interesting compare-and-contrast.

Nature versus Nurture

Algorithmically, Pandora versus Last.fm is something like the nature versus nurture debate. Taking the nature side, Pandora’s recommendations are based on the inherent qualities of the music. Give Pandora an artist or song, and it will find similar music in terms of melody, harmony, lyrics, orchestration, vocal character and so on. Pandora likes to call these musical attributes “genes” and its database of songs, classified against hundreds of such attributes, the “Music Genome Project.”

On the nurture side (as in, it’s all about the people around you), Last.fm is a social recommender. It knows little about songs’ inherent qualities. It just assumes that if you and a group of other people enjoy many of the same artists, you will probably enjoy other artists popular with that group.

Like Last.fm, most music-discovery systems have been social recommenders, also known as collaborative filters. Although much of the academic work in the area has focused on improving the matching algorithms, Last.fm’s innovation has been in improving the data the algorithms work on. Last.fm does so by providing users an optional plug-in that automatically monitors your media-player software so whatever you listen to—whether it came from Last.fm or not—can be incorporated into your Last.fm profile and thus be used as the basis for recommendations. Compared to relying on users to manually provide preferences, this automatic and comprehensive data capture leads to far better grist for the data mill.

A side note: In my years of analytics and data mining, a recurring theme is that better algorithms are nice but better data is nicer. That’s because a large number of smart people have evolved the best data-mining algorithms for various scenarios; thus, further improvements tend to be incremental. By contrast, whatever data you happen to be using in a project has probably had no priming for analytical use. Thus, improving how you acquire, clean, and transform that data can have disproportionately large benefits. The catchphrase for the negative version of this is “garbage in, garbage out,” although one could just as easily say, “the more signal in, the more signal out.”

Surfacing New Artists

Pandora and Last.fm are both about helping people discover new music, so let’s consider their approaches in terms of discovering truly “new” music—that is, artists who are just appearing on the music scene. If we assume that both services put new artists into their database at the same rate, Last.fm will be slower in surfacing them as recommendations. This is due to the “cold start” problem that afflicts social recommenders: Before something new can become recommendable, it needs time to accumulate enough popularity to rise above the system’s noise level. In contrast, because Pandora is only comparing songs’ inherent qualities—not who they’re popular with—it should be able to recommend a new artist the first day that artist is in the system. That said, I wouldn’t be surprised if Pandora did a little biasing of recommendations by popularity, which it measures as people use the service.

Partisans of Last.fm might retort that, in practice, Pandora will be slower at getting new artists and music into its database because of Pandora’s classification bottleneck—that is, the time necessary for a Pandora employee to classify each song on hundreds of musical attributes. With that bottleneck, Pandora can’t just classify everything as it comes in the door. By contrast, Last.fm does not need to do manual classification. With its software plug-in continually updating people’s preferences, Last.fm has a virtual army of talent scouts constantly finding new things, which Last.fm can integrate into its database automatically.

(Leaky) Locked Loops

Pandora people might counter that Last.fm’s army of talent scouts is compromised by its relative uniformity. That is, a social recommender tends to reward people who are like those who already use the system. If there are already many people in Last.fm with similar tastes to you, you’ll get good recommendations; if not, then maybe not. And if you don’t get good recommendations, are you going to keep feeding the system data? Probably not, and thus we have a self-perpetuating in-group/out-group situation. The result is a “locked loop,” whereby a social recommender gets stuck in certain genres and styles.

But with a social music recommender, a truly locked loop is unlikely. The reason is “leakage”: A population that shares the same core musical tastes will have enough variance in secondary tastes to allow for a continually expanding spectrum, albeit with much slower expansion in certain genres than others. Here’s an example of the problem. When I checked Last.fm’s similar artists to the reggae legend Bob Marley, first on the list was James Brown, followed by The Chemical Brothers, then Aerosmith. (If you’re reading this well after January 30, 2006, beware that Last.fm’s system is continually evolving, so the lists these links point to will probably have changed.) Other reggae acts appear further down, but the unlikely top choices suggest that Marley has been brought into the system more as a distant secondary choice than as a primary choice with other acts in his genre. A quick check of Aerosmith’s similar artists confirms this: Marley is 41st on the list, way behind various likelier suspects.

While better non-reggae recommendations are easy to imagine for Marley, they probably won’t appear until Marley’s primary fans are better represented on Last.fm. Then the quality non-reggae choices can emerge from his core fans’ secondary choices.

For the sake of comparison, when I put Marley into Pandora, I got something like a reggae radio station at first, which then drifted into other stuff over time.

Why versus What

Pandora is less subject to the echo chamber of overly like minds, but it has its own fundamental challenge in its reliance on matching songs’ “genes.” This rules out connections between songs or artists that don’t fit Pandora’s modeling and matching of musical qualities—which, in turn, puts enormous pressure on Pandora’s specific approach to be correct. In other words, Pandora’s success hinges on a theory, and a specific implementation of that theory, about why music recommendations work. By contrast, Last.fm simply describes what goes together according to its audience and then makes relatively simple inferences from that. So if there are hidden factors that Pandora isn’t explicitly capturing, Last.fm is at least capturing them indirectly.

It’s not hard to find cases where Pandora’s approach runs aground, although the system’s lack of transparency makes it difficult to know where the problem lies. For example, it’s hard to explain Pandora’s initial choices for Gary Numan (he of “Cars” fame). With Numan as the seed, Pandora gave me syrupy pop tunes by Orchestral Maneuvers in the Dark and the Human League. Yes, each artist’s most famous material was from the same time and was primarily electronic, but the latter two really miss the Numan aesthetic, which is more like supercooled liquid metal than warm syrup. Pandora went on to do somewhat better, but not great, with subsequent tunes.

In comparison, Last.fm immediately delivered Numan-appropriate songs from Assemblage 23, Killing Joke, Kraftwerk, and Skinny Puppy, eventually drifting into less relevant territory. Still, Pandora partially redeemed itself with an inspired connection: “Out of Control” by Ric Ocasek (former leader of The Cars), an obscure cut from an artist that is far from obvious as a connection for Gary Numan.

Last.fm’s Delivery versus Pandora’s Promise

I raise the Numan example because it exemplifies my experiences with Last.fm and Pandora. Having used a wide range of artists as seeds, I found Last.fm better than Pandora at delivering songs that I liked or at least didn’t feel compelled to skip, which is the most important thing when I’m listening while doing something else. The exception was when the seed artist had not hit critical mass in the Last.fm system, per the Marley example. Meanwhile, Pandora had more misses but was more likely to surface something truly out of left field, as with the Ric Ocasek example.

As a result, both Pandora and Last.fm have maintained a place in my music-listening world. However, ultimately I think Pandora has greater promise because it is far easier for Pandora to incorporate Last.fm’s functionality than the other way around. This point is important because, just as with the nature versus nurture argument, the best answer is likely to involve elements of both camps. That said, Pandora’s advantage comes at a significant cost to its business, with all the manual work it entails. At this point, Pandora is not delivering proportionally more benefit for that cost—which is why I used the word “promise” above.

Pandora Possibilities

The key to Pandora’s changing the game is to take better advantage of its exclusive, hard-to-replicate metadata about music. Users may never be able to objectively judge the quality of recommendations among different services, but they can definitely tell the difference between services with unique ways of getting to recommendations. For example, I’d like to see Pandora expose some of its internal attributes as dials for the user to control. If I put in the singer Paul Westerberg (former leader of The Replacements), I’d like to tell the system to match more strongly along his lyrical style rather than by the fact he has a “gravely male voice” (which is one of the things Pandora said it was matching on). It’s easy to picture many other creative uses of Pandora’s metadata, both in terms of a recommender and other applications.

Finally, I wonder why Pandora continues to employ hundreds of attributes. In the world of modeling preferences, hundreds of variables typically can be consolidated down to a much smaller number with nearly the same predictive power. Typically, you start with a large number of variables as a kind of fishing expedition and then, over time, reduce the set down to those that are doing most of the work. The reduced set can be part of the original set and/or new variables derived specifically for predictive power. For a labor-intensive business like Pandora’s, being able to cut the number of variables in half (or a lot more) would help contain the costs. And if there’s good reason not to consolidate attributes, I would still be wondering how to innovate in streamlining the production process just as much as how to innovate in the customer-facing part of the business.

Bowling or Batting?

A final thought: What Last.fm and Pandora do is hard. The people who built these services deserve a lot of credit. Given the ambitious scope, it’s easy to find examples where each of the services comes up short. However, it’s worth considering what the yardstick should be. Should we expect spot-on recommendations like a pro bowler expects a strike every time? Or is this more like the baseball batter, who is happy to get a hit one in three times? Whatever the metaphor, the fact that these services do enough right to retain a substantial number of users is good news, because the features and quality will only get better. So when you try Last.fm and/or Pandora, be sure to give them enough time—and enough different starting points—to show their best stuff.

Sunday, January 22, 2006

User Agreements: Stop the Madness

At some point in recent history, user agreements for consumer services got out of control. We all have ignored multiple screens of legalese just to get to the “I Agree” button. The irony is, now that everyone ignores such agreements, it doesn’t matter how ridiculously long they are. So they just keep getting worse.

For example, I got an email asking me to review Hertz #1 Gold’s updated terms and conditions. So I followed the link to the Hertz Web site and found the usual dense thicket of verbiage—which went on for 39 pushes of the “Page Down” key before I reached the bottom, where I could register my agreement. Out of morbid curiosity, I did a “Print Preview” and found that if I printed the screen on standard 8.5x11 paper, it would be 47 pages long.

Now I know that the rental-car business has lots of issues, including different laws for different states and countries. But people, there must be a better way.

Here’s an initial suggestion: The introductions says, “It is not necessary to read Terms and Conditions for rentals in countries in which you are not enrolled to use Gold.” Then why are you showing me those T&Cs in the first place? Under other circumstances, the Hertz Web site knows who I am and what I am enrolled in. Why the sudden amnesia?

To those involved in creating and implementing such agreements, stop the madness!

Thursday, January 19, 2006

BusinessWeek Does the Math

In Math Will Rock Your World, BusinessWeek covers analytics and data mining efforts at various companies. The positioning of everything as “math” is strange, but maybe that’s the hook the writer or editor needed. Whatever.

As a long-form commercial for my line of work, it’s great. And those not into “math” might even find it interesting because it focuses on the applications, not the algorithms.

Wednesday, January 18, 2006

All the Shah’s Men

A book from 2003 that I read recently, Steven Kinzer’s All the Shah’s Men is a history of the CIA’s 1953 coup in Iran. It was the CIA’s first successful regime change, toppling Mohammad Mossadegh, the elected prime minister. However, Kinzer argues that the near-term win was a long-term loss, planting the seeds of the 1979 Iranian revolution and the virulent anti-U.S. sentiments that came with it.

The highlights:

It was about oil. Mossadegh incurred the wrath of the British by nationalizing the British-run company that had the exclusive franchise on Iran’s oil. The British, in turn, packaged their frustrations with Mossadegh into scary scenarios about Iranian instability, baiting the Cold Warriors in the Eisenhower administration. The British wanted their oil franchise back, and the Americans wanted to ensure that Iranian oil didn’t somehow end up under Soviet influence.

The British could not let go of the colonial mindset. As Iran’s overlord in the early 20th century, the British had cut oil deals that left little for Iran. Before and during Mossadegh’s reign, Britain had opportunities to correct this imbalance and defuse tensions. Yet even when American companies began doing 50/50 partnerships with oil-producing countries, Britain refused to go beyond modest concessions from its original deal. The British reasoning was that its involvement in Iran was a noble act of modernization, duly compensated by a long-term contract, and that the ungrateful Iranians needed to be kept in line. These rationalizations of the colonial order, even as its foundations were crumbling, precluded a true British partnership with Iran.

Mossadegh also had trouble letting go of the colonial mindset. Having gained his fame—and, at some level, his identity—as the opposition, he had trouble envisioning solutions that involved the British. While this single-mindedness was instrumental in bringing him to power, it was an obstacle later, as it confirmed for the British that they were dealing with (in their view) ungrateful and irrational natives.

A key enabler for the coup was the relative openness of Iranian society at the time. Mossadegh’s reign had, by Middle East standards, relatively broad freedom of the press and assembly, factors the CIA exploited. The Agency bought-off newspaper editors to plant articles aimed at destabilizing Mossadegh’s rule. And in the critical moments of the coup itself, the CIA hired thugs to mount violent demonstrations in favor of Mossadegh, with the purpose of provoking violent counter-demonstrations and clashes to deepen the chaos. The clerics of the Iranian revolution would later point to this exploitation of freedoms as reasons against granting them.

The coup’s result was to make Iran’s monarch, the Shah, its absolute ruler for the next quarter century. In that time, the Shah was suitably pro-Western albeit increasingly repressive at home. His crushing of political dissent radicalized the opposition, which eventually broke through in the 1979 revolution, rallying a significant vanguard of the population against the Shah and his main backer, the United States.

Meanwhile, within the CIA of the Cold War era, the Iran operation’s perceived success was taken as a model for getting things done. Or as another review of All the Shah’s Men put it, the Iran operation...

...got the CIA into the regime-change business for good—similar efforts would soon follow in Guatemala, Indonesia, and Cuba—but that the Agency has had little success at that enterprise, while bringing itself and the United States more political ill will, and breeding more untoward results, than any other of its activities.

This point of view fairly represents the book’s bigger-picture perspective about unintended consequences. And if you didn’t follow the link, you might be surprised to find that the quote comes from a historian on staff at the CIA, writing in the CIA’s official journal.

Sunday, January 15, 2006

Interstellar Magnetic Slinky

Aside from being a potentially great band name, the title of this post refers to a press release from UC Berkeley called, Astronomers find magnetic Slinky in Orion.

I point to it because, in the world of press releases, having a title that rises above the noise is crucial. To understand the challenge, let’s look at the press release’s lead paragraph:

Astronomers announced today what may be the firs discovery of a helical magnetic field in interstellar space, coiled like a snake around a gas cloud in the constellation of Orion.

Getting from that to “Astronomers find magnetic Slinky in Orion” is the difference between something that won’t get attention and something that will.

It also helps to have some eye-catching imagery:

Thursday, January 12, 2006

Airborne: Neither Vitamin nor Aspirin

In product marketing, it’s often said that you are either selling aspirin (making the customer’s pain go away) or vitamins (making a normal situation better). So how should we view Airborne, a line of products associated with preventing colds?

Last Sunday’s New York Times tell us that Airborne had $90 million in sales in 2004, despite carrying a disclaimer on the box that says Airborne is “not intended to diagnose, treat, cure or prevent any disease.” Technically, Airborne’s main product is a dietary supplement, untested by the Food and Drug Administration.

The Times article all but grimaces at Airborne’s success via folksy marketing, which includes cartoons on the box and the tag line “Created by a Teacher!” But are the people who collectively buy $90 million worth of Airborne irrational and/or deluded? When you think of Airborne as a medicine, perhaps so. But when you think of it as an insurance product, where a few dollars may buy you better odds on a plane flight during cold season, it’s more plausible.

Thinking that way makes Airborne seem like a vitamin. And if we were to look at the ingredients, Airborne would indeed qualify for something in the vitamin or herbal remedy aisle. But here’s the twist: Per the folksy marketing, Airborne targets the everyman and everywoman, not the echinacea-chomping types who frequent the vitamin aisle. The result is a cross-over product: a vitamin (literally) that is often sold next to the aspirin (figuratively)—or, to untangle the medicinal metaphor, next to the Tylenol and Sudafed, both of which Airborne now outsells.

So, although the “Created by a Teacher!” positioning is enough to make me, and maybe you, reject Airborne out of hand, the lesson here is that millions of people were waiting for this product and, more important, its positioning. They just needed their vitamins repackaged as something more like, but not quite, aspirin.

Thursday, January 5, 2006

Laying Down the Law on Dating

Thanks to Isaac for pointing me to Mr. Yoest’s Ten Simple Rules for Dating My Daughter, a funny read whether or not you have a daughter.

And if you like that writerly style, check out W. Bruce Cameron, whose original piece Mr. Yoest enhanced with a little extra guns ’n ammo.

Wednesday, January 4, 2006

Product Name-O-Rama

Our pediatrician’s office is in a building that opened in 1973. Although the pediatrician has a tablet PC in every examination room, there remains some legacy equipment, identifiable by the groovy product names.

For example, this is not a scale; it’s a “Health O Meter”:

And don’t just use an elevator when you can use a “Selectomatic”:

Sunday, January 1, 2006

Microwave Oven Usability

In the time around my daughter’s birth, I was in and around the hospital a lot. Different wards have their own galley kitchens, each with a communal microwave oven. In using the microwave I was puzzled to see that it almost always had a small amount of time remaining from the previous person, who apparently stopped before the countdown was complete.

The hospital microwave was meant to be used as follows: Push the “Power” button, then select a power level, then push the “Time” button (which does nothing if you push it before pushing the “Power” button), then key-in a cooking time, then push the “Start” button.

It’s a relatively standard sequence for microwave ovens, but if you don’t know it, the front panel gives no hints. However, there is an “Add 30 Sec” button, which I found myself using as a shortcut, pushing multiple times until it showed the time I needed. In my case, the time I needed was divisible by 30 seconds, so I used the full countdown. But at some point I realized that most people probably did the same thing with the “Add 30 Sec” button, except they pushed “Stop” when their time (which was not divisible by 30) had counted down.

The lesson: A microwave with 16 buttons was getting used as if it had two buttons, including by me who knew how to use it the intended way.

Meanwhile, at home we have a relatively new microwave oven, a GE SpaceMaker 2.0. It attempts to be more user-friendly, with a touchscreen interface, including various forms of help. Here is the top-level screen:

It also has shortcuts for cooking or reheating food types like “popcorn” and “fish.” While fine for popcorn, the feature usually disappoints due to the variance in cooking required for different types of “fish,” “rice,” “fresh vegetables,” and such.

Still, I applaud GE for trying; it’s a step forward. But now that we’ve got a touchscreen instead of fixed buttons, can’t we be simpler? For example, why not let me push “Start” from a list of the last four settings I’ve used? Something like this:

TaskPowerTime
Cook101:45[Start]
Cook104:00[Start]
Reheat50:45[Start]
Cook102:30[Start]

Because we tend to use the microwave for a few things frequently, and for everything else rarely, this feature could be highly effective—at least for home use, not necessarily for communal use. Another way of doing it would be like speed dial on phones, where a setting could be recalled by pressing 1, 2, etc. (The SpaceMaker 2.0 has user-configurable “custom” buttons, but they don’t have the same level of intuitiveness that a “recent list” or “speed dial” feature would have.)

The point: At the hospital, people faced with 16 buttons improvised their way to the goal by using only two. In that spirit, the touchscreen’s promise is to offer not just more choices but the right choices, in context. The fact that, even with the SpaceMaker 2.0, I still often use the “30 Sec Express” button multiple times—and, in terms of minimizing button pushes, it is rational to do so—means more improvements are waiting to be made.