Saturday, November 22, 2008

SVD for You and Me

During Personify’s heyday, we were featured in a Business 2.0 article. It doesn’t seem to exist on the Web anymore, but the main thing I remember was a sentence that talked about Personify’s “algorithm-based software”—a phrase as useless as describing a car engine as “moving-parts-based.”

Certainly no one fed the author that phrase, and I doubt he or she invented it. More likely, the author wrote something that actually described our software, which an editor took the liberty of simplifying—to the point of pointlessness—for Business 2.0’s audience. Such things happen. It generated some smirks around the office, and that was that.

I tell this story because this weekend’s New York Times Magazine has a welcome counterpoint: an article about the Netflix Prize that could easily have hand-waved the details, per “algorithm-based software,” but instead made the details approachable and interesting for an audience even more general than Business 2.0’s.

Ironically, the algorithmic star of the article is singular value decomposition (SVD), a core component of, you guessed it, Personify’s algorithm-based software. Author Clive Thompson and his editor deserve credit for explaining SVD in everyday language, sprinkling plenty of movie examples from the Netflix contest. I’ll let you read it in the article (link below), but understanding SVD matters to Thompson’s larger questions of how predictable human tastes are and whether humans have limits to comprehending why certain predictions work.

A final irony: The New York Times Company may be running SVD to analyze behavior related to, among other things, Thompson’s article. I say that because The Times was a Personify customer, and last I heard (as of mid-2008), they were still running it at terabyte scale, six years after we discontinued official support. Just goes to show, many weird connections exist out there.

Now, onto the main attraction: Thompson’s If You Liked This, You’re Sure to Love That in The New York Times Magazine. Enjoy.

Saturday, November 15, 2008

Water-Based Data Centers

Every once in a while, an idea comes along that combines breakthrough creativity with utter practicality. Google’s water-based data center is such an idea.

The problem: A huge cost in Google’s business is building and running data centers. An individual data center houses thousands of computers that together run Google’s services. Operating the computers requires a lot of power. Moreover, because many computers are packed tightly together, keeping them cool is a major issue.

The solution: Float a data center on an ocean ship. The power comes from converting tidal motion into energy. The cooling comes from sea water, pumped through. An undersea cable carries the data to shore. If that sounds good, also consider that the facility may not be subject to taxes or government regulation, depending on where the ship is positioned. Finally, the transportation world already has an infrastructure for moving standardized containers around. Thus, Google can build its data-center in containers-sized modules, truck them to a container port, and drop them on a ship.

Of course, implementing this idea will have many obstacles. Maybe it will never work because of devils in the details. Whatever the outcome, the idea itself deserves praise just for the ingenuity.

Sunday, November 9, 2008

Poundstone’s Gaming the Vote

In the United States, the normal way to elect an official goes like this: Each voter picks a single candidate, and the candidate with the most votes wins. Call it a simple plurality vote—plurality referring to the requirement that a candidate only needs the most votes, not necessarily a majority. In presidential elections, the Electoral College complicates matters, but even the Electoral College is (mostly) based on combining states’ simple-plurality votes into a higher-level vote.

Although every presidential election renews the debate about the Electoral College, it’s worth asking a more fundamental question: Electoral College or not, is simple-plurality voting the best way to do an election?

In Gaming the Vote, William Poundstone argues it is far from the best way. The main problem is vote splitting, which can occur when an election has three or more candidates. An example: 60% of voters would vote for either Alice or Bob, who have similar policies. However, because each of these voters must choose either Alice or Bob, the total ends up split: Alice gets 30%, and Bob gets 30%. Meanwhile, only 40% of voters would ever vote for Carol, whose policies are the exact opposite of Alice’s and Bob’s. But because Carol has no vote-splitting competition, she wins the election, 40% to 30% to 30%. Thus, 60% of the electorate gets exactly the opposite of what it wanted.

If that seems abstract, recall the U.S. presidential election of 2000. Gore and Bush were in a close race that came down to a disputed count in Florida, which Bush won by 532 votes. Meanwhile, the Green party candidate, Ralph Nader, had received more than 97,000 votes in Florida. In post-election opinion polls, Nader voters preferred Gore to Bush by 2 to 1. So without Nader as the vote-splitting spoiler, Gore would have likely won Florida and the presidency.

Spoilers are enough of a problem when they occur naturally, but Poundstone makes the case that spoilers are increasingly being engineered. For example, he profiles several races from 2006 where supporters of one campaign sponsored a competitive campaign as a spoiler against the lead candidate. Not all were successful, but Poundstone’s larger point is that sponsored spoilers are cost-efficient bets, even if they don’t succeed. For example, in the 2006 campaign for a Pennsylvania U.S. Senate seat, supporters of Republican Rick Santorum paid $66,000 to aid the Green Party candidate’s drive to get on the ballot. As Poundstone observes, paying $66,000 to help the Green Party siphon off even 1% of the Democratic candidate’s vote is a lot cheaper than Santorum’s actually gaining that amount for himself, which would require million-dollar ad buys. (However, this particular ploy backfired when the Green candidate failed to make the ballot anyway.)

So, spoilers are bad. What to do?

The good news is, plenty of alternate voting systems exist. The bad news is, they all have flaws.

In 1948, economist Kenneth Arrow created a logical proof showing, in Poundstone’s words, “vote splitting and worse paradoxes can corrupt almost any reasonable way of voting.” From the 1970s, the Gibbard-Satterthwaite Theorem further showed that no voting system can be immune to strategic voting, where it is in the rational self-interest of some voters to not vote their true preferences. For example, in a “ranked choice” election, where voters rank each candidate, incentives arise to bury the serious contenders to one’s favored candidate. If everyone does that, the winner can end up being a pawn in the middle: some unqualified candidate that no one wanted to win but that happened to get a lot of second-place votes.

Poundstone provides a tour of various systems, down to the details. Interleaved with the theories, he dishes real-world examples and anecdotes from a rogue’s gallery of elections. (“The 1844 race was enlivened by the interesting claim that Henry Clay had broken every one of the ten commandements.”)

Poundstone also profiles various academics in the field. Individually, they seem like dedicated pursuers of truth, but collectively they achieve little beyond savaging each other’s favored theories. Meanwhile, the one person who has made clear progress in getting alternative voting systems accepted in several U.S. cities—Rob Richie of the advocacy organization FairVote—has done so by advocating the system that’s easiest to sell to politicians and the public: instant runoff voting (IRV). According to Poundstone, practically every academic agrees IRV is better than simple-plurality voting, but because IRV is not their preferred system, most academics have attacked Richie as they have attacked each other.

Given its fratricidal nature, the movement for better election systems leaves much to be desired as a movement. But even if the movement’s players worked together rather than against each other, they would still be on an uphill climb. The United States’ two major parties, because they already are the major parties, have more to lose than gain from changing the rules. Yet they often control the rules, directly or indirectly.

Poundstone raises these issues to no conclusion, although he does conclude that a particular election system is best. It’s called range voting, and its main feature is rating candidates on a numeric scale rather than ranking them in a sequence. The key thing range voting achieves is representing intensity of preferences, which turns out to address many issues.

Notably, range voting came from outside the election-systems academic community. A mathematician simulated a wide variety of voting schemes to see which one was best in practice, since in theory they all have some problem or other. He found that range voting had the best fidelity in representing voters’ true preferences, across simulated elections with various configurations of candidates and voting strategies.

Although Poundstone apparently could not elicit from the academic community a fatal flaw with range voting, neither could he get an endorsement. So the book does not have the satisfaction of ending with a thumping “case closed!” If anything, the case is just being opened.

With Gaming the Vote, Poundstone has done a service by making these important but obscure issues worth a read for the interested voter.