Keeping the good stuff out at Yahoo! Answers

This is, I think, an amusing and instructive tale. I’m a bit sorry to be telling it, because I have a lot of friends at Yahoo! (especially in the Research division), and I respect the organization. The point is not to criticize Yahoo! Answers, however: keeping pollution out is a hard problem for user-contributed content information services, and that their system is imperfect is a matter for sympathy, not scorn.
While preparing for my recent presentation at Yahoo! Research, I wondered whether Yahoo! Mail was still using the the Goodmail spam-reduction system (which is based on monetary incentives). I couldn’t find the answer with a quick Google search, nor by searching the Goodmail and Yahoo! corporate web sites (Goodmail claims that Yahoo! is a current client, but there was no information about whether Yahoo! is actually using the service, or what impact it is having).

So, I thought, this is a great chance to give Yahoo! Answers a try. I realize the question answerers are not generally Yahoo! employees, but I figured some knowledgeable people might notice the question. Here is my question, in full:

Is Yahoo! Mail actually using Goodmail’s Certified Email? In 2005 Yahoo!, AOL and Goodmail announced that the former 2 had adopted Goodmail’s “Certified Email” system to allow large senders to buy “stamps” to certify their mail (see e.g., The Goodmail home page currently states that this system is available at Yahoo!. Yet I can find nothing about it searching Yahoo!Mail Help, etc. My question: I the system actually being used at Yahoo!Mail? Bonus: Any articles, reports, etc. about its success or impacts on user email experience?

A day later I received the following “Violation Notice” from Yahoo! Answers:

You have posted content to Yahoo! Answers in violation of our Community Guidelines or Terms of Service. As a result, your content has been deleted. Community Guidelines help to keep Yahoo! Answers a safe and useful community, so we appreciate your consideration of its rules.

So, what is objectionable about my question? It is not profane or a rant. It is precisely stated (though compound), and I provided background context to aid answerers (and so they knew what I already knew).
I dutifully went and read the Community Guidelines (CG) and the Terms of Service (TOS), and I could not figure out what I had violated. I had heard elsewhere that some people did not like TinyURLs because it it not clear where you are being redirected, and thus it might be used to maliciously direct traffic. But I saw nothing in the CG or TOS that prohibited URLs in general, or TinyURLs specifically.
So I contacted the link they provided to appeal the deletion. A few days later I received a reply that cut-and-pasted the information from the Yahoo! Answers help page explaining why content is deleted. This merely repeated what I had been told in the first message (since none of the other categories applied): my content was in violation of the CG or TOS. But no information was provided (second time) on how the content violated these rules.
Another address was provided to appeal the decision, so I wrote a detailed message to that address, explaining my question, and my efforts to figure out what I was violating. A few days later, I got my third email from Yahoo! Answers:

We have reviewed your appeal request. Upon review we found that your
content was indeed in violation of the Yahoo! Answers Community
Guidelines, Yahoo! Community Guidelines or the Yahoo! Terms of Service. As a result, your content will remain removed from Yahoo! Answers.

Well… Apparently it’s clear to others that my message violates the CG or the TOS, but no one wants to tell me what the violation actually is. Three answers, all three with no specific explanation. Starting to feel like I’m a character in a Kafka novel.
At this point, I laughed and gave up (it was time for me to travel to Yahoo! to give my — apparently dangerous and community-guideline-violating — presentation anyway).
I have to believe that there is something about the use of a URL, a TinyURL, or the content to which I pointed that is a violation. I’ve looked, and found many answers that post URLs (not surprisingly) to provide people with further information. Perhaps the problem is that I was linking to a Goodmail press release on their web site, and they have a copyright notice on that page? But does Yahoo! really think providing a URL is “otherwise make available any Content that infringes any patent, trademark, trade secret, copyright” (from the TOS)? Isn’t that what Yahoo’s search engine does all the time?
End of story.
Moral? Yahoo! Answers is a user-contributed content platform. Like most, that means it is fundamentally an open-access publishing platform. There will be people who want to publish content that is outside the host’s desired content scope. How to keep out the pollution? Yahoo! uses a well-understood, expensive method to screen: labor. People read the posted questions and make determinations about acceptability. But, as with any screen, there are Type I (false negative) and Type II (false positive) errors. Screening polluting content is hard.
(My question probably does violate something, but surely the spirit of my question does not. I had a standard, factual, reference question, ironically, to learn a fact that I wanted to use in a presentation to Yahoo! Research. A bit more clarity about what I was violating and I would have contributed desirable content to Yahoo! Answers. Instead, a “good” contributor was kept out.)

Presentation at Yahoo! Research on user-contributed content

Yahoo! Research invited me to speak in their “Big Thinkers” series at the Santa Clara campus on 12 March 2008. My talk was “Incentive-centered design for user-contributed content: Getting the good stuff in, Keeping the bad stuff out.”
My hosts wrote a summary of the talk (that is a bit incorrect in places and skips some of the main points, but is reasonably good), and posted a video they took of the talk. The video, unfortunately, focuses mostly on me without my visual presentation, panning only occasionally to show a handful of the 140 or so illustrations I used. The talk is, I think, much more effective with the visual component. (In particular, it reduces the impact of the amount of time I spend glancing down to check my speaker notes!)
In the talk I present a three-part story: UCC problems are unavoidably ICD problems; ICD offers a principled approach to design; and ICD works in practical settings. I described three main incentives challenges for UCC design: getting people to contribute; motivating quality and variety of contributions; and discouraging “polluters” from using the UCC platform as an opportunity to publish off-topic content (such as commercial ads, or spam). I illustrated with a number of examples in the wild, and a number of emerging research projects on which my students and I are working.

ICD for home computer security

Ph.D. student Rick Wash and I are applying ICD design tools to the problem of home computer security. Metromode (online magazine) recently published an article featuring our project.

One of the major threats to home computers are viruses that install bots, creating botnets. These bots are code that use the computer’s resources to perform something on behalf of the bot owner. Most commonly, the bots become spam sending engines, so that spammers can send mail from thousands of home computers, making it harder to block the spam by originating IP (and also saving them the cost of buying and maintaining a server farm). Bots, of course, may also log keystrokes and try to capture bank passwords and credit card numbers.
The problem is crawling with incentives issues. Unlike first generation viruses, bots tend to be smarter about detection. In particular, they watch the process table, and limit themselves to using CPU cycles when other programs are not using many. That way, a normal home user may not see any evidence that he or she has a virus: the computer does not seem to noticeably slow down (but while they are away from the machine the bot may be running full tilt sending out spam). So, the bot doesn’t harm its host much, but it harms others (spreading spam, the bot virus itself, possibly other harmful activity like denial-of-service attacks on other hosts). This is a classic negative externality: the computer owner has little incentive (and often little appropriate knowledge) to stop the bot, but others suffer. How to get the home computer user to protect his or her machine better?
We are developing a social firewall that integrates with standard personal firewall services to provide the user additional benefits (motivating them to use the service), while simultaneously providing improved security information to the firewalls employed by other users.
We don’t have any papers released on this new system yet, but for some of the foundational ideas, see “Incentive-Centered Design for Information Security“, ICEC-07.

The Pause of Mr. Clause

One of my students, who had a difficult week, mentioned that s/he was looking around for “downward social comparisons” to feel better. This phrase comes from the social psychology literature on motivations. The idea is that people are motivated by how they perceive themselves doing on some criterion relative to others. More recent versions of this distinguish between downward and upward comparisons.
This incident reminded me of one of my favorite stories. It was told by Arlo Guthrie back in the late 60s, around the time he became famous not for being the son of Woody, but for being the composer and performer of “Alice’s Restaurant”. AR is a long (18 minute) story told to a strummed motif, a “zygote of a melody” (to paraphrase Ani Difranco). Arlo told lots of stories less famous, too. One he would tell before singing “The Pause of Mr. Clause” (which is a song about how the FBI would be very suspicious of Santa Clause (sic), given his long beard, red clothes — is he a commie? — and what’s in that pipe that he’s smoking, anyway?). The story is the classic downward social comparison story. Here it is, copied from the version published in This is the Arlo Guthrie Book (Amsco Music Publishing, NY, 1969):

“During these hard days and hard weeks, everybody always has it bad once in a while. You have a bad time of it and you always have a friend that says, ‘Hey, man, you ain’t got it that bad. Look at that guy!’ and you look at that guy and he’s got it worse than you. And it makes you feel better that there’s somebody that got it worse than you. But think of the last guy! Nobody’s got it worse than that guy! Nobody in the whole world! That guy — he’s so alone in the world that he doesn’t even have a street to lay in for a truck to run him over. Nothin’s happenin’ for that cat!
And all that he has to do to create a little excitement in his life is to bum a dime from somewhere, call up the FBI, say ‘FBI’ — thay say, ‘Yes’…say, ‘I dig Uncle Ho and Chairman Mao, and their friends are comin’ over for dinner!’ Click. Hang up the phone. And within two minutes (and not two minutes from when he hangs up the phone, but two minutes from when he first put the dime in) they got 30,000 feet of tape rolling! Files on tape. Pictures, movies, dramas, actions on tape — and then they send out half a million people all over the entire world…the globe…to find out all they can about this guy!
‘Cause there’s a number of questions involved in this guy. I mean, if he was the last guy in the world, how’d he get a dime to call the FBI? There are plenty of people that aren’t the last guys that can’t get dimes! He comes along and he gets a dime! I mean, if he had to bum a dime to call the FBI, how was he gonna serve dinner for all those people? How could the last guy make dinner for all those people? ANd if he could make dinner, and was gonna make dinner, then why did he call the FBI?
They find out all of those questions within two minutes! And that’s a great thing about America. I mean, this is the only country in the world — well, it’s not the only country in the world that can find stuff out in two minutes, but it’s the only country in the world that would take two minutes for that guy! Other countries would say, ‘Hey — he’s the last guy. Screw him.’ But in America, there is no discrimination and there is no hypocrisy ’cause they’ll get anybody. And that’s a wonderful thing about America.”

UCC incentives the old-fashioned way

Ben Kaufman announced Kluster at TED 2008. This is a business through which businesses can solicit user-contributed content: innovative technology or product ideas, business solutions, etc. Why would anyone give a for-profit company good innovation ideas? For a cash incentive…Business post challenges with a cash bonus, and Kluster has a scheme for paying out tha bonus to people whose ideas are successful. (It also runs a prediction market on the side for wagers on which of the proposed ideas will succeed.) No volunteers here: this UCC is compensated in the traditional form of tournament prizes.
Two similar businesses, at least, are already operating: InnoCentive and Cambrian House.
Think you’re smart, but don’t have time or capital to turn your ideas into businesses? Go sell your ideas online!
(Based on reporting in Putting Innovation in the Hands of a Crowd – New York Times)

Looking for (well-paid, highly-trained, very busy) volunteers

The Peer to Patent project is one of my favorite examples of a user-contributed content (UCC) project recently, not because it has been very successful (yet), but because it demonstrates the surprising and important ways that UCC may go to benefit society. It’s no all Wikipedia and social networking!
Peer to Patent is a project started by Prof. Beth Noveck and her Do Tank group at NYU Law School. The US Patent Office adopted it for a one-year pilot starting 15 June 2007. It is a system to post patent applications for public comment, in particular seeking suggestions about possible prior art, to assist USPTO examiners determine whether a patent should be granted. It was motivated by a widely held sense, particular in the area of software and business process patents, that the USPTO has been overwhelmed with the number of applications and the advances in technology in recent years, and that many and patents have been granted which can have the effect of stifling new innovation. During the first six months of the pilot, over 1800 people have registered to participate, and over 150 prior art references have been submitted on 24 patent applications that can be reviewed through this system.

In the February 2008 issue of the Communications of the ACM, Andy Oram published a column about the project in which he discussed the incentives challenges that may stand in the way of success. First, of course, not just any user is likely to be able to make quality contributions: to be useful, a contributor must have serious expertise in the area of the patent in order to be able to understand the application well enough to recognize possible prior art, and must know the literature well enough to identify the prior art. That’s not a lot of people, and they aren’t the type who have a lot of underpaid hours to volunteer. Indeed, he quotes Jon Bentley of Avaya Labs who points out that the whole essence of patenting is making money, and that the people in the best position to contribute may be those least interested in doing so.
One of the hopes of the project is that it is the monetary incentive itself — not provided by Peer to Patent, but indirectly — that will induce people to contribute: competitors. That is, if some company is using technology on which a patent is proposed, or is developing something similar, it will have a financial interest in seeing that the patent is not granted. Thus, they might be the ones to put the time in to review the application and propose the prior art. Although they are interested parties, as Oram says “prior art is prior art no matter who finds it”.
Interesting problem, and I’m looking forward to seeing whether or not Peer to Patent can succeed (and I hope it does, because I tend to think that too many software and business patent applications are approved).