User maybe-not-contributed content

Curiouser and curiouser.
Last week Jonathan Tasini, a free-lance writer, filed a lawsuit on behalf of himself and other bloggers who contributed — well maybe not contributed — their work to the Huffington Post site. His complaint is that Huffington Post sold itself to AOL for $315 million and did not share any of the gain with the volunteer — well maybe not volunteer — writers.
The lawsuit complaint makes fun reading, as these things go.
The main gripe (other than class warfare: it’s unfair!) seems to be that HuffPo “lured” (paragraph 2) writers to contribute their work not for payment but for “exposure (visibility, promotion and distribution)”, yet did not provide “a real and accurate measure of exposure” (paragraph 103). However, as far as I can see, there is no claim that HuffPo ever told its writers that HuffPo would not be earning revenue, nor a promise that it would provide any page view or other web analytic data.
How deceived was Tasini? He’s no innocent. In fact, he volunteers (oops! there’s that word again) in the complaint that he runs his own web site, that he posts articles to it written by volunteers, and that he earned revenue from the web site (paragraph 15). And he was the lead plaintiff in the famous (successful) lawsuit against the New York Times when it tried to resell freelance writer content to digital outlets (not authorized in its original contracts with the writers). And, gosh, though he was “lured” into writing for the HuffPo, and was “deceived” into thinking it was a “free forum for ideas”, he didn’t notice that they sold ads and were making money during the several years in which he contributed 216 articles to the site. That’s a pretty powerful fog of deception! Maybe Arianna Huffington should work for the CIA.

New article characterizing crowdsourcing on the web

There is an article in the new issue of CACM on “Crowdsourcing systems on the World-Wide Web”, by Anhai Doan, Raghu Ramakrishnan, and Alon Y. Halevy. In it they offer a definition of crowdsourcing systems, characterize them along nine dimensions, and discuss some of the dimensions as challenges.
It’s a useful review article, with many examples and a good bibliography. The characterization in nine dimensions is clear and I think mostly useful.
I’m particularly pleased to see that they have given prominent attention to the incentive-centered design issues on which I (and this blog) have focused for years. Indeed, they define crowdsourcing systems in terms of four incentive problems that must be solved (distinguishing them from, say, crowd management systems that only address three of the questions). They define crowdsourcing as “A system that enlists humans to help solve a problem defined by the system owners, if in doing so it addresses four fundamental challenges:

  • How to recruit and retain users?
  • What contributions can users make?
  • How to combine user contributions to solve the target problems?
  • How to evaluate users and their contributions?

The first and second are the “getting stuff in” (contribution) problem about which I write. How to get people to make effort to contribute something to the good of others? The fourth is the quality incentive problem, which I usually separate into “getting good stuff in” (positive quality), and “keeping bad stuff out”.

Even academics pollute Amazon reviews (updated)

[Oops. Turns out that Orlando Figes himself was the poison pen reviewer, and that he simply compounded his dishonesty by blaming his wife. That’s got to put a bit of strain on the home life.]
That people use pseudonyms to write not-arm’s-length book reviews on Amazon is no longer news.
But I couldn’t resist pointing out this new case, if nothing else as an especially fun example to use in teaching. Dr. Stephanie Palmer, a senior law (!) lecturer at Cambridge University (UK), was outed by her husband, Prof. Orlando Figes, for writing reviews under a pseudonym that savaged the works of his rivals, while also writing a review of a book by her husband that it was a “beautiful and necessary” account, written by an author with “superb story-telling skills.” Story-telling, indeed.
A closing comment by the editor of the Times Literary Service, which broke the story: “What is new and is regrettable is when historians use the law to stifle debate and to put something in the paper which is untrue….[Figes’s] whole business is replacing a mountain of lies with a few truths”.
Via The Guardian.

Yelp’s new idea

Yelp!, the local business user-contributed review site, has a well-known set of manipulative incentive problems. First, businesses might want to write overly positive reviews of themselves (under pseudonyms). Second, they might want to write negative reviews of their competitors. Third, they might want to pay Yelp to remove negative reviews of themselves. This last has received a lot of attention, including a class action suit against Yelp alleging that some of its sales people extort businesses into paying to remove unfavorable reviews.
Yelp has always filtered reviews, trying to remove those that it suspects are biased either too positive or too negative. But of course it makes both Type I and Type II errors, and some of the Type IIs (filtering out valid reviews) may be at the root of some of the extortion claims (or not).
Yelp has now made a rather simple, but I suspect quite favorable change: <a href=
""it is making all filtered reviews visible (on another page). This transparency, it hopes, will let users see that it is even-handed in its filtering, and that its errors are not themselves biased (or influenced).
Embracing transparency is a strategy that seems to work more often than not in this Web 2.0 age of the Internet. I think it will here. Most folks will never bother to look at the filtered-out reviews, and thus will rely on the very reviews that Yelp thinks are most reliable. Those who do look, if Yelp is indeed being even-handed, will probably find the filtering interesting, but will ignore these reviews in choosing which business to frequent. The main risk to Yelp is likely to be that imitators will better be able to reverse-engineer their filtering formulae.

Crowd-sourcing combats information asymmetry

Jonathan Zinman and Eric Zitzewitz studied ski resorts claims about snowfall. They found that, relative to government snow reports, ski resorts claim 23% more snowfall for weekend days than for weekdays. Seems a pretty clear case of deceptive advertising to draw in the business, with the risk (of being sued for deception, or of damaging reputation) taken more when the payoff is higher (weekends).
Deceptive advertising is a standard case of asymmetric information, hidden characteristics variety. The resort has better information, and chooses what to report.
What incentives to induce honesty? As I mentioned above, there are at least two obvious incentives: avoiding a lawsuit (by a government agency or perhaps a class action on behalf of disgruntled customers), and avoiding a loss of customer goodwill if they realize the resort is routinely lying.
How to increase those incentives (since apparently they have not been enough to prevent at least some deception)? One way is to raise the fine or other penalties if prosecuted. Another way, particularly for the reputation effect, is to reduce the cost of getting better information to the consumers.
And…Zinman and Zitzewitz found that the deception has decreased since the release of an iPhone app that aggregates skier reports of local conditions in real time (and that the reduction in exaggeration is much more notable at resorts that have good iPhone reception).
Crowdsourcing: reducing asymmetric information problems.
(Zinman must be pretty happy to have found a co-author with whom he gets first billing in co-authored papers…no mean feat.)
(Via Erin Krupka and the Marginal Revolution blog.)

The problem isn’t going away: who pays for open publishing?

This isn’t so much a specialized incentive-centered design story as a good example of a central problem in information economics: information may “want to be free” (Barlow 1996) but it isn’t free to maintain or distribute. Who is going to pay?
The arXiv project was started in the 90s as an e-print archive for rapid (pre-journal publication) of research papers in high-energy physics. Paul Ginsparg, a physicist at Los Alamos National Lab, started and maintained it for a number of years. It became a vital service for not just the original community, but for many other scholarly fields (including math, statistics and computer science). It currently houses about 600,000 articles, all freely available to anyone with an Internet connection.
Cornell University took responsibility for the project several years ago. The university librarian reports that Cornell spends about $400,000 a year to maintain and enhance this ever-growing resource, which benefits researchers across the world. It earns no direct revenue from the project. How long is that sustainable?
According to this posting in The Chronicle of Higher Education, Cornell is now asking the 200 institutions whose researchers account for about 75% of the downloads to voluntarily pay $4000 a year to support the project. Not a lot perhaps (the cost of a handful of subscriptions to journals in the related fields), but a request for an ongoing commitment to a charitable contribution. How likely is that to work? How sustainable? National Public Radio in the US receives some government funding, and fully 23% of its budget from corporate advertising. (It is called “sponsorship” because the corporations only get “announcements” and not “advertisements”, but as someone whose father made his living in a Madison Avenue advertising firm doing “corporate image advertising”, this is advertising. Corporations spend the money to enhance their brand image and reputation, on the belief that this increases their sales.)