Category Archives: Spam

Everything (of value) is for sale

There’s a truism that bothers many (except economists): if there is a good or service that has value to some and can be produced at a cost below that value by someone else, there will be a market. This is disturbing to many because it is as true for areas of dubious morality such as sexual transactions, clear immorality (human trafficking and slavery) as it is for lawn mowing and automobiles.
Likewise for online activities, as I’ve documented many times here. You can buy twitter followers, Yelp reviews, likes on Facebook, votes on Reddit. And, of course, Wikipedia, where you can buy pages or edits, or even (shades of The Sopranos), “protection”.
Here is an article that reports at some length on large scale, commercialized Wikipedia editing and page management services. Surprised? Just another PR service, like social media management services provided by every advertising / marketing / image management service today.

Humans paid by the robots

Not a new story, but the New York Times reports some interesting details (including prices) of human farms hired by robots (well, not really) to solve CAPTCHAs.
Macduff Hughes, at Google, captures the main point I’ve been making for years: screening out unwanted intruders is an economic problem, and CAPTCHAs are an economic (signaling) mechanism, trying to raise the price sufficiently for bad guys to keep them out.

New UCC opportunity, new opportunity for manipulation and spam

Google has made available a striking set of new features for search, which it calls SearchWiki. If you are logged in to a Google account, when you search you will have the ability to add or delete results you get if you search that page again, re-order the results, and post comments (which can be viewed by others).
But the comments are user-contributed content: this is a relatively open publishing platform. If others search on the same keyword(s) and select “view comments” they will see what you entered. Which might be advertising, political speech, whatever. As Lauren Weinstein points out, this is an obvious opportunity for pollution, and (to a lesser extent in my humble opinion, because there is no straightforward way to affect the behavior of other users) manipulation. In fact, she finds that comment wars and nastiness started within hours of SearchWiki’s availability:

It seem inevitable that popular search results in particular will
quickly become laden with all manner of “dueling comments” which can
quickly descend into nastiness and even potentially libel. In fact,
a quick survey of some obvious search queries shows that in the few
hours that SearchWiki has been generally available, this pattern is
*already* beginning to become established. It doesn’t take a
lot of imagination to visualize the scale of what could happen with
the search results for anybody or anything who is the least bit
controversial.

Lauren even suggests that lawsuits are likely by site owners whose links in Google become polluted, presumably claiming they have some sort of property right in clean display of their beachfront URL.

Presentation at Yahoo! Research on user-contributed content

Yahoo! Research invited me to speak in their “Big Thinkers” series at the Santa Clara campus on 12 March 2008. My talk was “Incentive-centered design for user-contributed content: Getting the good stuff in, Keeping the bad stuff out.”
My hosts wrote a summary of the talk (that is a bit incorrect in places and skips some of the main points, but is reasonably good), and posted a video they took of the talk. The video, unfortunately, focuses mostly on me without my visual presentation, panning only occasionally to show a handful of the 140 or so illustrations I used. The talk is, I think, much more effective with the visual component. (In particular, it reduces the impact of the amount of time I spend glancing down to check my speaker notes!)
In the talk I present a three-part story: UCC problems are unavoidably ICD problems; ICD offers a principled approach to design; and ICD works in practical settings. I described three main incentives challenges for UCC design: getting people to contribute; motivating quality and variety of contributions; and discouraging “polluters” from using the UCC platform as an opportunity to publish off-topic content (such as commercial ads, or spam). I illustrated with a number of examples in the wild, and a number of emerging research projects on which my students and I are working.

MetaFilter manipulated by nonprofit that reports on honesty and reliability of nonprofits

The New York Times today reported that the Executive Director of a nonprofit research organization manipulated the Ask MetaFilter question service to steer users to his organization’s site.
This is particularly piquant because the manipulator founded his organization (GiveWell) as a nonprofit to help people evaluate the quality (presumably, including reliability!) of nonprofit charitable organizations, and GiveWell itself is supported by charitable donations.
The manipulation was simple, and reminiscent of the well-publicized book reviews by authors and their friends on Amazon: the executive pseudonymously posted a question asking where he could go to get good information about charities, and then under his own name (but without identifying his affiliation) answered his own question by pointing to his own organization.
When discovered, the GiveWell board invoked old-fashioned incentives: they demoted the Executive Director (and founder), docked his salary, and required him to attend a professional development training program. Of course, the expected cost of being caught and punished was not, apparently, a sufficient incentive ex ante, but the organization apparently hopes by imposing the ex post punishment he will be motivated to behave in the future, and by publicizing it other employees will be similarly motivated. The publicity provides an additional incentive: the ED’s reputation has been severely devalued, presumably reducing his expected future income and sense of well-being as well.

UCC search arrives…manipulation and pollution to follow soon

Jimmy Wales announced the release of the public “alpha” of his new, for-profit search service, Wikia Search. The service is built on a standard search engine, but its primary feature is that users can evaluate and comment on search results, building a user-contributed content database that Wikia hopes will improve search quality, making Wikia a viable but open (and hopefully profitable) alternative to Google.
Miguel Helft, writer for the New York Times was quick to note that such a search service might be quite vulnerable to manipulation:

Like other search engines and sites that rely on the so-called “wisdom of crowds,? the Wikia search engine is likely to be susceptible to people who try to game the system, by, for example, seeking to advance the ranking of their own site. Mr. Wales said Wikia would attempt to “block them, ban them, delete their stuff,? just as other wiki projects do.

The tension is interesting: Wikia promotes itself as a valuable alternative to Google largely because its search and ranking algorithms are open, so that users know more about why some sites are being selected or ranked more highly than others.

“I think it is unhealthy for the citizens of the world that so much of our information is controlled by such a small number of players, behind closed doors,? [Wales] said. “We really have no ability to understand and influence that process.?

But, although the search and ranking algorithms may be public, whether or not searches are being manipulated by user contributed content will not be so obvious. It is far from obvious which approach is more dependable and “open”. Wikia’s success apparently will depend on its ad hoc and technical methods for “blocking, banning and deleting” manipulation.

Op-ed in Wall Street Journal advocates hybrid solution to spam

Three researchers published an op-ed in today’s Wall Street Journal (subscription only) suggesting that two practical methods to greatly reduce spam are now technically workable, but will not be implemented without cooperation on standards by the major email providers. They urge the providers to agree on a hybrid system:

To break this logjam, we advocate a hybrid system that would allow email users to choose their preferred email system. Those who want anonymity and no incremental cost for email can continue to send emails under the current system, without authentication and without sender bonds. Those who want the lowestcosts and don’t care about anonymity (most legitimate businesses would likely fall into this category) can send email that is user authenticated, but not bonded. People who want anonymity but are willing to pay to demonstrate the value they place on the recipient’s attention can post a bond. Payment could be made anonymously via a clearinghouse, using the electronic equivalent of a tiny traveler’s check bundled with each message. Those with especially high-value messages can make them both authenticated and bonded.

The authors are Jonathan Koomey (Lawrence Berkeley National Labs), Marshall van Alstyne (Boston U) and Erik Brynjolfsson (MIT Sloan).
The ideas are not new; they are trying to create public pressure. The authentication system in play is DKIM, a standard approved by the IETF earlier this year. The sender bond method was detailed in a paper by Thede Loder, Rick Wash and van Alstyne. Loder has started a company offering the service (Boxbe); Wash is currently one of my Ph.D. students (though he did this research while working with van Alstyne while Marshall was my colleague at Michigan).

Spam as security problem

Here is the blurb Rick Wash and I wrote for the USENIX paper (slightly edited for later re-use) about spam as a security problem ripe for ICD treatment. I’ve written a lot about spam elsewhere in this blog!

Spam (and its siblings spim, splog, spit, etc.) exhibits a classic hidden information problem. Before a message is read, the sender knows much more about its likely value to the recipient than does the recipient herself. The incentives of spammers encourage them to hide the relevant information from the recipient to get through the technological and human filters.

While commercial spam is not a traditional security problem, it is closely related due to the adversarial relationship between spammers and email users. Further, much spam carries security-threatening payloads: phishing and viruses are two examples. In the latter case, the email channel is just one more back door access to system resources, so spam can have more than a passing resemblance to hacking problems.

Spamming Web 2.0

The New York Times today ran a short note highlighting CNET’s story about commercial spamming of Digg.com and similar sites. There are companies being paid upwards of $15,000 to get a product placed on the front page of Digg, and most recently a top 30 Digger admitted that he entered an agreement to help elevate a new business to the front page of Digg (and solicited the other top 30 Diggers to participate).
The world was pretty darned excited when it discovered email (for most people, in the early 1990s). Spam followed in a big way within a year or two. It’s clear to me that we’re on the same trajectory with user-contributed content sites on the Web. There is an ever-increasing need for incentive-centered designs to help keep the bad stuff out.