Wednesday, January 19, 2005


Wouldn’t It Be Nice?


…if we were older, then— oops

Anyway, Google has led an initiative to take the steam out of various forms of online link pollution / abuse by proposing (and implementing, with the partcicipation of a lot of vendors) an extension to the venerable anchor tag: rel="nofollow". I hope it works. Fast uptake and wide implementation of this really promises to drain the monetary incentive from blogspam / refer(r)er spam / trackback spam / guestbook spam / wiki spam. If the people doing this realize that their links won’t be spidered, hopefully they just wont bother any more.

The problem, however, as Mark Pilgrim stated a while back, is that these jokers don’t read weblogs, they just write to them. The time/financial economics of spamming are such that it’s cheaper for a spammer to blindly spew on thousands of sites without error checking than it is for them to spend time individually checking their work to make sure that their scripts are working as intended.

My own weblog is a perfect example. Over the life of this site, I’ve basically had 3 spammers who have accounted for upwards of 99% of all the spam attempts. The earliest was a guy I refer to as “Unca Philtie”, since his primary method was to produce hundreds of fake refer(r)erals from a set of shell Blogspot blogs which linked to various Paris “Horsey” Hilton pr0nsites. For months after I removed the refer(r)er display from this site, he continued to bombard me with several hundred requests a day. It was cheaper for him to keep bombarding me with requests than it was for him to check his logs.

The second was the “Greets from me” guy, a pr0n comment spammer, so named because he actually signs his spams that way (still active). Of the three, he is certainly the cleverest, as he’s occasionally made adjustments to his scripts to adapt to my countermeasures. He only has access to a limited number of zombie hosts, so I’ve largely been able to keep him neutralized at the firewall. I usually get a flood from him about once a week or so, when he picks up a few more proxies.

The third is ol’ Joe Incest, so named because all his incoming pr0n spams mention incest. They also mention a veritable cornucopia of other things, many of which I am certain are still impossible until we as a species evolve a few more protuberances and/or orifices, but I digress. I trust that the most straightforward way to defeat a spammer who always spams the same topics is self-evident. His script also has a pretty glaring bug (that I’m not going to help him fix by going into detail) that keeps any of his spams from ever even showing up here at all. Indeed, I expect that this bug would bite him on any Blosxom blog, so he must not be paying attention, which reinforces the point I made earlier on…

Anyway, it would be nice if this new initiative shows results, but we should certainly be prepared for the possiblility that these assholes may just leave their buggy, braindead, and above all fast-cheap-nasty scripts running forever. After all, how often do you bother to clean up your crontabs?


:: Dave Walker (EST/EDT) [+]

:: [/tech/computers/internet]

:: Comments (8)

Comments:

Title:

Date: 1/19/2005 13:51:36

Response:
I don't think anyone is claiming it will remove 100% of comment spam. You and I remove the incentive for comment spammers by cleaning up our site. Unfortunately there are a bunch of people who aren't paying as close attention as we do to their comments. rel="nofollow" being in the default templates for popular tools will remove the incentive for comment spammers for people who aren't paying attention. Right now getting PageRank is the default for a comment spammer. If this is widely adopted, getting PageRank will be the exception and there will be less incentive for new comment spammers to spend their time setting up new cron jobs. Eventually the cost of the time spent setting up spamming tools will outweigh the negligable benefit of getting PageRank.

Title: Adoption

Date: 1/19/2005 15:11:14

Response:
Dave, I think the problem falls back to adoption. If less than all blog hosters adopt this, then it leaves an open avenue and everybody pays. I think it's a matter of the community remembering that if we don't persist, then we don't win. In one month, we have to look at all the blog hosters and say, her you, why are you not doing this?

d.w. wrote:

Title:

Date: 1/19/2005 15:26:48

Response:

Randy -- I agree. I think the key thing is that all of the biggest hosted services (Blogger, LiveJournal, Typepad, MSN) are supporting it out of the box. That means a huge pool of sites will get this implementation "for free". Self-hosters tend to be aware of the impact of blogspam already, so a large chunk will quickly implement / have already implemented it.


ssp wrote:

Title: Two things...

Date: 1/20/2005 08:10:14

Response:
First: This may look like a nice approach but I feel that Google is just trying to cheat here. The beauty of Google used to be that it could look at arbitrary sites and make sense of them in terms of relevance for search results. Most notably, it could deal with web sites as they are. Now it tries to change web sites just because the web has evolved and Google's algorithms aren't good enough anymore. Google should rather improve their programming than wasting people's time with this. Second: I had a lot of that incest pron stuff as well on my site. I am still wondering what the point is. Isn't the thing that makes incest bad, or 'interesting', the fact that the people are actually relatives? I fail to see how this translates to films... Well, probably nothing I really need to worry about.

pc4media wrote:

Title: Technorati Tag Spam

Date: 1/20/2005 11:46:23

Response:
The following is a collection of random thoughts about NO FOLLOW and Technorati Tags.

ssp wrote:

Title: Me, again...

Date: 1/20/2005 13:30:45

Response:
Another thing I fail to understand: Aren't the spam links we get for poker or porn actually relevant for those topics? People looking for those terms on Google may actually _want_ to visit just the sites placing them. So Google certainly shouldn't ignore them. And yet another one: Why should _I_ care if Google delivers sub-standard results for such search terms? P.S. Funny new form you've got here, with the text below. Perhaps remove the borders on this table and make the comment field slightly higher?

d.w. wrote:

Title:

Date: 1/21/2005 09:07:12

Response:

Sven -- if you were actually in the market for some online texas holdem or some super-special Uncle ****ing porn, wouldn't you rather your search turn up links based on what other cousin-banging card players are honestly linking to, rather than whichever site hired the most spammers?

I really don't care what Google returns for those searches, true, but it's very much in my interest to discourage the links from showing up on my site, if only one purely aesthetic grounds, not to mention the added server load (considerable and negative, on dynamically generated blogs like Blosxom and Wordpress) generated by the robots continually bombarding me with bogus posts.


Title:

Date: 8/5/2005 02:30:10

Response:
Actually link spam is a huge problem. You may think it reduces relevance, but that is the exact reason for noref. Google wants the most relevant websites at the top or search results. Not some spamming website




Beer & Pretzels -- Breakfast of Champions.