Thursday, May 26, 2011

Anyone still listening?

I haven't blogged much for a variety of reasons this year, sorry about that.

Got a lot going on, anyone even still listening?

Probably not, don't blame ya.

Tuesday, May 24, 2011

Whitelisting, Not Blacklisting to stop bots!

Really getting sick of repeating myself as people just don't seem to get it when it comes to blocking bots that blacklisting doesn't fucking work.Blacklisting requires wasting time chasing bots in access logs, huge ass .htaccess files that slow Apache and impact server performance, and are easily bypassed with changing a single character in the user agent name.

Whitelisting on the other hand only tells the server what can pass, everything else bounces. Whitelists are usually short, Googlebot, Slurp, Bingbot, valid browsers, and nothing else, a fast list to process which doesn't slow Apache down whatsoever.

Then install a script to monitor for things that access robots.txt, spider trap pages, natural spider traps like your legal and privacy pages, plus speedy or greedy accesses, and you've pretty much solved you scraper problems.

But for fuck's sake, use your goddamn brain and WHITELIST or you're just wasting your fucking time and inviting scrapers, not blocking them.