Saturday, February 11, 2006

Scrape Me Up Scotty

Finally I've had an out-of-this-world scraping experience when someone tried to offload about 500 pages via a satellite. Thanks to the fine people over at DirecPC for hosting this little bandit my scraper stopping efforts have now entered the realm of aerospace.

I can see the headlines now:
Earthly Bot Stopper Blocks E.T. Scrapers

It's safe to hazard a guess Ray Bradbury never, in his wildest imagination, thought people would use satellites to steal websites.

Friday, February 10, 2006

Scraper Sites are GOOD?

Usually I have a good deal of respect for Martinibuster as he's a cool dude and has some good insights into a lot of SEO and webmastering topics. However, when I read his article "Scraper Sites are Good for You - Surrender Your Content" my first thought was to print it on toilet paper so I could give it the proper respect it deserved.

Come on Martini, you can't be serious about letting people waste your bandwidth, steal your content, and spam the SERPs with your own junk just to give you links?

Martini claims "Scrapers: There is Nothing You Can do About it" which is totally wrong! I'm stopping those bots, AlexK's script snares them and BotBuster and Bandwidth Protector claim they will also stop them. Can't comment or endorse the latter products as I've never used them but they do exist as well as some others so webmasters not stopping scrapers just means they're lazy or cheap at this point.

Then Martini makes me spit soda across my keyboard with "Surrender to the Scrapers... It is Better for You". What a load of crap my friend. Just ask Aaron Pratt of SEOBUZZBOX how good surrendering to the scraper is as he's being lumped into supplemental results as the scrapers aggregators are being indexed first and Google thinks Aaron is the duplicate content which is just wrong.

So Martini says "scrapers do more good to you than harm" which gets the bullet list:

  • Scrapers can put your content in supplemental results
  • Scrapers can rank above you in the SERPs and get the first shot at AdSense and affiliate income using your content
  • High speed scrapes of 100s pages/second are like DDOS attacks and knock servers offline until they stop scraping keeping visitors from clicking your ads
  • When servers can't respond due to scrape attacks Google, Yahoo and MSN get time outs on pages and SERPs drop
  • Rampant scraping can run up your bandwidth charges and you pay for their excess
Nope, no harm no foul, nothing wrong with scraping.

What I can state with certaintly is that since I've started blocking scrapers my SERPs and REVENUE are both up substantially and that's about the only major change I've made to my site recently.

Try reading one of Martini's other articles that made sense instead, he's really a nice guy, just slightly misguided on the topic of scrapers is all.

P.S. Doesn't scraper and bot stopping sound like a great session topic for PubCon Vegas this year?

Thursday, February 09, 2006

Referer Spammer Revenge!

Today I caught some referer spammer bombarding my web server by hitting the same page over and over and over with only the referer changing.

To put it mildy, this pissed me off.

When I started looking up each domain I noticed something fascinating in that they all had the same AdSense account and were all registered at GoDaddy.

Recent topics on ThreadWatch about GoDaddy locking abusers domains provided a true inspiration today. Instead of wasting my time putting this asshole in my banned list of IPs and domains to keep him out as would be my normal routine, I reported him to both AdSense and GoDaddy abuse and will be waiting and watching to see if either of them take action.

If GoDaddy shuts his entire array of websites down this will be the best defensive action to take against referer spamming yet, I'll post the results if any when I see the domains go offline.

BTW, I also added referer spamming detection to my bot blocker today so I'll be snagging more of these idiots on a regular basis if this is a huge problem. I'm considering making the bot blocker automatically perform a whois on the domains when this is detected and send automatic abuse letters to the proper parties.

This could be fun ;)

Wednesday, February 08, 2006

Polish Robot You Can't Pronounce

Only a polish robot would be looking for polish websites on my server in Texas - sigh.

Anyway, we found Szukacz trying to snoop around but alas, it slammed into the great wall protecting my site.

Looks like it supports robots.txt according to the web page but who knows.

Burf Barf Puke

Someone in jolly old England unleashed Norbert the Spider upon my site which appears to be sent on behalf of the fine people at Burf that claim "BURF - Alternative Search Engine and Entertainment Portal"

Alternative to what, finding what I'm actually looking for?

Then I dediced to click on the "Your Ad Here" link just to see what one gets for the money to advertise with Burf and it's placement on a truckload of search sites I've never even heard about.

Well, it's cheap advertising but then again my URL would probably get about as much exposure putting it on the bottom of my shoe.

Yacy my Assy

Filed under "Who Needs Another Crappy Search Engine" we find something called Yacy that claims to be a P2P distributed search engine whatever the fuck that means. I'm translating that back into English as a free-for-all scrapefest bandwidth waster.

They claim it's "Easy to install!"

I claim it's "Easy to block!"

Toodles.

Monday, February 06, 2006

Anti-Social Bookmarking

Well here comes the new leech-of-the-week Susie came crawling and went head first into an error page. Sych2It claims that "Dead and out of date links are automatically reported" but I'm not sure how they would know for sure as they got a slap in the face instead of a web page.

Fine German Engineering? HA!

The AnonyMouse proxy slammed head first into my bot blocker.

Come on people, if you're gonna waste your fucking time writing an anonymous proxy server the least you can do is attempt to fake being a browser instead of making your user agent string your domain name.

Give me a goddamn break.