Saturday, September 23, 2006

Search Engine Spammers Extraordinaire

OK, these idiots made the classic mistake of scraping one of MY pages so they're about to get outted in a massive way. Unfortunately, in this case I didn't get an IP address and my content was already missing from their site thanks to the slow crawl and index of MSN, but a little research proved this was a HUGE operation of mind blowing proportions.

I got bored checking all the domains as some are hosted in the same place, some aren't, too many to look at but it's all spam. Perhaps the same person, or perhaps a bunch of idiots running some automatic website generating tools.

The sites tend to come in 3 flavors, AdSense monetized articles, AdSense monetized scrapers sites (scroll WAY down) and AdSense + Shareasale sites.

Just search for the phrase "When we had a difficult think about this project" in Google, Yahoo or MSN and you'll see a shitload of pages from these search engine spammers.

Also, try a search for the phrase "Foraging for the best file on" in Google, Yahoo and MSN and see more shitloads of pages.

You can see all sorts of key phrases these sites repeat and bust more and more of them like this "Everyones path is incomparable and everyone" one on Google or Yahoo.

And even more shit like "If you've worked with a portal" on Yahoo.

Someone noticed their terms were hijacked in these bullshit pages and blogged about their suspicion on what's going on.

Seriously though, I bet I could write a script to identify and locate all the bullshit spammers using this data with all their common phrases as it's so easy to spot once you have a data sample like these to analyze.

Spam, spam, fucking spam, and not so smart fucking spammers.

Whitelist OPT-IN htaccess file

People are always asking me how to build an OPT-IN .htaccess file, which I advocate, opposed to the traditional blacklist methods.

The problem with OPT-IN is it's VERY unforgiving and you really need to check your visitor stats and make sure you're letting in all the crawlers that are sending you traffic.

Belows is a bare bones sample of how it works and anything not in the list gets a 403 Forbidden error so you'll probably need to add more items and refine this for your particular website.

Sample .htaccess file for Apache 2.0:

#allow just search engines we like, we're OPT-IN only

#a catch-all for Google
BrowserMatchNoCase Googlebot good_pass
BrowserMatchNoCase Mediapartners-Google good_pass

#a couple for Yahoo
BrowserMatchNoCase Slurp good_pass
BrowserMatchNoCase Yahoo-MMCrawler good_pass

#looks like all MSN starts with MSN or Sand
BrowserMatchNoCase ^msnbot good_pass
BrowserMatchNoCase SandCrawler good_pass

#don't forget ASK/Teoma
BrowserMatchNoCase Teoma good_pass
BrowserMatchNoCase Jeeves good_pass

#allow Firefox, MSIE, Opera etc., will punt Lynx, cell phones and PDAs, don't care
BrowserMatchNoCase ^Mozilla good_pass
BrowserMatchNoCase ^Opera good_pass

#Let just the good guys in, punt everyone else to the curb
#which includes blank user agents as well

<Limit GET POST PUT HEAD>
order deny,allow
deny from all
allow from env=good_pass
</Limit>

Just save the above as a file named ".htaccess" in your httpdocs or root web folder in your hosting account and all the crazy bots abusing your site will get bounced from now on.

Remember, anything not listed will no longer have access so be careful and make sure everything your site needs allowed is in the list.

Enjoy.

Googlebot Validation

Google has finally completed a DNS project that will allow us to use a simple reverse and forward DNS check to verify it's really, truly, honestly Googlebot and not a cheap immitation, or Google crawling thru a proxy, or anything else you can imagine.

I'm so sick of explaining why you might need this and what it solves you'll just have to follow a few links and read the threads at these various places.

Here's the official How To Verify Googlebot post on Google's blog.

Then you can check out what's been said about How To Verify Googlebot on Matt's blog.

Then a couple of threads on WMW about Verifying Googlebot that should answer any other questions on this topic.

Thanks again to Matt for getting this project finished!

Tuesday, September 19, 2006

How Important Are Plurals

Many people ignore plurals when they optimize their website and miss a lot of opportunity for additional search engine traffic.

Here's a few trend examples:

Take a look at plumbing, plumber and plumbers and you'll note that the plural is just as often the search term as the singular plumber.

How about teaching, teacher and teachers where all 3 run very close and teachers appears to dominate the search trend by a thin margin.

Last but not least, something closer to home with blogger and bloggers, where blogger clearly stands out as the dominate term but bloggers is statistically significant enough to merit ranking for the plural.

So don't forget to rank for your keyword plurals or someone else will rank there instead of you and they will KICK YOUR S!

Request from India

Just when I thought it was going to be a boring day I got a link-exchange spam from one of those wonderful Indian SEO's that wouldn't know how to promote a website to save his own life.

I'm actually shocked this email didn't include the usual threat that "you have 24 hours to confirm a reciprocal link before we remove yours from our site".

Boy, doesn't this shit look familiar:

Dear Webmaster
Greetings from India

Happened to visit your Webpages : [FILL IN BLANK OF SPAM RECIPIENT HERE] & liked it very much.
Would like to request you to have a look on our site :: [FILL IN BLANK OF SITE BEING SPAMMED HERE]

Hope you'll like this site. We are trying our best to spam the shit out of everyone in the name of India, You can help us by just adding our link on your wonderful website. And these exchanging link with good quality websites is beneficial for both the site to get a good ranking in search engines & that will help both of us in driving Traffic.

So We request you to add our link at your Website

Here is the Link Information of our Sites ::

URL : [LINK TO OFF TOPIC SHIT GOES HERE]
Link Text : We Spamma U Ass
Desc. : That's Right, This is Spam, its no more a dream!!

Just do let us know if this acceptable for you.
Hope to have quick & positive response.
Thanks in Advance

Best Regards
Sendjay Sumspam
Spamming-Our-Ass-Off.com

BTW, if you're the Indian fuckhead sending this shit, FUCK NO I WON'T LINK TO YOUR SITE you goddamn moron.

Just a lovely way to start the day.

Say it with spam.