Wednesday, July 11, 2007

Are Domain Parks Playing Unfairly in Google?

John Andrews has been writing about the domainers becoming publishers:

The next wave of the competitive internet has arrrived, and it’s driven by the Domainers. No, not parked pages, and no, not typo squatters. Domainers as publishers.
After reading the post I was thinking "So what? They'll still have to fight for SE traffic just like everyone else except the added advantage of the premium domain names which will get type-in traffic and maybe rank a little better."

Well, I was sorely mistaken that it would still be even close to a level playing field as the domainers are using their domain park network to generate many thousands of backlinks in Google and Yahoo.

My initial investigation of all these backlinks in Google and Yahoo showed different links in the live sites I visited vs. Google or Yahoo cache which means they might be cloaking. The page cache always had specific links to their publisher sites on parked pages when the search engines crawled, but it'll be hard to prove it wasn't coincidence unless this situation persists over time.

The real question is why do the search engines index domain park sites in the first place?

The lame answer you'll get is "in case they turn into an actual website".

OK, crawl the sites, fine, but why should those parked pages show up in the search results or be allowed to influence page rank before they become an actual site of value?

We all know the an$wer to that que$tion a$ well.

Proxy Hijacking Humor

Instead of all the serious posts about Google Proxy Hijacking it's time for a little bit of humor, very little, my apologies in advance.

Riddle:

Q: What do you call thousands of PhD's that can't stop simple proxy hijacking of your website?

A: Google!
Knock Knock Joke:
a: KNOCK KNOCK!

b: Who's there?

a: Proxy!

b: Proxy who?

a: Proxy who Google crawls through to hijack your site!
Brain Teaser:
What does the following URL represent in Google SERPs?

http://someproxysite.com/nph-page.pl/000000A/http/www.airplane.com

Answer: If you said "Airplane Hijacking" you are correct!

And now, a sad light bulb joke:
Q: How many proxy sites does it take to screw in a light bulb?

A: None. Proxy sites get Google to hijack a light bulb that's already screwed in.
More airplane humor:
Q: What's the difference between a website and a 747?

A: Proxy sites can't get Google to hijack a 747!
Last but not least...
Q: What do you call a good proxy site?

A: Offline.
Ok, you can groan, boo and hiss now.

Sunday, July 08, 2007

Dynamic Robots.txt is NOT Cloaking!

If I read just one more post that claims using dynamic robots.txt files is a form of CLOAKING it might be enough to drive me so far over the edge that it would make "going postal" look pale by comparison.

For the last time, I'm going to explain why it's NOT CLOAKING to the mental midgets that keep clinging to this belief so they will stop this idiotic chant once and for all.

Cloaking is a deceptive practice used to trick visitors into clicking on links in the search engine and then showing the visitor something else altogether, a bait and switch practice. Technically speaking, cloaking is a process where you to show specific page content to a search engine that crawls and indexes your site and show different content to people that visit your site via those search results from that search engine.

Robots.txt files are never indexed in a search engine, therefore they will never appear in the search results for that search engine, therefore a human will never see robots.txt in the search engine, click on it, and see a different result on your website.

See? NO FUCKING CLOAKING INVOLVED!

Since the robots.txt file is only for robots, and humans shouldn't be looking at your robots.txt file in the first place, then showing the human "Disallow: \" is perfectly valid although you may show an actual robot other things as the human isn't allowed to crawl.

Let's face it, some of the stuff in our robots.txt file might be information we don't want people looking at or hacking around as it's just that: PRIVATE.

Additionally, robots.txt tells all of the other scrapers and various bad bots what user agents are allowed so if you're allowing some less than secure bot to crawl your site, the scrapers can adapt to that user agent to gain unfettered crawl access.

Dynamic robots.txt is ultimately about security, it's not about cloaking, and nosy people or unauthorized bots that look at robots.txt are sometimes instantly flagged as denied and blocked from further site access so keep your nose out and you won't have any problems.

If you still think it's cloaking, consider becoming a temple priest for the goddess Hathor as a career in logical endeavors will probably be too elusive.