Thursday, October 30, 2008

JadynAve Bot Wants Your Local Data

If you have a bunch of local data like I do then you better protect it because JadynAve's Local Business Search appears to be coming after your site with their JadynAveBot!

Didn't ask for robots.txt, has no data whatsoever on their robot page except to email them if you have any questions, big whoop.

Here's the IP and user agent:

38.99.186.40
"Mozilla/5.0 (compatible; JadynAveBot; +http://www.jadynave.com/robot"
I wouldn't bother trying to add them in robots.txt since they didn't ask for robots.txt.

This is a job for .htaccess!

A little research revealed they have also crawled without the "bot" in their user agent so you'll just want to block anything with "jadynave" in it.

Tuesday, October 28, 2008

Suspected Copyright Offenses

Something amusing hit my site from from .t-dialin.net which appears to be .t-online.de or the German version of T-Mobile.

I see the following IP and user agent:

84.153.98.95 "Verdacht Vergehen nach UrhG"
Which Google translates into:
Suspected offenses under the Copyright Act
Well isn't that just the cutest little user agent to get caught in a bot blocker?

Now I've had my chuckle for the evening, back to work...

Why Does Copyscape/GoogleAlert Hide?

Never really played around with Copyscape/GoogleAlert much but I noticed it tries to completely hide it's presence when accessing a server which isn't cool.

Not that I'm a fan of plagiarism as my copy of the DMCA is almost worn out from use, but I'm even a less fan of sneaky web crawlers that pretend to be shit they aren't.

The IP that Copyscape uses:

212.100.254.105 -> www.googlealert.com
The Copyscape user agent:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
This is located in a Rackspace so if you're already blocking Rackspace then you probably won't be bothered with Copyscape in the first place:
inetnum: 212.100.254.64 - 212.100.254.127
descr: Rackspace Managed Hosting
Of course you might not want to block this if you actually use Copyscape as it will become quite useless.

Monday, October 27, 2008

Viewzi's Meta Search Engine Taking Screenshots Without Permission

Here we go again with yet another visual search engine called Viewzi taking a bunch of screen shots without asking for permission from robots.txt.

In this case it's a meta search engine and technically the search engines Viewzi culls from has been given permission to crawl, but Viewzi itself was never given access permission.

Here's the Viewzi user agent:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b4pre) Gecko/2008022910 Viewzi/0.1
They appear to have just replaced the word Firefox with their user agent name Viewzi instead of just adding Viewzi to the agent which is kind of crappy to not even give Firefox attribution for their code being used to make screen shots.

Viewzi currently crawls from the "compute-1.amazonaws.com" range of IPs so if you're already blocking amazonaws.com then you've blocked Viewzi already.

Sorry, but you won't get any Viewzi of my sites until you learn to play nice.