Thursday, January 10, 2008

Scraping South of the Border

Never really had much of a problem with scrapers from Mexico before but today one came bouncing through Megared's proxy server:

200.52.167.3 [customer-CLN-167-3.megared.net.mx.] requested 11 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

200.52.167.8 [customer-CLN-167-8.megared.net.mx.] requested 156 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

200.52.167.4 [customer-CLN-167-4.megared.net.mx.] requested 31 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

200.52.167.9 [customer-CLN-167-9.megared.net.mx.] requested 36 pages as "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
They were just speeding through read fast asking just for pages, nothing else, the typical scraper.

Not much you can do about proxy servers and IP pools without punishing the innocent except set it to challenge all future accesses but that's a bit extreme for a single instance.

It's not like the crazy shit that comes from airtelbroadband.in, but that's a different blog post.

2 comments:

Anonymous said...

This seems more like a network of scrapers. It spikes during the Summer months and also reappears during the Christmas holidays. It is not limited to Mexico either - I've seen it from Brazilian ISPs, Chinese IP ranges, Russian IP ranges and Eastern Europe IP ranges.

Anonymous said...

Anything coming from south of the Alamo is grouped into one category. I 'm seeing many botnets and scraper networks using CAmerican, SAmerican IPs.

When the log files show probing for known exploitable files an IP from "south of the Alamo" is usually involved. IPs involved in probing are banned for about a day.(random number of hours between 18 and 27)

Anything that "trips" gets 5 to 10 cloaked pages and then hit with a Captcha. Humans will get another Captcha after a ramdon number of pages.

.cn, ru, most slavic countries and Pacific rim telecos are banned at the firewall.