Is there a whitelist of the RogerBot IP Addresses?

EricCholis

I'm all for letting Roger crawl my site, but it's not uncommon for malicious spiders to spoof the User-Agent string. Having a whitelist of Roger's IP addresses would be immensely useful!

chill986

Samantha (of the Moz team) suggested I have my client whitelist Rogerbot - so you are saying simply whitelist Rogerbot as a useragent? Is there any other information I need to provide?

cleanteam

Gotcha thanks for the response, Aaron.

AaronWheeler

Hey Kalen! Rogerbot is the crawler we use to gather data on websites for Moz Analytics and the Mozscape link index. Here's his info: http://moz.com/help/pro/what-is-rogerbot-.

I wish I could give you IP addresses, but they change all the time since we host Roger in the cloud. There's not even a reliable range of IPs to give you. You can totally whitelist the useragent rogerbot, but that's the only reliable information about the crawler you can go off of. I hope that helps but let me know if there's any other solution you can think of. Thank you!

cleanteam

Hi Aaron,

I'm not totally sure what RogerBot is, but I was also interested in a list of IPs to white list. We just completed a search crawl and are checking out the Crawl Diagnostics. It's hit some 503 errors b/c it's triggering our DoS filter.

Is there a way to get the IP addresses behind this crawl in order to white list them?

Thanks,
Kalen

AaronWheeler

Hey there Outside!

I totally understand your concerns, but unfortunately we don't have a static IP we can give you for Rogerbot. He's crawling from the cloud so his IP address changes all the time! As you know, you can allow him in Robots.txt but that's the only way to do it for now. We have a recent post about why this may be risky business: http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo

Hope that helps!

EricCholis

Personally, I've run across spiders that search for entry points and exploits in common CMS, e-commerce, and CRM web applications. For example, there was a recent Wordpress bug that could be exploited to serve malicious content (read: virus) to visiting users.

Spoofing the User-Agent string is elementary at best, and wouldn't fool any sys admin worth a salt. All you have to do is a WHOIS on the requested IP to help identify it's origin.

I'm a bit of a data geek, so I like to grep through log files to see things that won't show up in Analytics that require Javascript.

scanlin

Out of curiosity (and because I don't know), what is the advantage for a malicious spider to spoof the User-Agent string? I mean, I understand this hides its identity, but why does a spider need to hide its identity? And what can a malicious spider do that a browsing human can't do? I haven't taken any action to prevent robots from anything on my site. Should I?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Is there a whitelist of the RogerBot IP Addresses?

Browse Questions

Explore more categories

Related Questions

Restrict rogerbot for few days

New URL, new physical address, New Name. 30 point drop in Domain Authority. Yikes.

Rogerbot does not catch all existing 4XX Errors

Rogerbot did not crawl my site ! What might be the problem?

Its been over a month, rogerbot hasn't crawled the entire website yet. Any ideas?

Rogerbot getting cheeky?

Rogerbot Ignoring Robots.txt?

Does anyone know of a crawler similar to SEOmoz's RogerBot?