Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://mza.bundledseo.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://mza.bundledseo.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://mza.bundledseo.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to [email protected]!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
MOZ Domain Authority Change frequency
Hello Team, I just want to know - is there any MOZ DA algorithm update frequency because we have not seen any movement in DA on website from last few months. Also, is there any matrices which affecting DA that might we are missing. Thanks in Advance.
Link Explorer | | adlift0 -
Moz crawling doesn't show all of my pages
Hello, I'm trying to make an SEO on-pages audit of my website. When using the SEO crawl, i see only one page while i have much more pages in this website (200+). Is it an issue with the sitemap ? The robot.txt ? How can i check and correct this. My website is discoolver.com
Link Explorer | | MK-Discoolver1 -
WP Events Calendar Creates URLs Too Long in Site Crawler
My travel/tourism site is on WP and using an Events plugin that ads a calendar of events to many pages. The MOZ crawler is indexing almost 46K links with a URL too long, but the site only has about 3.8K pages indexed in Google. I can tell MOZ is indexing the same pages over and over again but just adding a random calendar month and year. Here are some examples. https://www.visitcurrituck.com/four-day-stay/?full=1&long_events=1&country[0]=US&ajaxCalendar=1&mo=10&yr=2003 https://www.visitcurrituck.com/four-day-stay/?full=1&long_events=1&country%5B0%5D=US&ajaxCalendar=1&mo=10&yr=2034 https://www.visitcurrituck.com/beach-houses-family-time/?full=1&long_events=1&country%5B0%5D=US&ajaxCalendar=1&mo=1&yr=1873 Any advice on how to prevent MOZ from indexing this way? I don't believe that Google is seeing this also, but maybe they are. I just know my site has over 63K issues and I'm sure at least 75% or more is because of the way they are picking up on the events calendar. Thanks!
Link Explorer | | CinivaAgency1 -
Moz Pro: Filter inbound links by partial anchor text?
My site has been targeted by a spam farm with hundreds of different domains, all linking to images on our CDN with similar variations of anchor text, eg: get free high quality hd wallpapers wedding cake makers
Link Explorer | | James_NZ
get free high quality hd wallpapers hairstyle makeover
get free high quality hd wallpapers living room cafe
etc Is it possible within Moz Pro to filter all incoming links with anchor text including "free high quality hd wallpapers" so that I can disavow all of the domains en masse? So far I've only been able to display/download the list of links exactly matching the full anchor text which is very time-consuming with 100+ variations. Regards,
James0 -
How to force moz to crawl my backlinks?
I have some good number number of backlinks in my webmaster tools. But, open site explorer is showing very few backlinks. How to force moz to crawl all the backlinks? Or is there any way to submit backlinks to moz?
Link Explorer | | sankar7890 -
Moz cannot crawl domain. Also OSE does not work properly on this specific domain?
Hi all, Moz cannot crawl the domein http://www.hoesjescases.nl.
Link Explorer | | Guapa_zwolle
When I open the crawl report I only see one line: <colgroup><col width="229"><col width="287"><col width="420"><col width="370"><col width="141"></colgroup>
| URL | Time Crawled | Title Tag | Meta Description | HTTP Status Code |
| http://www.hoesjescases.nl | 2015-10-05T12:20:48Z | 404 : Received 404 (Not Found) error response for page. | Error attempting to request page; see title for details. | 404 | Also when running OSE on this domain, Moz only can find 4 root domains while Majestic can find 91 domains. Google seems not to have any problems. What can be the problem for MOZ? Greetings!0 -
6 Months and still no PA/DA or links by Moz Explorer?
Hi, It's been 6 months and my site is still showing no PA/DA or backlinks according to moz opensite explorer. I know that it has backlinks as I created them myself, and I know its crawlable and the links is there as GWT tells me so. Also you can check for yourself on ahrefs.com. Is Moz indexed pages that much smaller then the other crawlers? The site in question is amanandvan247.com. Thanks.
Link Explorer | | Marvellous1 -
Not sure why the data on the reports is stale. Meaning it hasn't been updated since my purchase date. Hard to know if I am making any progress.
I am a MOZ Pro subscriber and I am not sure why the data on the reports is stale, meaning it hasn't been updated since my purchase date. Hard to know if I am making any progress. How often does the data update?
Link Explorer | | mcorcelli1