Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://mza.bundledseo.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://mza.bundledseo.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://mza.bundledseo.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to [email protected]!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz isn't crawling all my backlinks.
Moz isn't crawling all my backlinks. It's showing only 29 referring domain when I have more than 200 referring domains linking to my website. My website URL is 360gisthub.com.ng 360gisthub.com.ng
Link Explorer | | Mustybay0 -
WP Events Calendar Creates URLs Too Long in Site Crawler
My travel/tourism site is on WP and using an Events plugin that ads a calendar of events to many pages. The MOZ crawler is indexing almost 46K links with a URL too long, but the site only has about 3.8K pages indexed in Google. I can tell MOZ is indexing the same pages over and over again but just adding a random calendar month and year. Here are some examples. https://www.visitcurrituck.com/four-day-stay/?full=1&long_events=1&country[0]=US&ajaxCalendar=1&mo=10&yr=2003 https://www.visitcurrituck.com/four-day-stay/?full=1&long_events=1&country%5B0%5D=US&ajaxCalendar=1&mo=10&yr=2034 https://www.visitcurrituck.com/beach-houses-family-time/?full=1&long_events=1&country%5B0%5D=US&ajaxCalendar=1&mo=1&yr=1873 Any advice on how to prevent MOZ from indexing this way? I don't believe that Google is seeing this also, but maybe they are. I just know my site has over 63K issues and I'm sure at least 75% or more is because of the way they are picking up on the events calendar. Thanks!
Link Explorer | | CinivaAgency1 -
Is the keyword CTR provided in Moz dashboard average?
Hi, I'm a bit confused with Keyword CTR provided in Moz Dashboard. Is its an average of search positions or top 3 positions of SERPs?
Link Explorer | | NishilP1 -
Error Message on Moz Crawler
Hi all, Just ran into this issue, when analysing this site. Just got this message when using MOZ "Page Optimisation Error". Anyone know why? It seems to be working fine on other SEO analyser tools. Website is: www.sbpcreativemedia.com.au Thanks in advance! luXS8V5
Link Explorer | | Dushala0 -
Why aren't my "page social metrics" increasing?
I post a lot to Facebook & twitter, & my "page social metrics" haven't budged in 4 weeks. I even stopped using bit.ly & stated the full URL as a test. Still no change. The fb account is /getgoodgifts & twitter is /giftsing. Thoughts on why social metrics aren't increasing?
Link Explorer | | giftsing0 -
Site moved to https and not showing any Moz DA or PA
Hello Fellow Mozzers, So, I have a client site that we moved from non https to https at the end of 2015. We had decent moz numbers (Da/Pa etc) but now we are only showing a 1 for both of them. Does it usually take time for Moz to carry over the info to the https version or does that process entirely start over? Not sure where to start looking. Cheers D
Link Explorer | | DarinPirkey0 -
How long will it take for the changes we've made to reflect in Moz OSE spam score data?
I signed up for Moz to see the spam flags our site had triggered. As soon as I found out, we worked on it and have been trying to correct our mistakes but it's been more than a month and we've managed to neutralise zero flags. I would appreciate if someone can clarify how long the OSE data takes to refresh. Also, how do you combat the following three specific flags: Ratio of Followed to Nofollowed Subdomains Ratio of Followed to Nofollowed Domains Low Number of Pages Found Crawl only gets a valid response to a small number of pages. Thanks.
Link Explorer | | Oziti0 -
Open Site Explorer Doesn't Discover Link for my website
Hi, I've been trying to use Open Site Explorer to track link to my own website but it always comes up with no data and shows me this message: It looks like we haven't discovered link data for this site or URL. However, this site has been up and running for more than 9 months and google webmaster can track all the links to my website. I've also followed MOZ's support instruction to tweet a bit.ly links with my website but still no luck at all. My website is www.carpetcleanerperthwa.com.au Has anyone seen the same problem before? Any help would be much appreaciated. Thanks Will
Link Explorer | | willwai0