Why doesn't Moz crawler follow robots.txt?
-
It is crawling the entire site, and there is stuff we do not want it to. Please advise.
-
Which I am ok with, but why am I getting duplicate content?
-
Yes, it doesn't tell them which pages not to crawl - just not to index them
-
It has been used correctly. The site is a Magento site and they have it built in. There are a lot of filters for products so it uses rel=canonical to tell Google which to index.
-
rel=canonical is not really an robots instruction file - rel=canonical is to help with duplicate copy where you have the same or similar pages and your telling search engines which pages is the preferred page.
If you don't want pages crawling you have to tell Search engines in the robots file
-
Hi There,
Rel=canonical tags tell robots, which page is actually to index out of many.
For SEOs, canonicalization refers to individual web pages that can be loaded from multiple URLs. This is a problem because when multiple pages have the same content but different URLs, links that are intended to go to the same page get split up among multiple URLs. This means that the popularity of the pages gets split up. Unfortunately for web developers, this happens far too often because the default settings for web servers create this problem.
https://mza.bundledseo.com/learn/seo/canonicalization
I feel you have not used it correctly, check the above article and see if it helps.
Thanks,
Vijay
-
So I made a mistake it isn't the robots.txt that is the issue. I am getting hit with a ton of duplicate content penalties so I figured that was it. The problem is that I have pages with rel=canonical tags that it is ignoring. Does Roger not read those?
-
Hi
Have to agree with the above, Rogerbot does listen to robot.txt file, unlike Bing - while they are getting better Bing ignores the robots.txt file frequently.
Ive analysed quite a few server logs over the years and Roger has always listened to the file - its usually a mistake the in the robots file.
There is an option to test your robots.txt file in GCS - while this is testing to see if Google will crawl the page - usually Roger has the same instructions as Google.
However if you are still pretty certain that Roger is ignoring robots.txt please DM your Server Logs and your website and I will take a look and analyse it for you (free of course).
Thanks
Andy
-
All major search engines, including Moz's crawler Rogerbot and Internet Archives, respect Robots.txt as a standard “robots exclusion protocol” to communicate with web crawlers and web robots.
In case you wish to exclude some specific information from all Search Engines, you can use the following sample code as reference to block specific directories.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/However, if you want to specifically block Mz's Rogerbot from crawling specific sections of your website. You may take the following reference code to block specific areas / directories in your website from rogerbot:
User-agent: Rogerbot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/I hope this helps, If you have specific questions, please feel free to respond, I will be happy to answer them.
Regards,
Vijay
-
Hi there! Moz's crawler, rogerbot, does follow robots.txt. When he's not following robots.txt, it's usually because the robots.txt protocol is formatted improperly. Learn more about formatting your page here: https://mza.bundledseo.com/learn/seo/robotstxt
For more information on Roger, including how to block him, head here: https://mza.bundledseo.com/help/guides/moz-procedures/what-is-rogerbot
And if you want to test your formatting, try the Robots Checker here: https://support.google.com/webmasters/answer/6062598
If you're still unable to determine why rogerbot is crawling your site, feel free to write in to [email protected]!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Links in link lists not being discovered by Moz
I uploaded a couple of link lists to Moz about a week ago and have been tracking them. Most of the links on the list are from significant 90+ DA domains such as Wikipedia, Medium, etc. So far, Moz only seems to have discovered about 4 the 200 or so links I loaded. Do we know how long it takes for these to be discovered by Moz? Quite new to this, so any help understanding timelines for this would be appreciated.
Link Explorer | | RobertPhillips180 -
Does Moz register backlinks from Twitter/Facebook/LinkedIn?
I've noticed that on sites with decent social presence and regular content posts that Moz isn't showing/recording any inbound links from Twitter, Facebook or LinkedIn. Posts are regularly shared via these three channels but no inbound links ever seem to show up. Does Moz not record such inbound links, or are they all disguised using JavaScript or some other trick?
Link Explorer | | JCN-SBWD0 -
Domain Authority (DA) in Moz Pro changed only within the last 1-2 months?
Has anyone noticed that the Domain Authority (DA) as reported in Moz Pro has changed only within the last 1-2 months? We have screen shots showing plots of DA vs competitors w/ line graph 2 months ago starting in NOV 2017 which today starting JAN 2018 and comparing shows DA up to 50% different!
Link Explorer | | Amplitude_Digital
The change is seen both in the Links Overview and under the Spam Score sections still marked "NEW". Can Moz confirm that it's only recently within the last 2 months that in Moz Pro the NEW DA numbers have retroactively been updated even though the new Link Explorer has been publicly out since APR 30 from https://mza.bundledseo.com/community/q/moz-s-new-link-explorer-including-our-revamped-index-and-da-pa-scores-is-now-open-to-everyone? Look at the top green line starting ~12 months ago on both graphs, w/ old below 40 and new above 50. We've seen even greater differences for other tracked domains. Thanks! view0 -
After how long Moz show the matrices of a new domain
I want to know the Matrices of this site " Clash of lights ". Its a new domain how can i find its matrices like DA PA etc in Moz. Please guide me through it.
Link Explorer | | Muhammadahamd0 -
Learn how to use Open Site Explorer's Top Pages report to help inform your content marketing efforts. Get your Daily SEO Fix!
With the Top Pages report, you can see the pages on your site (and your competitors’) that are top performers. The pages are sorted by Page Authority - a prediction of how well a specific page will rank in search engines - and also metrics for linking root domains, inbound links, HTTP status and social shares. Be sure to watch today's Daily SEO Fix video tutorial to learn how to use Open Site Explorer's Top Pages report to analyze the competitions' content marketing efforts and to inform your own. This video is part of The Moz Daily SEO Fix tutorial series--Moz tool tips and tricks in under 2 minutes. To watch all of our videos so far, and to subscribe to future ones, make sure to visit the Daily SEO Fix channel on YouTube.
Link Explorer | | kellyjcoop3 -
May I know multiple campaigns set up, with same site url is allowable in Moz campaign or not?
Hi Guys, I have a question regarding moz campaign. I am a Moz Pro member, have taken Standard Plan. (Time being, am not in a position to upgrade to a higher plan) I have utilized only one campaign slot among total 5 slots. I have added xyz.com website in one campaign, there I have added 350 keywords which is the maximum count we can add in a a single campaign. Still, I have plenty of keywords to track for same website xyz.com. I am not planning to use the remaining 4 campaign slots for time being for any other websites. May I know, if I can add same xyz.com website in remaining campaign slots for tracking the remaining keywords for my website?
Link Explorer | | zco_seo0 -
Psst Did you see Open Site Explorer's New Link Building Opportunities Section??
Check out Rand's blog post about it for all the details: http://moz.com/blog/open-site-explorers-new-link-building-opportunities-section. We'd love for you to check it out and leave us your feedback! Thanks,
Link Explorer | | jennita
Jen3 -
Repeated mysterious 404's from ancient site structure killing my rankings
Several years ago I changed my site structure to go from a flash based site to a blog based wordpress site. After doing so I went from page 1 to page 30 for my relevant search terms. I have employed people to help me track down the problem and I believe that they have narroed it to the existance of 404's being created from some unknown internal source. I have been for years getting links like this... <colgroup><col width="792"></colgroup>
Link Explorer | | dfphotographer.com
| http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/?share=facebook http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/charisma-and-steve-301/?share=email http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/photography-brisbane-04-2/?share=email http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/photography-brisbane-12-2/ http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/photography-brisbane-13-2/ http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/photography-brisbane-13-2/?share=facebook http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/photography-brisbane-13-2/feed/ http://www.dfphotographer.com.au/brisbaneweddingphotographer/2011/10/brisbane-wedding-photographer-charisma-and-steve-victoria-park-brisbane/photography-brisbane-16-2/?share=email | ......regularly showing in webmaster tools, (this is from a top pages report from MOZ where there are hundreds also shown). When I do a moz crawl of the site, none of these links show up. Therefore I have no way of finding the source of these links (they also do not show me the source in WMT as they should). We have completely cleared the site and rebuilt it and although it is still only a couple of weeks in it still does not appear to have stopped them. Does anyone have any way of helping me find the source of these mysterious 404's?0