Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at [email protected] so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at [email protected] and we'll do our best to sort things out with you.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to rank website on google with help of Moz ?
I am new here and my question is how rank a website with moz because my website was not ranking in any keywords so plz guide me
Getting Started | | RoyDigitalWorld0 -
Moz scraper
How often do you Moz do whatever it is they do for me to get up-to-date data?
Getting Started | | infinety0 -
What is 'domain authority'?
IN seomoz, it mentions domain authority but it doesnt define it. what does domain authority mean?
Getting Started | | torbett0 -
Anyone use On-Page Grades in Moz? Please Give us Feedback!
Jackie from Moz here. We're looking to get some feedback on the letter grades we use for On-Page Grades (in the application and in the Research Tools). We made this super easy survey to collect our Community's thoughts. Tell us what you think! Thanks for all your help!
Getting Started | | JackieRae2 -
A lot of duplicate content issues - does Moz understand canonical URL?
Hi, Since I subscribed to Moz my Magento store has given a lot of duplicate content issues. However, I did have a problem with Canonical URL at the time. It has been settled for a couple of weeks by now and although I had 302 redirects before, I configured Magento to 301 today. Since Moz has been crawling and showing duplicate content for exactly the same Magento pages but with endings like store=us, store=aus etc (since I have several store views enabled), I am wondering whether canonical URL does actually help Google to skip these versions of the duplicate pages and does Moz also understand it and will it reduce the amount of duplicate content errors once the 301 redirects and canonical URLs have been properly set for a week or so? Thanks!
Getting Started | | speedbird12290 -
Getting started with moz
Hi i use to be a user of SEO moz before the change to just Moz, however i am struggling to navigate around and work on campaigns. I need to know what steps to take methodically from setting up a campaign, keyword research, competitors and monitoring SERPs results. in addition improving the page grade reports and analysing keyword difficulty. Whilst Moz assist you to set up some of this initially when creating the campaign i feel it doesn't seem to take you through a logical methodical step process of configuring in depth the steps of ensuring the correct settings are relevant to the campaign i.e. an outline of the steps and i need to take that follow on from each other and getting your campaign completed. for example: create campaign keyword research onpage optimisation competitor research link finder analyser run reports that related to the info provided above I feel lost in the features of moz whilst i can see they are highly beneficial putting them to use in a chronological order to ensure the the correct setup and make use of these tools. i.e. where to start and where to end currently i feel i can only find where to start and what i should do after that to make use of Moz fully is somewhat missing. Thanks in advance, any links and direct appreciated, i would alsolike to possible speak to a Moz team member regarding my account setup if possible.
Getting Started | | mari-rose0 -
How to locate page with the duplicate title? (Crawl Diagnostics - Duplicate Titles Warning)
I am looking through my crawl diagnostics and one of my errors states that a page has a duplicate title. My problem is that I do not know how to find the duplicate. Any advice here?
Getting Started | | bearpaw0 -
MOZ Removes WWW
I just signed up for Moz... Created my campaign. I'm noticing Moz is removing the www* from my domain. When they do analysis its wrong. when i go to google and type site:www.domain.com i have 350,000 + pages. MOZ is telling me i have 1,277 pages because its getting the external links by using *domain.com and not www.domain.com how do i force MOZ to read my domain as www.domain.com and not *.domain.com
Getting Started | | CarlosJaa0