Moz "Crawl Diagnostics" doesn't respect robots.txt
-
Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like:
- Duplicate content
- Overly dynamic URLs
- Duplicate Page Titles
The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored):Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/Many thanks for any info on this issue.
-
Hi Si, has this issue been resolved?
-
Hey Si,
Thanks for writing in. It doesn't seem that we are having an overarching issue with our crawler ignoring robots.txt files so I did some research in Google Webmaster Tools and it looks like most crawlers require an asterisk in the disallow directive to recognize that all pages of a dynamic URL are being disallowed. If you look in the "Pattern Matching" section of this resource here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449, that should give you more information about setting up the robots.txt with the correct disallow directives to block those pages.
If you add in the astrisk to the disallow directive and you are still seeing these pages crawled, it would help if you sent in an email with your campaign information to our support desk at [email protected] so we can have our engineers look into this more directly.
I hope this helps.
Chiaryn
-
If you have an "index,(no)follow" meta on those pages I think they will be crawled even though you have them blocked in robots.txt. So by adding "noindex" on those pages it might work as you want it to.
-
Is the / actually in the URL at that spot? Or is your link like http://www.example.com/abcd?p=147
If you give an example full URL that includes one of your blocked dynamic URLs we can take a better look. If your robots is setup correctly, it shouldn't find that stuff but give us more info if you're able.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why page load time is different in google webmaster vs what is displayed in moz?
When I analyze the site through Moz tool and compare the results with google webmaster, I am not able to figure what why Moz does not report the slow pages. Fro example this page has an avrage LCP of 3.0 sec https://www.collegehippo.com/graduate-school/programs/gre-score-business-analytics-data-analytics When I see the report in moz, it does not point to any such issue. Should I be worried about what google reports and try to fix the page?
Moz Bar | | etattva0 -
Strange insight "tracked keywords" from Moz
Hi there, I was wondering if somebody is seeying the same problem. Since a couple of days i got a new keyword insights in my mail and i've seen some strange things for two accounts. The branded keywords are dropped lower then position 51. The thing is that i got an insight, i checked it in my private mode and the branded keywords are still there. Other keywords are still ranking, and there is no effect on my traffic. There is no connection between the sites. One has a couple of links and the other a lot. One has a very good page with nice optimalisation and the other has no text. So it couldn't be over-optimalisation and incoming links...
Moz Bar | | NielsVos
Does anybody have the same problem? Greetings,
Niels0 -
Moz On-Page Grader doesn't pick up my Title, URL, Meta, H1, Body, IMG ALT's....does this mean Google won't?
Good morning, As my title says, 'Moz On-Page Grader doesn't pick up my Title, URL, Meta, H1, Body, IMG ALT's....does this mean Google won't?' My URL is www.refusedcarfinance.com and I'm currently targeting the Keyword 'bad credit car finance'. I am using Yoast SEO and have the keyword in my title, meta, content, h1's etc. Any advice would my much appreciated. Kindest regards, Joshua
Moz Bar | | RocketStats0 -
GA snippets for subdomains and best tracking in MOZ
Hey there, I have a blog running at blog.URL.com. I need to add Google Analytics and I want to track the value of this subdomain as separate from the main URL. Can I do that? If I do a unique snippet in the header of the blog is that the right way to go? The main site is on an antiquated CMS system and will be converted over the next few months to a full on WP site, does that change anything long-term? Thanks in advance for your help. Cheers, Lisa
Moz Bar | | LisaBOS0 -
Crawl Test Takes Long Time
Hi Moz, I have submitted our website for a crawl test. Usually it would only take a few hours to do the crawl. However this time, it takes quite long time and the result still shows in progress 😞 This is a small website which only contains less than 10 pages. Just wondering if this is our website setting issue or it is a technical issue at your end? Many thanks in advance. sFjAERG.png
Moz Bar | | russellbrown0 -
Moz Crawl Showing Duplicate Content But It's Not?!
Unfortunately I can't give out the URL, but here's the deal... I have two URL's which have completely different content on them but are being crawled as duplicate content. Any Idea how that would happen? I'm not seeing any errors in WMT's. Has anyone seen this before? Is the duplicate content reporting based on a % of the page content matching as the same?
Moz Bar | | Swarm-SEO0 -
My product pages have no weight / links from root domains with the Moz tool bar
Hi, When I view my home page (http://www.arkwildlife.co.uk) with the Moz toolbar, I see a good PA and a good amount of links from root domains. As I go down the site, it seems to get worse. The category pages (http://www.arkwildlife.co.uk/Category/0/Straight_Foods.html) have a little but not much and then from this point onwards, it's nothing. The product page (http://www.arkwildlife.co.uk/Item/Straight_Foods~Sunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html) is reporting to have no root domain links but I am not sure why. Interestingly, when I click through to a review page (http://www.arkwildlife.co.uk/StockReview/0/SUNH/0/Premium_Sunflower_Hearts.html) it does have some juice. Would anyone be able help on why this is happening and what I need to be looking at in order to resolve it please? EDIT: I've been looking at the hyperlinks and notice something odd. If I review the score with the first link below, it gives a score of 1, but the second gives a PA of 13 with one root domain linked. 1:http://www.arkwildlife.co.uk/Item/Straight_Foods~Sunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html 2:http://www.arkwildlife.co.uk/Item/Straight_Foods%7ESunflower_Seeds/SUNH/Premium_Sunflower_Hearts.html Please note the "%7E" instead of the "~" in the URL. The browser appears to show the ~ character no matter what but the rank of the page changes. I don't understand what the Moz toolbar is doing with this. Note: This behaviour only happens in Mozilla Firefox, in chrome both the rankings are zero for each URL. Many Thanks.
Moz Bar | | nawgie0 -
How Can I intreptret The Crawl Report Resulst?
Hello, I am new to Moz and I have received 2 crawl reports. The first one was ok. I made a few changes to my site plugins, and my next crawl report came up with 41 4XX errors. Basically, a lot of my posts. I went back to my plugins and saw the following plugins: 404 redirect plugin & Utlimate Tiny MCE I reactivated both. I am presuming that these must have caused the issues or maybe my site was hacked. I re ran a crawl this morning, but I don't know what the different headings mean or how to understand the report. Can anyone advise? My site is new and just started to go up the rankings...so quite disappointed with this set back. regards Chriss
Moz Bar | | chrisspell0