Robots.txt anomaly
-
Hi,
I'm monitoring a site thats had a new design relaunch and new robots.txt added.
Over the period of a week (since launch) webmaster tools has shown a steadily increasing number of blocked urls (now at 14).
In the robots.txt file though theres only 12 lines with the disallow command, could this be occurring because a line in the command could refer to more than one page/url ? They all look like single urls for example:
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themesetc, etc
And is it normal for webmaster tools reporting of robots.txt blocked urls to steadily increase in number over time, as opposed to being identified straight away ?
Thanks in advance for any help/advice/clarity why this may be happening ?
Cheers
Dan
-
many thanks for that Dan !
-
As far as I thought, the important thing is that your feed shows up in feed readers. Can you subscribe to and view your RSS feed in a variety of different feed readers?
Yes, so long as the ? is utilized only in ways in which would result in duplicate content, or content that would not be desirable to crawl, it will have that effect.
-Dan
-
Many Thanks for your comments Dan !
So it doesnt matter that the feeds not going to be crawled, dont we want feeds to be crawled usually?
Blocking anything with a ? is surely good then isnt it since prevents all the dupe content etc one gets from search results ?
Yes my clients webmaster set it up
-
Hi Dan
I see no reason to disallow the feed like that by default, unless there is some reason I don't know about. But it won't harm anything either.
The second part blocks any URL which begins with a ? (question mark). This would block anything that has a parameter in the URL - most commonly a search word, pagination, filtering settings etc.
As far as I'm aware this is not going to be damaging to the site, but it's not the default setting. Did someone set it up that way for you?
My robots.txt shows the default WordPress settings: http://www.evolvingseo.com/robots.txt
-
Hi Dan
Yes please find below, please can you also confirm if the bottom 2 lines refer to blocking internal search results ?:
Disallow: /feed
Disallow: */feedDisallow: /?
Disallow: /*?Many Thanks
Dan
-
Hi Dan
Can you share the exact line disallowing RSS?
Thanks!
-Dan
-
sorry 1 more question, i see that the webmaster has disallowed the feeds in the robots.txt file is this normal/desirable, i would have thought one would want rss feeds crawled by Google ?
-
nice 1 cheers Jesse !
-
Your assumption is correct. The disallows you listed are directories, not pages. Therefore, anything within the Plugins folder will be disallowed, same with the cache and themes folder.
So you may have multiple files (and I'm sure you do) within each of those folders.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adding your sitemap to robots.txt
Hi everyone, Best practice question: When adding your sitemap to your robots.txt file, do you add the whole sitemap at once or do you add different subcategories (products, posts, categories,..) separately? I'm very curious to hear your thoughts!
Technical SEO | | WeAreDigital_BE0 -
Google is Still Blocking Pages Unblocked 1 Month ago in Robots
I manage a large site over 200K indexed pages. We recently added a new vertical to the site that was 20K pages. We initially blocked the pages using Robots.txt while we were developing/testing. We unblocked the pages 1 month ago. The pages are still not indexed at this point. 1 page will show up in the index with an omitted results link. Upon clicking the link you can see the remaining un-indexed pages. Looking for some suggestions. Thanks.
Technical SEO | | Tyler1230 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Removal request for entire catalog. Can be done without blocking in robots?
Bunch of thin content (catalog) pages modified with "follow, noindex" few weeks ago. Site completely re-crawled and related cache shows that these pages were not indexed again. So it's good I suppose 🙂 But all of them are still in main Google index and shows up from time to time in SERPs. Will they eventually disappear or we need to submit removal request?Problem is we really don't want to add this pages into robots.txt (they are passing link juice down below to product pages)Thanks!
Technical SEO | | LocalLocal0 -
Robots.txt Question
In the past, I had blocked a section of my site (i.e. domain.com/store/) by placing the following in my robots.txt file: "Disallow: /store/" Now, I would like the store to be indexed and included in the search results. I have removed the "Disallow: /store/" from the robots.txt file, but approximately one week later a Google search for the URL produces the following meta description in the search results: "A description for this result is not available because of this site's robots.txt – learn more" Is there anything else I need to do to speed up the process of getting this section of the site indexed?
Technical SEO | | davidangotti0 -
How to add a disclaimer to a site but keep the content accessible to search robots?
Hi, I have a client with a site regulated by the UK FSA (Financial Services Authority). They have to display a disclaimer which visitor must accept before browsing. This is for real, not like the EU cookie compliance debacle 🙂 Currently the site 302 redirects anyone not already cookied (as having accepted) to a disclaimer page/form. Do you have any suggestions or examples of how to require acceptance while maintaining accessibility? I'm not sure just using a jquery lightbox would meet the FSA's requirements, as it wouldn't be shown if JS was not enabled. Thanks, -Jason
Technical SEO | | GroupM_APAC0 -
Warnings for blocked by blocked by meta-robots/meta robots Nofollow...how to resolve?
Hello, I see hundreds of notices for blocked by meta-robots/meta robots nofollow and it appears it is linked to the comments on my site which I assume I would not want to be crawled. Is this the case and these notices are actually a positive thing? Please advise how to clear them up if these notices can be potentially harmful for my SEO. Thanks, Talia
Technical SEO | | M80Marketing0 -
Robots.txt file question? NEver seen this command before
Hey Everyone! Perhaps someone can help me. I came across this command in the robots.txt file of our Canadian corporate domain. I looked around online but can't seem to find a definitive answer (slightly relevant). the command line is as follows: Disallow: /*?* I'm guessing this might have something to do with blocking php string searches on the site?. It might also have something to do with blocking sub-domains, but the "?" mark puzzles me 😞 Any help would be greatly appreciated! Thanks, Rob
Technical SEO | | RobMay0