Robots.txt: how to exclude sub-directories correctly?
-
Hello here,
I am trying to figure out the correct way to tell SEs to crawls this:
http://www.mysite.com/directory/
But not this:
http://www.mysite.com/directory/sub-directory/
or this:
http://www.mysite.com/directory/sub-directory2/sub-directory/...
But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:
disallow: /directory/sub-directory/
disallow: /directory/sub-directory2/
disallow: /directory/sub-directory/sub-directory/
disallow: /directory/sub-directory2/subdirectory/
etc...
I would end up having thousands of definitions to disallow all the possible sub-directory combinations.
So, is the following way a correct, better and shorter way to define what I want above:
allow: /directory/$
disallow: /directory/*
Would the above work?
Any thoughts are very welcome! Thank you in advance.
Best,
Fab.
-
I mentioned both. You add a meta robots to noindex and remove from the sitemap.
-
But google is still free to index a link/page even if it is not included in xml sitemap.
-
Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.
-
I am using wordpress, Enfold theme (themeforest).
I want some files to be accessed by google, but those should not be indexed.
Here is an example: http://prntscr.com/h8918o
I have currently blocked some JS directories/files using robots.txt (check screenshot)
But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)
Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.
-
Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:
allow: /directory/$
disallow: /directory/*
Which allows this URL:
http://www.mysite.com/directory/
But doesn't allow the following one:
http://www.mysite.com/directory/sub-directory2/...
This page also gives an update similar to mine:
https://support.google.com/webmasters/answer/156449?hl=en
I think I am good! Thanks
-
Thank you Michael, it is my understanding then that my idea of doing this:
allow: /directory/$
disallow: /directory/*
Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.
In the meantime if anyone else has more ideas about all this and can confirm me that would be great!
Thank you again.
-
I've always stuck to Disallow and followed -
"This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"
http://www.robotstxt.org/robotstxt.html
From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory
|
/*
| equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |I think this post will be very useful for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt
-
Thank you Michael,
Google and other SEs actually recognize the "allow:" command:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
The fact is: if I don't specify that, how can I be sure that the following single command:
disallow: /directory/*
Doesn't prevent SEs to spider the /directory/ index as I'd like to?
-
As long as you dont have directories somewhere in /* that you want indexed then I think that will work. There is no allow so you don't need the first line just
disallow: /directory/*
You can test out here- https://support.google.com/webmasters/answer/156449?rd=1
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it better to optimise for several keywords/keyword variations on one page, or create sub categories for those specific terms?
I've done a fair of research to try to find the answer to this, but different people seem to give very different opinions, and none of the info I could find is recent! I'm working with a company that produces a range of industrial products that fit into 6 main categories, within this categories, there are types of products and the products themselves. Prior to my involvement most of the content was added to the product pages and very little was added to the overall category page. The structure works like this: Electronic devices > type of device > products The 'type of device' category could be something like a switch, but within that category are 3/4 different switch types...leaving me with 11 or 12 primary keyword/phrases to aim for as each switch is searched for in more than one way. Should I try to rank for all of those terms using that one category page? Or should I change the structure to something like: Electronic devices > type of device > sub-category/specific variation of device > product This would mean creating a page for each variation to have a more accute focus for a small number of phrases..but it also means I've added another step between the home page and the products. Any advice is welcome! I'm worried I'm overthinking it!
Intermediate & Advanced SEO | | Adam_SEO_Learning0 -
Search Results Pages Blocked in Robots.txt?
Hi I am reviewing our robots.txt file. I wondered if search results pages should be blocked from crawling? We currently have this in the file /searchterm* Is it a good thing for SEO?
Intermediate & Advanced SEO | | BeckyKey0 -
Part of my site does not show the correct Meta title
Hi our website meta title on the directory section is showing the same title, it does not show the page title. We have tried turning off all plugins, reinstalling the theme, creating a new htacces file. installing Yoast, and testing with All in one seo but still the same thing happens. Tried different themes with the same results But when we test with Twenty Thirteen it is ok Completely lost and would love some help Thanks in advance
Intermediate & Advanced SEO | | Taiger0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Should sub domains to organise content and directories?
I'm working on a site that has directories for service providers and content about those services. My idea is to organise the services into groups, e.g. Web, Graphic, Software Development since they are different topics. Each sub domain (hub) has it's own sales pages, directory of services providers and blog content. E.g. the web hub has web.servicecrowd.com.au (hub home) web.servicecrowd.com.au/blog (hub blog) http://web.servicecrowd.com.au/dir/p (hub directory) Is this overkill or will it help in the long run when there are hundreds of services like dog grooming and DJing? Seems better to have separate sub domains and unique blogs for groups of services and content topics.
Intermediate & Advanced SEO | | ServiceCrowd_AU0 -
Whats the best way to revive a directory that was 301'd and now I want to remove that?
Last year i 301'd one of my directories on my site, pointing everything to a different directory. Long story short I am going to sell this product line again and would like to just remove the 301 to that original directory, but I am reading that the 301s are also cached in most browsers for a long time. Has anyone successfully done this and if you did what was it that you had to do? Thanks Mike
Intermediate & Advanced SEO | | SandyEggo0 -
Sub Domain or New Domain?
Hi All, We have a client that has a business with three different services. 2 of these services compliment each other in a really obvious way, but the 3rd, while related is not such a obvious complimentary service. For this reason, service 3 kind of weakens the content of the website SEO wise for the two main services. Also, internally at the business it is run by an entirely different team so it feels culturally somewhat different. So, the client wants to pull all the content about service 3 and put it on a different website. Which would you chose as a domain for this new site: service3.existingdomain.co.uk or www.service3+brandname.co.uk
Intermediate & Advanced SEO | | NoisyLittleMonkey0 -
In order to improve SEO with silos'urls, should i move my posts from blog directory to pages'directories ?
Now, my website is like this: myurl.com/blog/category1/mypost.html myurl.com/category1/mypage.html So I use silos urls. I'd like to improve my ranking a little bit more. Is it better to change my urls like this: myurl.com/category1/blog/mypost.html or maybe myurl.com/category1/mypost.html myurl.com/category1/mypage.html Thanks
Intermediate & Advanced SEO | | Max840