Robots.txt Syntax

DRSearchEngOpt

I have been having a hard time finding any decent information regarding the robots.txt syntax that has been written in the last few years and I just want to verify some things as a review for myself. I have many occasions where I need to block particular directories in the URL, parameters and parameter values. I just wanted to make sure that I am doing this in the most efficient ways possible and thought you guys could help.

So let's say I want to block a particular directory called "this" and this would be an example URL:

www.domain.com/folder1/folder2/this/file.html
or
www.domain.com/folder1/this/folder2/file.html

In order for me to block any URL that contains this folder anywhere in the URL I would use:

User-agent: *
Disallow: /this/

Now lets say I have a parameter "that" I want to block and sometimes it is the first parameter and sometimes it isn't when it shows up in the URL. Would it look like this?

User-agent: *
Disallow: ?that=
Disallow: &that=

What about if there is only one value I want to block for "that" and the value is "NotThisGuy":

User-agent: *
Disallow: ?that=NotThisGuy
Disallow: &that=NotThisGuy

My big questions here are what are the most efficient ways to block a particular parameter and block a particular parameter value. Is there a more efficient way to deal with ? and & for when the parameter and value are either first or later? Secondly is there a list somewhere that will tell me all of the syntax and meaning that can be used for a robots.txt file?

Thanks!

MichaelC-15022

My advice is to go easy with robots.txt--it's a bit like dynamite, powerful, but can take your leg (or entire website) off.

I like this checker:

http://tool.motoricerca.info/robots-checker.phtml

If you look ok after running that checker, then use the built-in Google one.

Note that robots.txt syntax DOES NOT have wildcards. Apparently this doesn't stop a ton of people from using wildcards in them (to no effect, and clearly they didn't bother to test!).

Another reason to avoid disallow in robots.txt is that if you disallow the engines from looking at a page's contents, then you're ALSO stopping the link juice that might have flowed to other pages it links to.

So let's say you have 100 pages on your site that you're currently blocking with disallow in robots.txt. If instead, you put a meta robots "noindex,follow" in each of those pages, then every page linked to from those 100 pages (i.e. everything in your main menu) would get an extra 100 internal links worth of link juice.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt Syntax

Browse Questions

Explore more categories

Related Questions

Set Robots.txt file to crawl my website at specific times

Syndicated content with meta robots 'noindex, nofollow': safe?

Need help with Robots.txt

Block subdomain directory in robots.txt

Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

Does It Really Matter to Restrict Dynamic URLs by Robots.txt?

10,000 New Pages of New Content - Should I Block in Robots.txt?

Should we block urls like this - domainname/shop/leather-chairs.html?brand=244&cat=16&dir=ascℴ=price&price=1 within the robots.txt?