Robots.txt, does it need preceding directory structure?

Milian

Do you need the entire preceding path in robots.txt for it to match?

e.g:

I know if i add Disallow: /fish to robots.txt it will block

/fish
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything

But would it block?:

en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything

(taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier!

As basically I'm wanting to block many URL that have BTS- in such as:

http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob

But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as:

http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy

Thanks for listening

Milian

Yes this is what I thought, but wanted some second opinions.

Although I wouldn't actually need a wild card after BTS, as just leaving it open is the same as using a wildcard:

/fish*.......... Equivalent to "/fish" -- the trailing wildcard is ignored. https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt Thanks for the link, I'll take a look

PinpointDesigns

You're right in with the **Disallow: /fish **in the robots file blocking all those initial links, but if you wanted to block everything inside the /en/ folder, you would need to do disallow: /en/fish

You could use a wildcard in the robots.txt file to do something along the lines of Disallow: /BTS-*

This _'should' _work, but it's always worth checking using a tool to make sure it's all implemented correctly. Distilled did a post a while back about a JS tool which allows you to test if robots.txt files work correctly which can be found here - http://www.distilled.net/blog/seo/js-bookmarklet-for-checking-if-a-page-is-blocked-by-robots-txt/

In addition to this, you could also use the 'blocked URLs' tool in GWT to see if the pages are successfully blocked once you've implemented the code.

Hope this helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt, does it need preceding directory structure?

Browse Questions

Explore more categories

Related Questions

AU and US site needs Hreflang?

Robots.txt & Disallow: /*? Question!

Help with Robots.txt On a Shared Root

International SEO Domain Structure

Should I use meta noindex and robots.txt disallow?

How many links would you need to rank up in page rank?

How to structure your site correctly for optimal juice flow?

Block all but one URL in a directory using robots.txt?