Best practice for disallowing URLS with Robots.txt

centurysafety

Hi Everybody,

We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring:

Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/"
Color - For example: ?color=91
Price - For example: "?price=650-700"
Order - For example: ?dir=desc&order=most_popular
Page - For example: "?p=1&standards=704"
Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/"

My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled?

Any advice would be appreciated!

TimHolmes

If you are looking to disallow url parameters you could use something like the following as a convention.

Disallow: /? or Disallow: /?dir=&order=&p= if you wanted to be more accurate with specific parameters. There have been a few Moz questions of this type over the last few years, if you do look to remove the parameters.

Also try and ensure that the product pages you have listed are well canonicalised and point to the original product etc. A good review on how to do this can be found here. This will in most cases be enough to remove any indexation/duplicate issues.

JordanLowry

First I assume you have webmaster tools set up?

They have a robots.txt tester tool which you can test out different parameters to make sure you get the right syntax. For example color would be blocked by: Disallow: /?color=91* and you would follow that similar format more or less.

If you are confused I highly recommend reading through Moz's robots.txt best practices guide before you make any changes. Be sure to test all out in webmaster tools(search console)>robots.txt tester.

Let me know if you run into any problems.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best practice for disallowing URLS with Robots.txt

Browse Questions

Explore more categories

Related Questions

H1 tags and keywords for subpages, is it best practice to reuse the keywords?

Is 1:1 301 redirect required on indexed URL when restructing URL even if the new URL is canonicalized?

WordPress Duplicate URLs?

Best to Fix Duplicate Content Issues on Blog If URLs are Set to "No-Index"

Should I use meta noindex and robots.txt disallow?

Do I need to disallow the dynamic pages in robots.txt?

Best Practice for Inter-Linking to CCTLD brand domains

Block all search results (dynamic) in robots.txt?