Robots.txt in subfolders and hreflang issues

lauralou82

A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations:

UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt

We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US.

They have the following hreflang tags across all pages:

We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously).

Search Console says there are no hreflang tags at all.

Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location.

Any suggestions how we can remove UK listings from Google US and vice versa?

Tom-Anthony

Hi there!

Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:

A **robots.txt **file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers

From Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en

You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.

You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt in subfolders and hreflang issues

Browse Questions

Explore more categories

Related Questions

Redirecting issue

I have two robots.txt pages for www and non-www version. Will that be a problem?

Google indexing despite robots.txt block

Are robots.txt wildcards still valid? If so, what is the proper syntax for setting this up?

Block Domain in robots.txt

Issue Duplicate Page Title

Confirming Robots.txt code deep Directories

Robots.txt