Duplicate site (disaster recovery) being crawled and creating two indexed search results

OpenTable

I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.

Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.

There seem to be two potential fixes. Which is best for this case?

use the robots.txt to block Google from crawling the .gtm site

2) canonicalize the the gtm urls to toptable.co.uk

In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.

Thanks in advance to the SEOmoz community!

Dr-Pete

It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.

Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.

josh-riley

First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.

OlegKorneitchouk

If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.

Otherwise, I would noindex the backup site.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate site (disaster recovery) being crawled and creating two indexed search results

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Site moved. Unable to index page : Noindex detected in robots meta tag?!

How long will old pages stay in Google's cache index. We have a new site that is two months old but we are seeing old pages even though we used 301 redirects.

Should m-dot sites be indexed at all

How is Google crawling and indexing this directory listing?

Empty search results labeled as Soft 404s?

How can I remove duplicate content & titles from my site?

Duplicate content on ecommerce sites

Push for site-wide https, but all pages in index are http. Should I fight the tide?