Duplicate site (disaster recovery) being crawled and creating two indexed search results

OpenTable

I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.

Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.

There seem to be two potential fixes. Which is best for this case?

use the robots.txt to block Google from crawling the .gtm site

2) canonicalize the the gtm urls to toptable.co.uk

In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.

Thanks in advance to the SEOmoz community!

Dr-Pete

It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.

Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.

josh-riley

First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.

OlegKorneitchouk

If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.

Otherwise, I would noindex the backup site.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate site (disaster recovery) being crawled and creating two indexed search results

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

The images on site are not found/indexed, it's been recommended we change their presentation to Google Bot - could this create a cloaking issue?

How did my dev site end up in the search results?

Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags

Do search results differ greatly when you search on mobile?

Best practice for removing indexed internal search pages from Google?

Duplicate content resulting from js redirect?

How to optimise for search results which are affected by Query Deserves Freshness?

Alexa site title shows as "302 Found" on search result pages