CDN Being Crawled and Indexed by Google

Scott-Thomas

I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases.

It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt?

Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL.

Thanks!

Scott

irvingw

It sounds like you got a hold of the problem.

Verify the subdomains in WMT

Block the CDN subdomains with robots.txt

Request site removal in WMT for the subdomains

make the canonicals absolute

Keep the blocked subdomains in WMT, when you log in you will see a message by the subdomains saying "Critical issue with your site" which is just telling you that the site is blocked.. I like to keep it in there so I can see it's still blocked.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

CDN Being Crawled and Indexed by Google

Browse Questions

Explore more categories

Related Questions

How to know how much pages are indexed on Google?

Bing indexing at a tiny fraction of Google

No index on subdomains

Google is indexing blocked content in robots.txt

Website Migration - Very Technical Google "Index" Question

Google previews meanings

Directory Indexed in Google, that I dont want, How to remove?

Site just will not be reincluded in Google's Index