Temporarily suspend Googlebot without blocking users

lzhao

We'll soon be launching a redesign, on a new platform, migrating millions of pages to new URLs.

How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture.

GWT's recommendation is to 503 all pages - including robots.txt, but that also makes the site invisible to real site visitors, resulting in significant business loss. Bad answer.

I've heard some recommendations to disallow all user agents in robots.txt. Any answer that puts the millions of pages we already have indexed at risk is also a bad answer.

Thanks

lzhao

So it seems like we've gone full circle.

The initial question was, "How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture."

Sounds like the answer is, 'that's not possible'.

Igal_Zeifman

Putting a noindex/nofollow on an index url will remove it from SERPs, although some ulrs will still show for direct search (using the url itself as a KW) but even then they will appear as clear links without any TItle/Description details.

Using a 301 redirect will remove the old page from index, regardless of noindex/nofollow.

If you are using a noindex/nofollow for the new url - both will not show.

lzhao

Thank you, Ruth!

Can I ask a clarifying question?

If I put a noindex/nofollow on the new urls, wouldn't the result be the same as if I put noindex/nofollow on the indexed urls? There is only one instance of each page - and all of the millions of indexed URLs will be redirecting to new urls.

Here is my assumption: if I put noindex/nofollow on the new urls - a search bot will crawl the old url, follow the redirect to the new url, detect the noindex/nofollow, and then drop the old, indexed url from their index. Is that the wrong assumption?

RuthBurrReedy

I would use robots.txt to noindex the whole website as well - but just the new pages, not the old ones. Then when you're ready to be crawled, remove the robots.txt entry and Fetch as Googlebot to get re-crawled. You may fall out of the index for a day or two but should quickly be re-indexed.

Another solution would be to use the meta robots tag to individually noindex each page (if there's a way to do that in your CMS, obviously adding them by hand wouldn't be scalable), and then remove. That may increase your chances of getting re-crawled and re-indexed sooner.

lzhao

Thanks for the response, Mark.

It sounds as if you tried this on a few new pages.

I'm talking about millions of existing pages.

Would you robots.txt noindex your entire website? Seems like you'd run a huge risk of being dumped from the index entirely.

Intergen

I recommend robots text noindex, nofollow.

That way people can still see the pages they just aren't indexed in Google yet.

As we developed some new pages on one of our sites we did this and we could still view pages and send folks there that we wanted to see the content for feedback - but no one else knew they were there.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Temporarily suspend Googlebot without blocking users

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Site architecture? I've got a free user report, that shoots back a page with their data for them to share with co-workers and friends.

Canonicalize or Block?

"Url blocked by robots.txt." on my Video Sitemap

Googlebot cannot access your site

Googlebot size limit

HTTP Vary:User-Agent Server or Page Level?

Googlebot cannot access your site

How to block "print" pages from indexing