Temporarily suspend Googlebot without blocking users
-
We'll soon be launching a redesign, on a new platform, migrating millions of pages to new URLs.
How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture.
GWT's recommendation is to 503 all pages - including robots.txt, but that also makes the site invisible to real site visitors, resulting in significant business loss. Bad answer.
I've heard some recommendations to disallow all user agents in robots.txt. Any answer that puts the millions of pages we already have indexed at risk is also a bad answer.
Thanks
-
So it seems like we've gone full circle.
The initial question was, "How can I tell Google (and other crawlers) to temporarily (a day or two) ignore my site? We're hoping to buy ourselves a small bit of time to verify redirects and live functionality before allowing Google to crawl and index the new architecture."
Sounds like the answer is, 'that's not possible'.
-
Putting a noindex/nofollow on an index url will remove it from SERPs, although some ulrs will still show for direct search (using the url itself as a KW) but even then they will appear as clear links without any TItle/Description details.
Using a 301 redirect will remove the old page from index, regardless of noindex/nofollow.
If you are using a noindex/nofollow for the new url - both will not show.
-
Thank you, Ruth!
Can I ask a clarifying question?
If I put a noindex/nofollow on the new urls, wouldn't the result be the same as if I put noindex/nofollow on the indexed urls? There is only one instance of each page - and all of the millions of indexed URLs will be redirecting to new urls.
Here is my assumption: if I put noindex/nofollow on the new urls - a search bot will crawl the old url, follow the redirect to the new url, detect the noindex/nofollow, and then drop the old, indexed url from their index. Is that the wrong assumption?
-
I would use robots.txt to noindex the whole website as well - but just the new pages, not the old ones. Then when you're ready to be crawled, remove the robots.txt entry and Fetch as Googlebot to get re-crawled. You may fall out of the index for a day or two but should quickly be re-indexed.
Another solution would be to use the meta robots tag to individually noindex each page (if there's a way to do that in your CMS, obviously adding them by hand wouldn't be scalable), and then remove. That may increase your chances of getting re-crawled and re-indexed sooner.
-
Thanks for the response, Mark.
It sounds as if you tried this on a few new pages.
I'm talking about millions of existing pages.
Would you robots.txt noindex your entire website? Seems like you'd run a huge risk of being dumped from the index entirely.
-
I recommend robots text noindex, nofollow.
That way people can still see the pages they just aren't indexed in Google yet.
As we developed some new pages on one of our sites we did this and we could still view pages and send folks there that we wanted to see the content for feedback - but no one else knew they were there.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to load the mobile version of a page without the desktop version in the background (and vice versa)
Let’s say your designer wants your homepage to be fairly image heavy. Then let’s say they want to use DIFFERENT images for desktop and mobile. You appease them and make this work. But now your homepage is slow (makes sense, right? It’s loading both sets of images but only displaying one set). You lazy load and compress but your home page takes SIX SECONDS to load. The rest of your site loads in just under two. This can only be having a negative impact on SEO. You won’t convince your designer to cut the images. What do you do? My immediate thought is to look for a way of only loading the content relevant to that screen size. Sure, it won’t reshuffle itself on desktop when you drag your Chrome window to the size of a phone. But who cares? We’re the only peope who do that anyway. Is this possible? Do you have any better ideas?
Technical SEO | | MSGroup0 -
Canonicalize or Block?
Hi Mozers, We have staff profile pages w/ one main URL and then URLs with query parameters and jump links to take you to different parts of the page. The longer URLs with parameters canonicalize to the main pages but should they also be nonidexed? Thanks, Yael
Technical SEO | | yaelslater0 -
"Url blocked by robots.txt." on my Video Sitemap
I'm getting a warning about "Url blocked by robots.txt." on my video sitemap - but just for youtube videos? Has anyone else encountered this issue, and how did you fix it if so?! Thanks, J
Technical SEO | | Critical_Mass0 -
Can I mark up breadcrumbs without showing them? (responsive design)
I am working on a site that has responsive design. We use faceted search for the desktop version but implemented a style of breadcrumbs for the mobile version as sidebars take up too much screen real estate. On the desktop design we are putting a display:none in front of the breadcrumbs. If we mark up those breadcrumbs and they are behind a display none, can we still get the rich snippets? Will Google see this is cloaking? In follow up, is there a way to markup breadcrumbs in the or somewhere else that is constant?
Technical SEO | | MarloSchneider0 -
Linking domains on the same C Block together
Hey, I have an online store selling dj equipment, sound & light products such as speakers, lasers, decks, pa systems, karaoke systems etc. I just bought a new domain but I registered it under a different name and address (my personal details). And I plan on hosting the website on a seperate server so it has no connection with my eCommerce store. The main purpose of the website will be to review the products I sell, write detailed how to guides for DJ's, party planners, mobile DJ's etc. There will be links on the current ecommerce website (which currently gets around anything from 500 to 1000 unique hits a day) going to the new blog website. But would I be better off keeping it on the same C Block even though they are going to be two very different websites and the blog may not always necessarily be about the products on my ecommerce website and may be products on say eBay, Amazon, etc. (In otherwords, it's going to be it's own website with an unbiased opinion, but the ecommerce site will be linking to it on certain products that are reviewed on there). Any help is appreciated 🙂
Technical SEO | | tomhall900 -
Linking without loosing link equity.
Hi, I was wondering if anyone had a solution to linking without loosing link equity? From what I have read using 'no follow' on both internal and external links DOES NOT pass any equity across the link to the link target, but also, the latest thought goes that it DOES loose link equity (as if it were a FOLLOW' link). So is there a method of retaining link equity using another method? Thanks
Technical SEO | | James770 -
Do user metrics really mean anything?
This is a serious question, I'd also like some advice on my experience so far with the Panda. One of my websites, http://goo.gl/tFBA4 was hit on January 19th, it wasn't a massive hit, but took us from 25,000 to 21,000 uniques per day. It survived Panda completely prior. The only thing that had changed, was an upgrade in the CMS, which caused a lot of duplicate content, i.e 56 copies of the homepage, under various URLs. These were all indexed in Google. I've heard varying views, as to whether this could trigger Panda, I believe so, but i'd appreciate your thoughts on it. There was also the above the fold update on the 19th, but we have 1 ad MAX on each page, most pages have none. I hate even having to have 1 ad. I think we can safely assume it was Panda that did the damage. Jan 18th was the first Panda refresh, since we upgraded our CMS in mid-late December. As it was nothing more than a refresh, I feel it's safe to assume, that the website was hit, due to something that had changed on the website, between the Jan 18th refresh and the one previous. So, aside from fixing the bugs in the CMS, I felt now was a good time to put a massive focus on user metrics, I worked hard and continuing to spend a lot of time, improving them. Reduced bounce rate from 50% to 30% (extremely low in the niche) Average page views from 7 to 12 Average time on site from 5 to almost 8 minutes Plus created a mobile optimised version of the site Page loading speeds slashed. Not only did the above improvements have no positive effect, traffic continued to slide and we're now close to a massive 40% loss. Btw I realise neither mobile site nor page loading speeds are user metrics. I fully appreciate that my website is image heavy and thin on text, but that is an industry wide 'issue'. It's not an issue to my users, so it shouldn't be an issue to Google. Unlike our competitors, we actively encourage our users to add descriptions to their content and provide guidelines, to assit them in doing so. We have a strong relationship with our artists, as we listen to their needs and develop the website accordingly. Most of the results in the SERPs, contain content taken from my website, without my permission or permission of the artist. Rarely do they give any credit. If user metrics are so important, why on earth has my traffic continued to slide? Do you have any advice for me, on how I can further improve my chances of recovering from this? Fortunately, despite my artists download numbers being slashed in half, they've stuck by me and the website, which speaks volumes.
Technical SEO | | seo-wanna-bs0 -
What to do about "blocked by meta-robots"?
The crawl report tells me "Notices are interesting facts about your pages we found while crawling". One of these interesting facts is that my blog archives are "blocked by meta robots". Articles are not blocked, just the archives. What is a "meta" robot? I think its just normal (since the article need only be crawled once) but want a second opinion. Should I care about this?
Technical SEO | | GPN0