Removing a site from Google's index
-
We have a site we'd like to have pulled from Google's index. Back in late June, we disallowed robot access to the site through the robots.txt file and added a robots meta tag with "no index,no follow" commands. The expectation was that Google would eventually crawl the site and remove it from the index in response to those tags. The problem is that Google hasn't come back to crawl the site since late May. Is there a way to speed up this process and communicate to Google that we want the entire site out of the index, or do we just have to wait until it's eventually crawled again?
-
ok. Not abundantly clear upon first reading. Thank you for your help.
-
Thank you for pointing that out Arlene. I do see it now.
The statement before that line is of key importance for an accurate quote. "If you own the site, you can verify your ownership in Webmaster Tools and use the verified URL removal tool to remove an entire directory from Google's search results."
It could be worded better but what they are saying is AFTER your site has already been removed from Google's index via the URL removal tool THEN you can block it with robots.txt. The URL removal tool will remove the pages and keep them out of the index for 90 days. That's when changing the robots.txt file can help.
-
"Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site)."
The above is a quote from the page. You have to expand the section I referenced in my last comment. Just re-posting google's own words.
-
I thought you were offering a quote from the page. It seems that is your summarization. I apologize for my misunderstanding.
I can see how you can make that conclusion but it not accurate. Robots.txt does not ensure a page wont get indexed. I always recommend use of the noindex tag which should be 100% effective for the major search engines.
-
Go here: http://www.google.com/support/webmasters/bin/answer.py?answer=164734
Then expand the option down below that says: "<a class="zippy zippy-track zippy-collapse" name="RemoveDirectory">I want to remove an entire site or the contents of a directory from search results"</a>
They basically instruct you to block all robots in the robots.txt file, then request removal of your site. Once it's removed, the robots file will keep it from getting back into the index. They also recommend putting a "noindex" meta tag on each page to ensure nothing will get picked up. I think we have it taken care of at this point. We'll see
-
Arlene, I checked the link you offered but I could not locate the quote you offered anywhere on the page. I am sure it is referring to a different context. Using robots.txt as a blocking tool is fine BEFORE a site or page is indexed, but not after.
-
I used the removal tool and just entered a "/" which put in a request to have everything in all of my site's directories pulled from the index. And I have left "noindex" tags in place on every page. Hopefully this will get it done.
Thanks for your comments guys!
-
We blocked robots from accessing the site because Google told us to. This is straight from the webmaster tools help section:
Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site).
-
I have webmaster tools setup, but I don't see an option to remove the whole site. There is a URL removal tool, but there are over 700 pages I want pulled out of the index. Is there an option in webmaster tools to have the whole site pulled from the index?
-
Actually, since you have access to the site, you can leave the robots.txt at disallowed -- if you go into Google Webmaster Tools, verify your site, and request removal of your entire site. Let me know if you'd like a link on this with more information. This will involve adding an html file or meta tag to your site to verify you have ownership.
-
Thank you. Didn't realize we were shooting ourselves in the foot.
-
Hi Arlene.
The problem is that when you blocked the site with robots.txt, you are preventing Google from re-crawling your site so they cannot see the noindex tag. If you have properly placed the noindex tag on all the pages in your site, then modify your robots.txt file to allow Google to see your site. Once that happens Google will begin crawling your site and then be able to deindex your pages.
The only other suggestion is to submit a sitemap and/or remove the "nofollow" tag. With the nofollow tag on all your pages, Google may visit your site for a single page at a time since you are telling the crawler not to follow any links it finds. You are blocking it's normal discovery of your site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can a page that's 301 redirected get indexed / show in search results?
Hey folks, have searched around and haven't been able to find an answer to this question. I've got a client who has very different search results when including his middle initial. His bio page on his company's website has the slug /people/john-smith; I'm wondering if we set up a duplicate bio page with his middle initial (e.g. /people/john-b-smith) and then 301 redirect it to the existent bio page, whether the latter page would get indexed by google and show in search results for queries that use the middle initial (e.g. "john b smith"). I've already got the metadata based on the middle initial version but I know the slug is a ranking signal and since it's a direct match to one of his higher volume branded queries I thought it might help to get his bio page ranking more highly. Would that work or does the 301'd page effectively cease to exist in Google's eyes?
Technical SEO | | Greentarget0 -
How to remove all sandbox test site link indexed by google?
When develop site, I have a test domain is sandbox.abc.com, this site contents are same as abc.com. But, now I search site:sandbox.abc.com and aware of content duplicate with main site abc.com My question is how to remove all this link from goolge. p/s: I have just add robots.txt to sandbox and disallow all pages. Thanks,
Technical SEO | | JohnHuynh0 -
Homepage no longer indexed in Google
Have been working on a site and the hompage has recently vanished from Google. I submit the site to Google webmaster tools a couple of days ago and checked today and the homepage has vanished. There are no no follow tags, and no robots.txt stopping the page from being crawled. It's a bit of a worry, the site is http://www.beyondthedeal.com
Technical SEO | | tonysandwich
Any insights would be massively appreciated! Thanks.0 -
Google Indexing
Hi Everybody, I am having kind of an issue when it comes to the results Google is showing on my site. I have a multilingual site, which is main language is Catalan. But of course if I am looking results in Spanish (google.es) or in English (google.com) I want Google to show the results with the proper URL, title and descriptions. My brand is "Vallnord" so if you type this in Google you will be displayed the result in Catalan (Which is not optimized at all yet) but if you search "vallnord.com/es" only then you will be displayed the result in Spanish What do I have to do in order for Google to read this the way I want? Regards, Guido.
Technical SEO | | SilbertAd0 -
How a google bot sees your site
So I have stumbled across various websites like this: http://www.smart-it-consulting.com/internet/google/googlebot-spoofer/ The concept here is to be able to view your site as a googlebot sees it. However, the results are a little puzzling. Google is reading the text on my page but not the title tags according to the results. Are websites like this accurate OR does Google not read title tags and H1 tags anymore? Also on a slighly related note. I noticed the results show the navigation bar is being read first by google, is this bad and should the navigation bar be optimized for keywords as well? If it did, it would read a bit funny and the "humans" would be confused.
Technical SEO | | StreetwiseReports0 -
Google has not been visiting my site
Hi I am working on a site at the moment http://www.cheapflightsgatwick.com and i had the site using a different template and in the search engines for the search term cheap flights gatwick we were fourth and for the term holiday magazine we were 12th in google but now we are not even in google on the first page for the search terms. But now after changing the template in joomla our rankings have gone out of the window. It took me about a day to sort out the site with the new template so i was not expecting any problems with the search engines but for some reason there is. If you put into the search engine www.cheapflightsgatwick.com then you will see that google has not visited the site for four days and also it is not showing the description and instead it is showing details about joomla. Can anyone let me know if there is anything i need to do to sort this out and why google is taking so long to visit my site
Technical SEO | | ClaireH-1848860 -
What's the best way to deal with an entire existing site moving from http to https?
I have a client that just switched their entire site from the standard unsecure (http) to secure (https) because of over-zealous compliance issues for protecting personal information in the health care realm. They currently have the server setup to 302 redirect from the http version of a URL to the https version. My first inclination was to have them simply update that to a 301 and be done with it, but I'd prefer not to have to 301 every URL on the site. I know that putting a rel="canonical" tag on every page that refers to the http version of the URL is a best practice (http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=139394), but should I leave the 302 redirects or update them to 301's. Something seems off to me about the search engines visiting an http page, getting 301 redirected to an https page and then being told by the canonical tag that it's actually the URL they were just 301 redirected from.
Technical SEO | | JasonCooper0