Removing a site from Google's index
-
We have a site we'd like to have pulled from Google's index. Back in late June, we disallowed robot access to the site through the robots.txt file and added a robots meta tag with "no index,no follow" commands. The expectation was that Google would eventually crawl the site and remove it from the index in response to those tags. The problem is that Google hasn't come back to crawl the site since late May. Is there a way to speed up this process and communicate to Google that we want the entire site out of the index, or do we just have to wait until it's eventually crawled again?
-
ok. Not abundantly clear upon first reading. Thank you for your help.
-
Thank you for pointing that out Arlene. I do see it now.
The statement before that line is of key importance for an accurate quote. "If you own the site, you can verify your ownership in Webmaster Tools and use the verified URL removal tool to remove an entire directory from Google's search results."
It could be worded better but what they are saying is AFTER your site has already been removed from Google's index via the URL removal tool THEN you can block it with robots.txt. The URL removal tool will remove the pages and keep them out of the index for 90 days. That's when changing the robots.txt file can help.
-
"Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site)."
The above is a quote from the page. You have to expand the section I referenced in my last comment. Just re-posting google's own words.
-
I thought you were offering a quote from the page. It seems that is your summarization. I apologize for my misunderstanding.
I can see how you can make that conclusion but it not accurate. Robots.txt does not ensure a page wont get indexed. I always recommend use of the noindex tag which should be 100% effective for the major search engines.
-
Go here: http://www.google.com/support/webmasters/bin/answer.py?answer=164734
Then expand the option down below that says: "<a class="zippy zippy-track zippy-collapse" name="RemoveDirectory">I want to remove an entire site or the contents of a directory from search results"</a>
They basically instruct you to block all robots in the robots.txt file, then request removal of your site. Once it's removed, the robots file will keep it from getting back into the index. They also recommend putting a "noindex" meta tag on each page to ensure nothing will get picked up. I think we have it taken care of at this point. We'll see
-
Arlene, I checked the link you offered but I could not locate the quote you offered anywhere on the page. I am sure it is referring to a different context. Using robots.txt as a blocking tool is fine BEFORE a site or page is indexed, but not after.
-
I used the removal tool and just entered a "/" which put in a request to have everything in all of my site's directories pulled from the index. And I have left "noindex" tags in place on every page. Hopefully this will get it done.
Thanks for your comments guys!
-
We blocked robots from accessing the site because Google told us to. This is straight from the webmaster tools help section:
Note: To ensure your directory or site is permanently removed, you should use robots.txt to block crawler access to the directory (or, if you’re removing a site, to your whole site).
-
I have webmaster tools setup, but I don't see an option to remove the whole site. There is a URL removal tool, but there are over 700 pages I want pulled out of the index. Is there an option in webmaster tools to have the whole site pulled from the index?
-
Actually, since you have access to the site, you can leave the robots.txt at disallowed -- if you go into Google Webmaster Tools, verify your site, and request removal of your entire site. Let me know if you'd like a link on this with more information. This will involve adding an html file or meta tag to your site to verify you have ownership.
-
Thank you. Didn't realize we were shooting ourselves in the foot.
-
Hi Arlene.
The problem is that when you blocked the site with robots.txt, you are preventing Google from re-crawling your site so they cannot see the noindex tag. If you have properly placed the noindex tag on all the pages in your site, then modify your robots.txt file to allow Google to see your site. Once that happens Google will begin crawling your site and then be able to deindex your pages.
The only other suggestion is to submit a sitemap and/or remove the "nofollow" tag. With the nofollow tag on all your pages, Google may visit your site for a single page at a time since you are telling the crawler not to follow any links it finds. You are blocking it's normal discovery of your site.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
These days on Google results, it also shows the site map. I submitted my company's sitemap and it still does not show?What am I doing wrong?
Look at the image in the link. I want my company to look like the "pluralsight" website in Google. I want it to show the sitemap. I have already submitted the sitemap to Google few days back, what am I doing wrong? search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&q=pluralsight&oq=pluralsight&aqs=chrome..69i57j0l5.11024j0j8
Technical SEO | | Deein0 -
Site Not Being Indexed
Hey Everyone - I have a site that is being treated strangely by google (at least strange to me) The site has 24 pages in the sitemap - submitted to WMT'S over 30 days ago I've manually triggered google to crawl the homepage and all connecting links as well and submitted a couple individually. Google has been parked the indexing at 14 of the 24 pages. None of the unindexed URL's have Noindex or follow tags on them - they are clearly and easily linked to from other places on the site. The site is a brand new domain, has no manual penalty history and in my research has no reason to be considered spammy. 100% unique handwritten content I cannot figure out why google isn't indexing these pages. Has anyone encountered this before? Know any solutions? Thanks in advance.
Technical SEO | | CRO_first0 -
How to remove the specific link from Google Listed Index?
I am working on SEO for an e-commerce client. When I search for brand name in the Google it displays the top link with tabular index of categories. Whereas I want to remove the category called Coffee from the tabular index because it redirecting to the Home page which is not relevant. For your ref. attached is the screenshot. pXjdaCH.png
Technical SEO | | mountain.penguine0 -
Micro-site homepage not being indexed
http://www.reebok.com/en-US/reebokonehome/ This is a homepage for an instructor network micro-site on Reebok.com The robots.txt file was excluding the /en-US/ directory, we've since removed that exclusion, and resubmitted this URL for indexing via Google Webmaster but we are still not seeing it in the index. Any advice would be very helpful, we may be missing some blocking issue or perhaps we just need to wait longer?
Technical SEO | | PatrickDugan0 -
Website is not indexed in Google
Hi Guys, I have a problem with a website from a customer. His website is not indexed in Google (except for the homepage). I could not find anything that can possibly be the cause. I already checked the robots.txt, sitemap, and plugins on the website. In the HTML code i also couldn't find anything which makes indexing harder than usual. This is the website i am talking about: http://www.xxxx.nl/ (Dutch) The only thing that i am guessing now is the Google sandbox, but even that is quite unlikely. I hope you guys discover something i could not find! Thanks in advance 🙂
Technical SEO | | B.Great0 -
Does a CMS inhibit a site's crawlability?
I smell baloney but I could use a little backup from the community! My client was recently told by an SEO that search engines have a hard time getting to their site because using a CMS (like WordPress) doesn't allow "direct access to the html". Here is what they emailed my client: "Word Press (like your site is built with) and other similar “do it yourself” web builder programs and websites are not good for search engine optimization since they do not allow direct access to the HTML. Direct HTML access is needed to input important items to enhance your websites search engine visibility, performance and creditability in order to gain higher search engine rankings." Bots are blind to CMSs and html is html, correct? What do you think about the information given by the other SEO?
Technical SEO | | Adpearance0 -
Can I format my H1 to be smaller than H2's and H3's on the same page?
I would like to create a web design with 12px H1 and for sub headings on the page to be more like 24px. Will search engines see this and dislike it? The reason for doing it is that I want to put a generic page title in the banner, and more poetic headings above the main body. Example: Small H1: Wholesale coffee, online coffee shop and London roastery Large h2: Respect the bean... Thanks
Technical SEO | | Crumpled_Dog
Scott0 -
404's and duplicate content.
I have real estate based websites that add new pages when new listings are added to the market and then deletes pages when the property is sold. My concern is that there are a significant amount of 404's created and the listing pages that are added are going to be the same as others in my market who use the same IDX provider. I can go with a different IDX provider that uses IFrame which doesn't create new pages but I used a IFrame before and my time on site was 3min w/ 2.5 pgs per visit and now it's 7.5 pg/visit with 6+min on the site. The new pages create new content daily so is fresh content and better on site metrics (with the 404's) better or less 404's, no dup content and shorter onsite metrics better? Any thoughts on this issue? Any advice would be appreciated
Technical SEO | | AnthonyLasVegas0