How can I get unimportant pages out of Google?
-
Hi Guys,
I have a (newbie) question, untill recently I didn't had my robot.txt written properly so Google indexed around 1900 pages of my site, but only 380 pages are real pages, the rest are all /tag/ or /comment/ pages from my blog. I now have setup the sitemap and the robot.txt properly but how can I get the other pages out of Google? Is there a trick or will it just take a little time for Google to take out the pages?
Thanks!
Ramon
-
If you want to remove an entire directory, you can exclude that directory in robots.txt, then go to Google Webmaster Tools and request a URL removal. You'll have an option to remove an entire directory there.
-
No, sorry. What I said is, if you mark the folder as disalow in robots.txt, it will not remove the pages are already indexed.
But the meta tag, when the spiders go again on the page and see that the pages are with the noindex tag will remove it.
Since you can not already include the directory on the robots.txt. Before removing the SE pages.
First you put the noindex tag on all pages you want to remove. After they are removed, it takes a week for a month. After you add the folders in robots.txt to your site who do not want to index.
After that, you dont need to worry about the tags.
I say this because when you add in the robots.txt first, the SE does not read the page anymore, so they would not read the meta noindex tag. Therefore you must first remove the pages with noindex tag and then add in robot.txt
Hope this has helped.
João Vargas
-
No, sorry. What I said is, if you mark the folder as disalow in robots.txt, it will not remove the pages are already indexed.
But the meta tag, when the spiders go again on the page and see that the pages are with the noindex tag will remove it.
Since you can not already include the directory on the robots.txt. Before removing the SE pages.
First you put the noindex tag on all pages you want to remove. After they are removed, it takes a week for a month. After you add the folders in robots.txt to your site who do not want to index.
After that, you dont need to worry about the tags.
I say this because when you add in the robots.txt first, the SE does not read the page anymore, so they would not read the meta noindex tag. Therefore you must first remove the pages with noindex tag and then add in robot.txt
Hope this has helped.
João Vargas
-
Thanks Vargas, If I choose for noindex, I should remove it from the robot.txt right?
I understood that if you have a noindex tag on the page and as well a dissallow in the robot.txt the SE will index it, is that true?
-
For you remove the pages you want, need to put a tag:
<meta< span="">name="robots" content="noindex">If you want internal links and external relevance to pass on these pages, you put:
<meta< span="">name="robots" content="noindex, follow">If you do the lock on robot.txt: only need to include the tag in the current urls, new search engines will index no.
In my opinion, I do not like using the google url remover. Because if someday you want to index these folders, will not, at least it has happened to me.
The noindex tag works very well to remove objectionable content, within 1 month or so now will be removed.</meta<></meta<>
-
Yes. It's only a secondary level aid, and not guaranteed, yet it could help speed up the process of devaluing those pages in Google's internal system. If the system sees those, and cross-references to the robots.txt file it could help.
-
Thanks guys for your answers....
Alan, do you mean that I place the tag below at all the pages that I want out of Google? -
I agree with Alan's reply. Try canonical 1st. If you don't see any change, remove the URLs in GWT.
-
There's no bulk page request form so you'd need to submit every URL one at a time, and even then it's not a guaranteed way. You could consider gettting a canonical tag on those specific pages that provides a different URL from your blog, such as an appropriate category page, or the blog home page. That could help speed things up, but canonical tags themselves are only "hints" to Google.
Ultimately it's a time and patience thing.
-
It will take time, but you can help it along by using the url removal tool in Google Webmaster Tools. https://www.google.com/webmasters/tools/removals
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website homepage temporarily getting removed from google index
hi, website: www.snackmagic.com The home page goes out of google index for some hours and then comes back. We are not sure why our home page is getting de-indexed temporarily. This doesn't happen with other pages on our website. This has been happening intermittently in the gap of 2-3 days. Any inputs will be very useful for us to debug this issue Thanks
Technical SEO | | manikbystadium0 -
Can I Get Penalized for 301 Redirects (Too Many or In Any Scenario)?
A client of ours owns several domain names that are keyword similar to the domain they actually use to run their site. They are asking us if we should 301 redirect all of these websites to the domain they use. However, I don't want this to work against them and their site get penalized later for this. I have heard buying out competitors and redirecting their domain to yours is frowned upon and penalized when you get caught (they did not do this). We are also wondering if there is a limit as to how many domains you can 301 redirect and what type (keyword similar, misspellings, .net's, etc.) and if you are penalized after too many (i.e. >50). All of the domains in question are keyword/brand name similar only and do not exist as actual websites. We just want to do the right thing. Thank you for your help.
Technical SEO | | JCunningham0 -
How do you get a Google+ pic in your SERP snippet
Hi from from 20 degrees C 83% humidity wetherby UK 🙂 A few weeks back i decided i needed to get my pretty face appearing in my serps for www.davidclick.com But after having set up a Gppgle+ account and linking my site to the Google+ account i think I may have done something wrong 😞 I linked to the Google+ page via a footer link in www.davidclick.com but alas I'm not able to get my face in my SERP which this website has: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/google-plus-picJPGcopy.jpg So my question is please - "How do you get your Google+ account image to appear in the SERPS. Ta muchly,
Technical SEO | | Nightwing
David0 -
What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT. I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap. Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
Technical SEO | | DotCar0 -
Canonical - how can you tell if page is appearing duplicate in Google?
Our home page file is www.ides.com/default.asp and appears in Google as www.ides.com. Would it be a good thing for us to include the following tag in the head section of our website homepage?
Technical SEO | | Prospector-Plastics0 -
Can Google read text in Javascript?
We have just completed the redesign of our product page, which you can see here: http://www.uksoccershop.com/p-19045/2011-12-Chelsea-Adidas-Away-Football-Shirt.html Because we want the select size / add to basket section to appear prominently, you can see we are showing only a snippet of the product description in this section and then user has to click "more" to see it. My question is, can Google read the product description here since it's in Javascript? The code is as follows: 2011-12 Chelsea Adidas Away Football Shirt £44.99 Item Code:379606 Brand new, official Chelsea away shirt for the 2011/12 Premiership season, available to buy in adult sizes S, M, L, XL, XXL, XXXL. This football shirt is manufactured by Adidas and is black in colour.[ More...](javascript:void(0);) Brand new, official Chelsea away shirt for the 2011/12 Premiership season, available to buy in adult sizes S, M, L, XL, XXL, XXXL. This football shirt is manufactured by Adidas and is black in colour. Cheer on the Blues in style in the new adidas Chelsea Away Shirt, featuring a striking blue blocked design on an imposing black background complete with the club crest and adidas logo embroidery across the chest for a great style on or off the pitch. The new Chelsea Away Shirt is designed with adidas' ClimaCool technology to bring moisture away from your skin, keeping you cool, comfortable and performing at your best as you emulate the skills of Frank Lampard, Fernando Torres and John Terry on the pitch. Customise your shirt with Premiership shirt printing for your favourite Chelsea stars or choose your own custom name and number. Adult Football Shirt
Technical SEO | | ukss1984
Short sleeves soccer jersey
Chelsea club crest to left chest
adidas logo and stripes
Print sponsor to centre
ClimaCool technology
Machine washable Product code: 379606 The 2011/12 Chelsea away football kit is released on 7th July 2011. <form name="currenychange" action="http://www.uksoccershop.com/p-19045/2011-12-Chelsea-Adidas-Away-Football-Shirt.html" method="get">
<select class="topselectbox" onchange="this.form.submit();" name="currency" style="float:right;"> <option value="USD">US Dollars</option> <option value="EUR">Euro</option> <option value="GBP" selected="selected">UK Sterling</option> <option value="AUD">Australian Dollars</option> </select>
</form> Available Now [Be the first to ask a question](javascript:void(0); "Ask a Question")
[Be the first to review this product](javascript://) Rating: 5 out of 5 stars <form name="cart_quantity" action="http://www.uksoccershop.com/p-19045/2011-12-Chelsea-Adidas-Away-Football-Shirt.html?number_of_uploads=0&action=add_product" method="post" enctype="multipart/form-data"> Which parts of this is Google going to be able to read? Should we make the product title our H1 header for this page and can it currently read that within the code above? </form>0 -
I have 15,000 pages. How do I have the Google bot crawl all the pages?
I have 15,000 pages. How do I have the Google bot crawl all the pages? My site is 7 years old. But there are only about 3,500 pages being crawled.
Technical SEO | | Ishimoto0 -
Why are my pages getting duplicate content errors?
Studying the Duplicate Page Content report reveals that all (or many) of my pages are getting flagged as having duplicate content because the crawler thinks there are two versions of the same page: http://www.mapsalive.com/Features/audio.aspx http://www.mapsalive.com/Features/Audio.aspx The only difference is the capitalization. We don't have two versions of the page so I don't understand what I'm missing or how to correct this. Anyone have any thoughts for what to look for?
Technical SEO | | jkenyon0