Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
-
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google.
Our developer has told us that these urls are created by a module and are not "real" pages in the CMS.
They would like to add the following to our robots.txt file
Disallow: /catalog/product/gallery/
QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index?
We don't want these pages to be found.
-
That's why I mentioned: "eventually". But thanks for the added information. Hopefully it's clear now for the original poster.
-
Looking at this video - https://www.youtube.com/watch?v=KBdEwpRQRD0&feature=youtu.be Matt Cutts advises to use the noindex tag on every individual page. However, this is very time consuming if you're dealing wit a large volume of pages.
The other option he recommends is to use the robots.txt file as well as the URL removal tool in GWMT, Although this is the second choice option, it does seem easier for us to implement than the noindex tag.
-
Hi,
Yes, if you put any url in the robots.txt it will not be shown in the search results after some time even if your pages were already indexed. Because when your disallow urls in the robots.txt , Google will stop crawling that page and eventually will stop indexing those pages.
-
Hi Nico
Great response thanks.
This is certainly something I'm taking into consideration and will question my developer about this.
-
Thanks Thomas.
I'm now finding out from my developer is we are able to noindex these pages with the meta robots.
If this is something that isn't possible, it's likely that we'll add to the robots.txt as you did.
Either way I think will be progress to different degrees.
-
I don' think Martijn's statement is quite correct as I have made different experiences in an accidental experiment. Crawling is not the same as indexing. Google will put pages it cannot crawl into the index ... and they will stay there unless removed somehow. They will probably only show up for specific searches, though
Completely agree, I have done the same for a website I am doing work with, ideally we would noindex with meta robots however that isn't possible. So instead we added to the robots.txt, the number of indexed pages have dropped, yet when you search exactly it just says the description can't be reached.
So I was happy with the results as they're now not ranking for the terms they were.
-
I don' think Martijn's statement is quite correct as I have made different experiences in an accidental experiment. Crawling is not the same as indexing. Google will put pages it cannot crawl into the index ... and they will stay there unless removed somehow. They will probably only show up for specific searches, though
In September 2015 I catapulted a website from ~3.000 to 130.000 indexed pages (roughly). 127.000 were essentially canonicalised duplicates (yes, it did make sense) but also blocked by robots.txt - but put into the index nonetheless. The problem was a dynamically generated parameter, always different, always blocked by robots.
The title was equal to the link text; the description became "A description for this result is not available because of this site's robots.txt – learn more." (If Google cannot crawl a URL Google will usually take titles from links pointing to that URL). No sign of disappearing. In fact, Google was happy to add more and more to its index ...
At the start of December 2015 I removed the robots.txt block - Google could now read the canonicals or noindex on the URLs ... the pages only began dropping out, slowly and in bunches of a few thousand in March 2016 - probably due to the very low relevancy and crawl budget assigned to them. Right now there are still about 24.000 pages in the index.
So my answer would be: No - disabling crawling in the robots.txt will NOT remove a page from the index. For that you need to noindex them (which sometimes also works if done in robots.txt, I've heard). Disallowing URLs in the robots.txt will very likely drop pages to the end of useful results, though, as Andy described. (I don't know if this has any influence on the general evaluation of the site as a whole; I'd guess not.)
Regards
Nico
-
Thanks Martijn. This is what I was assuming would happen. However, I got a confusing message from my developer which said the following,
"won't remove the URL's from the index but it will mean that they will only show up for very specific searches that customers are extremely unlikely to use. It will also increase Asgard's crawl budget as Google and Bing won't try to crawl these URLs. Would you be happy with this solution?"
I would tend to still agree with your statement though.
-
Yes they will be eventually. As you disallow Google to crawl the URLs it will probably start hiding the descriptions for some of these image pages soon as they can't crawl them anymore. Then at some point they'll stop looking at them at all.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console - Indexed Pages
I am performing a site audit and looking at the "Index Status Report" in GSC. This shows a total of 17 URLs have been indexed. However when I look at the Sitemap report in GSC it shows 9,000 pages indexed. Also, when I perform a site: search on Google I get 24,000 results. Can anyone help me to explain these anomalies?
Intermediate & Advanced SEO | | richdan0 -
How to make Google index your site? (Blocked with robots.txt for a long time)
The problem is the for the long time we had a website m.imones.lt but it was blocked with robots.txt.
Intermediate & Advanced SEO | | FCRMediaLietuva
But after a long time we want Google to index it. We unblocked it 1 week or 8 days ago. But Google still does not recognize it. I type site:m.imones.lt and it says it is still blocked with robots.txt What should be the process to make Google crawl this mobile version faster? Thanks!0 -
Will adding 1000's of outbound links to just a few website impact rankings?
I manage a large website that hosts 1000's of business listings that comprise an area that covers 7 state counties. Currently a category page (such as lodging) hosts a group of listings which then link to it's own page. From these pages links are present directly to the business it represents. The client is proposing that we change all listings to link to the representative county website and remove the individual pages. This essentially would create 1000's of external links to 7 different websites and remove 1000's of pages from our site.
Intermediate & Advanced SEO | | Your_Workshop
Does anyone have thoughts on how adding 1000's of links (potentially upwards of 3000) to only 7 websites (that I would deem relevant links) would affect SEO? I know if 1000's of links are added pointing to 1000's of websites the site can be considered a link farm, but I can't find any info online that speaks of a case like this.0 -
Google’s Hummingbird and Keyword Cannibalization
My client wants to have keywords added on every product with the product name , apparently some seo guru told him that hummingbird is all about key phrases and long tail keywords. As i know hummingbird lends to understand the intent and contextual meaning of the query. The issue is if I add the keywords on for e.g oak furniture on all of my product title,And we are using zen-cart platform and it will change the internal anchor text on the product listing page. It will cause a Cannibalization issue. Question1. I just need help to reply to client that adding keyword can cause detrimental to ranking. Question 2. If i am wrong then do we need to re optimise the site. I have read http://moz.com/blog/how-to-solve-keyword-cannibalization Many thanks.
Intermediate & Advanced SEO | | Adnan.Hassan.Khan0 -
Why is our page will not being found by google?
Hi, We have a page that went live nearly 2 months ago. https://www.invoicestudio.com/Secure/InvoiceTemplate Why does google not notice it. Both site: URL's return nothing. site:www.invoicestudio.com/Secure/InvoiceTemplate site:www.invoicestudio.com/Secure This is an important page for us and do not understand why google doesn't like it. Hope you can help Thanks Andrew
Intermediate & Advanced SEO | | Studio330 -
We're indexed in Google News, any tips or suggestions for getting traffic from news?
We have a news sitemap, and follow all best practices as outlined by Google for news. We are covering breaking stories at the same time as other publications, but have only made it to the front page of Google News once in the last few weeks. Does anyone have any tips, recommended reading, etc for how to get to the front page of Google News? Thanks!
Intermediate & Advanced SEO | | nicole.healthline0 -
No matter what I do, my website isn't showing up in search results. What's happening?
I've checked for meta-robots, all SEO tags are fixed, reindexed with google-- basically everything and it's not showing up. According to SEOMoz all looks fine, I am making a few fixes, but nothing terribly major. It's a new website, and i know it takes a while, but there is no movement here in a month. Any insights here?
Intermediate & Advanced SEO | | Wabash0 -
What would your Seo tactic's be for this
Hiya guys... Just a quicken, So my forum, talknightlife.co.uk is currently 10th on google for "nightlife forum" I have about 15 back links, 26 page autority. Now what i'm trying to do, which everyone else is doing, is trying to move it up a couple of spots maybe to 5th or something. What would your tactics be, I'm disregarding all the crap I read in the forums etc, you guys on here tend to have the best explanation. Let it rip 🙂 Cheers guys Luke.
Intermediate & Advanced SEO | | Lukescotty0