How to get a large number of urls out of Google's Index when there are no pages to noindex tag?
-
Hi,
I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down.
The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag.
I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler.
Any ideas? Thanks! Best... Michael
P.S.,
All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.
-
If no pages which support web coding actually exist for the URLs you want to remove from Google's index, I'd probably use the HTTP header instead. Maybe use the X-Robots directives:
- https://yoast.com/x-robots-tag-play/
- https://www.searchenginejournal.com/x-robots-tag-simple-alternate-robots-txt-meta-tag/67138/
Even if you have no page with web-code, you can always have a HTTP Header. A HTTP header simply allows a client and / or server to fire additional information through 'requests' (post / get etc).
This is the only thing I can think of which would really help. Some people might suggest robots.txt wildcards, but robots.txt handles crawling and not indexation (so those answers wouldn't really be worth anything to you)
The other thing you could do (maybe combine this with the X-Robots stuff) is to get all of those URLs to serve status code 410 (gone) instead of 404 (temporarily gone, but coming back)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
ECommerce Replatforming URL's
We are in the process of re-platforming our eCommerce site to Magento 2. For the most part, the majority of site content will remain the same. Unfortunately on our current platform, we have been inconsistent with the use of .html as a URL suffix. As a result, our category and product pages are half and half - /stainless-steel-hardware.html
Intermediate & Advanced SEO | | BoatOutfitters
&
/stainless-steel-hardware We are considering taking the opportunity to clean up and standardize our URLs. (Drop the .html from all URLs on the new site and 301 redirect these to the same URL without the .html) Our concern is that many of the .html pages are good categories with strong page rank and I've read many articles about page rank loss from 301 redirects. We are debating internally if it really makes sense to take an SEO hit for something is seemingly small as dropping the .html from the URL. It would be a no-brainer if we were taking the opportunity to change to more SEO friendly natural language URLs. However currently our URL's appear acceptable with the exception of the inconsistent suffix. Thanks in advance for any insight on how you would approach this!2 -
301's - Do we keep the old sitemap to assist google with this ?
Hello Mozzers, We have restructured our site and have done many 301 redirects to our new url structure. I have seen one of my competitors have done similar but they have kept the old sitemap to assist google I guess with their 301's as well. At present we only have our new site map active but am I missing a trick by not have the old one there as well to assist google with 301's. thanks Pete
Intermediate & Advanced SEO | | PeteC120 -
After reading of Google's so called "over-optimization" penalty, is there a penalty for changing title tags too frequently?
In other words, does title tag change frequency hurt SEO ? After changing my title tags, I have noticed a steep decline in impressions, but an increase in CTR and rankings. I'd like to once again change the title tags to try and regain impressions. Is there any penalty for changing title tags too often? From SEO forums online, there seems to be a bit of confusion on this subject...
Intermediate & Advanced SEO | | Felix_LLC0 -
What is a "Bad Link" in Google's eyes? Low DA?
Hi there, I'm going through my link profile and I noticed I have a few links that are from <10 DA sites. One has a DA of 6. Should I remove these? Aside from any referral traffic I receive from these links (I know there is none), are these links hurting me?
Intermediate & Advanced SEO | | Travis-W
What should I look out for in a site I may guest post on? Thanks!
Travis0 -
If I only Link to Page via Sitemap, can it still get indexed?
Hi there! I am creating a ton of content for specific geographies. Is it possible for these pages to get indexed if I only put them in my sitemap and don't link to them through my actual site (though the pages will be live). Thanks!
Intermediate & Advanced SEO | | Travis-W
Travis0 -
How to do a 301 redirect for url's with this structure?
In an effort to clean up my url's I'm trying to shorten them by using a 301 redirect in my .htaccess file. How would I set up a rule to grab all urls with a specific structure to a new shorter url examples: http://www.yakangler.com/articles/reviews/other-reviews/item/article-title http://www.yakangler.com/reviews/article-title So in the example above dynamically redirect all url's with /articles/reviews/other-reviews/item/ in it to /reviews/ so http://www.yakangler.com/articles/reviews/boat-reviews/item/1550-review-nucanoe-frontier http://www.yakangler.com/articles/reviews/other-reviews/item/1551-review-spyderco-salt http://www.yakangler.com/articles/reviews/fishing-gear-reviews/item/1524-slayer-inc-sinister-swim-tail would be... http://www.yakangler.com/reviews/1550-review-nucanoe-frontier http://www.yakangler.com/reviews/1551-review-spyderco-salt http://www.yakangler.com/reviews/1524-slayer-inc-sinister-swim-tail with one 301 redirect rule in my .htaccess file.
Intermediate & Advanced SEO | | mr_w0 -
Why will google not index my pages?
About 6 weeks ago we moved a subcategory out to becomne a main category using all the same content. We also removed 100's of old products and replaced these with new variation listings to remove duplicate content issues. The problem is google will not index 12 critcal pages and our ranking have slumped for the keywords in the categories. What can i do to entice google to index these pages?
Intermediate & Advanced SEO | | Towelsrus0 -
Tool to calculate the number of pages in Google's index?
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0