How to get a large number of urls out of Google's Index when there are no pages to noindex tag?
-
Hi,
I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down.
The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag.
I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler.
Any ideas? Thanks! Best... Michael
P.S.,
All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.
-
If no pages which support web coding actually exist for the URLs you want to remove from Google's index, I'd probably use the HTTP header instead. Maybe use the X-Robots directives:
- https://yoast.com/x-robots-tag-play/
- https://www.searchenginejournal.com/x-robots-tag-simple-alternate-robots-txt-meta-tag/67138/
Even if you have no page with web-code, you can always have a HTTP Header. A HTTP header simply allows a client and / or server to fire additional information through 'requests' (post / get etc).
This is the only thing I can think of which would really help. Some people might suggest robots.txt wildcards, but robots.txt handles crawling and not indexation (so those answers wouldn't really be worth anything to you)
The other thing you could do (maybe combine this with the X-Robots stuff) is to get all of those URLs to serve status code 410 (gone) instead of 404 (temporarily gone, but coming back)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Alternate page with proper canonical tag Status: Excluded in Google webmaster tools.
In Google Webmaster Tools, I have a coverage issue. I am getting this error message: Alternate page with proper canonical tag Status: Excluded. It gives the below blog post page as an example. Any idea how to resolve? At one time, I was using handl utm grabber, but the plugin is deactivated on my website. https://www.savacations.com/turrialba-costa-ricas-garden-city/?utm_source=deleted&utm_medium=deleted&utm_term=deleted&utm_content=deleted&utm_campaign=deleted&gclid=deleted5.
Intermediate & Advanced SEO | | Alancito0 -
A client rebranded a few years ago and doesn't want to be associated with it's old brand name. He wishes not to appear when the old brand is searched in Google, is there something we can do?
The problem is there was redirection between the old branded site and the new one, and now when you type in the name of the old brand, the new one comes up. I have desperately tried to convince this client there is nothing we can do about it, dozens of news articles crop up with the two brands together as this was a hot topic a few years ago, but just in case I missed something I thought I'd ask the community of experts here on Moz. An example for this would be Tyco Healthcare that became covidien in 2007. When you type tyco healthcare, covidien crops up here and there. Any ideas? Thanks!
Intermediate & Advanced SEO | | Netsociety0 -
Google Search Analytics How to Get Search Keywords for a Page?
How do I get the keywords coming into a page on the new Google Webmaster Tools Search Analytics? Used to be there in the old version. You would just view your most popular urls and when you expanded the urls you would see the terms driving the traffic. How do I see the most popular keyword queries for a given page in the new tool? Alternatively can I still use the old tool somehow?
Intermediate & Advanced SEO | | K-WINTER0 -
Problem: Magento prioritises product URL's without categories?
HI there, we are moving a website from Shoptrader to Magento, which has 45.000 indexations.
Intermediate & Advanced SEO | | onlinetrend
yes shoptrader made a bit of a mess. Trying to clean it up now. there is a 301 redirect list of all old URL's pointing to the new one product can exist in multiple categories want to solve this with canonical url’s for instance: shoptrader.nl/categorieA/product has 301 redirect towards magento.nl/nl/categorieA/product shoptrader.nl/categorieA/product-5531 has 301 redirect towards magento.nl/nl/categorieA/product shoptrader.nl/categorieA/product¤cy=GBP has 301 redirect towards magento.nl/nl/categorieA/product shoptrader.nl/categorieB/product has 301 redirect towards magento.nl/nl/categorieB/product, has canonical tag towards magento.nl/nl/categorieA/product shoptrader.nl/categorieB/product?language=nl has 301 redirect towards magento.nl/nl/categorieB/product, has canonical tag towards magento.nl/nl/categorieA/product Her comes the problem:
New developer insists on using /productname as canonical instead of /category/category/productname, since Magento says so. The idea is now to redirect to /category/category/productname and there will be a canonical URL on these pages pointing to /productname, loosing some link juice twice. So in the end indexation will take place on /productname … if Google picks it up the 301 + canonical. Would be more adviseable to direct straight to /productname (http://moz.com/community/q/is-link-juice-passed-through-a-301-and-a-canonical-tag), but I prefer to point to one URL with categories attached. Which has more advantages(?): clear menustructure able to use subfolders in mobile searchresults missing breadcrumb What would you say?0 -
Pagination and View All Pages Question. We currently don't have a canonical tag pointing to View all as I don't believe it's a good user experience so how best we deal with this.
Hello All, I have an eCommerce site and have implemented the use rel="prev" and rel="next" for Page Pagination. However, we also have a View All which shows all the products but we currently don't have a canonical tag pointing to this as I don't believe showing the user a page with shed loads of products on it is actually a good user experience so we havent done anything with this page. I have a sample url from one of our categories which may help - http://goo.gl/9LPDOZ This is obviously causing me duplication issues as well . Also , the main category pages has historically been the pages which ranks better as opposed to Page 2, Page 3 etc etc. I am wondering what I should do about the View All Page and has anyone else had this same issue and how did they deal with it. Do we just get rid of the View All even though Google says it prefers you to have it ? I also want to concentrate my link juice on the main category pages as opposed being diluted between all my paginated pages ? - Does anyone have any tips on how to best do this and have you seen any ranking improvement from this ? Any ideas greatly appreciated. thanks Peter
Intermediate & Advanced SEO | | PeteC120 -
Content From One Domain Mysteriously Indexing Under a Different Domain's URL
I've pulled out all the stops and so far this seems like a very technical issue with either Googlebot or our servers. I highly encourage and appreciate responses from those with knowledge of technical SEO/website problems. First some background info: Three websites, http://www.americanmuscle.com, m.americanmuscle.com and http://www.extremeterrain.com as well as all of their sub-domains could potentially be involved. AmericanMuscle sells Mustang parts, Extremeterrain is Jeep-only. Sometime recently, Google has been crawling our americanmuscle.com pages and serving them in the SERPs under an extremeterrain sub-domain, services.extremeterrain.com. You can see for yourself below. Total # of services.extremeterrain.com pages in Google's index: http://screencast.com/t/Dvqhk1TqBtoK When you click the cached version of there supposed pages, you see an americanmuscle page (some desktop, some mobile, none of which exist on extremeterrain.com😞 http://screencast.com/t/FkUgz8NGfFe All of these links give you a 404 when clicked... Many of these pages I've checked have cached multiple times while still being a 404 link--googlebot apparently has re-crawled many times so this is not a one-time fluke. The services. sub-domain serves both AM and XT and lives on the same server as our m.americanmuscle website, but answer to different ports. services.extremeterrain is never used to feed AM data, so why Google is associating the two is a mystery to me. the mobile americanmuscle website is set to only respond on a different port than services. and only responds to AM mobile sub-domains, not googlebot or any other user-agent. Any ideas? As one could imagine this is not an ideal scenario for either website.
Intermediate & Advanced SEO | | andrewv0 -
How can I see all the pages google has indexed for my site?
Hi mozers, In WMT google says total indexed pages = 5080. If I do a site:domain.com commard it says 6080 results. But I've only got 2000 pages in my site that should be indexed. So I would like to see all the pages they have indexed so I can consider noindexing them or 404ing them. Many thanks, Julian.
Intermediate & Advanced SEO | | julianhearn0 -
Understanding Google's keyword tool...
When I type in Google a keyword like : boot camp I get results that show Bootcamp (one word) traffic in the tens of thousands. I see many words combined. Does this mean that tens of thousands of people every month are misspelling that keyword? How should I interpret this in terms of anchor texting? I would hate to deliberately misspell it on my website just to get traffic. For those interested, my website is: http://ultimatebasictraining.com/admin/ (currently revaming my http://ultimatebasictraining.com website)
Intermediate & Advanced SEO | | StreetwiseReports0