Sanity Check: NoIndexing a Boatload of URLs
-
Hi,
I'm working with a Shopify site that has about 10x more URLs in Google's index than it really ought to. This equals thousands of urls bloating the index. Shopify makes it super easy to make endless new collections of products, where none of the new collections has any new content... just a new mix of products. Over time, this makes for a ton of duplicate content.
My response, aside from making other new/unique content, is to select some choice collections with KW/topic opportunities in organic and add unique content to those pages. At the same time, noindexing the other 90% of excess collections pages.
The thing is there's evidently no method that I could find of just uploading a list of urls to Shopify to tag noindex. And, it's too time consuming to do this one url at a time, so I wrote a little script to add a noindex tag (not nofollow) to pages that share various identical title tags, since many of them do. This saves some time, but I have to be careful to not inadvertently noindex a page I want to keep.
Here are my questions:
-
Is this what you would do? To me it seems a little crazy that I have to do this by title tag, although faster than one at a time.
-
Would you follow it up with a deindex request (one url at a time) with Google or just let Google figure it out over time?
-
Are there any potential negative side effects from noindexing 90% of what Google is already aware of?
-
Any additional ideas?
Thanks! Best... Mike
-
-
Hi Michael
The problem you have is the very low value content that exists on all of those pages and the complete impossibility of writing any unique Titles, Descriptions and content. There are just too many of them.
With a footwear client of mine I no indexed a huge slug of tags taking the page count down by about 25% - we saw an immediate 22% increase in organic traffic in the first month. (March 18th 2017 - April 17th 2017) the duplicates were all size and colour related. Since canonicalising (I'm English lol) more content and taking the site from 25,000 pages to around 15,000 the site is now 76% ahead of last year for organics. This is real measurable change.
Now the arguments:
Canonicalisation
How are you going to canonicalise 10,000+ pages ? unless you have some kind of magic bullet you are not going to be able to but lets look at the logic.
Say we have a page of Widgets (brand) and they come in 7 sizes. When the range is fully in stock all of the brand/size pages will be identical to the brand page, apart from the title & description. So it would make sense to canonicalise back to the brand. Even when sizes started to run out, all of the sizes will be on the brand page. So size is a subset of the brand page.
Similar but not the same for colour. If colour is a tag then every colour sorted page will be on the brand page. So really they are the same page - just a slimmer selection. Now I accept that the brand page will contain all colours as it did all sizes but the similarity is so great - 95 % of the content being the same apart from the colour, that it makes sense to call them the same.
So for me Canonicalisation would be the way to go but it's just not possible as there are too many of them.
Noindex
The upside of noindex is that it is generally easier to put the noindex tag on the page as there is no URL to tag. The downside is that the page is then not indexed in Google so you lose a little juice - I would argue by the way that the chances of being found in Google for a size page is extremely slim, less than 2% of visits came from size pages before we junked them and most of those were from a newsletter so reality is <1% not worth bothering about You could leave off the nofollow so that Google crawls through all of the links on the pages - the better option.
Considering your problem and having experience of a number of sites with the same problem Noindex is your solution.
I hope that helps
Kind Regards
Nigel - Carousel Projects.
-
Hi Chris & Nigel,
Thank you for the considered responses. Good points about canonicalizing. A part I find frustrating is that the shared title tag across dozens or hundreds of pages will be across many different products/groups of products. So, the title tag is not a solid way to group canonicals.
Since the url patterns vary, I don't see how I could group these by which dozens or hundreds canonicalize to which one page, let alone make the change in Shopify other than one page at a time. My understanding is that this title tag manipulation is the only handle Shopify gives for making these bulk changes.
Gah!
So, here are my follow up questions:
-
How big of a negative is this in it's as-is state and how much better will noindexing most of the 90% make it Google Organic-wise? I ask because even the BS title tag to noindex project is a huge time suck.
-
If more is ever revealed about how to more efficiently group and canonicalize in Shopify, would adding the canonical after noindexing capture that lost authority later or would the previous noindex have irretrievably lost that?
-
Given all that, would you continue as I am?
Thanks! Best... Mike
-
-
Hi Mike
I see this a lot with sites that have a ton of tag groups. One site I am working on has 50,000 pages in Google caused by tags appending themselves to every version of a URL, the site only has 400 products. Example
Site/size-4
Site/womens/size-4
Site/womens/boots/size-4
Site/womens/boots/ankle/size-4
Site/womens/clarks/boots/size-4Etc etc - If there are other tags like colour and features, this can cause a huge 3 dimensional matrix of additional pages that can slow down the crawl of the site - Google may not crawl all of the site as a result.
If it's possible to canonicalse then that is the best option as juice and follows are retained - very often it would be the page with the tag lopped off that the tag should cite.
In extreme circumstances I would consider noindexing the pages as they offer very skinny content and rubbish Meta because it's impossible to handle them individually. I have seen significant improvement in organics as a result.
Personally I don't think it's enough to simply leave Google to figure it out although I have seen some sites with very high DA get away with it.
To be honest I am pretty shocked that Shopify doesn't have a feature to cope with this
Regards
Nigel
Carousel Projects.
-
Hello Michael Johnson and Mozzers,
I have seen Shopify do this a few times, though I do not have clients on that particular platform at the moment. It is frustrating. You're right to want to resolve this issue. Between duplicate content, authority conflicts, and an inflated crawl budget, one issue or another is bound to hold back site performance.
Is this what you would do? Not immediately, no. I want to see those pages canonicalized. That way, your preferred pages get all the juice back from their respective canonical link. Is this an option for you?
**Deindex request... and s_ide effects?**_ Canonical tags would make these part irrelevant (yay less work!). To be thorough though: I'd let Google figure it out unless you have strong evidence your crawl budget is maxed. And I don't see any negative side effects from noindexing duplicate content. If worse comes to worse, you have a good plan.
Shape that content,
CopyChrisSEO and the Vizergy Team
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
International Country URL Structure
Hey Guys, We have a www.site.com (gTLD) site, the primary market in Australia. We want to expand to US and UK. For the homepage, we are looking to create 3 new subfolders which are: site.com/au/ site.com/uk/ site.com/us/ Then if someone visits the site.com redirect based on their ip address to to the correct location. We are also looking to setup hreflang tags between the 3 sub-folders and set geo-location targeting in google search console at sub-folder level. Just wondering if this setup sounds ok for international SEO? Cheers.
Intermediate & Advanced SEO | | pladcarl90 -
How would you address these URLS
Hey Mozzers, long time no post. Just a quick one for you regarding URLS, this is an example of a url on a site https://www.thisismyurl.co.uk/products/spacehoppers/special-spacehopper.html Many of these pages are getting flagged for having a url that is too long. The target of this page is "special spacehoppers". Should i be concerned with the url being to long given my keyword is at the end? Would this be a suitable idea? https://www.thisismyurl.co.uk/p/spacehoppers/special.html Would changing products to p be worthwhile? It would remove length from nearly all urls but would require a site wide re-direct. 2)Would removing the "spacehoppers" bit from the url be worth it? Yes it would shorten the url but would also remove the exact keyword from the url which could be detrimental to rankings.
Intermediate & Advanced SEO | | ATP0 -
URLs with parameters + canonicals + meta robots
Hi Moz community! I'm posting a new question here as I couldn't find specific answer to the case I'm facing. Along with canonical tags, we are implementing meta robots on our pages (e-commerce website with thousands of pages). Most of the cases have been covered but I still have one unanswered case: our products are linked from list pages (mostly categories) but they almost always include a tracking parameter (ie /my-product.html?ref=xxx) products urls are secured with a canonical tag (referring only to the clean url /my-product.html) but what would be the best solution regarding the meta robots? For now we opted for a meta robot 'noindex, follow' for non canonical urls (so the ones unfortunately linked from our category/list pages), but I'm afraid that it could hurt our SEO (apparently no juice is given from URLs with a noindex robots), and even maybe prevent bots from crawling our website properly ... Would it be best to have no meta robots at all on these product urls with parameters? (we obviously can't have 'index, follow' when the canonical ref points to another url!). Thanks for your help!
Intermediate & Advanced SEO | | JessicaZylberberg0 -
URL Changes Twice in the Same Year
I've got a new client with a great site, great off-page optimization and some scars and a hangover from a bad developer relationship. I'd be so grateful for your thoughts on this situation: Some time in the not-too-distant-past, the website is established and new content is posted. We'll call this Alpha. In April 2015, the client migrates to WordPress, implementing 301 redirects on every content page because of the capitalization issues of the old CMS. That means Alpha URLs are redirecting to Betas. Problem is, the new Beta WordPress URLs are the the permalink structure: /%year%/%monthnum%/%postname%/ and update by default when the page content is updated meaning that any updates to existing content cause another 301. It's my belief that for evergreen content, dates in the URL do nothing to help you and might even hurt from a user-experience standpoint, if not a search engine one. So, naturally, I'd like to move to the simple/%postname%/ structure, which would be Gamma. So, here's how I think we should fix it. Step 1: Update the sitemap and navigation and make the desired URL (Gamma) structure the default and the canonical. Step 2: Change the Alpha -> Beta redirects to Alpha -> Gamma Step 3: Add Beta -> Gamma redirects Anyone done this in the past? Anyone have any problems with it?
Intermediate & Advanced SEO | | LindsayDayton0 -
Does Google Read URL's if they include a # tag? Re: SEO Value of Clean Url's
An ECWID rep stated in regards to an inquiry about how the ECWID url's are not customizable, that "an important thing is that it doesn't matter what these URLs look like, because search engines don't read anything after that # in URLs. " Example http://www.runningboards4less.com/general-motors#!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 Basically all of this: #!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 That is a snippet out of a conversation where ECWID said that dirty urls don't matter beyond a hashtag... Is that true? I haven't found any rule that Google or other search engines (Google is really the most important) don't index, read, or place value on the part of the url after a # tag.
Intermediate & Advanced SEO | | Atlanta-SMO0 -
Switching Url
I started working with a Roofer/Contractor about a year ago. His website is http://www.lancasterparoofing.com/. The name of his business is Spicher Home Improvements. He used to have spicherhomeimprovements.com, well he still does. He was focusing on Roofing and Siding but now would like to branch to other areas like Interior remodeling. So adding interior work under LancasterPaRoofing.com is not applicable. I do not think starting another domain and having two is the best option. I think he should go back to using SpicherHomeImprovements.com and I assume he would take a small hit but in time he should be better off. Plus the url is more applicable to the real name of his business. Thanks for any feedback I receive. Chad
Intermediate & Advanced SEO | | ChadEisenhart0 -
Will Canonical tag on parameter URLs remove those URL's from Index, and preserve link juice?
My website has 43,000 pages indexed by Google. Almost all of these pages are URLs that have parameters in them, creating duplicate content. I have external links pointing to those URLs that have parameters in them. If I add the canonical tag to these parameter URLs, will that remove those pages from the Google index, or do I need to do something more to remove those pages from the index? Ex: www.website.com/boats/show/tuna-fishing/?TID=shkfsvdi_dc%ficol (has link pointing here)
Intermediate & Advanced SEO | | partnerf
www.website.com/boats/show/tuna-fishing/ (canonical URL) Thanks for your help. Rob0 -
Best way to find all url parameters?
In reference to http://googlewebmastercentral.blogspot.com/2011/07/improved-handling-of-urls-with.html, what is the best way to find all of the parameters that need to be addressed? Thanks!
Intermediate & Advanced SEO | | nicole.healthline0