150+ Pages of URL Parameters - Mass Duplicate Content Issue?

Richard-Kitmondo

Hi we run a large e-commerce site and while doing some checking through GWT we came across these URL parameters and are now wondering if we have a duplicate content issue.

If so, we are wodnering what is the best way to fix them, is this a task with GWT or a Rel:Canonical task?

Many of the urls are driven from the filters in our category pages and are coming up like this: page04%3Fpage04%3Fpage04%3Fpage04%3F (See the image for more).

Does anyone know if these links are duplicate content and if so how should we handle them?

Richard

I7SKvHS

TomRayner

Hi Richard

Honestly, I really don't know. A lot of me wants to say that: "Surely Google will know this isn't deliberate and manipulative duplicate content". You could take a couple of those URLs and do a Google search with them. Do:

site:www.example.com/page?query1
info:www.example.com/page?query1

With the first result, if your URL hasn't been indexed, that's a good thing. For the second result, if the info search returns the original URL (without the parameters), that's also good, as it means Google will be counting the one with parameters as just a variation and to be ignored. However, if it's returning the result with the parameters, that would indicate that the web crawler is indexing the version with parameters and treating it as a separate URL - raising the duplicate content risk. Silly Google!

Regardless of those results, I would look to implement the canonical tag anyway as it takes any guesswork out of the equation. And ultimately, a lot of this work with Google is guesswork as we can't see the algorithm - although it's an informed guess due to experience etc.

Richard-Kitmondo

Thanks for this Tom, great answer!

So am I right in thinking that each of these URL Parameters are very likely being classed as duplicate content?

PatrickDelehanty

Along with this great answer from Tom, I just wanted to add that Google does offer a resource on duplicate content as well with tips.

Hope this helps as well - good luck!

TomRayner

Hi Richard

It is something you should address ASAP. While I believe that Google is a lot better at recognising 'accidental' duplicate content - IE URLs with URL parameters - and distinguishing it from 'deliberate' duplicate content - just outright stealing someone's work or trying to rank several pages for multiple terms - that is only my assumption. To be completely sure, let's stop any chance of Google penalising these pages.

I think, in this instance, a rel canonical tag should do the trick. You can read more on the tag here in Moz's guide. Basically, on the page(s) where you're having this problem add a "self-referring" canonical tag. For example, if the page was http://www.example.com/blue-widgets/, the tag would be:

Make sure that, when you implement this, the pages that are generated with the URL parameters aren't also creating canonical tags like:

They should all have the original canonical tag.

What this will do is tell Google that "If you see any pages with this tag, we're aware that they might be duplicate, but please only count and index the http://www.example.com/blue-widgets/". It works just like a 301 redirect in that sense.

I think this would be the simplest solution for you to implement. If you're having problems, there would be a way of blocking access to pages with certain query/URL parameters by using the robots.txt file, but that could get quite messy.

Hope this helps

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

150+ Pages of URL Parameters - Mass Duplicate Content Issue?

Browse Questions

Explore more categories

Related Questions

Duplicate Page Content

Are image pages considered 'thin' content pages?

Duplicated rel=author tags (x 3) on WordPress pages, any issue with this?

Page Content

Our Development team is planning to make our website nearly 100% AJAX and JavaScript. My concern is crawlability or lack thereof. Their contention is that Google can read the pages using the new #! URL string. What do you recommend?

Duplicate content handling.

CGI Parameters: should we worry about duplicate content?

About duplicate content