Roger keeps telling me my canonical pages are duplicates
-
I've got a site that's brand spanking new that I'm trying to get the error count down to zero on, and I'm basically there except for this odd problem. Roger got into the site like a naughty puppy a bit too early, before I'd put the canonical tags in, so there were a couple thousand 'duplicate content' errors. I put canonicals in (programmatically, so they appear on every page) and waited a week and sure enough 99% of them went away.
However, there's about 50 that are still lingering, and I'm not sure why they're being detected as such. It's an ecommerce site, and the duplicates are being detected on the product page, but why these 50? (there's hundreds of other products that aren't being detected). The URLs that are 'duplicates' look like this according to the crawl report:
http://www.site.com/Product-1.aspx
http://www.site.com/product-1.aspx
And so on. Canonicals are in place, and have been for weeks, and as I said there's hundreds of other pages just like this not having this problem, so I'm finding it odd that these ones won't go away.
All I can think of is that Roger is somehow caching stuff from previous crawls? According to the crawl report these duplicates were discovered '1 day ago' but that simply doesn't make sense. It's not a matter of messing up one or two pages on my part either; we made this site to be dynamically generated, and all of the SEO stuff (canonical, etc.) is applied to every single page regardless of what's on it.
If anyone can give some insight I'd appreciate it!
-
ThompsonPaul -
Thanks for that info, it pretty much nails exactly what I had discovered independently. This is an IIS7/Win2k8R2 install so luckily the rewriting is a bit easier than in previous iterations. The whole platform is hand coded by us (after the 10th ecommerce site or so you can generally do them in your sleep) so I don't have to worry about CMS implementation and the like, and luckily we already knew that about the spaces so they simply aren't allowed in the filenames. I'm in the middle of making a regex right now that is going to down-case anything in an href="" or src="" tag that will hopefully handle everything on the site side user-created or not. Will consider what to do in regards to external links a bit down the road I think.
-
Valery, you're definitely going to want to normalize your URLs to lowercase. This is a quirk of IIS that it actually respects case in URLs and will consider different case URLs as different pages.
In addition to the search engine problems it creates, it's also a major problem for usabilty - yours and your users. For example, a user who is trying to type in a direct URL can get a 404 error depending on what case they use.
More importantly, your Google Analytics will report on each of those version as separate pages, unless you write a normalizing filter into your GA profiles. Better to do that normalization for the actual site, not just your analytics
While rel=canonical can resolve a number of issues, I've always found it vastly better to correct the actual problem at its root, rather than rely on canonicalization as a catch-all. Anecdotally, I've found correcting issues like this with rewrites seems to allow affected pages to rank better than when just corrected with canonicalization. WIsh I could find time to do an actual case-study on that
Managing rewrites on IIS servers will require a plugin like asapi-rewrite as IIS doesn't handle it natively.
P.S. IIS will also allow and respect spaces in URLs. Users in Internet Explorer will see them as normal with spaces but browsers like Firefox will insert the html entity for a space (%20) into each necessary spot in the URL. This is again a mess for usability, so much better to force rewrite of all URLs to replace spaces with dashes when creating new pages. Many CMSs have plugins for this or you can also use sitewide rewrites to do it after the fact.
-
I think I get your point; the canonical is pointing to where the juice should go, but the URLs are still functionally different things. I'm guessing some sort of URL rewrite is in order, and to standardize how I do in-text links on the site (with user-editable content this part could be a pain).
-
Hey Valery,
I see those on closer inspection. I know it looks weird, but that's accurate. Your server must be UNIX or Linux so they will actually treat case as a different word.
For example: banana.com/pancakes.html would be treated differently than banana.com/PanCakes.html.
So if you have any pages generated dynamically or otherwise that differ only in case, then they will be tagged as duplicate.
In your CSV file you can see the duplicates being caused by case. I'd also be happy to help provide a few specific examples but would want to generate a ticket for you so we don't divulge any private information.
Cheers,
Joel.
-
Joel -
Thanks a lot for looking into that. The pages are very similar, so I'm not surprised they're being duplicate triggered; but what does surprise me is that they are apparently being considered duplicate to a canonical version of themselves? When I click on the duplicate list I'm expecting to see:
Product1.aspx
Product1-Blue.aspx
Product1-Red.aspx
But instead I'm seeing:
Product1.aspx
product1.aspx
product1.ASPX
And so on. The first scenario to me implies that the 3 pages are duplicate to each other, whereas the second is saying that there's either a canonical problem or I literally have different-case versions of those files.
-
Hi Valery,
I took a peek at your campaign and it looks like those few remaining duplicate pages are in fact different, but very minor differences. Basically there's pages for different sizes of things.
While being different, they vary in such minute ways that Roger see's them as duplicates.
I Hope that answers the question.
Thanks,
Joel.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does one page rank while a similar page doesn't?
We have a blog post (actually several of them that have the same SEO characteristics) that brings a fair amount of traffic to our site (relatively speaking), and according to Google Webmaster Tools it averages in the top 10 in SERPs for various terms. This page has no external links to it, and very few internal links pointing to it. When I run Moz's On-Page Grader for the various keywords it ranks for, the page get's an F on all of them. It was not optimized for any keyword, and it isn't our best content; it was a blog post written a few years ago and forgotten about and was never promoted in any way. The topic does happen to be about something that people search for frequently. According to the keyword difficulty tool, all of the keywords it ranks for have 40-45% difficulty. We have lots of other pages on our site that we have tried to optimize and that get A's and B's in On-Page Grader, that have both internal (from the home page and main menu) and external links pointing at them, etc, but they don't rank well at all. Keyword difficulty for these keywords is in the same range, from 37 - 53%. Why does this one page rank so well when the other pages don't? Additionally, we have been looking at a competitor who has a page that ranks #1 in universal results for numerous keywords according to SEMRush, yet the page gets an F On-page Grader for those keywords. The page has 3 links to it, all from the same domain and it has a very low domain and page authority. The Domain Authority of this page is 47 and the page authority is 33 according to Open Site Explorer (compared with our DA of 30, and PA of 1), and the social metrics are a bit higher than ours, but neither has a lot (they may have 15 likes to our 10). Why does this page rank so well for them? How can we get our Page Authority higher? Thanks for any and all help.
Moz Pro | | mukunig1 -
Why am I getting all these duplicate pages?
This is going for basically all my pages, but my website has 3 'duplicates' as the rest just have 2 (no index) Why are these 3 variations counting as duplicate pages? http://www.homepage.com http://homepage.com http://www.hompage.com/index.php
Moz Pro | | W2GITeam0 -
1 page crawled ... and other errors
1. Why is only one (1) page crawled every second time you crawl my site? 2. Why do your bot not obey the rules specified in the robots.txt? 3. Why does your site constantly loose connection to my facebook account/page? This means that when ever i want to compare performance i need to re-authorize, and therefor can not see any data until next time. Next time i also need to re-authorize ... 4. Why cant i add a competitor twitter account? What ever i type i get an "uh oh account cannot be tracked" - and if i randomly succeed, the account added never shows up with any data. It has been like this for ages. If have reported these issues over and over again. We are part of a large scandinavian company represented by Denmark, Sweden, Norway and Finland. The companies are also part of a larger worldwide company spreading across England, Ireland, Continental Europe and Northern Europe. I count at least 10 accounts on Seomoz.org We, the Northern Europe (4 accounts) are now reconsidering our membership at seomoz.org. We have recently expanded our efforts and established a SEO-community in the larger scale businees spanning all our countries. Also in this community we are now discussing the quality of your services. We'll be meeting next time at 27-28th of june in London. I hope i can bring some answers that clarify the problem we have seen here on seomoz.org. As i have written before: I love your setup and you tools - when they work. Regretebly, that is only occasionally the case!
Moz Pro | | alsvik1 -
Too many on-page links
I received a warning in my most recent report for too many on-page links for the following page: http://www.fateyes.com/blog/. I can't figure out why this would be. I am counting between 60-70 including all pull downs, "read more's", archive, category and a few additional misc. links. Any ideas or suggestions on this? Or what I might do to rectify? Perhaps it's just an SEOmoz report blip... We currently don't have the post list rolling to additional pages so it's kind of passively set up to be endless, but it's in the works.
Moz Pro | | gfiedel0 -
How do I scan down to 10000 pages?
Hi very new here I have set up 5 campaigns, all of fairly large sites. It appears seomoz has scanned 4 of them down to 250 and 1 down to 10000. the one a really want to see down to 10000, my own site is the one I started scanning first well over a week ago. How do I get seomoz to scan further? Thanks
Moz Pro | | First-VehicleLeasing0 -
Duplicate page title
I own a store www.mzube.co.uk and the scam always says that I have duplicate page titles or duplicate page. What happens is thn I may have for example www.mzube.co.uk/allproducts/page1. And if I hve 20 pages all what will change from each page is the number at the end and all the rest of the page name will be the same but really the pages are if different products. So the scans think I have 20 pages the same but I havent Is this a concern as I don't think I can avoid this Hope you can answer
Moz Pro | | mzube0 -
Duplicate content & canonicals
Hi, Working on a website for a company that works in different european countries. The setup is like this: www.website.eu/nl
Moz Pro | | nvs.nim
www.website.eu/be
www.website.eu/fr
... You see that every country has it's own subdir, but NL & BE share the same language, dutch... The copywriter wrote some unique content for NL and for BE, but it isn't possible to write unique for every product detail page because it's pretty technical stuff that goes into those pages. Now we want to add canonical tags to those identical product pages. Do we point the canonical on the /be products to /nl products or visa versa? Other question regarding SEOmoz: If we add canonical tags to x-pages, do they still appear in the Crawl Errors "duplicate page content", or do we have to do our own math and just do "duplicate page content" minus "Rel canonical" ?0 -
One page per campaign?
Not quite sure if I read correctly, but is it correct that one campaign tracks only one page of my site? So if I wanted to track something like a services page, this would require a second campaign?
Moz Pro | | GroundFloorSEO0