Roger keeps telling me my canonical pages are duplicates
-
I've got a site that's brand spanking new that I'm trying to get the error count down to zero on, and I'm basically there except for this odd problem. Roger got into the site like a naughty puppy a bit too early, before I'd put the canonical tags in, so there were a couple thousand 'duplicate content' errors. I put canonicals in (programmatically, so they appear on every page) and waited a week and sure enough 99% of them went away.
However, there's about 50 that are still lingering, and I'm not sure why they're being detected as such. It's an ecommerce site, and the duplicates are being detected on the product page, but why these 50? (there's hundreds of other products that aren't being detected). The URLs that are 'duplicates' look like this according to the crawl report:
http://www.site.com/Product-1.aspx
http://www.site.com/product-1.aspx
And so on. Canonicals are in place, and have been for weeks, and as I said there's hundreds of other pages just like this not having this problem, so I'm finding it odd that these ones won't go away.
All I can think of is that Roger is somehow caching stuff from previous crawls? According to the crawl report these duplicates were discovered '1 day ago' but that simply doesn't make sense. It's not a matter of messing up one or two pages on my part either; we made this site to be dynamically generated, and all of the SEO stuff (canonical, etc.) is applied to every single page regardless of what's on it.
If anyone can give some insight I'd appreciate it!
-
ThompsonPaul -
Thanks for that info, it pretty much nails exactly what I had discovered independently. This is an IIS7/Win2k8R2 install so luckily the rewriting is a bit easier than in previous iterations. The whole platform is hand coded by us (after the 10th ecommerce site or so you can generally do them in your sleep) so I don't have to worry about CMS implementation and the like, and luckily we already knew that about the spaces so they simply aren't allowed in the filenames. I'm in the middle of making a regex right now that is going to down-case anything in an href="" or src="" tag that will hopefully handle everything on the site side user-created or not. Will consider what to do in regards to external links a bit down the road I think.
-
Valery, you're definitely going to want to normalize your URLs to lowercase. This is a quirk of IIS that it actually respects case in URLs and will consider different case URLs as different pages.
In addition to the search engine problems it creates, it's also a major problem for usabilty - yours and your users. For example, a user who is trying to type in a direct URL can get a 404 error depending on what case they use.
More importantly, your Google Analytics will report on each of those version as separate pages, unless you write a normalizing filter into your GA profiles. Better to do that normalization for the actual site, not just your analytics
While rel=canonical can resolve a number of issues, I've always found it vastly better to correct the actual problem at its root, rather than rely on canonicalization as a catch-all. Anecdotally, I've found correcting issues like this with rewrites seems to allow affected pages to rank better than when just corrected with canonicalization. WIsh I could find time to do an actual case-study on that
Managing rewrites on IIS servers will require a plugin like asapi-rewrite as IIS doesn't handle it natively.
P.S. IIS will also allow and respect spaces in URLs. Users in Internet Explorer will see them as normal with spaces but browsers like Firefox will insert the html entity for a space (%20) into each necessary spot in the URL. This is again a mess for usability, so much better to force rewrite of all URLs to replace spaces with dashes when creating new pages. Many CMSs have plugins for this or you can also use sitewide rewrites to do it after the fact.
-
I think I get your point; the canonical is pointing to where the juice should go, but the URLs are still functionally different things. I'm guessing some sort of URL rewrite is in order, and to standardize how I do in-text links on the site (with user-editable content this part could be a pain).
-
Hey Valery,
I see those on closer inspection. I know it looks weird, but that's accurate. Your server must be UNIX or Linux so they will actually treat case as a different word.
For example: banana.com/pancakes.html would be treated differently than banana.com/PanCakes.html.
So if you have any pages generated dynamically or otherwise that differ only in case, then they will be tagged as duplicate.
In your CSV file you can see the duplicates being caused by case. I'd also be happy to help provide a few specific examples but would want to generate a ticket for you so we don't divulge any private information.
Cheers,
Joel.
-
Joel -
Thanks a lot for looking into that. The pages are very similar, so I'm not surprised they're being duplicate triggered; but what does surprise me is that they are apparently being considered duplicate to a canonical version of themselves? When I click on the duplicate list I'm expecting to see:
Product1.aspx
Product1-Blue.aspx
Product1-Red.aspx
But instead I'm seeing:
Product1.aspx
product1.aspx
product1.ASPX
And so on. The first scenario to me implies that the 3 pages are duplicate to each other, whereas the second is saying that there's either a canonical problem or I literally have different-case versions of those files.
-
Hi Valery,
I took a peek at your campaign and it looks like those few remaining duplicate pages are in fact different, but very minor differences. Basically there's pages for different sizes of things.
While being different, they vary in such minute ways that Roger see's them as duplicates.
I Hope that answers the question.
Thanks,
Joel.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content
Hello, I'm managing a site which shows as having duplicate page issues (in the crawl analyser) for 3 pages. Basically the site is offering 3 different options of the same product so depending on which size you select, you are directed to the relevant page. These 3 pages are basically identical apart from a slight difference in copy regarding the size (small, medium, large) Is this likely to be a big issue regarding SEO, and what would the moz community suggest re this? Thank you!
Moz Pro | | wearehappymedia0 -
Duplicate Pages
Hello, we have an issue which I'm hoping someone can help with. Our Moz system is saying that this page http://www.indigolittle.com/fees/ Is a duplicate page. We use this page purely for mobiles and we have added code to say This has been on for over a month now however Moz is still picking the page us as a High Priority Issue.
Moz Pro | | popcreativeltd0 -
No more than one canonical url Tag.
I just got the "no more than one canonical url TAG" for this page http://www.vacuumadvisers.com/1/electrolux-ultra-active-deep-clean-bagless-canister-vacuum-cleaner-review. I have no idea how to Fix that. Tried google it but none for Tag in particular. PS. I have changed the Theme recently therefore so did the URL Anyone?
Moz Pro | | bishop230 -
Duplicate content analysis
Good morning everyone, I have just run a test from SEOmoz PRO tool and I got more than 2,000 double content errors. How might I see which are the 2 pages whose content is double? My website is just newly revamped and can't find these similarities on my own: for me there are not). Thanks for your help! Francesca
Moz Pro | | astojanov0 -
Using the On-Page optimization Report Card
I am curious if there is a way to get the on page optimization report card to not show a grade of "F" for a page that I'm not targeting a particular keyword for? For example my home page has a lot of grade "F's" for keywords that are targeted on different pages.
Moz Pro | | kadesmith0 -
This Rookie needs help! Duplicate content pages dropped significantly.
So I am pretty new to SEO Moz. I have an e-commerce site and recently did a website redesign. However, not without several mistakes and issues. That said, when SEO Moz did a crawl of my site, the results showed A LOT of Duplicate Content Pages on my site due to my having one item in many variations. It was almost over whelming and because the number of pages was so high, I have been trying to research ways to correct it quickly. The latest crawl from yesterday shows a drastic drop in the number of duplicate content pages and a slight increase in pages with too long page titles (which is fixable). I am embarrassed to give the number of duplicate pages that were showing but, just know, it's been reduced to a third of the amount. I am just wondering if I missed something and should I be happy or concerned? Has there been a change that could have caused this? Thanks for helping this rookie out!
Moz Pro | | AvenueSeo0 -
How fast can page authority be grown
I understand that it is easier to rank for a particular keyword given a higher DA score. How fast can page authority be established and grown for a given keyword if DA is equal to 10/20/30/50? What are the relative measures that dictate the establishment and growth of this authority? Can it be enumerated to a percentage of domain links? or a percentage of domain links given an assumed C-Block ratio? For example you have a website with DA of 40, and you want to target a new keyword, the average PA of the top ranked pages is 30, the average domain links are 1,000, and the average number of linking domains is 250 - if you aim to build 1,000 links per month from 500 linking domains, how fast can you approximate the establishment of page authority for the keyword?
Moz Pro | | NickEubanks0 -
On-Page Keyword Optimization Question
First let me say I want to improve the text of the site I am working on focusing on the site visitor in the first instance. I run the "On-Page Keyword Optimization" The page fails on "Avoid Keyword Stuffing in Document... ...Occurrences of Keyword 48" well over the limit of 15. The occurrence include those in the site navigation and strapline, but it was my understanding that Google was aware of nav areas/areas common to most other pages on the site and that keywords in these areas weren't viewed as being part of the page content. The keyword is the main keyword for the company, and the page is the home page i.e. "acme widgets" the others are "acme widgets for the home"... well you get the idea: The page breaks down as follows: 5 instances in primary nav 1 instance strapline 3 instances secondary nav Remainder in page body I am told by the tool to reduce to 15 instances, so should I? Have 9 instances in the nav and other areas and 6 or so on the page Have 9 instances in the nav and other areas and 15 or so on the page
Moz Pro | | GrouchyKids0