Why do I get duplicate pages, website referencing the capital version of the url vs the lowercase www.agi-automation.com/Pneumatic-grippers.htm
-
Can I the rel=canonical tag this?
-
I'm not a pro when it comes to technical server set ups, so maybe Keri can jump in with some better knowledge.
It seems to me like you have everything set up on your server correctly. And it looks like Google currently has only one version indexed of the original page in question.
You site navigation menu points to the capitalized version of the URL, but somewhere on your site there must be a link that points to the lowercase version which would explain how SEOmoz found the duplication when crawling your site, and if SEOmoz can find, so can Google.
I still think you should use the rel=canonical attribute just to be safe. Again, I'm not that great at technical stuff. Sorry I couldn't be of more help here.
Tim
-
Hi Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Lin...). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Hi Keri and Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Linear-EscapeMents.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Excellent points, Keri. I hadn't thought about either of those issues. Using a redirect is definitely the best way to go.
-
I'd vote for doing the rewrite to the lowercase version. This gives you a couple of added benefits:
-
If people copy and paste the URL from their browser then link to it, you're getting all the links going to the same place.
-
Your analytics based on your URLs will be more accurate. Instead of seeing:
urla.htm 70 visits
urlb.htm 60 visits
urlB.htm 30 visitsYou'll see
urlb.htm 90 visits
urla.htm 70 visits -
-
The problem is that search engines view these URLs as two separate pages, so both pages get indexed and you run into duplication issues.
Yes, using rel=canonical is a good way to handle this. I would suggest using the lowercase version as your canonical page, so you would place this bit of HTML on both pages:
The other option is to create a 301 redirect from the caps version to the lowercase version. This would ensure that anyone arriving at the page (including search engine bots) would end up being directed to the lowercase version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks
Can you help with how to fix this 404 error please? It appears that I have a redirect from one page to the other, although the referring page URL works, but it appears to be linking to another URL with this code at the end of the the URL - %5Bnull%20id=43484%5D that I'm struggling to find and fix. Thanks
Technical SEO | | Nichole.wynter20200 -
My WP website got attack by malware & now my website site:www.example.ca shows about 43000 indexed page in google.
Hi All My wordpress website got attack by malware last week. It affected my index page in google badly. my typical site:example.ca shows about 130 indexed pages on google. Now it shows about 43000 indexed pages. I had my server company tech support scan my site and clean the malware yesterday. But it still shows the same number of indexed page on google. Does anybody had ever experience such situation and how did you fixed it. Looking for help. Thanks FILE HIT LIST:
Technical SEO | | Chophel
{YARA}Spam_PHP_WPVCD_ContentInjection : /home/example/public_html/wp-includes/wp-tmp.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-includes/wp-vcd.php
{YARA}Backdoor_PHP_WPVCD_Deployer : /home/example/public_html/wp-content/themes/oceanwp.zip
{YARA}webshell_webshell_cnseay02_1 : /home/example2/public_html/content.php
{YARA}eval_post : /home/example2/public_html/wp-includes/63292236.php
{YARA}webshell_webshell_cnseay02_1 : /home/example3/public_html/content.php
{YARA}eval_post : /home/example4/public_html/wp-admin/28855846.php
{HEX}php.generic.malware.442 : /home/example5/public_html/wp-22.php
{HEX}php.generic.cav7.421 : /home/example5/public_html/SEUN.php
{HEX}php.generic.malware.442 : /home/example5/public_html/Webhook.php0 -
My SEO friend says my website is not being indexed by Google considering the keywords he has placed in the page and URL what does that mean?
My SEO friend says my website is not being indexed by Google considering the keywords he has placed in the page and URL what does that mean? We have added some text in the pages with keywords thats related the page
Technical SEO | | AlexisWithers0 -
Is my website is over optimized for ON page SEO?
The keyword for the page is “locksmith Logan” based in: Brisbane, Queensland, Australia Is webpage over used main keyword 'Logan locksmith' and what other areas are for improvement.
Technical SEO | | bondhoward0 -
Is it better to use XXX.com or XXX.com/index.html as canonical page
Is it better to use 301 redirects or canonical page? I suspect canonical is easier. The question is, which is the best canonical page, YYY.com or YYY.com/indexhtml? I assume YYY.com, since there will be many other pages such as YYY.com/info.html, YYY.com/services.html, etc.
Technical SEO | | Nanook10 -
Www vs no-www duplicate fix?
Hi all, I have more or less published two versions of our site. One on "www" and one without. And of course we uncovered it during our SEO crawl as "duplicate" content/titles. My guess (hope) is this is something that can be easily fixed on the server side, but I don't have a lot of knowledge around it. Does anyone know?
Technical SEO | | Becky_Converge0 -
Duplicate content /index.php/ issues
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.: www.mysite.com/category/article and www.mysite.com/index.php/category/article Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out. Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
Technical SEO | | b18turboef1 -
How do I eliminate duplicate page titles?
Almost...I repeat almost all of my duplicate page titles show up as such because the page is being seen twice in the crawl. How do I prevent this? <colgroup><col width="336"> <col width="438"></colgroup>
Technical SEO | | ENSO
| www.ensoplastics.com/ContactUs/ContactUs.html | Contact ENSO Plastics |
| ensoplastics.com/ContactUs/ContactUs.html | Contact ENSO Plastics | This is what is from the CSV...there are many more just like this. How do I cut out all of these duplicate urls?0