Why do I get duplicate pages, website referencing the capital version of the url vs the lowercase www.agi-automation.com/Pneumatic-grippers.htm

AGIAutomation

Can I the rel=canonical tag this?

http://www.agi-automation.com/Pneumatic-grippers.htm****http://www.agi-automation.com/pneumatic-grippers.htm

TimKelsey

I'm not a pro when it comes to technical server set ups, so maybe Keri can jump in with some better knowledge.

It seems to me like you have everything set up on your server correctly. And it looks like Google currently has only one version indexed of the original page in question.

You site navigation menu points to the capitalized version of the URL, but somewhere on your site there must be a link that points to the lowercase version which would explain how SEOmoz found the duplication when crawling your site, and if SEOmoz can find, so can Google.

I still think you should use the rel=canonical attribute just to be safe. Again, I'm not that great at technical stuff. Sorry I couldn't be of more help here.

Tim

AGIAutomation

Hi Tim,

Thanks for your responses. This is what the IT team has found. Let me know your thoughts:

On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.

To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Lin...). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.

What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.

So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."

AGIAutomation

Hi Keri and Tim,

Thanks for your responses. This is what the IT team has found. Let me know your thoughts:

On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.

To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Linear-EscapeMents.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.

What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.

So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."

TimKelsey

Excellent points, Keri. I hadn't thought about either of those issues. Using a redirect is definitely the best way to go.

KeriMorgret

I'd vote for doing the rewrite to the lowercase version. This gives you a couple of added benefits:

If people copy and paste the URL from their browser then link to it, you're getting all the links going to the same place.
Your analytics based on your URLs will be more accurate. Instead of seeing:

urla.htm 70 visits
urlb.htm 60 visits
urlB.htm 30 visits

You'll see

urlb.htm 90 visits
urla.htm 70 visits

TimKelsey

The problem is that search engines view these URLs as two separate pages, so both pages get indexed and you run into duplication issues.

Yes, using rel=canonical is a good way to handle this. I would suggest using the lowercase version as your canonical page, so you would place this bit of HTML on both pages:

The other option is to create a 301 redirect from the caps version to the lowercase version. This would ensure that anyone arriving at the page (including search engine bots) would end up being directed to the lowercase version.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Why do I get duplicate pages, website referencing the capital version of the url vs the lowercase www.agi-automation.com/Pneumatic-grippers.htm

Browse Questions

Explore more categories

Related Questions

Worth redirecting non-www to www due to higher page authority with www?

Http://newsite.intercallsystems.com/vista-series/[email protected]

SSL, www issue. Should we buy WWW license or just add redirect from www to non-www site?

New Page Showing Up On My Reports w/o Page Title, Words, etc - However, I didn't create it

Results pages are not getting pagerank

My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.

How do I get rid of duplicate content

Duplicate Page Content