Internal file extension canonicalization
-
Ok no doubt this is straightforward, however seem to be finding to hard to find a simple answer; our websites' internal pages have the extension .html. Trying to the navigate to that internal url without the .html extension results in a 404.
The question is; should a 401 be used to direct to the extension-less url to future proof? and should internal links direct to the extension-less url for the same reason?
Hopefully that makes sense and apologies for what I believe is a straightforward answer;
-
As above
example/abc rewrites to example/abc.html
example/abc.html redirects to example/abc
and all internal links link to example/abc
-
Thankyou for the replies.
I will try and clarify what I am trying to get at; apologies in advance for any naivety.
I understand homepage canonicalization; the confusion revolves around how this applies to internal pages.
Logically; I am struggling to see how internal pages are any different to a homepage in terms of the need to avoid multiple urls....and thus an extension-less url seemed appropriate. Not too mention the benefit or cleaner urls, easier to link to, remember etc.
i.e.
example/abc
example/abc.html
example/abc.index.html
-
As nick said, you dont need to do this, but if you are.
1. REWRITE the new url to the old url, as your webserver needs to know the extention
2. REDIRECT the old url to the new one, incase you already have links to the old urls, you dont want5 duplicate content
3. you need to make surer that all internal links point to the new url, you dont want un-necessary redirects as they leak link juice.
-
I'm about to make a whole lot of assumptions about your website to give this answer, just be aware.
Your website is built static, using HTML. Hence the .html file extension. If you're seeing websites that don't have file extension, it's most likely they are using content management systems (or have some serious /folder/index.html stuff going on).
Having a file extension like .html or .aspx or .php is not a bad thing. On websites like yours, it is required (unless you do the above subfolder thing) because it's an actual file the browser is grabbing rather than something being dynamically generated by a CMS. It has nothing to do with future-proofing.
As for 301'ing non-extension URLs to extention'd ones...well I don't know why you'd need to do that for your type of site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO Elements for Canonicalized URLs?
I am helping a client with SEO for their ecommerce store. They have around 65 products, but close to 500 URLs all for those same products. Basically they have a bit of an index bloat problem but long story short restructuring was too much of a lift for them, so I got them set up with a canonical URL strategy to ensure all duplicate pages point to one "main" product page. Getting to the point: They also need an on-page overhaul. I've created keyword optimized titles, metas, H1s, etc. for each product. Do these elements need to be added to every single one of the 450+ product pages OR do I only need to add them to the 65 "main" product pages that everything else is pointing to, since that will ultimately be the page that gets crawled?
Technical SEO | | AJTSEO0 -
How can I avoid too many internal links in my site navigation?
Hi! I always get this notification on my pages 'Avoid Too Many Internal Links' when I run the Page Optimization Score. And this is the message I get how to fix it: Scale down the number of internal links on your page to fewer than 100, if possible. At a minimum, try to keep navigation and menu links to fewer than 100. On my website I got a desktop navigation menu and a mobile variant, so in the source this will show more internal links. If I hide those links with CSS for the view, is the problem then solved? So Does Google then see less internal links? Or does Google crawl everything? I'm curious how I can fix this double internal links issue with my navigation menu.
Technical SEO | | Tomvl
What are you guys ideas / experiences about this?0 -
Optimizing internal links or over-optimizing?
For a while I hated the look of the internal links page of Google Web Master Tools account for a certain site. With a total of 120+K pages, the top internal link was the one pointing to "FAQ". With around 1M links. That was due to the fact, on every single page, both the header and the footer where presenting 5 links to the most popular questions. The traffic of those FAQ pages is non-existent, the anchor text is not SEO interesting, and theoretically 1M useless internal links is detrimental for page juice flow. So I removed them. Replacing the anchor with javascript to keep the functionality. I actually left only 1 “pure” link to the FAQ page in the footer (site wide). And overnight, the internal links page of that GWT account disappeared. Blank, no links. Now... Mhhh... I feel like... Ops! Yes I am getting paranoid at the idea the sudden disappearance of 1M internal links was not appreciated by google bot. Anyone had similar experience? Could this be seen by google bot as over-optimizing and be penalized? Did I possibly triggered a manual review of the website removing 1M internal links? I remember Matt Cutts saying adding or removing 1M pages (pages) would trigger a flag at google spam team and lead to a manual review, but 1M internal links? Any idea?
Technical SEO | | max.favilli0 -
Product page Canonicalization best practice
I'm getting duplicate content errors in GWT for product list pages that look like this: -www.example.com/category-page/product
Technical SEO | | IceIcebaby
-www.example.com/category-page/product/?p=2 The "p=2" example already has a rel=canonical in place, " Shouldn't the non-canonical pages be using the canonical attribute for the first page rather than the additional product pages? Thanks!0 -
How to do ip canonicalization ?
Hi , my website is opening with IP too. i think its duplicate content for google...only home page is opening with ip, no other pages, how can i fix it?, might be using .htaccess i am able to do...but don't know proper code for this...this website is on wordpress platform... Thanks Ramesh
Technical SEO | | unibiz0 -
Internal Blog - Embed Categorized RSS Feeds into Site Web Pages
I am thinking about additional ways to repurpose blog posts through out my website. I have a blog - http://www.domainname.com/blog I would like to use the blog categories, which are aligned with the site structure, and create on-page RSS Feeds for my regular web pages. Anything here that might not be good for SEO? Thank you
Technical SEO | | evereffect0 -
Internal Link Analysis Tool
I want to get a better handle on what internal link text (and co-occurance if possible) my site currently has. We have a lot of old blog articles that provide link juice back to the main site, but with thousands of pages, we never kept track of when we internally link to a page. Are there any tools that will provide an analysis of this? OpenSiteExplorer seems like a very tedious way to do it and it didn't appear to be 100% accurate. Also, are there any tools that will provide analysis and recommendations based on keywords targeted?
Technical SEO | | TheDude0 -
Is having no robots.txt file the same as having one and allowing all agents?
The site I am working on currently has no robots.txt file. However, I have just uploaded a sitemap and would like to point the robots.txt file to it. Once I upload the robots.txt file, if I allow access to all agents, is this the same as when the site had no robots.txt file at all; do I need to specify crawler access on can the robots.txt file just contain the link to the sitemap?
Technical SEO | | pugh0