Lately I have noticed Google indexing many files on the site without the .html extension
-
Hello,
Our site, while we convert, remains in HTML 4.0.
Fle names such as http://www.sample.com/samples/index.shtml are being picked up in the SERPS as http://www.sample.com/samples/ even when I use the "rel="canonical" tag and specify the full file name therein as recommended. The link to the truncated URL (http://www.sample.com/samples/) results in what MOZ shows as fewer incoming links than the full file name is shown as having incoming.
I am not sure if this is causing a loss in placement (the MOZ stats are showing a decline of late), which I have seen recently (of course, I am aware of other possible reasons, such as not being in HTML5 yet).
Any help with this would be great.
Thank you in advance
-
Can you clarify what you're concerned about for 301 redirects in terms of link juice?
301 redirects don't carry as much link juice as a direct link, but it doesn't impact correct links, just the links that, otherwise, wouldn't get link juice to your end destination at all. (Though, if your canonical is working correctly, it'll pass the same amount of link juice as a 301 redirect.)
Dr. Pete goes into this a bit more over here: https://mza.bundledseo.com/community/q/do-canonical-tags-pass-all-of-the-link-juice-onto-the-url-they-point-to
-
Many thanks for taking the time to respond Kristina.
-
I don't like to do redirects, as so many have warned of the consequences in terms of link juice
-
No, I don't link to the pages in question using "/" rather than the ".shtml" version of the page indexed.
-
A few external sources use the "/" version (recent linkers) I have found, but they likely only did so as they saw it displayed as such in the SERPs previously. No commercial or other affiliate sites do.
The reason I was really confused is that some pages are indexed using the "/", while others are not -- with no apparent reason I could locate. The "/" version for pages still remains on the first page for keywords, even with far less domain authorities and pages linking to them (for now!). We will be moving to another platform with a different default extension, so I wonder how that will be handled. Endless mysteries.
Thank you again for your time and suggestions,
Greg
-
-
Hmm, that doesn't seem good. It's hard to say whether this is causing the decline in your rankings, but either way, you want to make sure that you're not splitting your link equity between your / and .shtml pages. Here's what I'd do:
- If you can, 301 redirect / pages to .shtml pages. Obviously, it'd be easier if the canonical worked, but it sounds like it doesn't.
- Use ScreamingFrog or DeepCrawl to look through internal pages on your site to see if you're ever linking to the / version of pages rather than the .shtml pages. When Google chooses a different version of a URL over the canonical one, it's often because that's how it sees internal links pointing to the page. Make sure that you only have links to the .shtml version of the page.
- Use a tool like Moz or Ahrefs to find all internal links to your site. For any links that you built or have a partnership with the owners, make sure that they're linking to the .shtml version of the page. I could especially see your ad partners using / because it's a cleaner before parameters than .shtml.
After that, wait and see if Google fixes the problem.
Also worth noting: have you thought about changing your default to /? That's more common today, so you're probably getting a lot of external links with / instead of .shtml, and you'll never be able to fix that problem. If that's a possible solution, you may want to explore it.
Good luck!
Kristina
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New Site (redesign) Launched Without 301 Redirects to New Pages - Too Late to Add Redirects?
We recently launched a redesign/redevelopment of a site but failed to put 301 redirects in place for the old URL's. It's been about 2 months. Is it too late to even bother worrying about it at this point? The site has seen a notable decrease in site traffic/visits, perhaps due to this issue. I assume that once the search engines get an error on a URL, it will remove it from displaying in search results after a period of time. I'm just not sure if they will try to re-crawl those old URLs at some point and if so, it may be worth it to have those 301 redirects in place. Thank you.
Intermediate & Advanced SEO | | BrandBuilder0 -
Can you no index a page in Wordpress from just Google news?
I'm trying to find a plugin for Wordpress that enables you to no-index an individual page from Google news but not from Google search results. We want to remove some of our pages from Google news without hurting others.
Intermediate & Advanced SEO | | uSw0 -
When does Google index a fetched page?
I have seen where it will index on of my pages within 5 minutes of fetching, but have also read that it can take a day. I'm on day #2 and it appears that it has still not re-indexed 15 pages that I fetched. I changed the meta-description in all of them, and added content to nearly all of them, but none of those changes are showing when I do a site:www.site/page I'm trying to test changes in this manner, so it is important for me to know WHEN a fetched page has been indexed, or at least IF it has. How can I tell what is going on?
Intermediate & Advanced SEO | | friendoffood0 -
Best practice for removing indexed internal search pages from Google?
Hi Mozzers I know that it’s best practice to block Google from indexing internal search pages, but what’s best practice when “the damage is done”? I have a project where a substantial part of our visitors and income lands on an internal search page, because Google has indexed them (about 3 %). I would like to block Google from indexing the search pages via the meta noindex,follow tag because: Google Guidelines: “Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.” http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 Bad user experience The search pages are (probably) stealing rankings from our real landing pages Webmaster Notification: “Googlebot found an extremely high number of URLs on your site” with links to our internal search results I want to use the meta tag to keep the link juice flowing. Do you recommend using the robots.txt instead? If yes, why? Should we just go dark on the internal search pages, or how shall we proceed with blocking them? I’m looking forward to your answer! Edit: Google have currently indexed several million of our internal search pages.
Intermediate & Advanced SEO | | HrThomsen0 -
Magneto site with many pages
just finsihed scan to a magento site. off course I am getting thousand of pages that are dynamic. search pages and other. checking with site command on Google I see 154,000 results which pages it is recommended to block? some people are talking about blocking the search pages and some actually talking about allowing them? any answer on this? Thanks
Intermediate & Advanced SEO | | ciznerguy0 -
Website is not getting indexed in Google! Not sure why?
I just came up with my new blog, its not live yet but the 1<sup>st</sup> landing page is ready, up and running… all is fine but here is the only problem is its not getting indexed in Google and I am not really sure why? .xml sitemap is there Google webmaster and analytics are there Website contain at least that much real social shares that it should get indexed in Google Few Links may be coming from Famous Bloggers and SEOmoz (both sites are very authentic in their respective domains) It’s the 4 day the website is up I don’t think website is not getting indexed in Google just because it contains 1 landing page and a thank you page! Any clue or help will be appreciated. www.setalks.com is the domain
Intermediate & Advanced SEO | | MoosaHemani0 -
In mobile searches, does Google recognize HTML5 sites as mobile sites?
Does Google recognize HTML5 sites using responsive design as mobile sites? I know that for mobile searches, Google promotes results on mobile sites. I'm trying to determine if my site, created in HTML5 with responsive design falls into that category. Any insights on the topic would be very helpful.
Intermediate & Advanced SEO | | BostonWright0 -
How to link back to our main site from landing pages without getting penalized
I work for a small family insurance agency in CA and I am trying to learn how to compete in this extremely competitive industry. One of the ideas we had was to purchase all the long-tail keyword urls we could and use them as landing pages to direct traffic back to our primary site. (ex. autoinsurancecity.com). Our thought was that we could put landing pages on each that looked almost identical to the main page and use the navigation in the landing pages as links to direct traffic to the applicable category pages on the main site. (Ex. autoinsurancecity.com -> mainpage.com/auto-insurance). My concern is that I want to make sure we don't tick off Google. Implementing this strategy would result in each of the category pages getting lots of links from the landing page navigation very quickly. I don't think the links will be worth much from an SEO perspective, but I don't want them to look like spam either. Any suggestions on if this sort of tactic would put us at risk of being penalized? If so, does anyone have any suggestions on a better way to implement a strategy like this? Thank you in advance for the help! I'm totally new to this and any advice goes a long way!
Intermediate & Advanced SEO | | matthewbyers0