Lately I have noticed Google indexing many files on the site without the .html extension
-
Hello,
Our site, while we convert, remains in HTML 4.0.
Fle names such as http://www.sample.com/samples/index.shtml are being picked up in the SERPS as http://www.sample.com/samples/ even when I use the "rel="canonical" tag and specify the full file name therein as recommended. The link to the truncated URL (http://www.sample.com/samples/) results in what MOZ shows as fewer incoming links than the full file name is shown as having incoming.
I am not sure if this is causing a loss in placement (the MOZ stats are showing a decline of late), which I have seen recently (of course, I am aware of other possible reasons, such as not being in HTML5 yet).
Any help with this would be great.
Thank you in advance
-
Can you clarify what you're concerned about for 301 redirects in terms of link juice?
301 redirects don't carry as much link juice as a direct link, but it doesn't impact correct links, just the links that, otherwise, wouldn't get link juice to your end destination at all. (Though, if your canonical is working correctly, it'll pass the same amount of link juice as a 301 redirect.)
Dr. Pete goes into this a bit more over here: https://mza.bundledseo.com/community/q/do-canonical-tags-pass-all-of-the-link-juice-onto-the-url-they-point-to
-
Many thanks for taking the time to respond Kristina.
-
I don't like to do redirects, as so many have warned of the consequences in terms of link juice
-
No, I don't link to the pages in question using "/" rather than the ".shtml" version of the page indexed.
-
A few external sources use the "/" version (recent linkers) I have found, but they likely only did so as they saw it displayed as such in the SERPs previously. No commercial or other affiliate sites do.
The reason I was really confused is that some pages are indexed using the "/", while others are not -- with no apparent reason I could locate. The "/" version for pages still remains on the first page for keywords, even with far less domain authorities and pages linking to them (for now!). We will be moving to another platform with a different default extension, so I wonder how that will be handled. Endless mysteries.
Thank you again for your time and suggestions,
Greg
-
-
Hmm, that doesn't seem good. It's hard to say whether this is causing the decline in your rankings, but either way, you want to make sure that you're not splitting your link equity between your / and .shtml pages. Here's what I'd do:
- If you can, 301 redirect / pages to .shtml pages. Obviously, it'd be easier if the canonical worked, but it sounds like it doesn't.
- Use ScreamingFrog or DeepCrawl to look through internal pages on your site to see if you're ever linking to the / version of pages rather than the .shtml pages. When Google chooses a different version of a URL over the canonical one, it's often because that's how it sees internal links pointing to the page. Make sure that you only have links to the .shtml version of the page.
- Use a tool like Moz or Ahrefs to find all internal links to your site. For any links that you built or have a partnership with the owners, make sure that they're linking to the .shtml version of the page. I could especially see your ad partners using / because it's a cleaner before parameters than .shtml.
After that, wait and see if Google fixes the problem.
Also worth noting: have you thought about changing your default to /? That's more common today, so you're probably getting a lot of external links with / instead of .shtml, and you'll never be able to fix that problem. If that's a possible solution, you may want to explore it.
Good luck!
Kristina
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why did Google cache & index a different domain than my own?
We own www.homemenorca.com, a real estate website based in Spain. Pages from this domain are not being indexed: https://www.google.com/search?q=site%3Awww.homemenorca.com&oq=site%3Awww.homemenorca.com&aqs=chrome..69i57j69i58j69i59l2.3504j0j7&sourceid=chrome&ie=UTF-8Please notice that the URLs are Home Menorca, but the titles are not Home Menorca, they are Fincas Mantolan, a completely different domain and company: http://www.fincasmantolan.com/. Furthermore, when we look at Google's cache of Home Menorca, we see a different website: http://webcache.googleusercontent.com/search?q=cache%3Awww.homemenorca.com%2Fen&oq=cache%3Awww.homemenorca.com%2Fen&aqs=chrome..69i57j69i58j69i59.1311j0j4&sourceid=chrome&ie=UTF-8We reviewed Google Search Console, Google Fetch, the canonical tags, the XML sitemap, and many more items. Google Search Console accepted our XML sitemap, but is only indexing 5-10% of the pages. Google is fetching and rendering the pages properly. However, we are not seeing the correct content being indexed in Google. We have seen issues with page loading times, loading content longer than 4 seconds, but are unsure why Google would be indexing a different domain.If you have suggestions or thoughts, we would very much appreciate it.Additional Language Issue:When a user searches "Home Menorca" from America or the UK with "English" selected in their browser as their default language, they are given a Spanish result. It seems to have accurate hreflang annotations within the head section on the HTML pages, but it is not working properly. Furthermore, Fincas Mantolan's search result is listed immediately below Home Menorca's Spanish result. We believe that if we fix the issue above, we will also fix the language issue. Please let us know any thoughts or recommendations that can help us. Thank you very much!
Intermediate & Advanced SEO | | CassG12340 -
In the google index but search redirects to homepage
Hi everyone, thanks for reading i have a website "www.gardeners.scot" and have the following pages listed in google site: command http://www.gardeners.scot/garden-landscaping-Edinburgh.htm & http://www.gardeners.scot/garden-maintenance-Edinburgh.htm however when a user searches for "garden landscaping Edinburgh" or "garden maintenance Edinburgh" we are in the rankings but google search links these phrases to the home page not to their targeted pages. the site is about a year old have checked the robots.txt, sitemap.xml & .htaccess files but can see anything wrong there. any ideas out there?
Intermediate & Advanced SEO | | livingphilosophy0 -
Question spam malware causing many indexed pages
Hey Mozzers, I was speaking with a friend today about a site that he has been working on that was infected when he began working on it. Here (https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site:themeliorist.ca) you can see that the site has 4400 indexed pages, but if you scroll down you will see some pages such as /pfizer-viagra-samples/ or /dapoxetine-kentucky/. All of these pages are returning 404 errors, and I ran it through SEO spider just to see if any of these pages would show up, and they don't. This is not an issue for a client, but I am just curious why these pages are still hanging around in the index. Maybe others have experience this issue too. Cheers,
Intermediate & Advanced SEO | | evan890 -
How to make sure dev site is not index in wordpress and how would it be affected?
hi guys! I'm currently having a dev version of my site (dev.website.com) that once everything is done i would move the dev to the public domain (website.com). But since is a total duplicate content of my real site would it affect the seo? if so, i tried setting the reading privacy in wordpress so google would not index it but im afraid when i live it in the future and revert the setting back to normal it would affect the site seo. any opinion and suggestion on this?
Intermediate & Advanced SEO | | andrewwatson920 -
How do I know what pages of my site is not inedexed by google ?
Hi I my Google webmaster tools under Crawl->sitemaps it shows 1117 pages submitted but 619 has been indexed. Is there any way I can fined which pages are not indexed and why? it has been like this for a while. I also have a manual action (partial) message. "Unnatural links to your site--impacts links" and under affects says "Some incoming links" is that the reason Google does not index some of my pages? Thank you Sina
Intermediate & Advanced SEO | | SinaKashani0 -
Why are bit.ly links being indexed and ranked by Google?
I did a quick search for "site:bit.ly" and it returns more than 10 million results. Given that bit.ly links are 301 redirects, why are they being indexed in Google and ranked according to their destination? I'm working on a similar project to bit.ly and I want to make sure I don't run into the same problem.
Intermediate & Advanced SEO | | JDatSB1 -
Language Subdirectory homepage not indexed by Google
Hi mozzers, Our Spanish homepage doesn't seem to be indexed or cached in Google, despite being online for over a month or two. All Spanish subpages are indexed and have started to rank but not the homepage. I have submitted sitemap xml to GWTools and have checked there's no noindex on the page - it seems to be in order. And when I run site: command in Google it shows all pages except homepage. What could be the problem? Here's the page: http://www.bosphorusyacht.com/es/
Intermediate & Advanced SEO | | emerald0 -
Stop Google crawling a site at set times
Hi All I know I can use robots.txt to block Google from pages on my site but is there a way to stop Google crawling my site at set times of the day? Or to request that they crawl at other times? Thanks Sean
Intermediate & Advanced SEO | | ske110