Canonical and Sitemap issue
-
Hi all,
I was told that I could change my homepage Canonical tag to match that of my XML sitemap, this sitemap is being generated for me automatically and shows the homepage as e.g. https://www.mysite.com/index.html, yet my Canonical tag has been set to https://www.mysite.com.
Google currently shows as https://www.mysite.com/ being indexed, but https://www.mysite.com/index.html is not currently displayed in search results.
Can someone please tell me if I should change the Canonical to the index.html version, or if I should do nothing, or remove the Canonical tag altogether?
Thank you for looking.
-
I agree with the others. Given "https://www.mysite.com/index.html is not currently displayed in search results", in all likelihood it is being redirected to https://www.mysite.com (and should be). So you don't want to change the canonical to the index.html version of the page only to have it redirected back to https://www.mysite.com. It'll unnecessarily slow the site and might even create a loop.
-
Thank you both, I'll leave it as it is, I'm not able to edit the XML my side sadly.
-
Yes, that's a good point. Canonicals are suggestions for Google, not commands.
-
I see your point, and don't worry about it. Sitemaps help Google find all of your pages and can provide certain other information, but they are not required so no need to overthink them. In general Google is pretty good at finding what it needs to find. And it will certainly find your homepage.
-
I agree with Linda here, I would leave the canonical tag as is. It is a cleaner, better looking URL for the SERPs. If anything, manually update the XML file to reflect the canonical version of the homepage. The main purpose of the XML sitemap is to help search engines crawl and index a website. The homepage is going to be the most frequently crawled page so Google will not have a problem finding it.
Also, do not worry about Google disliking the canonical pointing to .com instead of /index.html. If Google determines that is not the ideal URL for it's index it will ignore the canonical tag.
-
Hi,
Thanks, basically I was concerned that Google may not like that https://www.mysite.com/ was not in the sitemap, yet index.html was and the canonical was pointing to https://www.mysite.com.
If that makes any sense....
-
What are you trying to achieve? Do you particularly want the index.html version to be the canonical? The https://www.mysite.com/ version is more straightforward and what most people would expect your homepage URL to be.
Unless there is some pressing reason to do otherwise, I'd leave it the way it is.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is the User Sitemap dead?
There's a discussion going on in our office about sitemaps. I thought it'd be a good idea to get the thoughts of the Moz community in on it, too. What are your thoughts? is the User Sitemap still an effective tool to utilize?
Web Design | | TaylorRHawkins1 -
How can I fix New 4XX Issue on Site Crawl?
Hi all, My recent site crawl shows 27 4xx issues on this website http://www.rrbusinessconsultants.com/ All of them are for 'posts' on this wordpress website. Here is an example of the issue: http://www.rrbusinessconsultants.com/rr-business-consultants-on-the-rise-of-glassdoor-and-how-companies-are-coping/void(null) The blog page seems to be creating links ending in void(null) which are defaulting to 404 pages. I cannot see the links on the site so cannot see how to remove them. Can anyone provide any insight into how to correct his issue? Many thanks in advance.
Web Design | | skehoe0 -
Redirects Not Working / Issue with Duplicate Page Titles
Hi all We are being penalised on Webmaster Tools and Crawl Diagnostics for duplicate page titles and I'm not sure how to fix it.We recently switched from HTTP to HTTPS, but when we first switched over, we accidentally set a permanent redirect from HTTPS to HTTP for a week or so(!).We now have a permanent redirect going the other way, HTTP to HTTPS, and we also have canonical tags in place to redirect to HTTPS.Unfortunately, it seems that because of this short time with the permanent redirect the wrong way round, Google is confused as sees our http and https sites as duplicate content.Is there any way to get Google to recognise this new (correct) permanent redirect and completely forget the old (incorrect) one?Any ideas welcome!
Web Design | | HireSpace0 -
Wordpress Custom Permalinks Plugin Issue
Hello, I installed this plugin and I'm having trouble making it work. I'm new to WordPress so was hoping someone can help me out. I'm looking to set up the following pages for example
Web Design | | ec1976
http://testsite.azurewebsites.net/Services/Services1
http://testsite.azurewebsites.net/Services/Services2 I do the following steps Add New Page Enter the following title "Services1" Enter the following text in the Permalink textbox "Services/Services1" replacing the default text "?page_id=140" Then I publish the page. This automatically updates the "Custom Fields" section below. The name is set to "custom_permalink" and the value is set to "Services/Services1". Then I "view page" but get a 404 error. The url is "http://testsite.azurewebsites.net/Services/Services1" Some additional info 1. WordPress version 9.8.3 Settings - Permalinks is set to "post name" Custom Permalinks version 0.7.18 Any help would be greatly appreciated. Thanks0 -
Pagenation - Crawl Issue
Hi,
Web Design | | semvibe
We have a site with large number of products (6000 +) under each categories and so we have made a page under each category to list out all products (View all page), which lists out product in pagenation setup built on Ajax. The problem is only our 1st page is crawlable and all the other pages beyond 1st page remains hidden,
We need make all our pagenation URL’s crawlable, our requirements are we never want a change in URL as user goes to next page, want to show the user the same URL for all the pagenation numbers. Is there a perfect solution?0 -
Why is there no sitemap.xml for SEOmoz?
I noticed that SEOmoz does not have a "root" sitemap called sitemap.xml. On the other hand, there do appear to be sitemaps for various sections of the such as http://www.seomoz.org/blog-sitemap.xml I was planning on having a root level sitemap that referenced difference sections of my site (blog, support, etc.) but I'm a little concerned that this site itself doesn't seem to be following that practice. Presumably this website is submitting the individual section maps to Google directly since they aren't linkable through sitemap.xml?
Web Design | | schof0 -
Canonical url with pagination
I would like to find out what is the standard approach for sections of the site with large number of records being displayed using pagination. They don't really contain the same content, but if title tag isn't changed it seem to process it as duplicate content where the parameter in the url indicating the next page is used. For the time being I've added ' : Page 1' etc. at the end of the title tag for each separate page with the results, but is there a better way of doing it? Should I use the canonical url here pointing to the main page before pagination shows up in the url?
Web Design | | coremediadesign0 -
Google search issue with exact domain
We had a site from Feb-2011 to Nov-2011 at the domain amcoexterminating.com. The site was pure HTML/CSS and the daily unique visitors steadily increased over that time. So all was fine. We then moved the site to a CMS (Joomla) on Dec. 6th. From that day forward, the daily visitors went into the tank. Before the move, if you typed "amcoexterminating.com" or "amco exterminating" into Google search, the site would be the first result (as you'd expect since those are the words that make up the actua domain). But we tried this yesterday and the site did not come up at all. NOT GOOD. It would work in Yahoo or Bing, but not in Google. So obviously, the problem with Google search directly affected the daily visitors. We just checked Webmaster tools yesterday (yes, this should have been done sooner, lesson learned) and it said "Site has severe health issues - Important page blocked by robots.txt". It listed the "important" page URL and it was just a link to an image. Regardless, I wiped out the Joomla created robots.txt file and added a new one and made it just say... User-agent: *Allow: / About 14 hours later, after the new robots.txt file was recognized by Google, the "severe health" message went away. However if I search in Google for "amcoexterminating.com", it still doesn't show up and the client is concerned (as they should be). Do you think the search engines just need more time to refresh? If so, once it refreshes, should the site show up first again right away? Or is it possible the robots.txt file had nothing to do with the issue? If so, what other things could I check into that might cause Google search to not find a site even if you search for exact domain name? Please share any and all things I should look into as I need to get this site showing in Google search again (as it was before moving to the CMS). Thanks!
Web Design | | MarathonMS0