Sitemap url's not being indexed
-
There is an issue on one of our sites regarding many of the sitemap url's not being indexed. (at least 70% is not being indexed)
The url's in the sitemap are normal url's without any strange characters attached to them, but after looking into it, it seems a lot of the url's get a #. + a number sequence attached to them once you actually go to that url. We are not sure if the "addthis" bookmark could cause this, or if it's another script doing it.
For example
Url in the sitemap: http://example.com/example-category/0246
Url once you actually go to that link: http://example.com/example-category/0246#.VR5a
Just for further information, the XML file does not have any style information associated with it and is in it's most basic form.
Has anyone had similar issues with their sitemap not being indexed properly ?...Could this be the cause of many of these url's not being indexed ?
Thanks all for your help.
-
Anders,
Thanks for the reply. I definitely agree a self referring canonical might just be a good extra addition on these product pages, so I'm definitely adding that to our list of to do's if it does not improve.
In terms of indexing pages - We have not restricted crawl frequency, we have it set to "allow google to determine the optimal crawl rate". No other warnings found within the search console either.
Thanks for your help.
-
I agree - i probably would ignore everything after the "#".
But have you tried added a <link rel="canonical" href="http://example.com/page-url" /> to your pages and see if this will update it? Also: Add the sitemap to your robots.txt if not allready done.
Regarding indexing pages - have you restricted crawl frequency in Google Search Console, or is it set to be determined by GoogleBot? Any other warnings or messages in Search Console?
Best regards,
Anders -
Lesley,
Thanks for the confirmation on that one and the article. Since it doesn't seem like a lot of people on the site are using that address share function, I do not think it would do any harm to remove it.
At least we know the root cause of why it's doing it to the url's. Now the real question is...could it be getting in the way of indexing those url's ?...one would think not, as from what I've read, google would simply ignore what comes after the #.
Thoughts ?
Appreciate the help.
-
Patrick,
We'd prefer to keep the actual url's private, however I can provide further information to help hopefully allow the community to dissect this further:
- It's an E-commerce website, meaning many facets, filters, and possible duplicate content angles
- It seems many of the static pages (/products main page, /contact,etc) are indexed, however it seems the individual products are mostly not being indexed through the sitemap
- While the url's found in webmaster tools under "index" has also steadily been going down, it definitely doesn't correspond with the lack of pages indexed vs submitted within the sitemap
- We have checked robots.txt, and it is not blocking any important pages. (I also had them allow robots to crawl css and js so google could have full access)
- The individual product pages all have the "addthis" feature, meaning they all have a #. + number sequence added to the url's. However one would think this wouldn't be the cause of this lack of indexation ?
Thanks for your help.
-
Yes, add this is doing this to your url. I hate it, that is one reason why I do not use them.
Here is an article on how to remove them, http://support.addthis.com/customer/portal/articles/1013558-removing-all-hashtags-anchors-weird-codes-from-your-urls
-
Hi there
Could you provide you website's URL? It would help the community take a deeper look - thanks!
Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My Website's Home Page is Missing on Google SERP
Hi All, I have a WordPress website which has about 10-12 pages in total. When I search for the brand name on Google Search, the home page URL isn't appearing on the result pages while the rest of the pages are appearing. There're no issues with the canonicalization or meta titles/descriptions as such. What could possibly the reason behind this aberration? Looking forward to your advice! Cheers
Technical SEO | | ugorayan0 -
Sitemap generator partially finding list of website URLs
Hi everyone, When creating my XML sitemap here it is only able to detect a portion of the website. I am missing at least 20 URLs (blog pages + newly created resource pages). I have checked those missing URLs and all of them are index and they're not blocked by the robots.txt. Any idea why this is happening? I need to make sure all wanted URLs to be generated in an XML sitemap. Thanks!
Technical SEO | | Taysir0 -
What's with the redirects?
Hi there,
Technical SEO | | HeadStud
I have a strange issue where pages are redirecting to the homepage.Let me explain - my website is http://thedj.com.au Now when I type in www.thedj.com.au/payments it redirects to https://thedj.com.au (even though it should be going to the page https://thedj.com.au/payments). Any idea why this is and how to fix? My htaccess file is below: BEGIN HTTPS Redirection Plugin <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteRule ^home.htm$ https://thedj.com.au/ [R=301,L]
RewriteRule ^photos.htm$ http://photos.thedj.com.au/ [R=301,L]
RewriteRule ^contacts.htm$ https://thedj.com.au/contact-us/ [R=301,L]
RewriteRule ^booking.htm$ https://thedj.com.au/book-dj/ [R=301,L]
RewriteRule ^downloads.htm$ https://thedj.com.au/downloads/ [R=301,L]
RewriteRule ^payonline.htm$ https://thedj.com.au/payments/ [R=301,L]
RewriteRule ^price.htm$ https://thedj.com.au/pricing/ [R=301,L]
RewriteRule ^questions.htm$ https://thedj.com.au/faq/ [R=301,L]
RewriteRule ^links.htm$ https://thedj.com.au/links/ [R=301,L]
RewriteRule ^thankyous/index.htm$ https://thedj.com.au/testimonials/ [R=301,L]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://thedj.com.au/ [L,R=301]</ifmodule> END HTTPS Redirection Plugin BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress RewriteCond %{HTTP_HOST} ^mrdj.net.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.mrdj.net.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^mrdj.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.mrdj.com.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^thedjs.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.thedjs.com.au$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^theperthweddingdjs.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.theperthweddingdjs.com$
RewriteRule ^/?$ "https://thedj.com.au/" [R=301,L] RewriteCond %{HTTP_HOST} ^thedjs.net.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.thedjs.net.au$
RewriteRule ^/?$ "https://thedj.com.au" [R=301,L]0 -
Drupal's Yoast
Hi. I'm wondering if anyone knows of an equivalent to Yoast for Drupal sites? Is there such a thing? I've been asked whether I could optimize a Drupal site and am wondering if the guiding principles and techniques I use for HTML and Wordpress sites can be easily transferred to a Drupal implementation, or whether I might be setting myself (and the client!) up for failure. Any observations or advice would be appreciated.
Technical SEO | | DonnaDuncan0 -
Why are URLs like www.site.com/#something being indexed?
So, everything after a hash (#) is not supposed to be crawled and indexed. Has that changed? I see a clients site with all sorts of URLs indexed like ... http://www.website.com/#!category/c11f For the above URL, I thought it was the same as simply http://www.website.com/. But they aren't, they're getting indexed and all the content on the pages with these hash tags are getting crawled as well. Thanks!
Technical SEO | | wiredseo0 -
When do you use 'Fetch as a Google'' on Google Webmaster?
Hi, I was wondering when and how often do you use 'Fetch as a Google'' on Google Webmaster and do you submit individual pages or main URL only? I've googled it but i got confused more. I appreciate if you could help. Thanks
Technical SEO | | Rubix1 -
If a page isn't linked to or directly sumitted to a search engine can it get indexed?
Hey Guys, I'm curious if there are ways a page can get indexed even if the page isn't linked to or hasn't been submitted to a search engine. To my knowledge the following page on our website is not linked to and we definitely didn't submit it to Google - but it's currently indexed: <cite>takelessons.com/admin.php/adminJobPosition/corp</cite> Anyone have any ideas as to why or how this could have happened? Hopefully I'm missing something obvious 🙂 Thanks, Jon
Technical SEO | | TakeLessons0 -
Issue with Joomla Site not showing in SERP's
Site: simpsonelectricnc dot com I'm working on a Joomla website for a local business that isn't ranking at all for any relevant keyword - including the business name. The site is only about six months old and has relatively few links. I realize it takes time to compete for even low-volume keywords, but I think something else may be preventing the site from showing up. The site is not blocked by Robots.txt (which includes a valid reference to the sitemap)
Technical SEO | | CGR-Creative
There is no duplicate content issue, the .htaccess is redirecting all non-www traffic to www version
Every page has a unique title and H1 tag.
The URL's are search-engine friendly (not dynamic either)
XML sitemap is live and submitted to Google WM Tools. Google shows that it is indexing about 70% of the submitted URL's. The site has essentially no domain authority (0.02) according to Moz - I'm assuming this is due to lack of links and short life on the web.
Until today, 98% of the pages had identical meta descriptions. Again, I realize six months is not an eternity - but the site will not even show up for "business name + city,state" searches in Google. In fact, the only way I've seen it in organic results is to search for the exact URL. I would greatly appreciate any help.0