Sitemap generator partially finding list of website URLs
-
Hi everyone,
When creating my XML sitemap here it is only able to detect a portion of the website. I am missing at least 20 URLs (blog pages + newly created resource pages). I have checked those missing URLs and all of them are index and they're not blocked by the robots.txt.
Any idea why this is happening? I need to make sure all wanted URLs to be generated in an XML sitemap.
Thanks!
-
Gaston,
Interestingly enough by default the generator only located only half of the URLs. I hope that one of those 2 fields will do the trick.
-
Hi Taysir,
I´ve never used that service. I suspect that the section you refer to should do the trick.
I believe that you do know how many URLs there are in the whole site, so you can compare how much pro-sitemaps.com finds to your numbers.Best luck!
GR -
Thanks for your response Gaston. These pages are definitely not blocked by the robots.txt file. I think that it is an internal linking problem. I actually subscribed to pro-sitemap.com and was wondering if I should use this section and add remaining sitemap URLs that are missing: https://cl.ly/0k0t093f0Y1T
Do you think this would do the trick?
-
Google not only provides a basic template you could do the sitemap manually if you wished, and this link has Google listing several dozen open source sitemap generators.
If Google Webmaster's can't read the one you generated fully, then clearly an alternate generator should definitely fix that for you. Good luck!
-
Hi taysir!
Have you tried any other crawler to check whether those pages can be finded?
I'd strongly suggest you Screaming Frog spider, the free version allows you up to 500 URLs. Also, it has a feature to create sitemaps from the crawled URLs. Even though dont know if that available in the free version.
Here some info about that feature: XML sitemap genetator - Screaming FrogUsual issues in not being findable are:
- Poor internal linking
- Not having a sitemap (this is why you find out)
- Blocked resources in robots.txt
- Blocked pages with robots meta tag
That being said, its completely normal that Google has indexed pages that you cant find in a AdHoc crawl, that is because GoogleBot could have found those pages from external linking.
Also keep in mind that having pages blocked with Robots.txt or robots meta tag will not prevent that page from being indexed nor will make them deindex if you add some rules to block them.Hope it helps.
Best luck
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Indexed But Not Submitted to Sitemap
Hi guys, In Google's webmaster tool it says that the URL has been indexed but not submitted to the sitemap. Is it necessary that the URL be submitted to the sitemap if it has already been indexed? Appreciate your help with this. Mark
Technical SEO | | marktheshark100 -
Some URLs in the sitemap not indexed
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index. It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible. Any ideas on why this is? Is this type of discrepancy typical?
Technical SEO | | Mase0 -
Friendly URL
Can be Friendly URL installed on a custom made jobsite using mod rewrite / apache without any big interference to the system itself? Thank you.
Technical SEO | | tomaz770 -
XML Sitemap Issue or not?
Hi Everyone, I submitted a sitemap within the google webmaster tools and I had a warning message of 38 issues. Issue: Url blocked by robots.txt. Description: Sitemap contains urls which are blocked by robots.txt. Example: the ones that were given were urls that we don't want them to be indexed: Sitemap: www.example.org/author.xml Value: http://www.example.org/author/admin/ My issue here is that the number of URL indexed is pretty low and I know for a fact that Robot.txt aren't good especially if they block URL that needs to be indexed. Apparently the URLs that are blocked seem to be URLs that we don't to be indexed but it doesn't display all URLs that are blocked. Do you think i m having a major problem or everything is fine?What should I do? How can I fix it? FYI: Wordpress is what we use for our website Thanks
Technical SEO | | Tay19860 -
Would you shorten this url, and if so how?
I designed the structure of my website way before I even thought about SEO. I run a website that requires me to categorize articles is somewhat deep nested categories so an example url would be as follows http://www.yakangler.com/articles/news/new-products/boats/item/1442-jackson-kayak-launches-the-big-tuna Would you shorten the url to somethign like this? http://www.yakangler.com/a/n/np/b/item/1442-jackson-kayak-launches-the-big-tuna If so how would you manage the redirects I'm unsure how to add a 301 redirect in my .htaccess file that wouldn't require me to add one for every single article. Could I do it with a rule that recognizes only the middle part of the url and redirect it accordingly? Thanks for any advice you might have!
Technical SEO | | mr_w0 -
Would moving a large part of our website onto a separate website be SEO suicide?
Hello, Our website currently has what I would call educational and sales pages - which sells our services and also a techy section for the developer community. The developer pages on the website have some of the highest authority pages that we have and equates for about 50% of the content. It has been proposed to move the developer pages onto their own domain - away from the main website. Now, would this crush a lot of the SEO benefit that we have on our main site? Does anyone know of a workable solution that would help retain the SEO. Would linking to our main site from the developer site help? It would be great to hear what people think, Thanks,
Technical SEO | | esendex0 -
Could somebody suggest a GOOD Wordpress XML sitemap generator?
We have been putzing around with Google XML Sitemaps Generator (a plug-in on Wordpress) for our Wordpress blog and we cannot get it to write an XML sitemap! Could somebody suggest a viable alternative that actually works? Thank you for your help! Jay
Technical SEO | | theideapeople0 -
HTML url extension
I've read some information about the extension of an url. But i couldn't find a clear answer. What is better for SEO, an extension with html or without? /make-money-online/how-to-make-a-million-dollars-in-1-year/ or /make-money-online/how-to-make-a-million-dollars-in-1-year.html/ Is there a difference between a normal website or a blog?
Technical SEO | | PlusPort0