Sitemap generator partially finding list of website URLs
-
Hi everyone,
When creating my XML sitemap here it is only able to detect a portion of the website. I am missing at least 20 URLs (blog pages + newly created resource pages). I have checked those missing URLs and all of them are index and they're not blocked by the robots.txt.
Any idea why this is happening? I need to make sure all wanted URLs to be generated in an XML sitemap.
Thanks!
-
Gaston,
Interestingly enough by default the generator only located only half of the URLs. I hope that one of those 2 fields will do the trick.
-
Hi Taysir,
I´ve never used that service. I suspect that the section you refer to should do the trick.
I believe that you do know how many URLs there are in the whole site, so you can compare how much pro-sitemaps.com finds to your numbers.Best luck!
GR -
Thanks for your response Gaston. These pages are definitely not blocked by the robots.txt file. I think that it is an internal linking problem. I actually subscribed to pro-sitemap.com and was wondering if I should use this section and add remaining sitemap URLs that are missing: https://cl.ly/0k0t093f0Y1T
Do you think this would do the trick?
-
Google not only provides a basic template you could do the sitemap manually if you wished, and this link has Google listing several dozen open source sitemap generators.
If Google Webmaster's can't read the one you generated fully, then clearly an alternate generator should definitely fix that for you. Good luck!
-
Hi taysir!
Have you tried any other crawler to check whether those pages can be finded?
I'd strongly suggest you Screaming Frog spider, the free version allows you up to 500 URLs. Also, it has a feature to create sitemaps from the crawled URLs. Even though dont know if that available in the free version.
Here some info about that feature: XML sitemap genetator - Screaming FrogUsual issues in not being findable are:
- Poor internal linking
- Not having a sitemap (this is why you find out)
- Blocked resources in robots.txt
- Blocked pages with robots meta tag
That being said, its completely normal that Google has indexed pages that you cant find in a AdHoc crawl, that is because GoogleBot could have found those pages from external linking.
Also keep in mind that having pages blocked with Robots.txt or robots meta tag will not prevent that page from being indexed nor will make them deindex if you add some rules to block them.Hope it helps.
Best luck
GR
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can redirect URL website also shown on the google ranking? and higher than the original website?
can redirect URL website also shown on the google ranking? and higher than the original website? For example, I create URL B which redirect to website A, and do good SEO on URL B, can URL B rank higher than my original website A?
Technical SEO | | HealthmateForever0 -
Duplicated content & url's for e-commerce website
Hi, I have an e-commerce site where I sell greeting cards. Products are under different categories (birthday, Christmas etc) with subcategories (for Mother, for Sister etc) and same product can be under 3 or 6 subcategories, for example: url: .../greeting-cards/Christmas/product1/for-mother
Technical SEO | | jurginga
url:.../greeting-cards/Christmas/product1/for-sister
etc On the CMS I have one description record per each card (product1) with multiple subcategories attached that naturally creates URLs for subcategories. Moz system (and Google for sure) picks these urls (and content) as duplicated.
Any ideas how to solve this problem?
Thank you very much!0 -
If a URL canonically points to another link, is that URL indexed?
Hi, I have two URL both talking about keyword phrase 'counting aggregated cells' The first URL has canonical link pointing to the second URL, but if one searches for 'counting aggregated cells' both URLs are shown in the results. The first URL is the pdf, and i need only second URL (the landing page) to be shown in the search results. The canonical links should tell Google which URL to index, i don't understand why both URLs are present in search results? Is 'noindex' for the first URL only solution? I am using Yoast SEO for my website. Thank you for the answers.
Technical SEO | | Chemometec0 -
Which URL would you choose?
1 – www.company.com/subfolder/subfolder/keyword-keyword-product (I’m able to keyword match with this url) or 2. www.company.com/subfolder/subfolder/product (no url keyword match) What would you choose? A url which is "short" but still relevant, or, a url which is more descriptive allowing “keyword” match? Be great to get your feedback guys. Many thanks Gary
Technical SEO | | GaryVictory0 -
Sitemap and crawl impact
If I have two links in the sitemap (for example: page1.html and page2.html) but the web-site contains more pages (page1.html, page2.html and page3.html) is this a sign for Google to not to crawl other pages? I.e. Will Google index page3.html? Consider that any page can be accessed.
Technical SEO | | ditoroin0 -
Changed URL of all web pages to a new updated one - Keywords still pick the old URL
A month ago we updated our website and with that we created new URLs for each page. Under "On-Page", the keywords we put to check ranking on are still giving information on the old urls of our websites. Slowly, some new URLs are popping up. I'm wondering if there's a way I can manually make the keywords feedback information from the new urls.
Technical SEO | | Champions0 -
Long URL
I am using seomoz software as a trial, it has crawled my site and a report is telling me that the URL for my forum is to long: <dl> <dt>Title</dt> <dd>Healthy Living Community</dd> <dt>Meta Description</dt> <dd>Healthy life discussion forum chatting about all aspects of healthy living including nutrition, fitness, motivation and much more.</dd> <dt>Meta Robots</dt> <dd>noodp, noydir</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> <dd> 1 Warning Long URL (> 115 characters) Found about 17 hours ago <dl> <dt>Number of characters</dt> <dd>135 (over by 21)</dd> <dt>Description</dt> <dd>A good URL is descriptive and concise. Although not a high priority, we recommend a URL that is shorter than 75 characters.</dd> </dl> </dd> <dd> URL: http://www.goodhealthword.com/forum/reprogramming-health/welcome-to-the-forum-for-discussing-the-4-steps-for-reprogramming-ones-health/ The problem is when I check the page via edit or in the admin section of wordpress, the url is a s follows: http://www.goodhealthword.com/forum/ My question is where is I cannot see where this long url is located, it appears to be a valid page but I cant find it. Thanks Pete </dd> </dl>
Technical SEO | | petemarko0 -
ROR Sitemap
Do search engines Read RoR sitemaps ? Are they necessary ? Isn't xml sitemap enough.
Technical SEO | | seoug_20050