How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
-
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated!
Thanks
-
I agree with Chris. With such large websites it would be advisable having a sitemap index and then splitting the index into various individual indexes such as Pages, Products, Categories, images, media, tags etc.
-
The easiest thing i can think of is to write a script that works with your dispatcher to create a site map. The format I would use is add the page and all of the "product images" on the page to the map and move to the next. At the same time I would use an auto increment variable to keep track of how many lines you have written. When you get around 50k, write out the name of the next site map file that the program will create and have them chained together this way.
-
That's a great help Chris, thank you! And thanks to all for your help!
-
Typically, a sitemap is going to include every page on the site. As Francesca said, each sitemap can be up to 50K urls and if you need multiple sitemaps then you create a sitemap index that points to the rest of the sitemaps.
-
Thanks for the feedback!
I will look into screamingfrog for sure.
@Lesley - we are using a custom platform (in house) so we don't have that functionality. The issue is that we have a lot of inventory (millions) of cars. We have built (and are releasing new functionality today) to provide internal links so that Google can crawl all the inventory easily (users can too :). My question about sitemaps has boiled down to this: Do we need to build the sitemap to include every single page (all the inventory) or do we provide a "map" so that google can find the top pages and then crawl the inventory from there. Again the site is bestride.com. If anyone wants to take a look at the site, that would be fantastic!
Thanks
-
Are you using a custom platform or an off the shelf e-commerce package? Most off the shelf packages actually have a module that can create a site map and a lot have it where you can cron it too.
-
Of course, you can also use the moz's crawl test report at http://pro.moz.com/tools/crawl-test
-
Hi Kristin,
Each sitemap.xml can support maximum 50.000 URLs. So, If you have a site with more than 100K, It'd be better to create 2 or 3 o 4 etc sitemaps.xml in order to contain all URLs. Hope it is useful.
Kind regards!
Francesca
-
You can use screamingfrog to create your sitemap. You just need to license it for crawl more than 500 URI.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Old pages not mobile friendly - new pages in process but don't want to upset current traffic.
Working with a new client. They have what I would describe as two virtual websites. Same domain but different coding, navigation and structure. Old virtual website pages fail mobile friendly, they were not designed to be responsive ( there really is no way to fix them) but they are ranking and getting traffic. New virtual website pages pass mobile friendly but are not SEO optimized yet and are not ranking and not getting organic traffic. My understanding is NOT mobile friendly is a "site" designation and although the offending pages are listed it is not a "page" designation. Is this correct? If my understanding is true what would be the best way to hold onto the rankings and traffic generated by old virtual website pages and resolve the "NOT mobile friendly" problem until the new virtual website pages have surpassed the old pages in ranking and traffic? A proposal was made to redirect any mobile traffic on the old virtual website pages to mobile friendly pages. What will happen to SEO if this is done? The pages would pass mobile friendly because they would go to mobile friendly pages, I assume, but what about link equity? Would they see a drop in traffic ? Any thoughts? Thanks, Toni
Technical SEO | | Toni70 -
Soft 404's on a 301 Redirect...Why?
So we launched a site about a month ago. Our old site had an extensive library of health content that went away with the relaunch. We redirected this entire section of the site to the new education materials, but we've yet to see this reflected in the index or in GWT. In fact, we're getting close to 500 soft 404's in GWT. Our development team confirmed for me that the 301 redirect is configured correctly. Is it just a waiting game at this point or is there something I might be missing? Any help is appreciated. Thanks!
Technical SEO | | MJTrevens0 -
Disallowing WP 'author' page archives
Hey Mozzers. I want to block my author archive pages, but not the primary page of each author. For example, I want to keep /author/jbentz/ but get rid of /author/jbentz/page/4/. Can I do that in robots by using a * where the author name would be populated. ' So, basically... my robots file would include something like this... Disallow: /author/*/page/ Will this work for my intended goal... or will this just disallow all of my author pages?
Technical SEO | | Netrepid0 -
Are the duplicate content and 302 redirects errors negatively affecting ranking in my client's OS Commerce site?
I am working on an OS Commerce site and struggling to get it to rank even for the domain name. Moz is showing a huge number of 302 redirects and duplicate content issues but the web developer claims they can not fix those because ‘that is how the software in which your website is created works’. Have you any experience of OS Commerce? Is it the 302 redirects and duplicate content errors negatively affecting the ranking?
Technical SEO | | Web-Incite0 -
ECommerce: Best Practice for expired product pages
I'm optimizing a pet supplies site (http://www.qualipet.ch/) and have a question about the best practice for expired product pages. We have thousands of products and hundreds of our offers just exist for a few months. Currently, when a product is no longer available, the site just returns a 404. Now I'm wondering what a better solution could be: 1. When a product disappears, a 301 redirect is established to the category page it in (i.e. leash would redirect to dog accessories). 2. After a product disappers, a customized 404 page appears, listing similar products (but the server returns a 404) I prefer solution 1, but am afraid that having hundreds of new redirects each month might look strange. But then again, returning lots of 404s to search engines is also not the best option. Do you know the best practice for large ecommerce sites where they have hundreds or even thousands of products that appear/disappear on a frequent basis? What should be done with those obsolete URLs?
Technical SEO | | zeepartner1 -
Mega Menus - Site Links - Bottom of the Page
Here are the questions: If you replace your top menu with a mega menu - like rei.com, target.com etc - that has dramatically more links and lots of non-optimized testimonials and calls for action, and locate the actual code of the mega menu at the bottom of the HTML , How will this affect your sitelinks? Will this now, make your on-page content more visible and indexable? Or does the Google bott dismiss this as just navigation content? In the past, I've have seen this technique work well, but that was before site links were easier to obtain. Looking at sites with virtually no navigation on their home pages and good authority, I've seen site links seemingly gleamed from alt attributes.
Technical SEO | | Runner20090 -
Sitemap coming up in Google's index?
I apologize if this question's answer is glaringly obvious, but I was using Google to view all the pages it has indexed of our site--by searching for our company and then clicking the link that says to display more results for the site. On page three, it has the sitemap indexed as if it wee just another page of our site. <cite>www.stadriemblems.com/sitemap.xml</cite> Is this supposed to happen?
Technical SEO | | UnderRugSwept0 -
Canonical on ecommerce site
I have read tons of guides about canonical implementaiton but still am confused about how I should best use it. On my site with tens of thousands of urls and thousands of afiiliates and shopping networks sending traffic, is it smart to simply add the tag to every page and redirect to the same url. In doing this would that solve the problem of a single page having many different entrances with different tracking codes? Is there a better way to handle this? Also is there any potential problems with rolling out the tag to all pages if they are simply refrencing themselves in the tag? Thanks in advance.
Technical SEO | | Gordian0