Are there any negative side effects of having millions of URLs on your site?
-
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning.
Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning?
Any insight appreciated!
-
Agree with the points above with one exception. Yes, you have to find a way to deal with duplicate and quality content at scale. Yes, Robots.txt, nofollow links and index sitemaps are your friends. I would not use rel=canonical unless I had to. Better to get those extra pages de-indexed and then not let Google crawl the urls with the extra parameters to start with. Why waste Google's time in crawling pages that are just resorted versions of another? If you use the directives wisely you probably "only" have 200,000 pages worth crawling if you have that many sort parameters.
Good luck!
-
I'll echo Robert's concern about duplicate content. If those facet combinations are creating many pages with very similar content, that could be an issue for you.
If, let's say, there are 100 facet combinations that create essentially the same basic page content, then consider taking facet elements that do NOT substantially change the page content, and use rel=canonical to tell Google that those are all really the same page. For instance, let's say one of the facets is packaging size, and product X comes in boxes of 1, 10, 100, or 500 units. Let's say another facet is color, and it comes in blue, green, or red. Let's say the URLs for these look like this:
www.mysite.com/product.php?pid=12345&color=blue&pkgsize=1
www.mysite.com/product.php?pid=12345&color=green&pkgsize=10
www.mysite.com/product.php?pid=12345&color=red&pkgsize=100
You would want to set the rel=canonical on all of these to:
www.mysite.com/product.php?pid=12345
Be sure that your XML sitemap, your on-page meta robots, and your rel=canonicals are all in agreement. In other words, if a page has meta robots "noindex,follow", it should NOT show up in your XML sitemap. If the pages above have their rel=canonicals set as described, then your sitemap should contain www.mysite.com/product.php?pid=12345 and NONE of the three example URLs with the color and pkgsize parameters above.
-
There are several concerns to be addressed with this scenario:
- Organization
This is going to be very difficult to keep track of. If you are well-organized or the pages will not need much adjusting, this is probably okay.
- Duplicate Content
This is going to be a pain the behind. That being said, most site auditing tools will allow you to make adjustments as necessary.
- Broken Links
With a site of this size, broken links and 404's are going to be inevitable. This could lead to some negative SEO impacts and will have to be kept on top of.
- Hacking
This is a big reason why some sites have enormous numbers of URLs. This would likely be the biggest concern on my mind and worth looking in to. Going through that many pages will be impossible, so it might be worth taking a look at the link profile and determining where most of your links are coming from. If these are coming from spammy sites, you may have a problem there.
All this being said, the size of a website is normally not a cause for concern. Just make sure that your main pages (Home, Landing Pages) are properly handled and optimized and you shouldn't have too much trouble. I would add that unwieldy htaccess files (large ones) can result in slower loading times, which can impact your rankings with Google.
Let me know if there is anything specific concerning you and I will be happy to help. Congrats on the upgrade and hope it works out!
Rob
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Staging site and "live" site have both been indexed by Google
While creating a site we forgot to password protect the staging site while it was being built. Now that the site has been moved to the new domain, it has come to my attention that both the staging site (site.staging.com) and the "live" site (site.com) are both being indexed. What is the best way to solve this problem? I was thinking about adding a 301 redirect from the staging site to the live site via HTACCESS. Any recommendations?
Technical SEO | | melen0 -
What effect does HTTPS have on SEO for a public site?
I have a client who I've been working with for 4 months but getting NO TRACTION at all on their SERPS. This is unusual for me! The only difference to their site from other clients is that the whole site is https so I'm wondering if that's making a big difference. The site is: https://www.cnc-ltd.co.uk Any help of hints would be great thanks in advance Steve
Technical SEO | | stevecounsell0 -
Moved a site and changed URL structures: Looking for help with pay
Hi Gents and Ladies Before I get started, here is the website in question. www.moldinspectiontesting.ca. I apologize in advance if I miss any important or necessary details. This might actually seem like several disjointed thoughts. It is very late where I am and I am a very exhausted. No on to this monster of a post. **The background story: ** My programmer and I recently moved the website from a standalone CMS to Wordpress. The owners of the site/company were having major issues with their old SEO/designer at the time. They felt very abused and taken by this person (which I agree they were - financially, emotionally and more). They wanted to wash their hands of the old SEO/designer completely. They sought someone out to do a minor redesign (the old site did look very dated) and transfer all of their copy as affordably as possible. We took the job on. I have my own strengths with SEO but on this one I am a little out of my element. Read on to find out what that is. **Here are some of the issues, what we did and a little more history: ** The old site had a terribly unclean URL structure as most of it was machine written. The owners would make changes to one central location/page and the old CMS would then generate hundreds of service area pages that used long, parameter heavy url's (along with duplicate content). We could not duplicate this URL structure during the transfer and went with a simple, clean structure. Here is an example of how we modified the url's... Old: http://www.moldinspectiontesting.ca/service_area/index.cfm?for=Greater Toronto Area New: http://www.moldinspectiontesting.ca/toronto My programmer took to writing 301 redirects and URL rewrites (.htaccess) for all their service area pages (which tally in the hundreds). As I hinted to above, the site also suffers from a overwhelming amount of duplicate copy which we are very slowly modifying so that it becomes unique. It's also currently suffering from a tremendous amount of keyword cannibalization. This is also a result of the old SEO's work which we had to transfer without fixing first (hosting renewal deadline with the old SEO/designer forced us to get the site up and running in a very very short window). We are currently working on both of these issues now. SERPs have been swinging violently since the transfer and understandably so. Changes have cause and effect. I am bit perplexed though. Pages are indexed one day and ranking very well locally and then apparently de-indexed the next. It might be worth noting that they had some de-index problems in the months prior to meeting us. I suspect this was in large part to the duplicate copy. The ranking pages (on a url basis) are also changing up. We will see a clean url rank and then drop one week and then an unclean version rank and drop off the next (for the same city, same web search). Sometimes they rank along side each other. The terms they want to rank for are very easy to rank on because they are so geographically targeted. The competition is slim in many cases. This time last year, they were having one of the best years in the company's 20+ year history (prior to being de-indexed). **On to the questions: ** **What should we do to reduce the loss in these ranked pages? With the actions we took, can I expect the old unclean url's to drop off over time and the clean url's to pick up the ranks? Where would you start in helping this site? Is there anything obvious we have missed? I planned on starting with new keyword research to diversify what they rank on and then following that up with fresh copy across the board. ** If you are well versed with this type of problem/situation (url changes, index/de-index status, analyzing these things etc), I would love to pick your brain or even bring you on board to work with us (paid).
Technical SEO | | mattylac0 -
I have altered a url as it was too long. Do I need to do a 301 redirect for the old url?
Crawl diagnostics has shown a url that is too long on one of our sites. I have altered it to make it shorter. Do I now need to do a 301 redirect from the old url? I have altered a url previously and the old url now goes to the home page - can't understand why. Anyone know what is best practice here? Thanks
Technical SEO | | kingwheelie0 -
Redirect old URL's from referring sites?
Hi I have just came across some URL's from the previous web designer and the site structure has now changed. There are some links on the web however that are still pointing at the old deep weblinks. Without having to contact each site it there a way to automatically sort the links from the old structure www.mydomain.com/show/english/index.aspx to just www.mydomain.com Many Thanks
Technical SEO | | ocelot0 -
Mobile Site Domain/URL Structure
We are currently building a mobile optimised version of our main website and I had some questions with regard to SEO. 1. Is it best to structure the domain as: m.yourdomain.com yourdomain/m 2. It is correct to place rel="cannonical" on the mobile pages and to have only the main site indexed? Thanks in advance and links or books on mobile seo you can direct me to that would be greatly appreciated. Phil
Technical SEO | | Phily0 -
Mobile site rank on Google S.E. instead of desktop site.
Hello, all SEOers~ Today, I would like to hear your opinion regarding on Mobile site and duplicate contents issue. I have a mobile version of our website that is hosted on a subdomain (m instead www). Site is targeting UK and Its essentially the same content, formatted differently. So every URL on www exists also at the "m" subdomain and is identical content. (there are some different contents, yet I could say about 90% or more contents are same) Recently I've noticed that search results are showing links to our mobile site instead of the desktop site. (Google UK) I have a sitemap.xml for both sites, the mobile sitemap defined as follows: I didn't block googlebot from mobile site and also didn't block googlebot-mobile from desktop site. I read and watched Google webmaster tool forum and related video from Matt Cutts. I found many opinion that there is possibility which cause duplicate contents issue and I should do one of followings. 1. Block googlebot from mobile site. 2. Use canonical Tag on mobile site which points to desktop site. 3. Create and develop different contents (needless to say...) Do you think duplicate contents issue caused my mobile site rank on S.E. instead of my desktop site? also Do you think those method will help to show my desktop site on S.E.? I was wondering that I have multi-country sites which is same site format as I mentioned above. However, my other country sites are totally doing fine on Google. Only difference that I found is my other country sites have different Title & Meta Tag comparing to desktop site, but my UK mobile site has same Title & Meta Tag comparing to desktop. Do you think this also has something to do with current problem? Please people~! Feel free to make some comments and share your opinion. Thanks for reading my long long explanation.
Technical SEO | | Artience0 -
The course of action to move my macro site to some mini sites- justin if you can help
We have a site that we want to break up into mini sites but keep the old site for the major brands. Empirecovers.com is the major and we want to break it off into Empire Truck Covers and Empire Boat covers. What I am thinking of doing is linking from the home to Empiretruckcovers.com instead of a mini page on the site and 301 redirect the mini page to empiretruckcovers.com. Than (there wont be duplicate content) making a small page for truck covers on empire just so people do not get confused. Is this the best way to go or what do you suggest? We are doing this because I feel there is seo value in having mini sites and also the user experience will be cleaner and people will trust it a lot more than inside a big site. The other problem is I have some great rankings on the pages so I want to do it so there is as little damage as possible. I guess once I start I will do all the free directories, yahoo directory and try to get links as fast as I can. Any suggestions would be great. I am going to do a/b testing to see if my adwords convert better on mini site or on the big site for certain keywords too
Technical SEO | | goldjake17880