Are there any negative side effects of having millions of URLs on your site?
-
After a site upgrade, we found that we have over 3.7 million URLs on our site. Many of these URLs are due to the facet options. Each facet combination yields a different URL. However, we need to do a deeper analysis into these URLs to see if this is the only reason why so many are returning.
Does anyone know if there are any negatives of having so many URLs crawled, other than the fact that Google only spends so much time crawling a site? Is the number of URLs something that should be concerning?
Any insight appreciated!
-
Agree with the points above with one exception. Yes, you have to find a way to deal with duplicate and quality content at scale. Yes, Robots.txt, nofollow links and index sitemaps are your friends. I would not use rel=canonical unless I had to. Better to get those extra pages de-indexed and then not let Google crawl the urls with the extra parameters to start with. Why waste Google's time in crawling pages that are just resorted versions of another? If you use the directives wisely you probably "only" have 200,000 pages worth crawling if you have that many sort parameters.
Good luck!
-
I'll echo Robert's concern about duplicate content. If those facet combinations are creating many pages with very similar content, that could be an issue for you.
If, let's say, there are 100 facet combinations that create essentially the same basic page content, then consider taking facet elements that do NOT substantially change the page content, and use rel=canonical to tell Google that those are all really the same page. For instance, let's say one of the facets is packaging size, and product X comes in boxes of 1, 10, 100, or 500 units. Let's say another facet is color, and it comes in blue, green, or red. Let's say the URLs for these look like this:
www.mysite.com/product.php?pid=12345&color=blue&pkgsize=1
www.mysite.com/product.php?pid=12345&color=green&pkgsize=10
www.mysite.com/product.php?pid=12345&color=red&pkgsize=100
You would want to set the rel=canonical on all of these to:
www.mysite.com/product.php?pid=12345
Be sure that your XML sitemap, your on-page meta robots, and your rel=canonicals are all in agreement. In other words, if a page has meta robots "noindex,follow", it should NOT show up in your XML sitemap. If the pages above have their rel=canonicals set as described, then your sitemap should contain www.mysite.com/product.php?pid=12345 and NONE of the three example URLs with the color and pkgsize parameters above.
-
There are several concerns to be addressed with this scenario:
- Organization
This is going to be very difficult to keep track of. If you are well-organized or the pages will not need much adjusting, this is probably okay.
- Duplicate Content
This is going to be a pain the behind. That being said, most site auditing tools will allow you to make adjustments as necessary.
- Broken Links
With a site of this size, broken links and 404's are going to be inevitable. This could lead to some negative SEO impacts and will have to be kept on top of.
- Hacking
This is a big reason why some sites have enormous numbers of URLs. This would likely be the biggest concern on my mind and worth looking in to. Going through that many pages will be impossible, so it might be worth taking a look at the link profile and determining where most of your links are coming from. If these are coming from spammy sites, you may have a problem there.
All this being said, the size of a website is normally not a cause for concern. Just make sure that your main pages (Home, Landing Pages) are properly handled and optimized and you shouldn't have too much trouble. I would add that unwieldy htaccess files (large ones) can result in slower loading times, which can impact your rankings with Google.
Let me know if there is anything specific concerning you and I will be happy to help. Congrats on the upgrade and hope it works out!
Rob
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks
Can you help with how to fix this 404 error please? It appears that I have a redirect from one page to the other, although the referring page URL works, but it appears to be linking to another URL with this code at the end of the the URL - %5Bnull%20id=43484%5D that I'm struggling to find and fix. Thanks
Technical SEO | | Nichole.wynter20200 -
URL is invalid: Why?
Hello everyone, I am currently listing my company on business directories. For some websites however when I add my website URL, it comes up as URL is invalid. What could be the reason for this? I have tried different variations like www., http:// and https://. Kind Regards,
Technical SEO | | SMCCoachHire
Aqib0 -
Our client's site was owned by former employee who took over the site. What should be done? Is there a way to preserve all the SEO work?
A client had a member of the team leave on bad terms. This wasn't something that was conveyed to us at all, but recently it came up when the distraught former employee took control of the domain and locked everyone out. At first, this was assumed to be a hack, but eventually it was revealed that one of the company starters who unhappily left the team owned the domain all along and is now holding it hostage. Here's the breakdown: -Every page aside from the homepage is now gone and serving a 404 response code -The site is out of our control -The former employee is asking for a $1 million ransom to sell the domain back -The homepage is a "countdown clock" that isn't actively counting down, but claims that something exciting is happening in 3 days and lists a contact email. The question is how we can save the client's traffic through all this turmoil. Whether buying a similar domain and starting from square one and hoping we can later redirect the old site's pages after getting it back. Or maybe we have a legal claim here that we do not see even though the individual is now the owner of the site. Perhaps there's a way to redirect the now defunct pages to a new site somehow? Any ideas are greatly appreciated.
Technical SEO | | FPD_NYC0 -
Can the Hosting location of image files have a negative effect if on the developers own media server rather than on client site server ?
Hi Can the Hosting location of image files have a negative effect if on the developers own media server as opposed to on the actual websites server ? In the case i'm looking at the image files are hosted on a totally separate server (a media subdomain of the developers site server) from the subject sites dedicated server. Will engines still attribute the properties of files hosted in this manner to the main website (such as file name or should they really be on the subject sites server own media folder ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
URL removals
Hello there, I found out that some pages of the site have two different URL's pointing at the same page generating duplicate content, title and description. Is there a way to block one of them? cheers
Technical SEO | | PremioOscar0 -
Page URL Change
We're planning on rolling out a redesign of an existing page, and at the same time, we're looking to possibly changing the URL of the page. Currently, the URL is www.blah.com/phraseword1-phraseword2-phraseword3-phraseword4 and we're ranking top 3 in Google SERP for that 4-word phrase. The keyword phrase is something we have in our Page Title, Site Copy and the URL. Now, we are planning on simplifying the URL to below.. www.blah.com/phraseword1-phraseword2 The plan is to 301 redirect the original URL to this new URL and actually work the exact phrase into the copy a few more times. My understanding is that URL doesn't get as much weight as it does in the past, but it's still important. So my question is... How important is the URL in this case where we will continue to have it in our page title and also we'll be working more copy on to the page with the appropriate keyword? Will 301 redirect from the old URL address the issue of passing SEO value for that keyword phrase? Thanks,
Technical SEO | | JoeLin
Joe0 -
URL Structure
Hi Guys, I'm in the process of creating a very exciting startup aimed at the baby industry. It's essentially a social commerce question where parents can shop for products, create lists of products and ask questions. The challenge I'm facing is how best to structure my URLs from an SEO standpoint. For example a common baby topic such as "feeding", can sit in all three categories: Shopping category aggregates all products related to feeding List category aggregates all lists related to feeding Question category aggregates all question and answers on feeding So for that keyword "feeding" you have 3 potential landing pages. What I was wondering is what is the most effective way of doing it? I was thinking of something along these lines: /shopping/feeding /baby_list/feeding /ask/feeding Would love to hear your points of view on this. Thanks! Walid
Technical SEO | | walidalsaqqaf0 -
URL Rewrite
Using the .htaccess file how do I rewrite a url from www.exampleurl.com/index.php?page=example to www.exampleurl.com/example removing index.php?page= Any help is muchly appreciated
Technical SEO | | CraigAddyman0