Crawl Budget and Faceted Navigation
-
Hi, we have an ecommerce website with facetted navigation for the various options available.
Google has 3.4 million webpages indexed. Many of which are over 90% duplicates.
Due to the low domain authority (15/100) Google is only crawling around 4,500 webpages per day, which we would like to improve/increase.
We know, in order not to waste crawl budget we should use the robots.txt to disallow parameter URL’s (i.e. ?option=, ?search= etc..). This makes sense as it would resolve many of the duplicate content issues and force Google to only crawl the main category, product pages etc.
However, having looked at the Google Search Console these pages are getting a significant amount of organic traffic on a monthly basis.
Is it worth disallowing these parameter URL’s in robots.txt, and hoping that this solves our crawl budget issues, thus helping to index and rank the most important webpages in less time.
Or is there a better solution?
Many thanks in advance.
Lee.
-
Hello, I have also been in a similar situation. What I did was to disallow the urls with parameters using the robots.txt and place (in only the pages with parameters) the following two html tags:
This will expressly indicate to google not to index these pages. I still have some errors but I guess they will disappear in a few months.
Regards
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawling/indexing of near duplicate product pages
Hi, Hope someone can help me out here. This is the current situation: We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles. We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg). There is not any search volume related to the different quantities The 'top' page does not link to the pages for the different quantities The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same. Current situation:
Intermediate & Advanced SEO | | AMAGARD
- Most pages for the different quantities do not have internal links (about 95%) But the sitemap does contain all of these pages. Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them. Problems: Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's. Having url's in the sitemap that do not have an internal link is a problem on its own All these pages are indexed so all sorts of gravel/pebbles have near duplicates. My solution: remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index My questions: To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap? Do you agree that these pages are near duplicates and that it is best to remove them from the index? A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem? Thanks a lot in advance for your help! Best!1 -
Navigation Menu - Whats too much
Ive always had pages set up for a lot of our products and had these in the navigation menu. For instance we sell Solar Control Window Film which helps with heat, glare and UV. We then have a navigation menu something like this: Solar Window Film
Intermediate & Advanced SEO | | Fozzy1609
Heat Control window Films
Anti glare window film
UV window film
etc etc Ihave this for all my services and products. I have unique content on each. My question is this. Would I be better having the naviation menu with links to all the seperate services we offer
OR
Should I have it linking to the main services and then the related services from within the page> For example Ill have just Solar Window Film in the navigation and then on the page it would internally link to the heat related section and the glare related section etc. Im wondering whether my sub pages would suffer because theyre not linked to from every page with the second method or whether it would help in some way0 -
Our parent company has included their sitemap links in our robots.txt file - will that have an impact on the way our site is crawled?
Our parent company has included their sitemap links in our robots.txt file. All of their sitemap links are on a different domain and I'm wondering if this will have any impact on our searchability or potential rankings.
Intermediate & Advanced SEO | | tsmith1310 -
Layered navigation and hiding nav from user agent
I am trying to deal with the duplicate content issues presented by Magento's layered navigation feature (aka faceted navigation). I installed Amasty's Improved Navigation extension (https://amasty.com/improved-layered-navigation.html) and it offers the option to hide the layered navigation from specific user agents (ie googlebot, bingbot, etc). This seems like cloaking to me and I hesitate to try it, unless hiding faceted navigation from specific user agents is known to be acceptable to Google (white hat practice). Does anyone know if this the case?
Intermediate & Advanced SEO | | Kyle_M0 -
Can't crawl website with Screaming frog... what is wrong?
Hello all - I've just been trying to crawl a site with Screaming Frog and can't get beyond the homepage - have done the usual stuff (turn off JS and so on) and no problems there with nav and so on- the site's other pages have indexed in Google btw. Now I'm wondering whether there's a problem with this robots.txt file, which I think may be auto-generated by Joomla (I'm not familiar with Joomla...) - are there any issues here? [just checked... and there isn't!] If the Joomla site is installed within a folder such as at e.g. www.example.com/joomla/ the robots.txt file MUST be moved to the site root at e.g. www.example.com/robots.txt AND the joomla folder name MUST be prefixed to the disallowed path, e.g. the Disallow rule for the /administrator/ folder MUST be changed to read Disallow: /joomla/administrator/ For more information about the robots.txt standard, see: http://www.robotstxt.org/orig.html For syntax checking, see: http://tool.motoricerca.info/robots-checker.phtml User-agent: *
Intermediate & Advanced SEO | | McTaggart
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /modules/
Disallow: /plugins/
Disallow: /tmp/0 -
Do search engines crawl links on 404 pages?
I'm currently in the process of redesigning my site's 404 page. I know there's all sorts of best practices from UX standpoint but what about search engines? Since these pages are roadblocks in the crawl process, I was wondering if there's a way to help the search engine continue its crawl. Does putting links to "recent posts" or something along those lines allow the bot to continue on its way or does the crawl stop at that point because the 404 HTTP status code is thrown in the header response?
Intermediate & Advanced SEO | | brad-causes0 -
SEOmoz is only crawling 2 pages out of my website
I have checked on Google Webmaster and they are crawling around 118 pages our of my website, store.itpreneurs.com but SEOmoz is only crawling 2 pages. Can someone help me? Thanks Diogo
Intermediate & Advanced SEO | | jslusser0 -
SEO for Global Navigations
I did my first SEO audit from the book SEO Secrets by Danny Dover on my new website at http://melo4.melotec.com:4010/ In the book he says to disable Javascript and see if the global navigation still works. So when I did that the dropdown menus in my navigation don't show. I'm assuming this is a problem but when I check the cache text only version of the site, the dropdowns are in the text only version. Are their any experienced SEO's out their who can weigh in on this issue? Should I have my developer redo the navigation without any javascript? Thanks, Shawn
Intermediate & Advanced SEO | | Romancing0