Robots.txt - Googlebot - Allow... what's it for?
-
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke
User-Agent: Googlebot
Allow: /.js
Allow: /.css
-
Thanks Tom - that's very useful - appreciated - and thanks also Clever PhD re: the robots.txt tester info - Luke
-
Just as a follow-up to Tom's great post. If you were wanting to test a robots.txt setup, especially if you were using a wildcard or using an allow combined with a disallow, Google Search Console under the Crawl section has a robots.txt Tester. You will see your most recent robots.txt file there that Google has a copy of. You can then modify that version and then enter a URL at the bottom to see if everything is set correctly or not. It is pretty handy, especially if you have a big robots.txt file. Note that this tool does not change how Google crawls your site or your robots.txt file, it is just for testing. Once you find the configuration that works, you would still need to update the robots.txt on your server.
-
Hi Luke
As you have correctly assumed, that particular robots command would be pointless.
The Googlebot does follow allow commands (while other ones do not), but it should only be used if it is an exception to a disallow rule.
So, for example, if you had a rule that blocked pages within a sub-directory, with:
Disallow: /example/*
You could create an allow rule that indexes a specific page within that directory to be indexed, like:
Allow: /example/page.html
Couple of things to point out here. "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule." (Google Source). In this example, because the more specific rule is the allow rule, that will prevail. It is also best practice to put your "allow" rules at the top of the robots.txt file.
But in your example, if they have allow rules for JS and CSS files without having disavow rules for those directories/paths etc - it's a waste of space. Google will attempt to crawl anything it can by default - unless you disavow access.
TL;DR - You don't need to proactively tell Google to crawl CSS and JS - it will by default.
Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Magento 1.9 SEO. I have product pages with identical On Page SEO score in the 90's. Some pull up Google page 1 some won't pull up at all. I am searching for the exact title on that page.
I have a website built on Magento 1.9. There are approximately 290,000 part numbers on the site. I am sampling Google SERP results. About 20% of the keywords show up on page 1 position 5 thru 10. 80% don't show up at all. When I do a MOZ page score I get high 80's to 90's. A page score of 89 on one part # may show up on page one, An identical page score on a different part # can't be found on Google. I am searching for the exact part # in the page title. Any thoughts on what may be going on? This seems to me like a Magento SEO issue.
Intermediate & Advanced SEO | | CTOPDS0 -
Robots.txt wildcards - the devs had a disagreement - which is correct?
Hi – the lead website developer was assuming that this wildcard: Disallow: /shirts/?* would block URLs including a ? within this directory, and all the subdirectories of this directory that included a “?” The second developer suggested that this wildcard would only block URLs featuring a ? that come immediately after /shirts/ - for example: /shirts?minprice=10&maxprice=20 BUT argued that this robots.txt directive would not block URLS featuring a ? in sub directories - e.g. /shirts/blue?mprice=100&maxp=20 So which of the developers is correct? Beyond that, I assumed that the ? should feature a * on each side of it – for example - /? - to work as intended above? Am I correct in assuming that?
Intermediate & Advanced SEO | | McTaggart0 -
Blocking Certain Site Parameters from Google's Index - Please Help
Hello, So we recently used Google Webmaster Tools in an attempt to block certain parameters on our site from showing up in Google's index. One of our site parameters is essentially for user location and accounts for over 500,000 URLs. This parameter does not change page content in any way, and there is no need for Google to index it. We edited the parameter in GWT to tell Google that it does not change site content and to not index it. However, after two weeks, all of these URLs are still definitely getting indexed. Why? Maybe there's something we're missing here. Perhaps there is another way to do this more effectively. Has anyone else ran into this problem? The path we used to implement this action:
Intermediate & Advanced SEO | | Jbake
Google Webmaster Tools > Crawl > URL Parameters Thank you in advance for your help!0 -
How do I tell if competitor's links are good?
One strategy I have seen recommended over and over is to look at your competitor's back links and see if any could be relevant for your site and worth pursuing. My question is how do I evaluate a link and not end up pursuing some penalized site? I would guess checking for Google index is a good idea since some of the webmasters may not be aware they are penalized. Is it DA and whether they are indexed alone? Many sites I have seen have DA in the teens but are legitimate in our industry. Should they not be considered due to low DA? Also I see links from directories on many competitor sites. Seems a controversial subject, but assuming the directory is industry specific, is it OK? Thanks in advance!
Intermediate & Advanced SEO | | Chris6610 -
- Truth ? ''link building isn't considered a suitable way of promotion as per recent search engine updates''
I need SEO. A SEO consultant said: ''link building isn't considered a suitable way of promotion as per recent search engine updates'' they mention: ''Therefore we would be undertaking a range of promotional exercises such as blog postings, social book marking, press release, etc that are more effective for ensuring best possible rankings for the website.'' Do you agree? Thank you
Intermediate & Advanced SEO | | BigBlaze2051 -
Ecommerce URL's
I'm a bit divided about the URL structure for ecommerce sites. I'm using Magento and I have Canonical URLs plugin installed. My question is about the URL structure and length. 1st Way: If I set up Product to have categories in the URL it will appear like this mysite.com/category/subcategory/product/ - and while the product can be in multiple places , the Canonical URL can be either short or long. The advantage of having this URL is that it shows all the categories in the breadcrumbs ( and a whole lot more links over the site ) . The disadvantage is the URL Length 2nd Way: Setting up the product to have no category in the URL URL will be mysite.com/product/ Advantage: short URL. disadvantage - doesn't show the categories in the breadcrumbs if you link direct. Thoughts?
Intermediate & Advanced SEO | | s_EOgi_Bear1 -
Do links from twitter count in SEOMoz's Toolbar link count?
I am using the Chrome extension and looking at a SERP, when a page is said to have 2000 incoming links, does that include tweets with a link back to this page? What about retweets. Are those counted separately or as one? And what about independent tweets that have exactly the same content (tweet text + link)
Intermediate & Advanced SEO | | davhad0 -
Will Google Visit Non-Canonicalized Page Again and Return Its Page's Original Ranking?
I have 2 questions about canonicalization. 1. Will Google ever visit Page A again if after it has been canonicalized to Page B? 2. If Google will still visit Page A and found that it is not canonicalizing to Page B already, will the original rankings and traffic of Page A returned to the way before it's canonicalized? Thanks.
Intermediate & Advanced SEO | | globalsources.com0