Robots.txt - Googlebot - Allow... what's it for?
-
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke
User-Agent: Googlebot
Allow: /.js
Allow: /.css
-
Thanks Tom - that's very useful - appreciated - and thanks also Clever PhD re: the robots.txt tester info - Luke
-
Just as a follow-up to Tom's great post. If you were wanting to test a robots.txt setup, especially if you were using a wildcard or using an allow combined with a disallow, Google Search Console under the Crawl section has a robots.txt Tester. You will see your most recent robots.txt file there that Google has a copy of. You can then modify that version and then enter a URL at the bottom to see if everything is set correctly or not. It is pretty handy, especially if you have a big robots.txt file. Note that this tool does not change how Google crawls your site or your robots.txt file, it is just for testing. Once you find the configuration that works, you would still need to update the robots.txt on your server.
-
Hi Luke
As you have correctly assumed, that particular robots command would be pointless.
The Googlebot does follow allow commands (while other ones do not), but it should only be used if it is an exception to a disallow rule.
So, for example, if you had a rule that blocked pages within a sub-directory, with:
Disallow: /example/*
You could create an allow rule that indexes a specific page within that directory to be indexed, like:
Allow: /example/page.html
Couple of things to point out here. "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule." (Google Source). In this example, because the more specific rule is the allow rule, that will prevail. It is also best practice to put your "allow" rules at the top of the robots.txt file.
But in your example, if they have allow rules for JS and CSS files without having disavow rules for those directories/paths etc - it's a waste of space. Google will attempt to crawl anything it can by default - unless you disavow access.
TL;DR - You don't need to proactively tell Google to crawl CSS and JS - it will by default.
Hope this helps.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO Best Practices regarding Robots.txt disallow
I cannot find hard and fast direction about the following issue: It looks like the Robots.txt file on my server has been set up to disallow "account" and "search" pages within my site, so I am receiving warnings from the Google Search console that URLs are being blocked by Robots.txt. (Disallow: /Account/ and Disallow: /?search=). Do you recommend unblocking these URLs? I'm getting a warning that over 18,000 Urls are blocked by robots.txt. ("Sitemap contains urls which are blocked by robots.txt"). Seems that I wouldn't want that many urls blocked. ? Thank you!!
Intermediate & Advanced SEO | | jamiegriz0 -
Magento: Should we disable old URL's or delete the page altogether
Our developer tells us that we have a lot of 404 pages that are being included in our sitemap and the reason for this is because we have put 301 redirects on the old pages to new pages. We're using Magento and our current process is to simply disable, which then makes it a a 404. We then redirect this page using a 301 redirect to a new relevant page. The reason for redirecting these pages is because the old pages are still being indexed in Google. I understand 404 pages will eventually drop out of Google's index, but was wondering if we were somehow preventing them dropping out of the index by redirecting the URL's, causing the 404 pages to be added to the sitemap. My questions are: 1. Could we simply delete the entire unwanted page, so that it returns a 404 and drops out of Google's index altogether? 2. Because the 404 pages are in the sitemap, does this mean they will continue to be indexed by Google?
Intermediate & Advanced SEO | | andyheath0 -
Getting into Google News, URL's & Sitemaps
Hello, I know that one of the 'technical requirements' to get into google news is that the URL's have unique numbers at the end, BUT, that requirement can be circumvented if you have a Google News Sitemap. I've purchased the Yoast Google News Sitemap (https://yoast.com/wordpress/plugins/news-seo/) BUT just found out that you cannot submit a google news Sitemap until you are accepted into google news. Thus, my question is that do you need to add the digits to the URL's temporarily until you get in and can submit a google news sitemap, OR, is it ok to apply without them and take care of the sitemap after you get in. If anyone has any other tips about getting into Google News that would be great! Thanks!
Intermediate & Advanced SEO | | stacksnew0 -
Could this be seen as duplicate content in Google's eyes?
Hi I'm an in-house SEO and we've recently seen Panda related traffic loss along with some of our main keywords slipping down the SERPs. Looking for possible Panda related issues I was wondering if the following could be seen as duplicate content. We've got some very similar holidays (travel company) on our website. While they are different I'm concerned it may be seen as creating content that is too similar: http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/the-wildlife-and-beaches-of-kenya.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/ultimate-kenya-wildlife-and-beaches.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/wildlife-and-beach-family-safari.aspx They do all have unique text but as you can see from the titles, they are very similar (note from an SEO point of view the tabbed content is all within the same page at source level). At the top level of the holiday pages we have a filtered search:
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays.aspx These pages have a unique introduction but the content snippets being pulled into the boxes is drawn from each of the individual holiday pages. I'm just concerned that these could be introducing some duplicating issues. Any thoughts?0 -
Shoemaker with ugly shoes : Agency site performing badly, what's our best bet?
Hi everyone,
Intermediate & Advanced SEO | | AxialDev
We're a web agency and our site www.axialdev.com is not performing well. We have very little traffic from relevant keywords. Local competitors with worse On-page Grader scores and very few backlinks outrank us. For example, we're 17th for the keyword "agence web sherbrooke" in Google.ca in French. Background info: In the past, we included 3 keywords-rich in the footer of every site we made (hundreds of sites by now). We're working to remove those links on poor sites and to use a single nofollow link on our best sites. Since this is on-going and we know we won't be able to remove everything, our link profile sucks (OSE). We have a lot of sites on our C-Block, some of poor quality. We've never received a manual penalty. Still, we've disavowed links as a precaution after running Link D-Tox. We receive a lot of trafic via our blog where we used to post technical articles about Drupal, Node js, plugins, etc. These visits don't drive business. Only a third of our organic visits come from Canada. What are our options? Change domain and delete the current one? Disallow the blog except for a few good articles, hoping it helps Google understand what we really do. Keep donating to Adwords? Any help greatly appreciated!
Thanks!2 -
What's the news on sitwide nofollow links and anchor text penalties
Is it possible to be penalized for sitewide nofollow links because of anchor text penalties, even if you use branded anchor text?
Intermediate & Advanced SEO | | BobGW0 -
Best way to view Global Navigation bar from GoogleBot's perspective
Hi, Links in the global navigation bar of our website do not show up when we look at Google cache --> text only version of the page. These links use "style="<a class="attribute-value">display:none;</a>" when we looked at HTML source. But if I use "user agent switcher" add-on in Firefox and set it to Googlebot, the links in global nav are displayed. I am wondering what is the best way to find out if Google can/can not see the links. Thanks for the help! Supriya.
Intermediate & Advanced SEO | | SShiyekar0 -
301 Redirect All Url's - WWW -> HTTP
Hi guys, This is part 2 of a question I asked before which got partially answered; I clicked question answered before I realized it only fixed part of the problem so I think I have to post a new question now. I have an apache server I believe on Host Gator. What I want to do is redirect every URL to it's corresponding alternative (www redirects to http). So for example if someone typed in www.mysite.com/page1 it would take them to http://mysite.com/page1 Here is a code that has made all of my site's links go from WWW to HTTP which is great, but the problem is still if you try to access the WWW version by typing it, it still works and I need it to redirect. It's important because Google has been indexing SOME of the URL's as http and some as WWW and my site was just HTTP for a long time until I made the mistake of switching it now I'm having a problem with duplicate content and such. Updated it in Webmaster Tools but I need to do this regardless for other SE's. Thanks a ton! RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} ^www.yourdomain.com [NC] RewriteRule ^(.*)$ http://yourdomain.com/$1 [L,R=301]
Intermediate & Advanced SEO | | DustinX0