Dilemma about "images" folder in robots.txt
-
Hi, Hope you're doing well.
I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
-
Yup my images send me traffic from Google images on most of my sites and attractive images attract hotlinks as well. At the moment people are hosting their images on a different domain (cdn) and are still being credited with the images but I haven't tried to do that myself ie I don't know if they've set some "ownership" somewhere and somehow.
-
I recommend allowing Google to crawl those images. Google optimizes its crawl rate and once it has done a complete crawl it will understand how often to crawl certain areas of your site. My main concern would be that you are losing potential rankings and indexing from those images - if they are unique and high quality you definitely want them to index the images, understand the file names, and appropriately index them.
I wouldn't be concerned about Google bot eating up your server resources. If it does become a problem, then you can go back and adjust the bot access through the robots.txt, as you've done already. However, I would let them in first and only react if it becomes a problem.
I have tens of thousands of product images accessed by the google bot and it is no concern to my ecommerce company and the server resources. I'm not saying that it can't be a potential problem, but the benefit outweighs the risk of it being one - I choose a reactive stance in this situation.
Closely monitor your Google Webmaster Tools account, watch the crawl rate and statistics, and if it becomes an issue then decide on which image folders should or shouldn't be indexed.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
[Very Urgent] More 100 "/search/adult-site-keywords" Crawl errors under Search Console
I just opened my G Search Console and was shocked to see more than 150 Not Found errors under Crawl errors. Mine is a Wordpress site (it's consistently updated too): Here's how they show up: Example 1: URL: www.example.com/search/adult-site-keyword/page2.html/feed/rss2 Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword/page2.html Example 2 (this surprised me the most when I looked at the linked from data): URL: www.example.com/search/adult-site-keyword-2.html/page/3/ Linked From: www.example.com/search/adult-site-keyword-2.html/page/2/ (this is showing as if it's from our own site) http://a-spammy-adult-site.com/search/adult-site-keyword-2.html Example 3: URL: www.example.com/search/adult-site-keyword-3.html Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword-3.html How do I address this issue?
Intermediate & Advanced SEO | | rmehta10 -
Robots.txt gone wild
Hi guys, a site we manage, http://hhhhappy.com received an alert through web master tools yesterday that it can't be crawled. No changes were made to the site. Don't know a huge amount about the robots.txt configuration expect that using Yoast by default it sets it not to crawl wp admin folder and nothing else. I checked this against all other sites and the settings are the same. And yet 12 hours later after the issue Happy is still not being crawled and meta data is not showing in search results. Any ideas what may have triggered this?
Intermediate & Advanced SEO | | wearehappymedia0 -
Have a Robots.txt Issue
I have a robots.txt file error that is causing me loads of headaches and is making my website fall off the SE grid. on MOZ and other sites its saying that I blocked all websites from finding it. Could it be as simple as I created a new website and forgot to re-create a robots.txt file for the new site or it was trying to find the old one? I just created a new one. Google's website still shows in the search console that there are severe health issues found in the property and that it is the robots.txt is blocking important pages. Does this take time to refresh? Is there something I'm missing that someone here in the MOZ community could help me with?
Intermediate & Advanced SEO | | primemediaconsultants0 -
Use Canonical or Robots.txt for Map View URL without Backlink Potential
I have a Page X with lots of unique content. This page has a "Map view" option, which displays some of the info from Page X, but a lot is ommitted. Questions: Should I add canonical even though Map View URL does not display a lot of info from Page X or adding to robots.txt or noindex, follow? I don't see any back links coming to Map View URL Should Map View page have unique H1, title tag, meta des?
Intermediate & Advanced SEO | | khi50 -
Should I disallow via robots.txt for my sub folder country TLD's?
Hello, My website is in default English and Spanish as a sub folder TLD. Because of my Joomla platform, Google is listing hundreds of soft 404 links of French, Chinese, German etc. sub TLD's. Again, i never created these country sub folder url's, but Google is crawling them. Is it best to just "Disallow" these sub folder TLD's like the example below, then "mark as fixed" in my crawl errors section in Google Webmaster tools?: User-agent: * Disallow: /de/ Disallow: /fr/ Disallow: /cn/ Thank you, Shawn
Intermediate & Advanced SEO | | Shawn1240 -
Should I NOFOLLOW my "Add To Cart" buttons?
Hello and Merry Christmass Should I NOFOLLOW my "Add To Cart" buttons? My e-commerce site has hundreds of products. Content wise, there is no real value to the reader on that page (besides for some testimonials and "why here" sentences). So it is not a page you'd want / expect to find in the SERPs. Also, with hundreds of links pointing to this page it would be "stronger" than other important pages which doesn't make sense. Last but not least, if I have limited time that the bots are on my site, why keep sending them to a non important page. This is why I am leaning to nofollowing the "add to cart" buttons and looking for reinforcements. Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Trailing slash and rel="canonical"
Our website is in a directory format: http://www.website.com/website.asp Our homepage display URL is http://www.website.com which currently matches our to eliminate the possibility of duplicate content. However, I noticed that in the SERPs, google displays the homepage with a trailing slash http://www.website.com/ My question: should I change the rel="canonical" to have a trailing slash? I noticed one of our competitors uses the trailing slash in their rel="canonical" Do potential benefits outweigh the risks? I can PM further information if necessary. Thanks for the assistance in advance...
Intermediate & Advanced SEO | | BethA0 -
Google Said "Repeat the search with the omitted results included."
We have some pages targeting the different countries but with the Near to Similar content/products, just distinguished with the country name etc. one of the page was assigned to me for optimizing. two or three Similar pages are ranked with in top 50 for the main keyword. I updated some on page content to make it more distinguish from others. After some link building, I found that this page still not showing in Google result, even I found the following message on the google. "In order to show you the most relevant results, we have omitted some entries very similar to the 698 already displayed.
Intermediate & Advanced SEO | | alexgray
If you like, you can repeat the search with the omitted results included." I clicked to repeat omitted result and found that my targeted url on 450th place in google (before link building this was not) My questions are Is google consider this page low quality or duplicate content? Is there any role of internal linking to give importance a page on other (when they are near to similar)? Like these pages can hurt the whole site rankings? How to handle this issue?0