Crawling image folders / crawl allowance
-
We recently removed /img and /imgp from our robots.txt file thus allowing googlebot to crawl our image folders. Not sure why we had these blocked in the first place, but we opened them up in response to an email from Google Product Search about not being able to crawl images - which can/has hurt our traffic from Google Shopping.
My question is: will allowing Google to crawl our image files eat up our 'crawl allowance'? We wouldn't want Google to not crawl/index certain pages, and ding our organic traffic, because more of our allotted crawl bandwidth is getting chewed up crawling image files.
Outside of the non-detailed crawl stat graphs from Webmaster Tools, what's the best way to check how frequently/ deeply our site is getting crawled?
Thanks all!
-
I did this accidentally as well recently and had 100% of my products disallowed from google shopping within 48 hours. Sounds like it's not an option. They need the crawl your images folder to make sure you have valid images in you product listings.
-
if your rankings are improving, then good move!
-
Hey Richard,
We were previously blocking googlebot from crawling our images at all (through disallowing /img/ and /imgp/ in robots.txt file. We removed this block after recieving this email from Google:
Thank you for participating in Google Product Search. It has come to our attention that a robots.txt file is preventing us from crawling some or all of the images on your site. In order for us to access and display the images you provide in your product listings, we'd like you to modify your robots.txt file to allow user-agent 'googlebot' to crawl your site.
_Failure for Google to access your images may affect the visibility of your items on Google Product Search and Product Ad results. _
While I totally agree that image traffic will not convert like standard traffic, it is free and who knows, we may just pick up a few sales from it. Of course if this comes at the cost of eating up a disproportionate amount of our crawl allowance relative to the value (or avoiding any penalties from Google Product Search) we'd be better off leaving the block on.
By way of an update, it looks like our rankings have started to improve in Google product search. We first experienced a drop in rankings and traffic from Product Search on 4/16 and removed the block from robots.txt on 4/22.
-
Why do you need Google to reach inside your img folder? Images display on the page and are indexed then. Sure, if you are selling images, then I can see the need for this, but to just crawl the img folder??
If it is not huge, I do not see it penalizing you. I would make sure all images are named using keywords as crawling pic001.jpg, pic002.jpg, product01.jpg, logo.gif will not do you any good anyway.
Also I find bad linking coming from Google image searches. No one searches to purchase a coffee cup and looks in Google images to do so. Conversely, if someone is searching images of coffee cups to use in whatever, having them click over to your site is a waste of time. They are just going to grab the image and go leaving your metrics a mess.
I hope that helps.
-
It may effect crawl allowance but depends on the size of your site, page rank and trust etc.
One of the best ways to determine crawl depth and whether you have any issues is to create separate sitemaps for your most important content or areas of your site. You could also create an image sitemap.
Then you can monitor these over time and and will give you a good picture of which content is being crawled and indexed well and which content/images are not. This may also help you to find out if the site structure is too deep or whether you need to link more to deeper content in order to improve crawling and indexation.
Hope this helps.
-
Personally, I wouldn't try to figure out the impact by looking at crawl stats. I'd be more focused on end results. Have we had an increase in organic traffic, or conversions from Google shopping since we opened it up, or has either of these gone down?
That's what matters, and is the only real indicator as to whether it was a wise move or not.
-
You could check your server stats on who is accessing your site, this should tell you what bots are going to your pages when. I don't know what control panel you are using for your site, but if you are using Cpanel, I am sure there are tutorials online to help you find this information.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Http:// vs Https:// in Og:URL
Hi, Recently, we have migrated our website from http:// to https://. Now, every URL is in https:// and we have used 301 permanent redirection for redirecting OLD URL's to New Ones. We have planned to include http:// link in og:url instead of https:// due to some social share issues we are facing. My concern is, if Google finds the self http:// URL on every page of my blog, will Google gets confused with http and https:// as we are providing the old URL to Google for crawling. Please advice. Thanks
Technical SEO | | SameerBhatia0 -
Crawl Attempt Errors & Homepage Not Ranking
Hi all, I have scanned the community forum thoroughly to find a solution to this issue and noticed some detailed and informed responses, but I am not sure which apply to the issue we are currently having. We are receiving a lot of 803 Crawl Attempt Errors on a weekly basis for our site www.mangofurniture.co.uk and also our homepage isn't ranking and I can't help but think that the two are linked. We have some rankings for the internal pages and have a couple of other sites that use the same template as www.mangofurniture.co.uk that are doing well with no crawl attempt errors and strong homepage rankings. There are a lot of great resources out there on the Moz forum and elsewhere but I am little unsure what applies to our problem or whether to two are linked at all. We have tried rewriting the homepage and developing the internal linking system but to no success as yet. Also, because the site is fairly new so the link profile is quite small at present. Any advice regarding this would be greatly appreciated. Many thanks in advance.
Technical SEO | | FurnitureGeek0 -
Can we use images from the internet of celebrities?
Hi, Can we use images from the internet of celebrities? We have a Indian celebrity website. Can we use images from other websites? Would that be legal? as 100's of sites use them? Should i have them no index ? or no follow pages? Thanks
Technical SEO | | jomin740 -
<sub>& <sup>tags, any SEO issues?</sup></sub>
Hi - the content on our corporate website is pretty technical, and we include chemical element codes in the text that users would search on (like S02, C02, etc.) A lot of times our engineers request that we list the codes correctly, with a <sub>on the last number. Question - does adding this code into the keyword affect SEO? The code would look like SO<sub>2</sub>.</sub> Thanks.
Technical SEO | | Jenny10 -
Website not crawled
i added website www.nsale.in in add campaign, it shows only 1 page crawled. but its working fine for other sites, any idea why it failed ?
Technical SEO | | Dhinesh0 -
Wiki/Knowledge bases
Hi A client of mine is creating a knowledge base/wiki for their website. There using there suppliers own knowledge base (basically their a reseller). What would be the best practice with regards to duplicate content. Would it be best to make all the pages "no follow"? and block the pages by the robot.txt?
Technical SEO | | Cocoonfxmedia0 -
Image search and CDNs
Hi, Our site has a very high domain strength. Although our site ranks well for general search phrases, we rank poorly for image search (even though our site has very high quality images). Our images are hosted on a separate CDN with a different domain. Although there are a number of benefits to doing this, since they are on a different domain, are we not able to capitalize on our my site's domain strength? Is there any way to associate our CDN to our main site via Google webmaster tools? Has anyone researched the search ranking impacts due to storing your images on a CDN, given that your domain strength is very high? Curious on people's thoughts?
Technical SEO | | NicB10 -
How to see a theme ‘/wp-content/themes/’
HI I'm still plugging away at getting to grips with my companies personalized blog. I've been trying for the past two days to upload a theme to my own test Wordpress blog, in order to correct a bug in the companies theme that makes formatting in the Post disappear. The code in the themes CSS file seems to be fine. Anyhow what I assumed would be a simple step has given me hours of hassle. I have finally got to the point of uploading an unzipped version of the theme intot ‘/wp-content/themes/’. Now try as I might my Wordpress admin is completely blind to the fact. Any attempt at using the Upload facility (which is what I attempted many hours ago) fails. There seems to be no place to say, look out there at my directory - a new original theme - unzipped and ready to go. Am I missing something very obvious?
Technical SEO | | catherine-2793880