Robot.txt File Not Appearing, but seems to be working?
-
Hi Mozzers,
I am conducting a site audit for a client, and I am confused with what they are doing with their robot.txt file. It shows in GWT that there is a file and it is blocking about 12K URLs (image attached). It also shows in GWT that the file was downloaded 10 hours ago successfully. However, when I go to the robot.txt file link, the page is blank.
Would they be doing something advanced to be blocking URLs to hide it it from users? It appears to correctly be blocking log-ins, but I would like to know for sure that it is working correctly. Any advice on this would be most appreciated. Thanks!
Jared
-
There is an old webmaster world thread that explains how to hide the robots.txt file from browsers.... not sure why one would do this however....
http://www.webmasterworld.com/forum93/74.htm
Perhaps they are doing something like this?
-
I verified that I was checking /robots.txt. I had trouble verifying if it was under the non-www because everything redirects to the www. I also checked to see if it was being blocked, and it is not.
I went to Archive.org (Wayback Machine), and I can see the robot.txt file in previous versions of the site. I cannot, however, view it online, even though Google says they are downloading it successfully, and the robots.txt file is successfully blocking URLs from the search index.
-
Be sure you are visiting /robots.txt In all of your copy above, you are referencing robot.txt
Also, check to see if it possibly is only showing up on the www. version or the site or the non-www version of the site.
To be sure if it's working, you can test URLs of your website within Google Webmaster Tools. Go to Crawl->Blocked URLs and scroll down to the bottom.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallowed "Search" results with robots.txt and Sessions dropped
Hi
Intermediate & Advanced SEO | | Frankie-BTDublin
I've started working on our website and I've found millions of "Search" URL's which I don't think should be getting crawled & indexed (e.g. .../search/?q=brown&prefn1=brand&prefv1=C.P. COMPANY|AERIN|NIKE|Vintage Playing Cards|BIALETTI|EMMA PAKE|QUILTS OF DENMARK|JOHN ATKINSON|STANCE|ISABEL MARANT ÉTOILE|AMIRI|CLOON KEEN|SAMSONITE|MCQ|DANSE LENTE|GAYNOR|EZCARAY|ARGOSY|BIANCA|CRAFTHOUSE|ETON). I tried to disallow them on the Robots.txt file, but our Sessions dropped about 10% and our Average Position on Search Console dropped 4-5 positions over 1 week. Looks like over 50 Million URL's have been blocked, and all of them look like all of them are like the example above and aren't getting any traffic to the site. I've allowed them again, and we're starting to recover. We've been fixing problems with getting the site crawled properly (Sitemaps weren't added correctly, products blocked from spiders on Categories pages, canonical pages being blocked from Crawlers in robots.txt) and I'm thinking Google were doing us a favour and using these pages to crawl the product pages as it was the best/only way of accessing them. Should I be blocking these "Search" URL's, or is there a better way about going about it??? I can't see any value from these pages except Google using them to crawl the site.0 -
Should I use meta noindex and robots.txt disallow?
Hi, we have an alternate "list view" version of every one of our search results pages The list view has its own URL, indicated by a URL parameter I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"... Thanks 🙂
Intermediate & Advanced SEO | | ntcma0 -
It appears that Googlebot Mobile will look for mobile redirects from the desktop site, but still use the SEO from the desktop site.
Is the above statement correct? I've read that its better to have different SEO titles & descriptions for mobile sites as users search differently on mobile devices. I've also read it's good to link build, keep text content on mobile sites etc to get the mobile site to rank. If I choose to not have titles & descriptions on my mobile site will Google just rank our desktop version & then redirect a user on a mobile device to our mobile site or should I be adding in titles & descriptions into the mobile site? Thanks so much for any help!
Intermediate & Advanced SEO | | DCochrane0 -
.htaccess files
I am working with a clients website which has multiple htaccess files (.htaccess , .htaccess.holiding, and .htaccess.live -all in the same directory) My question is how does a server process these files? All 3 files? Currently the domain has 301 redirect showing for the home page to the mobile site (which is a problem) in one of the files (.htaccess but not others) Has anyone come across this before with regard to SEO problems?
Intermediate & Advanced SEO | | OnlineAssetPartners0 -
Which Authorship Strategy Works?
We want to claim our articles and get our picture next to our articles in the search engines. I was offered this article http://www.virante.com/blog/2012/01/08/how-to-show-your-author-photo-in-google-search-results/ but Google has this article: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1408986 Google's way is simpler, Is that all I need to do? This is for a Joomla site Thanks!
Intermediate & Advanced SEO | | BobGW0 -
Strategy for a large website where you only work for one business unit.
I have been tasked with improving traffic/leads to www.intertek.com. The problems we face are that I only work for one of the business units. There are many within the company and they all work independantly. The services my division offers range from ISO certification to food safety/testing to oil and gas services. They want to increase their quality content and traffic. What is the best strategy to approach working with a company this diverse and the limitation of managing 500 pages of a 15,000 page site? What are the first steps and what actions do you think would give the best results?
Intermediate & Advanced SEO | | laura-intertek0 -
Google authorship program-sometimes it works..and sometimes ...
sometimes my picture shows up in search results and sometimes it doesn't. I find it depends who's computer I search. Is there a way to make it show up all the time for search results? Also, what determines if your picture shows up besides the article itself or just on the right in a general box showing your google+ profile?
Intermediate & Advanced SEO | | StreetwiseReports0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0