What is the sense of robots.txt?
-
Using robots.txt to prevent search engine from indexing the page is not a good idea. so what is the sense of robots.txt? just for attracting robots to crawl sitemap?
-
While your robots.txt file is not the best means to control search engines, it does have a purpose. To respond to your questions:
-
the file does not "attract" any robots, but robots who do visit can learn a bit about your site and understand what content you don't wish to be crawled
-
you can block parts of your site that you feel have no value for indexing such as Keri mentioned your "print" version of pages, or overlays pages, or login pages, etc.
The idea is that you own the website, and you can have a measure of control over it. You can disallow specific crawlers, etc. although it's up to each crawler whether they actually respect your wishes.
More details can be read at: http://www.robotstxt.org/
-
-
There are often times pages you don't want indexed, and that's what robots.txt is there for. These are just some things you may not want indexed:
- Premium content for subscription-only members
- Your admin directory
- Printable versions of pages
- Development servers
You keep things you don't want out of the index, and you also don't waste the crawl budgets of the search engines on stuff that's not what you want in the engines in the first place.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file. Other than perhaps using up crawl budget, are there any other negative implications?
Technical SEO | | richdan0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Is having no robots.txt file the same as having one and allowing all agents?
The site I am working on currently has no robots.txt file. However, I have just uploaded a sitemap and would like to point the robots.txt file to it. Once I upload the robots.txt file, if I allow access to all agents, is this the same as when the site had no robots.txt file at all; do I need to specify crawler access on can the robots.txt file just contain the link to the sitemap?
Technical SEO | | pugh0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?
We accidentally uploaded a robots.txt disallow root for all agents last Tuesday and did not catch the error until yesterday.. so 6 days total of exposure. Organic traffic is down 20%. Google has since indexed the correct version of the robots.txt file. However, we're still seeing awful titles/descriptions in the SERPs and traffic is not coming back. GWT shows that not many pages were actually removed from the index but we're still seeing drastic rankings decreases. Anyone been through this? Any sort of timeline for a recovery? Much appreciated!
Technical SEO | | bheard0 -
Competition links make no sense
Hello everybody, I used the open site explorer to check where my competitor has links and try to put mine there too. However I am extremely confused with the results. Eg the first link to my competitor coming from a domain with authority 91, is a download file. The other one is a link from ups, the courier service. When I click on it I get an access denied.The other one comes from samsung and when I click on it, I download an swf file. Next one, fcc.gov and it downloads a wp file. If I keep clicking on these links, in the end I am going to get a virus or something and learn nothing about what my competitor does. Any one have a clue how they managed to get linked like that?
Technical SEO | | polyniki0 -
What to do about "blocked by meta-robots"?
The crawl report tells me "Notices are interesting facts about your pages we found while crawling". One of these interesting facts is that my blog archives are "blocked by meta robots". Articles are not blocked, just the archives. What is a "meta" robot? I think its just normal (since the article need only be crawled once) but want a second opinion. Should I care about this?
Technical SEO | | GPN0 -
Search Engine Blocked by Robot Txt warnings for Filter Search result pages--Why?
Hi, We're getting 'Yellow' Search Engine Blocked by Robot Txt warnings for URLS that are in effect product search filter result pages (see link below) on our Magento ecommerce shop. Our Robot txt file to my mind is correctly set up i.e. we would not want Google to index these pages. So why does SeoMoz flag this type of page as a warning? Is there any implication for our ranking? Is there anything we need to do about this? Thanks. Here is an example url that SEOMOZ thinks that the search engines can't see. http://www.site.com/audio-books/audio-books-in-english?audiobook_genre=132 Below are the current entries for the robot.txt file. User-agent: Googlebot
Technical SEO | | languedoc
Disallow: /index.php/
Disallow: /?
Disallow: /.js$
Disallow: /.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /utm
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Sitemap:0