Robots.txt and canonical tag
-
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said -
If you have a robots.txt disallow in place for a page, the canonical tag will never be seen.
Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
-
Thanks Ryan for explaining things very clearly.
-
What we know is there have been many cases where a page that is blocked in robots.txt has appeared in search results. The explanation provided is that robots.txt blocks crawlers during normal site visits, but not necessarily on visits where they are following links from other sites.
-
If spiders follow links to an article on my site, will they read the contents then ? If the canonical tag is on article page itself, will canonical tag will be seen ?
-
Daylan offered a great answer but I would like to add one exception. When crawlers from the major SEs visit your site they will honor your robots.txt file but sometimes they will follow links from other sites to an article on your site, and during that particular visit they will not see the robots.txt file and index your page.
This is one of the reasons why your robots.txt file should be used as minimally as possible, and when it is used you should have a backup process in place such as the canonical or noindex tag on a page.
-
Thanks Daylan for your quick response. I just wanted a second opinion that canonical tag will never be seen if a page is disallowed.
-
Thats correct in most cases:
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:
User-agent: *
Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
More information available here about:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Understanding Redirects and Canonical Tags in SEO: A Complex Case
Hi everyone, nothing serious here, i'm just playing around doing my experiments 🙂
Technical SEO | | chueneke
but if any1 of you guys understand this chaos and what was the issue here, i'd appreciate if you try to explain it to me. I had a page "Linkaufbau" on my website at https://chriseo.de/linkaufbau. My .htaccess file contains only basic SEO stuff: # removed ".html" using htaccess RewriteCond %{THE_REQUEST} ^GET\ (.*)\.html\ HTTP RewriteRule (.*)\.html$ $1 [R=301,L] # internally added .html if necessary RewriteCond %{REQUEST_FILENAME}.html -f RewriteCond %{REQUEST_URI} !/$ RewriteRule (.*) $1\.html [L] # removed "index" from directory index pages RewriteRule (.*)/index$ $1/ [R=301,L] # removed trailing "/" if not a directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.*)/ $1 [R=301,L] # Here’s the first redirect: RedirectPermanent /index / My first three questions: Why do I need this rule? Why must this rule be at the top? Why isn't this handled by mod_rewrite? Now to the interesting part: I moved the Linkaufbau page to the SEO folder: https://chriseo.de/seo/linkaufbau and set up the redirect accordingly: RedirectPermanent /linkaufbau /seo/linkaufbau.html I deleted the old /linkaufbau page. I requested indexing for /seo/linkaufbau in the Google Search Console. Once the page was indexed, I set a canonical to the old URL: <link rel="canonical" href="https://chriseo.de/linkaufbau"> Then I resubmitted the sitemap and requested indexing for /seo/linkaufbau again, even though it was already indexed. Due to the canonical tag, the page quickly disappeared. I then requested indexing for /linkaufbau and /linkaufbau.html in GSC (the old, deleted page). After two days, both URLs were back in the serps:: https://chriseo.de/linkaufbau https://chriseo.de/linkaufbau.html this is the new page /seo/linkaufbau
b14ee095-5c03-40d5-b7fc-57d47cf66e3b-grafik.png This is the old page /linkaufbau
242d5bfd-af7c-4bed-9887-c12a29837d77-grafik.png Both URLs are now in the search results and all rankings are significantly better than before for keywords like: organic linkbuilding linkaufbau kosten linkaufbau service natürlicher linkaufbau hochwertiger linkaufbau organische backlinks linkaufbau strategie linkaufbau agentur Interestingly, both URLs (with and without .html) redirect to the new URL https://chriseo.de/seo/linkaufbau, which in turn has a canonical pointing to https://chriseo.de/linkaufbau (without .html). In the SERPs, when https://chriseo.de/linkaufbau is shown, my new, updated snippet is displayed. When /linkaufbau.html is shown, it displays the old, deleted page that had already disappeared from the index. I have now removed the canonical tag. I don't fully understand the process of what happened and why. If anyone has any ideas, I would be very grateful. Best regards,
Chris0 -
No: 'noindex' detected in 'robots' meta tag
Pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. The page below in search console shows the error above...
Technical SEO | | Sean_White_Consult0 -
Invert canonicals?
Hi, We have 2 sites, site A and site B. For now, some of our articles are duplicated on site B with rel canonicals towards site A. Starting now, Site B will be the main site for this category, we'll only post the content on this site. We will keep the old content on site A. But what do you think will happen if we invert the canonicals for the old articles? They would go towards site B. Would google eventually update its index, a bit like it would do for a redirect? Thanks !
Technical SEO | | AdrienLargus0 -
Hreflang tag implentation
Hi, We've had hreflang tags implemented on our site for a few weeks now, and while we are seeing some improvements for the regional subfolders I wanted to double check I had the tags implemented correctly (a couple of examples are below). However while the regional subfolder sites are now ranking instead of the US site for some keywords, some key search terms are still returning the US site. Could this be due to incorrect implementation for that specific page? Due to complications with using Magento we're implementing the tags in the site maps. Also magento appears to be inserting a rel canonical tag automatically for each page and self referencing e.g. On www.example.com/uk/security-cameras (one of the pages we're having issues with) the canonical tag is http://www.example.com/uk/security-cameras" />. Is this an issue? Any advice would be appreciated. Thanks. <url><loc>http://www.example.com/uk/dvrs-kits</loc>
Technical SEO | | ahyde
<lastmod>2014-07-23</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority></url>
<url><loc>http://www.example.com/uk/dvrs-kits/1080p</loc>
<lastmod>2014-07-23</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority></url>0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
Will rel canonical tags remove previously indexed URLs?
Hello, 7 days ago, we implemented canonical tags to resolve duplicate content issues that had been caused by URL parameters. These "duplicate content" had already been indexed. Now that the URLs have rel canonical tags in place, will Google automatically remove from its index the other URLs with the URL parameters? I ask because we have been tracking the approximate number of URLs indexed by doing a site: search in Google, and we have barely noticed a decrease in URLs indexed. Thanks.
Technical SEO | | yacpro130 -
What if meta description tag comes before meta title tag? Do the search engines disregard or penalize if the order is not title then description in the HTML?
Do the search engines disregard or penalize if the order is not title then description in the HTML? A client's webmaster is a newbie to SEO and did just this. Suggestions?
Technical SEO | | alankoen1230 -
Tags - Should i add description for every tag ?
Hi 🙂 Should i add description for every tag in wordpress and i am using Yoast WordPress SEO and there is also option to add SEO title for tags , should i add also titles for tags ? and how should look title for tag, can you give me example from my blog : http://www.dota2club.com thank you !!!!
Technical SEO | | wolfinjo0