What to do about "blocked by meta-robots"?
-
The crawl report tells me "Notices are interesting facts about your pages we found while crawling". One of these interesting facts is that my blog archives are "blocked by meta robots".
Articles are not blocked, just the archives.
What is a "meta" robot?
I think its just normal (since the article need only be crawled once) but want a second opinion. Should I care about this?
-
Meta robots refers to the < meta name="robots" > tag at the page header level. This is usually the case when a blog is set up with an SEO program like All In One SEO for example, where you can manually set which content is blocked. It's common to block archives, tags, and other sections, in the theory that allowing these to be crawled could either cause duplicate content issues, or drain link value from the primary category navigation.
-
In general, there are two ways you can block crawlers from indexing your content.
-
You can add a Disallow entry to your robots.txt file
-
You can add a meta tag to your pages:
What you are saying in either case is "please do not list this content in your search engine".
In general, you would not want to block your archives. There certainly can be specific cases where you only want the public to see your most current content, in which case you can block it.
-
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt & meta noindex--site still shows up on Google Search
I have set up my robots.txt like this: User-agent: *
Technical SEO | | RoxBrock
Disallow: / and I have this meta tag in my on a Wordpress site, set up with SEO Yoast name="robots" content="noindex,follow"/> I did "Fetch as Google" on my Google Search Console My website is still showing up in the search results and it says this: "A description for this result is not available because of this site's robots.txt" This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.1 -
Where did the "Location" go, on Google SERP?
In order to emulate different locations, I've always done a Google query, then used the "Location" button under "Search Tools" at the top of the SERP to define my preferred location. It seems to have disappeared in the past few days? Anyone know where it went, or if it's gone forever? Thanks!
Technical SEO | | measurableROI0 -
Meta Keywords - Should I define them myself
Hi All, Im sure this has been answered somewhere but I couldn't find it. SEOQuake etc suggest you should define meta keywords. However I was under the impression that this was not best practice Can anyone confirm what I should do/ is best practice? Cheers Bowey
Technical SEO | | CFCU0 -
How can I Style Long "List Posts" in Wordpress?
Hi All, I have been working on a list-post which spans over 100 items. Each item on the list has a quick blurb to explain it, an image and a few resource links. I am trying to find an attractive way to present this long list post in Wordpress. I have seen several sites with long list posts however; they place their items one on top of the other which yields a VERY long page and the end user has to do a lot of scrolling. Others turn their lists into slideshows, but I have no data on how slides perform against 10-mile-long-lists which load in 1 page. I would like to do something similar to what List25.com does as they present about 5-10 items per page and they seem to have pagination. The pagination part I understand however; is there a shortcode plugin to format lists in an attractive way just like list25?
Technical SEO | | IvanC0 -
Block bad crawlers
Hi! how are you? I've been working on some of my sites, and noticed that i'm getting lots of crawls by search engines that i'm not intereted in ranking well. My question is the following: do you have a list of 'bad behaved' search engines that take lots of bandwidth and don´t send much/good traffic? If so, do you know how to block them using robots.txt? Thanks for the help! Best wishes, Ariel
Technical SEO | | arielbortz0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Accidentally checked privacy setting in WP to "not to index" and dropped rank...how can I fix this?
I recently rebuilt a static website to a wordpress site...In the privacy settings ....the -"Ask search engines not to index this site" was checked and I didn't notice. I had a top ranking website now its completely gone off google and every where else. I have unchecked it, resubmitted a sitemap to google.....does anyone know if this is permanent damage or if there is something else I can do to help fix this......I'm freaking out
Technical SEO | | eversseo0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050