Have you ever seen or experienced a page indexed which is actually from a website which is blocked by robots.txt?
-
Hi all,
We use robots file and meta robots tags for blocking website or website pages to block bots from crawling. Mostly robots.txt will be used for website and expect all the pages to not getting indexed. But there is a condition here that any page from website can be indexed by Google even the site is blocked from robots.txt; because crawler may find the page link somewhere on internet as stated here at last paragraph. I wonder if this really the case where some webpages have got indexed.
And even we use meta tags at page level; do we need to block from robots.txt file? Can we use both techniques at a time?
Thanks
-
Hi vtmoz,
The most mandatory way to prevent any page to be indexed is by using a meta robots tag with a _noindex _parameter.
Then using robots.txt will help to optimize your server resources and is a way that prevent google to crawl any new page that do not have the meta robots tag.And yeah, its very common to have indexed pages even the robots.txt file blocks the entire website.
If what you are looking for is to remove from index the pages, follow this steps:
- Allow the whole website to be crawable (or at least that specific pages/section) in the robots.txt
- add the robots meta tag with "noindex,follow" parametres
- wait several weeks, 6 to 8 weeks is a fairly good time. Or just do a followup on those pages
- when you got the results (all your desired pages to be de-indexed) re-block with robots.txt those pages
- DO NOT erase the meta robots tag.
Hope it helps.
Best luck.
GR.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Need only tens of pages to be indexed out of hundreds: Robots.txt is Okay for Google to proceed with?
Hi all, We 2 sub domains with hundreds of pages where we need only 50 pages to get indexed which are important. Unfortunately the CMS of these sub domains is very old and not supporting "noindex" tag to be deployed on page level. So we are planning to block the entire sites from robots.txt and allow the 50 pages needed. But we are not sure if this is the right approach as Google been suggesting to depend mostly on "noindex" than robots.txt. Please suggest whether we can proceed with robots.txt file. Thanks
Algorithm Updates | | vtmoz0 -
Site:www Issue - Homepage of the website is not showing in Google
Hello everyone, When I have manually search site:www.blinds4uk.co.uk in google.co.uk to know about webpages status, home page of the website is not showing in google search engine result pages. Please let me know, what is the reason behind this? because website crawling and indexing is good. Many Thanks.
Algorithm Updates | | Kuldeep-Sharma0 -
Why different pages rank in different countries?
Hi all, I have been investigating on why our log-in page is ranking for primary keyword, but not our homepage. I can see now homepage is ranking from our second important country. I wonder why and what causes to rank different pages in different countries for same keyword. Again the statistics does not vary much between these countries. Thnaks
Algorithm Updates | | vtmoz0 -
Can site blocked for US visitors rank well internationally?
Because of regulatory reasons, a stock trading site needs to be blocked to United States visitors Since most of google datacenters seem to be located in the US, can this site rank well in the other countries where does business despite being blocked in the US? Do U.S. Google data centers influence only US rankings?
Algorithm Updates | | tabwebman0 -
A website Mobile versions for different languages
Hi There!, A website available in two languages(EN, FR) but mobile version is only available for one languages (EN). Mobile website is designed by following 'Separate URLs' configuration. So all English version desktop URLs are redirected to corresponding mobile version pages. For example : https://www.sitegeek.com/godaddy
Algorithm Updates | | gamesecure
()
is redirect to: https://m.sitegeek.com/godaddy (if page opened in mobile)
() But, same URL is in France language https://fr.sitegeek.com/godaddy is not redirected to https://m.sitegeek.com/godaddy So what would the correct implementation? France version would be redirected to English mobile version till Fr version is not prepared or it should not be redirected ? Rajiv0 -
Is it possible that Google may have erroneous indexing dates?
I am consulting someone for a problem related to copied content. Both sites in question are WordPress (self hosted) sites. The "good" site publishes a post. The "bad" site copies the post (without even removing all internal links to the "good" site) a few days after. On both websites it is obvious the publishing date of the posts, and it is clear that the "bad" site publishes the posts days later. The content thief doesn't even bother to fake the publishing date. The owner of the "good" site wants to have all the proofs needed before acting against the content thief. So I suggested him to also check in Google the dates the various pages were indexed using Search Tools -> Custom Range in order to have the indexing date displayed next to the search results. For all of the copied pages the indexing dates also prove the "bad" site published the content days after the "good" site, but there are 2 exceptions for the very 2 first posts copied. First post:
Algorithm Updates | | SorinaDascalu
On the "good" website it was published on 30 January 2013
On the "bad" website it was published on 26 February 2013
In Google search both show up indexed on 30 January 2013! Second post:
On the "good" website it was published on 20 March 2013
On the "bad" website it was published on 10 May 2013
In Google search both show up indexed on 20 March 2013! Is it possible to be an error in the date shown in Google search results? I also asked for help on Google Webmaster forums but there the discussion shifted to "who copied the content" and "file a DMCA complain". So I want to be sure my question is better understood here.
It is not about who published the content first or how to take down the copied content, I am just asking if anybody else noticed this strange thing with Google indexing dates. How is it possible for Google search results to display an indexing date previous to the date the article copy was published and exactly the same date that the original article was published and indexed?0 -
Should social widgets be the kind that shares/likes a page, or the kind that adds followers to a brand social page?
I'm wondering if the social widgets on my blog should create a share/like referencing the page or should the social widget create a follower to my brands page on a particular social network? Any ideas?
Algorithm Updates | | salesduke0 -
Shouldn’t Google always rank a website for its own unique, exact +10 word content such as a whole sentence?
Hello fellow SEO's, I'm working with a new client who owns a property related website in the UK.
Algorithm Updates | | Qasim_IMG
Recently (May onwards) they have experienced significant drops in nearly all non domain/brand related rankings. From page 1 to +5 or worse. Please see the attached webmaster tools traffic graph.
The 13th of June seemed to have the biggest drop (UK Panda update???) When we copy and paste individual +20 word sentences from within top level content Google does bring up exact results, the content is indexed but the clients site nearly always appears at the bottom of SERP's. Even very new or small, 3-4 page domains that have clearly all copied all of their content are out ranking the original content on the clients site. As I'm sure know, this is very annoying for the client! And this even happens when Google’s cache date (that appears next to the results) for the clients content is clearly older then the other results! The only major activity was the client utilising Google optimiser which redirects traffic to various test pages. These tests finished in June. Details about the clients website: Domain has been around for 4+ years The website doesn't have a huge amount of content, around 40 pages. I would consider 50% original, 20% thin and 30% duplicate (working on fixing this) There haven’t been any signicant sitewide or page changes. Webmaster tools show nothing abnormal or any errors messages (some duplicate meta/title tags that are being fixed) All the pages of the site are indexed by Google Domain/page authority is above average for the niche (around 45 in for the domain in OSE) There are no ads of any kind on the site There are no special scripts or anything fancy that could cause problems I can't seem to figure it out, I know the site can be improved but such a severe drop where even very weak domains are out ranking suggests a penalty of some sort? Can anyone help me out here? hxuSn.jpg0