Robots.txt & meta noindex--site still shows up on Google Search
-
I have set up my robots.txt like this:
User-agent: *
Disallow: /and I have this meta tag in my on a Wordpress site, set up with SEO Yoast
name="robots" content="noindex,follow"/>
I did "Fetch as Google" on my Google Search Console
My website is still showing up in the search results and it says this:
"A description for this result is not available because of this site's robots.txt"
This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.
-
CleverPhd,
Really since to see a detailed yet to the point answer.
Thanks for contributing, and being in the Moz community.
Regards,
Vijay
-
Thanks for that clarification CleverPhD, forgot to mention that.
-
This one has my vote. You have to allow them access in order to see that you don't want the pages indexed. If you block them from seeing this rule...well they won't be able to see it.
-
Just to be clear on what Logan said. You have to allow Google to crawl your site by opening up your robots.txt to Google so it can see your noindex directive that is on each of the pages. Otherwise Google will never "see" the noindex directive on your pages.
Likewise, on sitemap.xml. If you are not allowing Google to crawl the sitemap (because you are blocking it with robots.txt) then Google will not read the sitemap, find all your pages that have the noindex directive on them and then remove those pages from the index.
A great article is here
https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466
From the mouth of Google "Important! For the noindex meta tag to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex tag, and the page can still appear in search results, for example if other pages link to it."
The other point that logan makes is that Google might list your site if there are enough sites linking to it. The steps above should take care of this, as you are deindexing the page, but here is what I am thinking he is referencing
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google will include a site that is blocked in robots.txt if enough pages link to it, even if they have not crawled the url.
You can go into Search Console and find all the links that they say are pointing to your site. You can also use tools like CognitiveSEO or Ahrefs, Majestic or Moz etc and gather up all of those sites to find links to your site and include those in a disavow file that you put into Search Console and tell Google to ignore all of those links to your site.
Secret bonus method. Putting a noindex directive in your robots
https://www.deepcrawl.com/knowledge/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
This allows you to manage your noindex directives in your robots.txt. Makes it easier as you can control all your noindex directives from a central location and block whole folders at a time. This would stop Google from crawling AND indexing pages all in one page and you can just leave the rest of the site alone and not worry about if a noindex tag should or should not be on a certain page.
Good luck!
-
As mentioned by Logan,noindex meta tag
is the most effective way to remove indexed pages. It sometimes takes time, you have to submit the right sitemap.xml which cover the pages/post you wish to get removed from google index.
-
I did read that about the robots.txt and that is why I added the noindex.
I use SEO Yoast for sitemap.xml, so shouldn't all my pages be there? I believe they are because I just looked at it a couple days ago.
So are you saying I should look through my backlink profile (WMT) and try to remove any backlinks?
Would 'Fetch as Google' not ping Google to tell them to recrawl?
Thanks for your help.
-
Hi,
First things first, it's a common misconception that the robots.txt disallow: / will prevent indexing. It's only indented to prevent crawling, which is why you don't get a meta description pulled into the result snippet. If you have links pointing to that page and a disallow: / on your robots, it's still eligible for indexation.
Second, it's pretty weird that the noindex tag isn't effective, as that's the only sure-fire way to get de-indexed intentionally. I would recommend creating an XML sitemap for all URLs on that domain that are noindex'd and resubmit that in Search Console. If Google hasn't crawled your site since adding the noindex, they don't know it's there. In my experience, forcing them to recrawl via XML submission has been effective at getting noindex noticed quicker.
I would also recommend taking a look at the link profile and removing any possible links pointing to your noindex pages, this will help future attempts at indexing.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Meta description issue on Google
Hello, I have a small issue on Google with our Meta Description tag not always being properly displayed. If you search for the term: Globe Car (in two words), everything is being displayed properly: http://screencast.com/t/YQCUkJnk Now do the same search for the term GlobeCar (in one word) and the meta tag set into our homepage seems to be totallly ignored and Google is now displaying something that is generated from out of their hat: http://screencast.com/t/K0KeeRGSgspV Anyone has an idea what would cause this? Thanks!
Technical SEO | | GlobeCar1 -
Why has google stopped showing one domain and switching to showing another that points to the same website?
My client has a website here: http://www.savannahchiropractic.com/ The site ranked really well for searches for the term "savannah chiroprtactor". Sometime last week (April 21 mobilegeddon?) the site slipped in rankings, but reappeared under the domain name: "www.whitemarshchiropractic.com" Both URLs go to the exact same site. and I don't even think that the whitemarshchripractic.com site is a redirect. I'm not sure how it's setup, but it's definitely not a duplicate site. The ranking is still lower than the savannahchiropractic.com site, but now appears. Anyone have any insights on this?
Technical SEO | | aj6130 -
Google showing wrong title
Hi, Can anyone assist a newbie please? My keyword 'Security Systems' is giving me position 1 on page 1, but the title it is using is not the page title. I am assuming for some reason it has made it up, please see below. The actual title tag says - <colgroup><col width="420"></colgroup>
Technical SEO | | DaddySmurf
| Security systems | wireless | battery powered | Police Approved | CSS | Google is showing - Compound Security Systems: Wireless Security Systems | Battery ... <cite>www.compoundsecurity.co.uk/</cite>Manufacturers & suppliers of The Mosquito Device & Professional industry compliant and Police recommended battery powered wireless security systems.Contact us - Mosquito Anti-Loitering Devices - Security Equipment - Installers<iframe class="SEOmoz-iframe" src="chrome-extension://eakacpaijcpapndcfffdgphdiccmpknp/html/serpbar.html#{" settings":{"dmr":false,"dmt":false,"mr":true,"mt":true,"serp-overlay":true,"toolbar-enabled":true,"subdomain-metrics":false,"serp-panel-open":true,"toolbar-position":"bottom"},"lsdata":{"feid":1245,"fipl":315,"fmrp":4.717881718426137,"fmrr":3.169235978744307e-9,"ftrp":5.245402699173709,"ftrr":8.283399271183101e-8,"fuid":15756,"pda":47.20670031521297,"peid":1266,"pid":315,"pmrp":4.634793943915049,"pmrr":1.137485303730264e-7,"ptrp":4.949510322693934,"ptrr":1.6938187788875737e-7,"puid":15777,"ueid":510,"uemrp":4.693787066897207,"uemrr":2.520503065617982e-10,"uid":1060,"uipl":191,"ujid":1038,"umrp":4.96103855611017,"umrr":5.477715129904515e-10,"upa":55.929113684261466,"utrp":6.066855084050327,"utrr":2.1655670926371268e-10},"user":{"level":"pro","options":{},"proenabled":true,"aclurl":"http:="" moz.com="" users="" level?src="mozbar","loginURL":"https://mza.bundledseo.com/login","logoutURL":"https://mza.bundledseo.com/logout","apiRequest":"url-metrics","cols":"133146078688","expired":false,"accessId":"pro-RGFkZHlTbXVyZg%3D%3D","expires":1374249469,"signature":"M%2Bh%2FXh4OS%2BAMtT2j6kuntWSTvNI%3D","displayName":"DaddySmurf"},"serp":1,"host":"www.compoundsecurity.co.uk"}"" scrolling="no"></iframe> If anyone can tell me how to correct this, i would very much appreciate it. Regards, Si0 -
Site is not displaying in Search Engines
My site is www.deoveritas.com it is in magento framework and it has a blog section in wordpress. When I enter Site:www.deoveroitas.com in google it shows all blog links in search result. The homepage and other innerpages are not getting displayed in search results at all. I even tried searching for www.deoveritas.com/about-us and it displays blogs in result. Checked Google webmaster fetch as google and it was index and successful. Can you please help me with this. Is my site de-indexed or banned by Google? the same issue is on Bing and Yahoo search engines too. Please help Thank you.
Technical SEO | | tpt.com0 -
How ro write a robots txt file to point to your site map
Good afternoon from still wet & humid wetherby UK... I want to write a robots text file that instruct the bots to index everything and give a specific location to the sitemap. The sitemap url is:http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspx Is this correct: User-agent: *
Technical SEO | | Nightwing
Disallow:
SITEMAP: http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspx Any insight welcome 🙂0 -
Google Search Parameters
Couple quick questions. Is using the parameter pws=0 still useful for turning off personalization? Is there a way to set my location as a URL parameter as well? For instance, I want to set my location to United States, can this be done with a URL param the same way as pws=0?
Technical SEO | | nbyloff0 -
Robots.txt Sitemap with Relative Path
Hi Everyone, In robots.txt, can the sitemap be indicated with a relative path? I'm trying to roll out a robots file to ~200 websites, and they all have the same relative path for a sitemap but each is hosted on its own domain. Basically I'm trying to avoid needing to create 200 different robots.txt files just to change the domain. If I do need to do that, though, is there an easier way than just trudging through it?
Technical SEO | | MRCSearch0 -
Site not being Indexed that fast anymore, Is something wrong with this Robots.txt
My wordpress site's robots.txt used to be this: User-agent: * Disallow: Sitemap: http://www.domainame.com/sitemap.xml.gz I also have all in one SEO installed and other than posts, tags are also index,follow on my site. My new posts used to appear on google in seconds after publishing. I changed the robots.txt to following and now post indexing takes hours. Is there something wrong with this robots.txt? User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /wp-login.php Disallow: /wp-login.php Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /author Disallow: /category Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /login/ Disallow: /wget/ Disallow: /httpd/ Disallow: /*.php$ Disallow: /? Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.cgi$ Disallow: /*.xhtml$ Disallow: /? Disallow: /*?Allow: /wp-content/uploads User-agent: TechnoratiBot/8.1 Disallow: ia_archiverUser-agent: ia_archiver Disallow: / disable duggmirror User-agent: duggmirror Disallow: / allow google image bot to search all imagesUser-agent: Googlebot-Image Disallow: /wp-includes/ Allow: /* # allow adsense bot on entire siteUser-agent: Mediapartners-Google* Disallow: Allow: /* Sitemap: http://www.domainname.com/sitemap.xml.gz
Technical SEO | | ideas1230