Robots.txt and robots meta
-
I have an odd situation. I have a CMS that has a global robots.txt which has the generic
User-Agent: *
Allow: /I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?
-
I see. Have you considered putting it behind an htpasswd?
-
I can control it (it's a custom piece of software) but it's not as easy a fix as adding a meta to the template.
The main problem is we have a junk TLD we use to test some new ideas off the live server (lets clients give us feedback) but it gets spidered and indexed and starts ranking for client sites before they're ready to live in their own TLD. This means we have to compete against ourselves (even with a 301). There's nothing sensitive or it would live behind a password.
-
Do you need to control access to the site beyond the SERPS? I would not rely on robots.txt to shield any sensitive data.
For a breakdown of robots.txt and robots meta-tags checkout: http://www.robotstxt.org/robotstxt.html and http://www.searchtools.com/robots/robots-meta.html/, and for a great post on using these standards in SEO check out: http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
I am also concerned that you are unable to control your robots.txt! If your CMS doesn't let you do that and overwrites it when you change it manually, you have some major control problems on your hands that you should remedy.
-
Blocking it at the robots.txt will not guarantee that your site will not appear at Google's index. I think you can use meta robots NOINDEX to guarantee that Google will not show your pages when someone try to Google it.
It is important to say that Googlebot and other spiders will continue to visit your page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt in subfolders and hreflang issues
A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
Technical SEO | | lauralou82
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?0 -
My SERP meta description is displaying 315 characters...
Hi Mozzers, We have recently taken the #2 spot for our main keyword in Google UK serp. I just checked again and we have dropped to #4 and our meta description is no longer there as it has been replaced with some homepage content... 315 characters of homepage content right up to the full stop. I'm a little confused. A couple of our competitors meta descriptions are showing the same, extra long homepage text instead. Is there something totally normal and harmless causing this or do I need to be monitoring/changing something? Has Google made an update to allow for longer meta decs? Any advice appreciated! sWrBcuB.png
Technical SEO | | SanjidaKazi0 -
Very strange: META descriptions not showing
Hello, Since Panda 4.0 has been launched, all of my optimized META description have been gone in Google.
Technical SEO | | MarcelMoz
A while ago, I posted a question about this problem here: http://moz.com/community/q/all-meta-descriptions-gone. I know about Google's own will to decide which META description will be shown. And also about unique content of the descriptions. All pages did have an optimized description before Panda 4.0 and there were no troubles at all, what tells me there is something else going on. I tested some things: Rewrote 50 descriptions to very uinique ones, only five got indexed. This tells me that duplicate content of the descriptions is not the problem (they have never been 100% duplicate, product type was a variable which was always different for each page). Removed cache in GWT and fetched again as Google, didn't help. I checked the pages I tested and they all have been indexed again without showing the optimized descriptions. More information: The first time I changed some META descriptions and fetched the pages again in GWT, Google picked up my new META descriptions and showed them. A few days later, most of them disappeared again (so Google is aware of the description but seems to ignore it). Some pages show the optimized description when I change my search query (only a few, mostly the optimized description never got shown) Technique is ok. Source code shows the right optimized description. META robots isn't blocking anything except NOODP/NOYDIR (always has blocked those). Websites using the exact same CMS, website template, META descriptions (style and build-up), do not have these problems I compared elements like place of description in source code, usage of meta robots, og:description, crawl-delay in robots.txt, and special characters in descriptions between websites that are showing optimized vs. website that don't show optimized descriptions. I can't find any connection. Something I noticed is a change is my Robots.txt file: my webmaster has added the following command:
Crawl-delay: 2 May this have to do with my problem? I guess it doens't. I did some research and there are more websites that are suffering this problem beside mine. This tells me it must be Google (and so Panda 4.0) that is responsible for this change. I realy want my optimized descriptions back. Does anybody have an idea what to do?
Thanks in advance. Marcel0 -
Multi-domain content and meta data feed
Hi, I am working with a client whose web developer has offered to build a CMS that auto-feeds meta-data and product descriptions (on-page content) to two different websites which have two completely different URL's (primary domain names) associated with them. Please see screenshots attached for examples. The entire reason this has been offered is to avoid duplicate content issues. The client has two E-Commerce websites but only one content management system that can update both simultaneously. The work-around shown in the screenshots is the developers attempt at ensuring that both sites have unique meta data and on-page content associated with each product. Can anyone advise whether they foresee that this may cause any issues from an SEO perspective. Thanks in advance wM3ngsj.png KtBun98.png
Technical SEO | | SteveK640 -
Robots.txt
www.mywebsite.com**/details/**home-to-mome-4596 www.mywebsite.com**/details/**home-moving-4599 www.mywebsite.com**/details/**1-bedroom-apartment-4601 www.mywebsite.com**/details/**4-bedroom-apartment-4612 We have so many pages like this, we do not want to Google crawl this pages So we added the following code to Robots.txt User-agent: Googlebot Disallow: /details/ This code is correct?
Technical SEO | | iskq0 -
Our homepage currently uses a Meta refresh. Is it worth $1,000 to get it fixed?
Look at http://www.ccisolutions.com After the meta refresh takes place the homepage URL looks like this: http://www.ccisolutions.com/StoreFront/IAFDispatcher?iafAction=showMain I am trying to convince management that it is worth spending $1,000 with our current provider to get it fixed. It is my understanding that this meta refresh could be preventing the value of our homepage from being passed down to our category pages, etc. Can anyone give me something concrete that I can use to convince management that the fix is worth $1,000? Or is it not worth fixing?
Technical SEO | | danatanseo0 -
Getting home page content at top of what robots see
When I click on the text-only cache of nlpca(dot)com on the home page http://webcache.googleusercontent.com/search?q=cache:UIJER7OJFzYJ:www.nlpca.com/&hl=en&gl=us&strip=1 our H1 and body content are at the very bottom. How do we get the h1 and content at the top of what the robots see? Thanks!
Technical SEO | | BobGW0 -
Robots.txt
Hi there, My question relates to the robots.txt file. This statement: /*/trackback Would this block domain.com/trackback and domain.com/fred/trackback ? Peter
Technical SEO | | PeterM220