What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
-
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like:
staging.domain.com
User-agent: *
Disallow: /in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.
-
Just make sure that when/if you copy over the staging site to the live domain that you don't copy over the robots.txt, htaccess, or whatever means you use to block that site from being indexed and thus have your shiny new site be blocked.
-
I agree. The name of your subdomain being "staging" didn't register at all with me until Matt brought it up. I was offering a generic response to the subdomain question whereas I believe Matt focused on how to handle a staging site. Interesting viewpoint.
-
Matt/Ryan-
Great discussion, thanks for the input. The staging.domain.com is just one of the domains we don't want indexed. Some of them still need to be accessed by the public, some like staging could be restricted to specific IPs.
I realize after your discussion I probably should have used a different example of a sub-domain. On the other hand it might not have sparked the discussion so maybe it was a good example
-
.htaccess files can be placed at any directory level of a site so you can do it for just the subdomain or even just a directory of a domain.
-
Staging URL's are typically only used for testing so rather than do a deny I would recommend using a specific ALLOW for only the IP addresses that should be allowed access.
I would imagine you don't want it indexed because you don't want the rest of the world knowing about it.
You can also use HTACCESS to use username/passwords. It is simple but you can give that to clients if that is a concern/need.
-
Correct.
-
Toren, I would not recommend that solution. There is nothing to prevent Googlebot from crawling your site via almost any IP. If you found 100 IPs used by the crawler and blocked them all, there is nothing to stop the crawler from using IP #101 next month. Once the subdomain's content is located and indexed, it will be a headache fixing the issue.
The best solution is always going to be a noindex meta tag on the pages you do not wish to be indexed. If that method is too much work or otherwise undesirable, you can use the robots.txt solution. There is no circumstance I can imagine where you would modify your htaccess file to block googlebot.
-
Hi Matt.
Perhaps I misunderstood the question but I believe Toren only wishes to prevent the subdomain from being indexed. If you restrict subdomain access by IP it would prevent visitors from accessing the content which I don't believe is the goal.
-
Interesting, hadn't thought of using htaccess to block Googlebot.Thanks for the suggestion.
-
Thanks Ryan. So you don't see any issues with de-indexing the main site if I created a second robots.txt file, e.g.
http://staging.domin.com/robots.txt
User-agent: *
Disallow: /That was my initial thought but when Google announced they consider sub-domains part of the TLD I was afraid it might affect the htp://www.domain.com versions of the pages. So you're saying the subdomain is basically treated like a folder you block on the primary domain?
-
Use an .htaccess file to only allow from certain ip addresses or ranges.
Here is an article describing how: http://www.kirupa.com/html5/htaccess_tricks.htm
-
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Place a robots.txt file in the root of the subdomain.
User-agent: *
Disallow: /This method will block the subdomain while leaving your primary domain unaffected.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blogs Not Getting Indexed Intermittently - Why?
Over the past 5 months many of our clients are having indexing issues for their blog posts.
Technical SEO | | JohnBracamontes
A blog from 5 months ago could be indexed, and a blog from 1 month ago could be indexed but blogs from 4, 3 and 2 months ago aren't indexed. It isn't consistent and there is not commonality across all of these clients that would point to why this is happening. We've checked sitemap, robots, canonical issues, internal linking, combed through Search Console, run Moz reports, run SEM Rush reports (sorry Moz), but can't find anything. We are now manually submitting URLs to be indexed to try and ensure they get into the index. Search console reports for many of the URLs will show that the blog has been fetched and crawled, but not indexed (with no errors). In some cases we find that the blog paginated pages (i.e. blog/page/2 , blog/page/3 , etc.) are getting indexed but not the blogs themselves. There aren't any nofollow tags on the links going to the blogs either. Any ideas? *I've added a screenshot of one of the URL inspection reports from Search Console alt text0 -
Http:// vs Https:// in Og:URL
Hi, Recently, we have migrated our website from http:// to https://. Now, every URL is in https:// and we have used 301 permanent redirection for redirecting OLD URL's to New Ones. We have planned to include http:// link in og:url instead of https:// due to some social share issues we are facing. My concern is, if Google finds the self http:// URL on every page of my blog, will Google gets confused with http and https:// as we are providing the old URL to Google for crawling. Please advice. Thanks
Technical SEO | | SameerBhatia0 -
Google not indexing /showing my site in search results...
Hi there, I know there are answers all over the web to this type of question (and in Webmaster tools) however, I think I have a specific problem that I can't really find an answer to online. site is: www.lizlinkleter.com Firstly, the site has been live for over 2 weeks... I have done everything from adding analytics, to submitting a sitemap, to adding to webmaster tools, to fetching each individual page as googlebot and then submitting to index via webmaster tools. I've checked my robot files and code elsewhere on the site and the site is not blocking search engines (as far as I can see) There are no security issues in webmaster tools or MOZ. Google says it has indexed 31 pages in the 'Index Status' section, but on the site dashboard it says only 2 URLS are indexed. When I do a site:www.lizlinketer.com search the only results I get are pages that are excluded in the robots file: /xmlrpc.php & /admin-ajax.php. Now, here's where I think the issue stems from - I developed the site myself for my wife and I am new to doing this, so I developed it on the live URL (I now know this was silly) - I did block the content from search engines and have the site passworded, but I think Google must have crawled the site before I did this - the issue with this was that I had pulled in the Wordpress theme's dummy content to make the site easier to build - so lots of nasty dupe content. The site took me a couple of months to construct (working on it on and off) and I eventually pushed it live and submitted to Analytics and webmaster tools (obviously it was all original content at this stage)... But this is where I made another mistake - I submitted an old site map that had quite a few old dummy content URLs in there... I corrected this almost immediately, but it probably did not look good to Google... My guess is that Google is punishing me for having the dummy content on the site when it first went live - fair enough - I was stupid - but how can I get it to index the real site?! My question is, with no tech issues to clear up (I can't resubmit site through webmaster tools) how can I get Google to take notice of the site and have it show up in search results? Your help would be massively appreciated! Regards, Fraser
Technical SEO | | valdarama0 -
Homepage indexed and cached as the wrong domain
I'm a bit baffled by this one and would love if someone in the community could help provide some clarity! In general, my website (PSG1.com) is indexed and cached correctly. The exception is that the homepage is actually cached as plasticsurgerygroupnewjersey.com, another domain we own. Header checkers all confirm that plasticsurgerygroupnewjersey.com redirects to PSG1.com, not the other way around No canonical is set for that domain. At one time, I used the Moz toolbar to view attributes and it registered PSG1.com as having a response code of both 200 and 301 to plasticsurgerygroupnewjersey.com. However, I cannot replicate this. Any idea why the homepage of PSG1.com is not indexed/cached correctly? I appreciate your wisdom!
Technical SEO | | BTeubner0 -
Forum on a Sub-domain - Thin Content?
I have wordpress blog installed on my Domain and now I intent to start a Forum. I understand that the content on the forum would be thin-content which may attract Google Penalties. So, would it be wise to start the forum on a sub-domain to avoid any penalty. My query is:- 1. If the content on the sub-domain is thin, can it impact my main domain as well. 2. Should I install the forum on a sub-domain or an entirely different domain so as to avoid any penalty? My preference is a sub-domain provided google does not levy any penalty I also intent to display RSS Feeds of the Forum on the Home Page of the Website.
Technical SEO | | cakaranbatra0 -
Should a 301 from a penalised domain to a new domain be removed?
A business traded on a domain let's say example.COM which was heavily penalised due to non-removable spammy back links. Their previous SEO advised them to set up on example.CO.UK but redirected example.COM to example.CO.UK. Example.CO.UK ranks very poorly, presumably due to being 'tarred with the same brush' i.e. attributed with the ills of example.COM. Will it do any good to remove the redirect or is example.CO.UK now doomed as well?
Technical SEO | | Ewan.Kennedy1 -
Google News not indexing .index.html pages
Hi all, we've been asked by a blog to help them better indexing and ranking on Google News (with the site being already included in Google News with poor results) The blog had a chronicle URL duplication problem with each post existing with 3 different URLs: #1) www.domain.com/post.html (currently in noindex for editorial choices as showing all the comments) #2) www.domain.com/post/index.html (currently indexed showing only top comments) #3) www.domain.com/post/ (very same as #2) We've chosen URL #2 (/index.html) as canonical URL, and included a rel=canonical tag on URL #3 (/) linking to URL #2.
Technical SEO | | H-FARM
Also we've submitted yesterday a Google News sitemap including consistently the list of URLs #2 from the last 48h . The sitemap has been properly "digested" by Google and shows that all URLs have been sent and indexed. However if we use the site:domain.com command on Google News we see something completely different: Google News has indexed actually only some news and more specifically only the URLs #3 type (ending with the trailing slash instead of /index.html). Why ? What's wrong ? a) Does Google News bot have problems indexing URLs ending with .index.html ? While figuring out what's wrong we've found out that http://news.google.it/news/search?aq=f&pz=1&cf=all&ned=us&hl=en&q=inurl%3Aindex.html gives no results...it seems that Google News index overall does not include any URLs ending with /index.html b) Does Google News bot recognise rel=canonical tag ? c) Is it just a matter of time and then Google News will pick up the right URLs (/index.html) and/or shall we communicate Google News team any changes ? d) Any suggestions ? OR Shall we do the other way around. meaning make URL #3 the canonical one ? While Google News is showing these problems, Google Web search has actually well received the changes, so we don't know what to do. Thanks for your help, Matteo0 -
Does it matter if my domain has a .com .org. net extention?
Hi, Does the domain extention ie. .com .org. net effect the chances of me ranking in search engines. Is there a prefrence or does it not matter? Thanks Yaser
Technical SEO | | yaser0