SEOMOZ crawler is still crawling a subdomain despite disallow
-
This is for our client with a subdomain. We only want to analyze their main website as this is the one we want to SEO. The subdomain is not optimized so we know it's bound to have lots of errors.
We added the disallow code when we started and it was working fine. We only saw the errors for the main domain and we were able to fix them. However, just a month ago, the errors and warnings spiked up and the errors we saw were for the subdomain.
As far as our web guys are concerned. the disallow code is still there and was not touched. User-agent: rogerbot Disallow: /
We would like to know if there's anything we might have unintentionally changed or something we need to do so that the SEOMOZ crawler will stop going through the subdomain.
Any help is greatly appreciated!
-
Thanks Peter for your assistance.
Hope to hear from the SEOMOZ team soon with regards to this issue.
-
John,
Thanks for writing in! I would like to take a look at which project you guys were working with that this is happening. I will go ahead and start a ticket so we can better answer your questions You should hear from me soon!
Best,
Peter
Moz Help Team. -
I have heard of this before recently, I think possibly the moz crawler all or sometimes now just ignores the disallow because it is not a usual S.E crawler.
Hopefully one of the staff can provide some insight in this for you.
All the best.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Effect on SEO with growing number of subdomains
Since a few days I'm having some concernes on our website structure regarding SEO. Since I can't find similar cases I'm curious if the Moz community maybe have a few thoughts on the issue I'm facing The situation is as follow: For every new client our company (hosting) receives through www.example.com a new subdomain is created. This subdomain is an backup of the original website of the client and is very much irrelevant to our business. Google can also crawl these subdomains and index them. Productvariant 1: clientxxx1.productX.example.com
Intermediate & Advanced SEO | | Steven87
Productvariant 2: clientxxx1.productY.example.com
Productvariant 3: cleintxx10.productZ.example.com So I think above situation is far from ideal and I think it can cause problems. The problems we could be facing where Im thinking of are: no control over content (spam, low quality, bad optimised pages) duplicate sites (the backup on our subdomain and the original one of the client) impossible to make/manage a property for each subdomain in search console. Huge amount of subdomains which could influence crawl/indexation by Google. Maybe there are some more issues we could face where I didn't think of? The most common fix would be to use an other domain for the backups like client1.host-example.com and prevent Google from crawling it. This way www.example.com wouldn't be affected. So my questions basically are: 1. How much will this influence rankings for www.example.com
2. Are there any similar cases and what effect did it have on rankings/crawl/indexation when it got fixed / didn't got fixed?0 -
Subdomain replaced domain in Google SERP
Good morning, This is my first post. I found many Q&As here that mostly answer my question, but just to be sure we do this right I'm hoping the community can take a peak at my thinking below: Problem: We are relevant rank #1 for "custom poker chips" for example. We have this development website on a subdomain (http://dev.chiplab.com). On Saturday our live 'chiplab.com' main domain was replaced by 'dev.chiplab.com' in the SERP. Expected Cause: We did not add NOFOLLOW to the header tag. We also did not DISALLOW the subdomain in the robots.txt. We could have also put the 'dev.chiplab.com' subdomain behind a password wall. Solution: Add NOFOLLOW header, update robots.txt on subdomain and disallow crawl/index. Question: If we remove the subdomain from Google using WMT, will this drop us completely from the SERP? In other words, we would ideally like our root chiplab.com domain to replace the subdomain to get us back to where we were before Saturday. If the removal tool in WMT just removes the link completely, then is the only solution to wait until the site is recrawled and reindexed and hope the root chiplab.com domain ranks in place of the subdomain again? Thank you for your time, Chase
Intermediate & Advanced SEO | | chiplab0 -
XML and Disallow
I was just curious about any potential side effects of a client Basically utilizing a catch-all solution through the use of a spider for generating their XML Sitemap and then disallowing some of the directories in the XML sitemap in the robots.txt. i.e.
Intermediate & Advanced SEO | | DRSearchEngOpt
XML contains 500 URLs
50 URLs contain /dirw/
I don't want anything with /dirw/ indexed just because they are fairly useless. No content, one image. They utilize the robots.txt file to " disallow: /dirw/ " Lets say they do this for maybe 3 separate directories making up roughly 30% of the URL's in the XML sitemap. I am just advising they re-do the sitemaps because that shouldn't be too dificult but I am curious about the actual ramifications of this other than "it isn't a clear and concise indication to the SE and therefore should be made such" if there are any. Thanks!0 -
[Need advice!] A particular question about a subdomain to subfolder switch
Hello Moz Community! I really was hoping to get your help on a issue that is bothering me for a while now. I know there is a lot of about this topic but I couldn’t find a good answer for my particular question. We are running several web applications that are similar but are also different from each other. Right now, each one has its own subdomain (which was mainly due to technical reasons). Like this: webapp1.rootdomain.com, webapp2.rootdomain.com etc. Our root domain currently points with 301 to webapp1.rootdomain.com. Now, we are thinking about making two changes: changing to a subfolder level like this: rootdomain.com/webapp1 , rootdomain.com/webapp2 etc. Changing our rootdomain to a landing page (lisitng all the apps) and take out the 301 to webapp1 We want to do these changes mainly for SEO reasons. I know that the advantages are not so clear between subdomain/subfolder but we think it could be the right way to go to push the root domain and profit more from juice passing to the different apps. The problem is that we had a bad experience when we first switched from our first wep app (rootdomain.com) to an subdomain (webapp1.rootdomain.com) to set them equal with the other apps. Our traffic dropped a lot and it took us 6 weeks to get back on the same level as before. Maybe it was the 301 not passing all juice or maybe it was the switch to the subdomain. We are not sure. So, I guess my question is do you think it is the right thing to do for web apps to go with subfolders to pass more juice from root to subfolders? Will it bring again huge drops in traffic once we make that change? Is it worth taking that risk or initial drop because it will pay off in the future? Thanks a lot in advance! Your answers would help me a lot.
Intermediate & Advanced SEO | | ummaterial0 -
Would spiders successfully crawl a page with two distinct sets of content?
Hello all and thank you in advance for the help. I have a coffee company that sell both retail and wholesale products. These are typically the same product, just at different prices. We are planning on having a pop up for users to help them self identify upon their first visit asking if they are retail or wholesale clients. So if someone clicks retail, the cookie will show them retail pricing throughout the site and vice versa for those that identify themselves as wholesale. I can talk to our programmer to find out how he actually plans on doing this from a technical standpoint if it would be of assistance. My question is, how will a spider crawl this site? I am assuming (probably incorrectly) that whatever the "default" selection is (for example, right now now people see retail pricing and then opt into wholesale) will be the information/pricing that they index. So long story short, how would a spider crawl a page that has two sets of distinct pricing information displayed based on user self identification? Thanks again!
Intermediate & Advanced SEO | | ClayPotCreative0 -
When crawls occur - when will my links show up in Open Site Explorer
Hello everyone, I've been building links for a while now and none of them show up in Explorer. My domain authority hasn't changed for about a month or so. When does Google do crawls and when does SEOMoz do crawls? Thanks
Intermediate & Advanced SEO | | Harbor_Compliance0 -
NOINDEX content still showing in SERPS after 2 months
I have a website that was likely hit by Panda or some other algorithm change. The hit finally occurred in September of 2011. In December my developer set the following meta tag on all pages that do not have unique content: name="robots" content="NOINDEX" /> It's been 2 months now and I feel I've been patient, but Google is still showing 10,000+ pages when I do a search for site:http://www.mydomain.com I am looking for a quicker solution. Adding this many pages to the robots.txt does not seem like a sound option. The pages have been removed from the sitemap (for about a month now). I am trying to determine the best of the following options or find better options. 301 all the pages I want out of the index to a single URL based on the page type (location and product). The 301 worries me a bit because I'd have about 10,000 or so pages all 301ing to one or two URLs. However, I'd get some link juice to that page, right? Issue a HTTP 404 code on all the pages I want out of the index. The 404 code seems like the safest bet, but I am wondering if that will have a negative impact on my site with Google seeing 10,000+ 404 errors all of the sudden. Issue a HTTP 410 code on all pages I want out of the index. I've never used the 410 code and while most of those pages are never coming back, eventually I will bring a small percentage back online as I add fresh new content. This one scares me the most, but am interested if anyone has ever used a 410 code. Please advise and thanks for reading.
Intermediate & Advanced SEO | | NormanNewsome0 -
Pros and Cons of new subdomain and redirecting old subdomain?
Hi guys, I am thinking about redirecting the ranking subdoman to a new subdomain which has my main keyword within it. I am trying to outrank an exact match competitor and don't seem to be able to do so. For example: page.example.com would be my existing ranking URL which is very powerful, and I am thinking about redirecting this to a new URL being keywordpage.example.com What are your thoughts on do this? Thanks B
Intermediate & Advanced SEO | | HaymarketMediaGroupLtd0