Severe rank drop due to overwritten robots.txt
-
Hi,
Last week we made a change to drupal core for an update to our website. We accidentally overwrote our good robots.txt that blocked hundreds of pages with the default drupal robots.txt. Several hours after that happened (and we didn't catch the mistake) our rankings dropped from mostly first, second place in Google organic to bottom and mid first page.
Basically I believe we flooded the index with very low quality pages at once and threw a red flag and we got de-ranked.
We have since fixed the robots.txt and have been re-crawled but have not seen a return in rank.
Would this be a safe assumption of what happened? I haven't seen any other sites getting hit in the retail vertical yet in regards to any Panda 2.3 type of update.
Will we see a return in our results anytime soon?
Thanks,
Justin
-
Your present approach is correct. Ensure all these pages are tagged as noindex for now. Remove the block from robots.txt and let Google and Bing crawl these pages.
I would suggest waiting until you are confident all the pages were removed from Google's index, then check Yahoo and Bing. If you decide that robots.txt is the best decision for your company, then you can replace the disallows after confirming your site is no longer affected by these pages.
I would also suggest that, going forward, you ensure any new pages on your site that you do not wish to index always include the appropriate meta tag. If this issue happens again then you will have a layer of protection in place.
-
We're pretty confident thus far that we have flooded the index with about 15,000 low rank URLs all at once. This has happened once in the past a few years back but we didn't flood their index, they were newer pages at the time in which were low quality and could have been seen as spam since there was no real content but adsense so we removed them with a disallow in robots.
We are adding the meta no-index to all of these pages. You're saying we should remove the disallow in robots.txt so googlebot can crawl these pages and see the meta-noindex?
We are a very large site and we're crawled often. We're a PR7 site and MOZrank DA is 79/100. We have dropped from 82.
We're hoping these URLs will be removed quickly, I don't think there is a way of removing 15k links in GWMT without setting off flags also.
-
There is no easy answer for how long it will take.
If your theory about the ranking drop being caused by these pages being added is correct, then as these pages are removed from Google's index, your site should improve. The timeline depends on the size of your site, your site's DA, the PA and links for these particular pages, etc.
If it was my site I would mark the calendar for August 1st to review the issue. I would check all the pages which were mistakenly indexed to be certain they were removed. After, I would check the rankings.
-
Hi Ryan,
Thanks for your response. Actually you are correct. We have found some of the pages that should be no follows still indexed. We are now going to use the noindex, follow meta tags on these pages because we can't afford to have theses pages indexed as they are particularly for clients/users only and are very low quality and have been flagged before.
Now, how long until we see our rank move back? Thats the real big question.
Thanks so much for your help.
Justin
-
That's a great answer Ryan... I wonder, just out of curiosity, if it wouldn't hurt to look at the cached version of the pages if they're indexed? I'd be curious to know if the date they were cached is right near when the robots.txt was changed? I know it wouldn't alter his course of action, but might add further confirmation that this caused the problem?
-
Justin,
Based on the information you provided it's not possible to determine if the robots.txt file was part of the issue. You need to investigate the matter further. Using Google enter a query in an attempt to find some of the previously blocked content. For example, let's assume your site is about SEO but you shared a blog article about your movie review of the latest Harry Potter movie. You may have used robots.txt to block that article because it is unrelated to your site's focus. Perform a search for "Harry Potter insite:mysite.com" replacing mysite.com with your main web address. If the search returns your article, then you know the content was indexed. Try this approach for several of your previously blocked areas of the website.
If you find this content in SERPs, then you need to have it removed. The best thing to do is add the "noindex, follow" tags to all these pages, then remove the block from your robots.txt file.
The problem is that with the block in place on your robots.txt file, Google cannot see the new meta tag and does not know to remove the content from it's index.
One last item to mention. Google does have a URL removal tool but that would not be appropriate in this instance. That tool is designed to remove a page which causes direct damage by being in the index. Trade secrets or other confidential information can be removed with this tool.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Tester - syntax not understood
I've looked in the robots.txt Tester and I can see 3 warnings: There is a 'syntax not understood' warning for each of these. XML Sitemaps:
Technical SEO | | JamesHancocks1
https://www.pkeducation.co.uk/post-sitemap.xml
https://www.pkeducation.co.uk/sitemap_index.xml How do I fix or reformat these to remove the warnings? Many thanks in advance.
Jim0 -
Ranking fluctuation
According to Miles Rank tracking I have a keyword that is fluctuating by 5 or 6 positions in a 24-hour period every time it goes 10 down to 15 and back to 10 again with very little stopping point in between and keeps fluctuating like that. As far as I know the website is stable it uses Magento 1.9 and is a fixed category page. The page URL is relatively new as we migrated from an old site with a different URL structure I added 301s at server level from the old page URL to the new one. There is only two things I can think that can be the problem -
Technical SEO | | seoman10
1. Lack of links directly to the page (there is literally only one or two)
2. Loading speed problem But I don't see any of these would cause a 5 position fluctuation regularly every day. But do you have any thoughts? Thanks in advance.0 -
Will it be possible to point diff sitemap to same robots.txt file.
Will it be possible to point diff sitemap to same robots.txt file.
Technical SEO | | nlogix
Please advice.0 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Robots.txt crawling URL's we dont want it to
Hello We run a number of websites and underneath them we have testing websites (sub-domains), on those sites we have robots.txt disallowing everything. When I logged into MOZ this morning I could see the MOZ spider had crawled our test sites even though we have said not to. Does anyone have an ideas how we can stop this happening?
Technical SEO | | ShearingsGroup0 -
Is there a reason to set a crawl-delay in the robots.txt?
I've recently encountered a site that has set a crawl-delay command set in their robots.txt file. I've never seen a need for this to be set since you can set that in Google Webmaster Tools for Googlebot. They have this command set for all crawlers, which seems odd to me. What are some reasons that someone would want to set it like that? I can't find any good information on it when researching.
Technical SEO | | MichaelWeisbaum0 -
How to allow one directory in robots.txt
Hello, is there a way to allow a certain child directory in robots.txt but keep all others blocked? For instance, we've got external links pointing to /user/password/, but we're blocking everything under /user/. And there are too many /user/somethings/ to just block every one BUT /user/password/. I hope that makes sense... Thanks!
Technical SEO | | poolguy0 -
PageRank Dropped?
The Symptoms About a year ago, our site EZWatch-Security-Cameras.com had a PageRank of 5. Several months ago it sunk to a 4 and we were a little worried, but it wasn't anything to really sweat over. At the end of january we noticed it had dropped again to a PR3, again we were a little more worried. When the farmer update hit we suddenly dropped to a PR1 but our traffic wasn't seriously affected, and in march most of the pages regained their PageRank. I noticed this morning that our homepage rank has once again dropped to a PR1. I am waiting to see if there has been any significant drop in traffic, but I haven't spotted anything that stands out significant, aside from an increase in the average cost for our paid search account of about 5%. The Problems We've Spotted Keep in mind that our current website is fairly old (2005) and we are ready to launch a new one. Our current website is running on X-Cart, and we have a few modules added on. Problem 1 - One such module handles a custom kit builder, this area has not been restricted by crawlers and it could be generating a large amount of needless page crawls. Problem 2 - Another module allows "SEO friendly URL's" according to the developer, but what actually happens is a visitor could type in any-url-they-like-for-product-id**-p-11111.html**, where the underlined section is any character string (or lack of), followed by either a product or category indicator and the id for said item. This causes a massive amount of virtual page duplications, and the module is encrypted so we aren't able to modify it to include rel="canonical" tags. Obviously this causes massive amounts of seemingly duplicate content. Problem 3 - In addition to the regular URL duplication, we also recently acquired the domain EZWatch.com (our brand name, easier to remember). That domain name responds with the content from our regular website, and it will be the primary domain name when we change shopping carts. With the second domain name the content could also be considered a duplication. The Solutions We're Working On The website we use was designed in 2005, and we believe that it's reached the end of its useful life. Over the past several months we have been working on an entirely new shopping cart platform, designed from the ground up to be more efficient operationally-speaking, and to provide more SEO control. The new site will be ready to launch within days, and we will start using the new Domain name at the same time. We are planning on doing page-to-page301 redirects for all pages with at least 1 visit within the past 180+ days, according to our Google Analytics reports. We are also including rel="canonical" on all pages. We will also be restricting dynamic sections of our website via the robots.txt file. So What More Can We Do? With your collective SEO experience, what other factors could also be contributing to this decline?
Technical SEO | | EZWatchPro0