Panda Updates - robots.txt or noindex?
-
Hi,
I have a site that I believe has been impacted by the recent Panda updates. Assuming that Google has crawled and indexed several thousand pages that are essentially the same and the site has now passed the threshold to be picked out by the Panda update, what is the best way to proceed?
Is it enough to block the pages from being crawled in the future using robots.txt, or would I need to remove the pages from the index using the meta noindex tag? Of course if I block the URLs with robots.txt then Googlebot won't be able to access the page in order to see the noindex tag.
Anyone have and previous experiences of doing something similar?
Thanks very much.
-
This is a good read. http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world I think you should be careful with robot.txt because blocking access to the bot will not cause them to remove the content from their index. They will simply include a message saying not quite sure what's on this page.. I would use noindex to clear out the index first before attempting robot.txt exclusion.
-
Yes, both because if a page is linked to on another site google with spider that other site and follow your link without hitting the robots.txt and the page could get indexed if there is not a noindex on it.
-
Indeed try both.
Irving +1
-
both. block the lowest quality lowest traffic pages with nodindex and block the folder in robots.txt
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Results not Updating (Title, Description, and URL)
Issue: I recently discovered that my site was accessible by both HTTP and HTTPS. The site has used a rel canonical tag to point to the HTTP version. Google+ was pointing to HTTPS though. The title, description, and URL shown in the results for the homepage is HTTPS, other pages are HTTP, etc... Steps taken to Resolve: This week I did the following... 301'd all non-checkout pages to the HTTP version Switched Google+ URL to HTTP version and added new post with an HTTP link to the homepage. Used webmaster tools to recrawl and reindex the site Resubmitted XML Sitemap No luck... the site is still not updating... any advice would be greatly appreciated. Thanks all! Site is Here
Intermediate & Advanced SEO | | AhlerManagement0 -
I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
Intermediate & Advanced SEO | | McTaggart0 -
Meta No INDEX and Robots - Optimizing Crawl Budget
Hi, Sometime ago, a few thousand pages got into Google's index - they were "product pop up" pages, exact duplicates of the actual product page but a "quick view". So I deleted them via GWT and also put in a Meta No Index on these pop up overlays to stop them being indexed and causing dupe content issues. They are no longer within the index as far as I can see, i do a site:www.mydomain.com/ajax and nothing appears - So can I block these off now with robots.txt to optimize my crawl budget? Thanks
Intermediate & Advanced SEO | | bjs20100 -
Penguin Update, what I've noticed
Hi Guys, I have spent 2 days looking at our site and competitors after the update, 3 things jump out straight away for us. I am in the travel industry and still on the first page of the major KW's but in the 8 to 10 region, was 2 to 5. 1. The sites that have moved up both have shops selling merchandise which is not the main focus of their site, anyone else spotted sites with a eCommerce section have benefited from latest update? 2. Sites we have links from, although they look like a travel sites, maybe be themed differently by Google. Anybody know a good tool that will help determine what theme a site is? Images, design and content don't always seem to be a good indicator, I think back links to the domain has a big effect on the site you get the link from. Any tool that will help speed up this process would be great. We need more quality links from travel sites (or at least what google thinks is a travel related site). 3. The competitors who have done well seem to have 45% links to home page, we only had 28% so we are focusing now on links to home page. We don't really stand out from the top 10 sites in any other way in terms of other indicators like branded keywords vrs money making kw's. Any thoughts or feedback would be great.
Intermediate & Advanced SEO | | PottyScotty0 -
Panda'd - and I think I know how to fix it...
Hi, I have a non-core site that seems to have been affected by a Panda refresh in late December http://www.seomoz.org/google-algorithm-change#2012 Anyway, I couldn't figure out for the longest time why this site, which is full of high-quality, expert-level content would get dinged -- i made several moves to try and eliminate duplicate content -- even though I couldn't find evidence of the duplicate content, but it's a wordpress site so there's lots of opportunities to accidentally introduce it through archives, tags and whatnot. The classic SEO mistake I was making was I was forgetting about a type of post we were doing to facilitate one of our email campaigns. On most, sites there's always something you aren't optimizing, and that's the stuff that can really create unintended issues in google, because the decisions made on those pieces, is often more operational toward the other campaigns, than strategic to search. these posts, are thin little articles, written by humans, but the text is actually submitted to another external site, published there and then recreated as content that the email campaign links to. These posts are segregated from the normal feed on the wordpress site, and the last time I had reviewed this content, we were not using a method for creating that involved publishing it to facebook first. But, OK, so I'm going to stop indexing this content, that's a given. I believe that is the Panda issue -- I could be wrong, but it makes sense, since otherwise the site is maybe the least likely site to be affected by Panda that I've ever been involved with. Do I do anything else, after fixing a Panda issue? Is there a reconsideration request for this or something. Should I send a singing telegram to Cutts? I researched a few articles, and there wasn't much on what to do after you fixed it, but to wait. Just wondering if anyone else who fixed a Panda thang, utilized any communication channel to let google know. thanks!
Intermediate & Advanced SEO | | reallygoodstuff0 -
Not using a robot command meta tag
Hi SEOmoz peeps. Was doing some research on robot commands and found a couple major sites that are not using them. If you check out the code for these: http://www.amazon.com http://www.zappos.com http://www.zappos.com/product/7787787/color/92100 http://www.altrec.com/ You fill not find a meta robot command line. Of course you need the line for any noindex, nofollow, noarchive pages. However for pages you want crawled and indexed, is there any benefit for not having the line at all? Thanks!
Intermediate & Advanced SEO | | STPseo0 -
How to determine the correct number of ad units post-Panda
What guidelines are you using to determine the correct number of ad units? Also is it number of units per page or the size of the ads (visually)? Any additional guidance or links you can point me to regarding ads in a post-Panda world would be helpful.
Intermediate & Advanced SEO | | nicole.healthline0 -
Noindex term for pages
I have a client who has a virtuemart site and all of the "ask a question on this product" had been indexed on google. I have managed to get a noindex meta tag into the ask a question page, will these be dropped from the index next time they are crawled and google sees the noindex?
Intermediate & Advanced SEO | | webseoservices0