Panda Updates - robots.txt or noindex?
-
Hi,
I have a site that I believe has been impacted by the recent Panda updates. Assuming that Google has crawled and indexed several thousand pages that are essentially the same and the site has now passed the threshold to be picked out by the Panda update, what is the best way to proceed?
Is it enough to block the pages from being crawled in the future using robots.txt, or would I need to remove the pages from the index using the meta noindex tag? Of course if I block the URLs with robots.txt then Googlebot won't be able to access the page in order to see the noindex tag.
Anyone have and previous experiences of doing something similar?
Thanks very much.
-
This is a good read. http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world I think you should be careful with robot.txt because blocking access to the bot will not cause them to remove the content from their index. They will simply include a message saying not quite sure what's on this page.. I would use noindex to clear out the index first before attempting robot.txt exclusion.
-
Yes, both because if a page is linked to on another site google with spider that other site and follow your link without hitting the robots.txt and the page could get indexed if there is not a noindex on it.
-
Indeed try both.
Irving +1
-
both. block the lowest quality lowest traffic pages with nodindex and block the folder in robots.txt
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
HTTPS Update - 1 Category Dropped Out of Google
Hi We updated to HTTPs last week, we haven't had any major issues and most categories on the site are OK, apart from one. We have completely dropped out of ranking in Google at all for our Dollies section: https://www.key.co.uk/en/key/dollies-load-movers-door-skates We've always ranked well on the first page for a number of keywords, now we're out of the top 100 - I am trying to hunt for an issue but I can't seem to find one. Can anyone advise? Thanks 🙂
Intermediate & Advanced SEO | | BeckyKey0 -
Best practice for disallowing URLS with Robots.txt
Hi Everybody, We are currently trying to tidy up the crawling errors which are appearing when we crawl the site. On first viewing, we were very worried to say the least:17000+. But after looking closer at the report, we found the majority of these errors were being caused by bad URLs featuring: Currency - For example: "directory/currency/switch/currency/GBP/uenc/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL3dvcmt3ZWFyP3ByaWNlPTUwLSZzdGFuZGFyZHM9NzEx/" Color - For example: ?color=91 Price - For example: "?price=650-700" Order - For example: ?dir=desc&order=most_popular Page - For example: "?p=1&standards=704" Login - For example: "customer/account/login/referer/aHR0cDovL2NlbnR1cnlzYWZldHkuY29tL2NhdGFsb2cvcHJvZHVjdC92aWV3L2lkLzQ1ODczLyNyZXZpZXctZm9ybQ,,/" My question now is as a novice of working with Robots.txt, what would be the best practice for disallowing URLs featuring these from being crawled? Any advice would be appreciated!
Intermediate & Advanced SEO | | centurysafety0 -
Robots.txt question
I notice something weird in Google robots. txt tester I have this line Disallow: display= in my robots.text but whatever URL I give to test it says blocked and shows this line in robots.text for example this line is to block pages like http://www.abc.com/lamps/floorlamps?display=table but if I test http://www.abc.com/lamps/floorlamps or any page it shows as blocked due to Disallow: display= am I doing something wrong or Google is just acting strange? I don't think pages with no display= are blocked in real.
Intermediate & Advanced SEO | | rbai0 -
NoIndexing Massive Pages all at once: Good or bad?
If you have a site with a few thousand high quality and authoritative pages, and tens of thousands with search results and tags pages with thin content, and noindex,follow the thin content pages all at once, will google see this is a good or bad thing? I am only trying to do what Google guidelines suggest, but since I have so many pages index on my site, will throwing the noindex tag on ~80% of thin content pages negatively impact my site?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
What About Google Panda Update 22?
Maybe I haven't found the threads or whatever but I haven't seen lots of posts about the latest Google Panda update from November 21-22 on SEOmoz. Panda 22 is not even listed here: http://www.seomoz.org/google-algorithm-change Until November 21st, Google killed 3 of 5 websites I own through their Panda updates (never got hit by Penguin updates as I got only original content), accounting for about 25% of my income. Fortunately, the 2 remaining websites gained more traffic throughout the summer of 2012 so my income almost got back to 100% even though I got the "Unnatural Links" warning in Google Webmaster Tools in July. Since then, I did a huge link cleanup and according to the Link Detox Tool (from another SEO service), the number of "toxic links" went from about 350 to 50. Back link reports is as follow: 8% (52) Toxic Links; 57% (382) Suspicious Links; 35% (235) Healthy Links; Out of the 382 suspicious, most of them are coming from the same domain and they are all directories to which my website has been submitted automatically (not using any specific keyword anchor). On the opposite, healthy links are coming from different domains so I like to think they have a stronger impact than suspicious links. That said, my two remaining websites were still doing well until November 21 where it got hit by the Panda. Now traffic has dropped by 55% and income has dropped by 75% (yes I'll have to look for a job within a year if I don't fix this). (I want to add that none of my websites are "thin websites". One has over 1500 pages of content and the other has about 500 pages. All websites have content added 3 to 5 times a week.) What I don't get is that all my "money keywords" are still ranked in the top 10 results on Google according to multiple tools / services I use, yet the impressions dropped from 50% to 75% for those keywords?!? I have a feeling that this time it's not only a drop in ranking. There's a drop in impressions caused by something else. Is it caused by emphasis on local search? Are they showing more ads and less organic results? But here's the "funny part": For the last 5 years, I was never able to advertise my website on Google Adwords. Each time, I got a quality score of about 4/10 only to see it drop to 1/10 within a few hours of launching the campaign. On November 22nd, I build new PPC campaigns based on the exact same PPC campaigns I had the past (same keywords, same ads, same landing pages). Guess what? Now the quality score is between 7/10 and 10/10 (most of them have 10/10) for the exact same PPC campaign! What a "coincidence" huh?
Intermediate & Advanced SEO | | sbrault740 -
301 redirect or Robots.txt on an interstatial page
Hey guys, I have an affiliate tracking system that works like this : an affiliate puts up a certain code on his site, for example : www.domain.com/track/aff_id This url leads to a page where the hit is counted, analysed and then 302 redirects to my sales page with the affiliates ID in the url : www.mysalespage.com/?=aff_id. However, we've noticed recently that one affiliate seems to be ranking for our own name and the url google indexed was his tracking url (domain.com/track/aff_id). Which is strange because there is absolutely nothing on that page, its just an interstatial page so that our stats tracking software can properly filter hits. To remove the affiliate's url from showing up in the serps, I've come up with 2 solutions : 1 - Change the redirect to a 301 redirect on his track page. 2 - Change our robots.txt page to block all domain.com/track/ pages from being indexed. My question is : if I 301 redirect instead of 302, will I keep the affiliates from outranking me for my own name AND pass on link juice or should I simply block google from crawling the interstatial tracking pages?
Intermediate & Advanced SEO | | CrakJason0 -
Penguin Update Issues.. What would you recommend?
Hi, We've been pretty badly hit by this penguin Update. Site traffic is down 40-50%. We suspect it's for a couple of reasons 1)Google is saying we have duplicate content. e.g. for a given category we will have 4-5 pages of content (products). So it's saying pagenum=2 , pagenum=3 etc are duplicate pages. We've implemented rel=canonical so that pagenum=2 point to the original category e.g. http://mydomain/widgets.aspx We've even specified pagenum as a url parameter that pagniates. Google still hasn't picked up these changes. How long does it take - it's been about a week 2)They've saying we have soft 404 errors. e.g. we remove a category or product we point users to a category or page not found. is it best to block googlebot from crawling these page by specifying in robots.txt. because we really don't care about these categories or product pages. How best to handle? 3)There are some bad directory and crawlers that have crawled our website but have put incorrect links . So we've got like 1700 product not found. I'm sure that's taking up a lot of crawling time. So how do we tell Google not to bother with these link coming from specific sources e.g. ignore all links coming from xxx.com. Any help will be much appreciated as this is Killing our business. Jay
Intermediate & Advanced SEO | | ConservationM0 -
Article Submissions - Still Worth it After Panda Update?
Are article submissions still relevant after the panda update? Many of these sites (ezinearticles) are still hit from the panda update.
Intermediate & Advanced SEO | | qlkasdjfw0