Most Painless way of getting Duff Pages out of SE's Index
-
Hi,
I've had a few issues that have been caused by our developers on our website. Basically we have a pretty complex method of automatically generating URL's and web pages on our website, and they have stuffed up the URL's at some point and managed to get 10's of thousands of duff URL's and pages indexed by the search engines.
I've now got to get these pages out of the SE's indexes as painlessly as possible as I think they are causing a Panda penalty.
All these URL's have an addition directory level in them called "home" which should not be there, so I have:
www.mysite.com/home/page123 instead of the correct URL www.mysite.com/page123
All these are totally duff URL's with no links going to them, so I'm gaining nothing by 301 redirects, so I was wondering if there was a more painless less risky way of getting them all out the indexes (IE after the stuff up by our developers in the first place I'm wary of letting them loose on 301 redirects incase they cause another issue!)
Thanks
-
If all /home directories should be removed, and they all correlate directly to the same URL without the /home then the best means to correct the issue would be a single regex expression to redirect the URLs.
Once you add the redirect Google's index should be cleaned up in about 30 days.
-
Add robots.txt to stop Google crawling the subdirectories you don't want shown such as /home
More information is available on http://www.seomoz.org/learn-seo/robotstxt
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
Site Structure: How do I deal with a great user experience that's not the best for Google's spiders?
We have ~3,000 photos that have all been tagged. We have a wonderful AJAXy interface for users where they can toggle all of these tags to find the exact set of photos they're looking for very quickly. We've also optimized a site structure for Google's benefit that gives each category a page. Each category page links to applicable album pages. Each album page links to individual photo pages. All pages have a good chunk of unique text. Now, for Google, the domain.com/photos index page should be a directory of sorts that links to each category page. Alternatively, the user would probably prefer the AJAXy interface. What is the best way to execute this?
Intermediate & Advanced SEO | | tatermarketing0 -
What's the best way to check Google search results for all pages NOT linking to a domain?
I need to do a bit of link reclamation for some brand terms. From the little bit of searching I've done, there appear to be several thousand pages that meet the criteria, but I can already tell it's going to be impossible or extremely inefficient to save them all manually. Ideally, I need an exported list of all the pages mentioning brand terms not linking to my domain, and then I'll import them into BuzzStream for a link campaign. Anybody have any ideas about how to do that? Thanks! Jon
Intermediate & Advanced SEO | | JonMorrow0 -
Urgent Site Migration Help: 301 redirect from legacy to new if legacy pages are NOT indexed but have links and domain/page authority of 50+?
Sorry for the long title, but that's the whole question. Notes: New site is on same domain but URLs will change because URL structure was horrible Old site has awful SEO. Like real bad. Canonical tags point to dev. subdomain (which is still accessible and has robots.txt, so the end result is old site IS NOT INDEXED by Google) Old site has links and domain/page authority north of 50. I suspect some shady links but there have to be good links as well My guess is that since that are likely incoming links that are legitimate, I should still attempt to use 301s to the versions of the pages on the new site (note: the content on the new site will be different, but in general it'll be about the same thing as the old page, just much improved and more relevant). So yeah, I guess that's it. Even thought the old site's pages are not indexed, if the new site is set up properly, the 301s won't pass along the 'non-indexed' status, correct? Thanks in advance for any quick answers!
Intermediate & Advanced SEO | | JDMcNamara0 -
Adding Orphaned Pages to the Google Index
Hey folks, How do you think Google will treat adding 300K orphaned pages to a 4.5 million page site. The URLs would resolve but there would be no on site navigation to those pages, Google would only know about them through sitemap.xmls. These pages are super low competition. The plot thickens, what we are really after is to get 150k real pages back on the site, these pages do have crawlable paths on the site but in order to do that (for technical reasons) we need to push these other 300k orphaned pages live (it's an all or nothing deal) a) Do you think Google will have a problem with this or just decide to not index some or most these pages since they are orphaned. b) If these pages will just fall out of the index or not get included, and have no chance of ever accumulating PR anyway since they are not linked to, would it make sense to just noindex them? c) Should we not submit sitemap.xml files at all, and take our 150k and just ignore these 300k and hope Google ignores them as well since they are orhpaned? d) If Google is OK with this maybe we should submit the sitemap.xmls and keep an eye on the pages, maybe they will rank and bring us a bit of traffic, but we don't want to do that if it could be an issue with Google. Thanks for your opinions and if you have any hard evidence either way especially thanks for that info. 😉
Intermediate & Advanced SEO | | irvingw0 -
How to do a 301 redirect for url's with this structure?
In an effort to clean up my url's I'm trying to shorten them by using a 301 redirect in my .htaccess file. How would I set up a rule to grab all urls with a specific structure to a new shorter url examples: http://www.yakangler.com/articles/reviews/other-reviews/item/article-title http://www.yakangler.com/reviews/article-title So in the example above dynamically redirect all url's with /articles/reviews/other-reviews/item/ in it to /reviews/ so http://www.yakangler.com/articles/reviews/boat-reviews/item/1550-review-nucanoe-frontier http://www.yakangler.com/articles/reviews/other-reviews/item/1551-review-spyderco-salt http://www.yakangler.com/articles/reviews/fishing-gear-reviews/item/1524-slayer-inc-sinister-swim-tail would be... http://www.yakangler.com/reviews/1550-review-nucanoe-frontier http://www.yakangler.com/reviews/1551-review-spyderco-salt http://www.yakangler.com/reviews/1524-slayer-inc-sinister-swim-tail with one 301 redirect rule in my .htaccess file.
Intermediate & Advanced SEO | | mr_w0 -
Webmaster Index Page significant drop
Has anyone noticed a significant drop in indexed pages within their Google Webmaster Tools sitemap area? We went from 1300 to 83 from Friday June 23 to today June 25, 2012 and no errors are showing or warnings. Please let me know if anyone else is experiencing this and suggestions to fix this?
Intermediate & Advanced SEO | | datadirect0 -
There's a website I'm working with that has a .php extension. All the pages do. What's the best practice to remove the .php extension across all pages?
Client wishes to drop the .php extension on all their pages (they've got around 2k pages). I assured them that wasn't necessary. However, in the event that I do end up doing this what's the best practices way (and easiest way) to do this? This is also a WordPress site. Thanks.
Intermediate & Advanced SEO | | digisavvy0