New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Query for paginated URLs - Shopify
Hi there, /collections/living-room-furniture?page=2
On-Page Optimization | | williamhuynh
/collections/living-room-furniture?page=3
/collections/living-room-furniture?page=4 Is that ok to make all the above paginated URLs canonicalised with their main category /collections/living-room-furniture Also, does it needs to be noindex, follow as well? Please advice, thank you!1 -
SEO for replicated website system
I have a client who has 750 agents. They want to provide them all with a website on a subdomain (mysite.domain.com). The sites will all contain basically the same info, however, this info can be customized on each site by each rep. Most of these reps sell pretty much the same thing, so the customization wont be very dramatic. So the question is, how can we build this replicated website system and deliver SEO value to each site?
On-Page Optimization | | gotchamobi0 -
My competitors are using blackhat. What should i do.?
My competitors are using on page black hat methods They are using like keyword stuffing What should i do.?
On-Page Optimization | | aman1231 -
Url structure
Hi Guys, Wondering what is better for url structure say for example a key word "slow cooker" example.com/slowcooker or example.com/slow-cooker ? Thank you 🙂
On-Page Optimization | | GetApp0 -
Prevent Indexing of URLs Based on Tags
I started my website as a blog over at Posterous, but decided to turn it into a full scale business website with a self-hosted WordPress theme. Shortly after transitioning from Posterous to WordPress, I noticed that Google was indexing not only my old blog posts, but the URLs of my blog posts based on the tags they have. Is there any reason why this is a problem? I'm sure it shouldn't qualify as duplicate content, but for some reason it just feels a bit sloppy to me to have all of these pages indexed...Is this a non-issue? Should I just be more discriminating with my use of 'tags' if it bothers me? JiGLH.png
On-Page Optimization | | williammarlow0 -
Hierarchy and consistency in ecommerce URLs
One of the first things I remember reading about SEO and URLs, a long time ago, is that keywords are important, and hierarchy is important, for search engines and for users. Hierarchy in URLs would give the search engines an idea of the structure of the site, and users would be able to edit the URLs to continue navigating. I'm wondering about URLs, hierarchy and usability lately, since I've seen that ASOS uses a new URL structure on their site. At first glance, I thought it was brilliant, so I would like to get all of your opinions as well. For those of you that haven't seen the URLs: for categories, ASOS uses a structure as you would expect it, but for products they don't insert the category in the URL. Instead they insert the brand name as the first part of the URL, followed by the product title. Some examples: Category:
On-Page Optimization | | DocdataCommerce
www.asos.com/women/dresses/... Product:
www.asos.com/french-connection/french-connection-tie-waist-pocket-stripe-dress/... I can see the importance of brand name for a site like ASOS, and like how they stressed this by inserting not the category but the brand for products. I don't know how much ASOS still relies on organic non-ASOS related keyword traffic, but still. Now, for hierarchy, I guess a good internal linking structure will tell the search engines about the hierarchy of a site as well, right? So perhaps hierarchy in the URL isn't that important? Perhaps something like this would be just as good as anything, given a good internal link structure? www.onlinestore.com/category/
www.onlinestore.com/subcategory/
www.onlinestore.com/brand/product-title/ Now, I understand that if you use this structure, you wouldn't be able to have men/shirts and women/shirts, but let's say that you don't have subcategories that use the same names. In this case, how important is hierarchy? And, what do you think about this URL structure for an ecommerce site for which brands are important?0 -
Trailing slash on URLs
Hi everyone My question is regarding trailing URLs in Wordpress A designer setup a site and made the URL structure something like: www.website.com/description/ but I want to change it to www.website.com/description as this is more SEF A trailing / was added to the permalink structure in Wordpress, if I change this will google see these as new URLs and will all the current URLs become 404? Or should I just leave it? Thanks in advance
On-Page Optimization | | webseoservices0 -
Are xml sitemaps still in use today?
Hi, Are you still using XML sitemaps today? If yes, does it bring any benefit to your website like faster indexing of webpages or better rankings? Are you using special features like video sitemaps or sitemap index files? Best regards, Tobias
On-Page Optimization | | Tobiask-1215731