New CMS system - 100,000 old urls - use robots.txt to block?
-
Hello.
My website has recently switched to a new CMS system.
Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls.
Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical'
Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find.
My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary.
Thanks!
-
Great stuff..thanks again for your advice..much appreciated!
-
It can be really tough to gauge the impact - it depends on how suddenly the 404s popped up, how many you're seeing (webmaster tools, for Google and Bing, is probably the best place to check) and how that number compares to your overall index. In most cases, it's a temporary problem and the engines will sort it out and de-index the 404'ed pages.
I'd just make sure that all of these 404s are intentional and none are valuable pages or occurring because of issues with the new CMS itself. It's easy to overlook something when you're talking about 100K pages, and it could be more than just a big chunk of 404s.
-
Thanks for the advice! The previous website did have a robots.txt file with a few wild cards declared. A lot of the urls I'm seeing are NOT indexed anymore and haven't been for many years.
So, I think the 'stop the bleeding' method will work, and I'll just have to proceed with investigating and applying 301s as necessary.
Any idea what kind of an impact this is having on our rankings? I submitted a valid sitemap, crawl paths are good, and major 301s are in place. We've been hit particularly hard in Bing.
Thanks!
-
I've honestly had mixed luck with using Robots.txt to block pages that have already been indexed. It tends to be unreliable at a large scale (good for prevention, poor for cures). I endorsed @Optimize, though, because if Robots.txt is your only option, it can help "stop the bleeding". Sometimes, you use the best you have.
It's a bit trickier with 404s ("Not Found"). Technically, there's nothing wrong with having 404s (and it's a very valid signal for SEO), but if you create 100,000 all at once, that can sometimes give raise red flags with Google. Some kind of mass-removal may prevent problems from Google crawling thousands of not founds all at once.
If these pages are isolated in a folder, then you can use Google Webmaster Tools to remove the entire folder (after you block it). This is MUCH faster than Robots.txt alone, but you need to make sure everything in the folder can be dumped out of the index.
-
Absolutely. Not founds and no content are a concern. This will help your ranking....
-
Thanks a lot! I should have been a little more specific..but, my exact question would be, if I move the crawlers' attention away from these 'Not Found' pages, will that benefit the indexation of the now valid pages? Are the 'Not Found's' really a concern? Will this help my indexation and/or ranking?
Thanks!
-
Loaded question without knowing exactly what you are doing.....but let me offer this advice. Stop the bleeding with robots.txt. This is the easiest way to quickly resolve that many "not found".
Then you can slowly pick away at the issue and figure out if some of the "not founds" really have content and it is sending them to the wrong area....
On a recent project we had over 200,000 additional url's "not found". We stopped the bleeding and then slowly over the course of a month, spending a couple hours a week, found another 5,000 pages of content that we redirected correctly and removed the robots....
Good luck.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Newbie SEO ?: Does my About page URL have to contain the word About?
New to WordPress and SEO. Built and launched my website last week. The URL was originally domain/about. However, I installed Yoast plugin and it told me "about" was a stop word. So, without too much thought (my first problem), I changed the url (before Google crawled me) to clearwingcommunications.com/storytelling. Since then, I've noticed that sites I know are optimized have their URL with the word "about." So, is this considered a bad practice? My site HAS been crawled at this point. If I change it back to About and do a 301 redirect, does that hurt reporting? Thanks for your help! Christy
On-Page Optimization | | christyr0 -
Optimizing a URL/menu structure
Hi Mozzers, I'm working on Content Strategy at my job, and I'm close to making some recommendations on short/long-term direction. While I'm there, I want to tackle the URL/menu structure (correct term?), which is a bit of a mess as pages have been created without any consideration for it over time. For ease, let's just say we have 3 main subdirectories of the site (Section A-C), and let's also say that section A also has 3 important subdirectories. From a UX perspective at least, we want a page to look like: example.com/sectionA/subsectionAA/page1 but currently it's example.com/page1 We have dozens and dozens of these examples. To complicate matters a little further, Sections B and C have been earmarked to be consolidated into a new section (D), as they're currently confusing and overlapping, and create roadblocks in user journeys. So a page that is, say: example.com/sectionB/page2 may well want to be: example.com/sectionD/subsectionDA/page2 I'm comfortable enough with technically doing this, as I'm experienced enough in Drupal and have an agency on hand too, BUT - I don't know if there are any SEO pitfalls I need to be wary of when I'm doing this, beyond resubmitting sitemaps, and the trickle-down effects of redirects. Any advice, wise forum? thanks!
On-Page Optimization | | joberts0 -
Is using hyphens in a URL to separate words good practice?
Hi guys, I have a client who wants to use a hyphen to separate two words in the URL to make each work stand out. Is is good or bad practice to use a hyphen in a URL and will it affect rankings? Thanks!
On-Page Optimization | | StoryScout0 -
Tips on URL structure for a site re-design
Wanted to know what you would do with regards to urls – in an ideal world how would you structure them? Keen to know as me and dave are soon to have a meeting about this and were wondering about changing them from the current – http://www.looking4parking.com/airport/gatwick to something like - www.looking4parking.com/gatwick-airport-parking We will soon have pages for the specific parking types that will be a lot more engaging to users with some really useful content on benefits, features, how a certain type of parking works, images, video etc. Currently going to a type of parking, such as meet and greet just brings up a dropdown modal – I was thinking of having the url structure looking like this – www.looking4parking.com/gatwick-airport-meet-and-greet-parking www.looking4parking.com/gatwick-airport-on-site-parking www.looking4parking.com/gatwick-airport-park-and-ride We will then have specific pages for each parking product – in which this product will have unique content built around it – each will have an overview of the product, benefits, features, reviews, images, directions to the car park, find your route and eventually a video on each product So for example we currently have the product “Jet Parks 2” at Manchester airport – the current url is - http://www.looking4parking.com/airport/manchester/park-and-ride/jetparks-2 I would like to change this now we have the opportunity to refresh the whole system, to something along the lines of **domain/location/product title - **www.looking4parking.com/manchester-airport-parking/jetparks-2 or as we have some similar products at certain airports (mainly where the airport has multiple terminals) we would just change it to the following - www.looking4parking.com/manchester-airport-parking/jetparks-3 What are peoples thoughts/opinions on the above?
On-Page Optimization | | RyanCrawf19840 -
URL contain special character
Hello, I am using URLs which contain special character such as ', ". I found in the Google Webmaster Tool report errors related URL contain ' character. Google have indexed partial URL from beginning to ' character and cut off the rest of URL. For example: I submitted URL www.example.com/vietnam-visa-corp'-test-page.html, then google report error NOT FOUND for URL www.example.com/vietnam-visa-corp I don't know why and how to fix it? Please help! Thanks,
On-Page Optimization | | JohnHuynh1 -
Ecommerce Product Subcategory URL
Our website has 5 main categories displayed in tabs in the header. The main landing page of each of the 5 categories is a paginated page (3pages- set up with canonical tags to avoid duplicate content) with a side bar which splits the main category into many subcategories. Each of these subcategories essentially filter the main landing page into more defined categories customers find useful (price/colour) BUT once clicked enter into a separate landing page. We have worked hard to avoid any duplicate content issues between these sub-landing pages and the main landing page. This was done as we wanted each of the subpages to organically rank (thus we went with this method rather than filters). Hope we didn't do the wrong thing there? The question is should these sub-landing pages route straight from home to have the best chance to get individually ranked or routed through the main category bearing in mind we have 5 main categories each with many subcategories. i.e. domain.co.uk/subcategory or domain.co.uk/category/subcategory Thanks in advance for any advice given.
On-Page Optimization | | jannkuzel0 -
Keyword use in Title tag?
To improve SEO on a particular keyword, should you use that same keyword in the title tag of multiple pages within your site? Will that help or would it actually hurt by causing pages within your site to complete against each other for that keyword? Does it make a difference if that keyword is truly used on all those different pages?
On-Page Optimization | | KHCreative0 -
How many times to use keyword
So if I have my main site, blog and article directory section, can I use keyword in all three places, but obviously different, unique content? And if so, can I link them to make it even more powerful?
On-Page Optimization | | azguy0