Best way to handle indexed pages you don't want indexed
-
We've had a lot of pages indexed by google which we didn't want indexed. They relate to a ajax category filter module that works ok for front end customers but under the bonnet google has been following all of the links.
I've put a rule in the robots.txt file to stop google from following any dynamic pages (with a ?) and also any ajax pages but the pages are still indexed on google.
At the moment there is over 5000 pages which have been indexed which I don't want on there and I'm worried is causing issues with my rankings.
Would a redirect rule work or could someone offer any advice?
-
Gavin Since you have added the noindex in the pages, the best way is to let Google crawl those pages, see the noindex and remove them. The other option is to keep everything as is and request these parameter pages via your Google Webmaster Console. Option 1: You never know how long it takes Option 2: This should happen relatively fast I would therefore suggest keeping everything as is and doing a removal request.
-
Right... We think we've been able to get the code noindex code into the dodgy pages. The only way we could think of doing it without breaking the user interface was to put this rule into the PHP.
if(!empty($_SERVER['HTTP_X_REQUESTED_WITH']) && strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest')
{normal code
}
else
{echo '';
echo '';
echo '';
echo '';
echo '';
echo '404';
echo '';
echo '';
}Its rendering ok for us front end, if anyone would like to test... I'm just hopeful it would work for google?
http://www.outdoormegastore.co.uk/cycling/cycling-clothing/protective-clothing.html?ajax=1
One thing I am not sure about is how google is going to revisit the said pages. I have put in various rules to the robots.txt files as well as the url parameter handling in webmaster tools to prevent any future pages from being followed... Would these rules need to be removed?
-
The AJAX URLs are used by the site, though, right (for visitors)? If you 404 them, you may be breaking the functionality and not just impacting Google.
Another problem is that, if these pages are no longer crawlable, and you add a page-level directive (whether it's a 404, 301, canonical, NOINDEX, etc.), Google won't process those new instructions. So, they could get stuck in the index. If that's the case, ti may actually be more effective to block the "ajax=" parameter with parameter handling in Google Webmaster Tools (there's a similar option in Bing).
If you know the path is cut and this isn't a recurrent problem, that could be the fastest short-term solution. You do need to monitor, though, as they can re-enter the index later.
-
Gavin, that's a more generic response. In this scenario, unless you can make a 404 happen, it won't work and therefore is not applicable. Noindex and / or the canonical tag are the choices and I would try and get those going if possible.
-
Thanks for all of the replies... My best option seems to be the meta noindex rule but the nature of the pages that are getting indexed are just one long ajax string with no access to the header are. I hope I have already 'prevented' google from following the links in the future by adding the rules to robots.txt but I'm now desperate to clean up (cure) the existing ones.
My next thought would be to put a rule in htaccess and redirect anything with ajax in the url to a 404 page?
I'm worried that this may have even worse side effects with rankings but its based on this article that google publish: https://support.google.com/webmasters/bin/answer.py?hl=en&answer=59819
"To remove a page or image, you must do one of the following:
- Make sure the content is no longer live on the web. Requests for the page must return an HTTP 404 (not found) or 410 status code
What would your thoughts be on this?
-
Definitely review George's comment as you need to figure out why they're being crawled. As Andrea said, any solution takes time, I'm sorry to say. Robots.txt is not a good solution for getting pages removed that are already indexed, especially in bulk. It's better at prevention than cure.
META NOINDEX can be effective, or you could rel=canonical these pages to the appropriate non-AJAX URL - not sure exactly how the structure is set up. Those are probably the two fastest and more powerful approaches. Google parameter handling (in Webmaster Tools) is another option, but it's a bit unpredictable whether they honor it and how quickly.
You can only do mass removal if everything is in a folder, if I recall. There's no way to bulk remove unless all of the pages are structurally under one root URL.
-
I'm not sure if you're aware or not, but I think I know why Google is indexing these pages.
Right now, you are outputting URLs into your source code of your page in the form of a JavaScript function call similar to the following:
I believe this is because your page (and this function call) is programmatically created. Instead of outputting the whole URL to the page, you could output only what needs to be there.
For example:
Then change the signature of the JavaScript function so that it accepts this new input and builds the URL from your inputs:
function initSlider(price, low, high, category, subcategory, product, store, ajax, ?) {
// build URL
var URL = 'http://www.outdoormegastore.co.uk/' + category + '/' + subcategory + '/' + product + '.html?_' + store + '&' + ajax;
// continue...
}
Right now, because that URL is being outputted to the page, I think Google sees it as a URL it should follow and index. If you build this URL with the function in an external JavaScript file, I don't think it will be indexed.
Your developer(s) should know what I'm talking about.
Hope this helps!
-
If they are already indexed, it's going to take time for Google to recrawl, read the tag and get them to fall out, so patience will be key. It's not a quick thing to undo.
If the pages are all in one location, you can add a disallow robots/text to Webmaster Tools command to prevent that entire folder from being indexed, but again, it's already done so you are going to have to wait for all those pages to fall out.
-
Thanks for the quick reply! I'm desperate to get these removed as soon as possible now. I've got webmaster tools access but requesting over 5,000 pages to be removed one by one will take too long. You can't do page removal in bulk can you?
I'm going to work on the noindex option
-
OMG, that does not look good. I completely understand. The best way in my opinion would be to add a noindex meta tag on these pages and let Google crawl them. Once they re-index them with the noindex, that should take care of the problem. However, be careful since you want to make sure that noindex tag does not appear on your real pages, just the AJAX ones.
Another option might be to consider the canonical tag, but then technically these pages are not duplicate pages, they just should not exist. Are you verified and using the Google Webmaster Console ? If yes, see if you can get some of these pages excluded via the URL removal tool. The best way is to add the noindex tag in my opinion.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
'duplicate content' on several different pages
Hi, I've a website with 6 pages identified as 'duplicate content' because they are very similar. This pages looks similar because are the same but it show some pictures, a few, about the product category that's why every page look alike each to each other but they are not 'exactly' the same. So, it's any way to indicate to Google that the content is not duplicated? I guess it's been marked as duplicate because the code is 90% or more the same on 6 pages. I've been reviewing the 'canonical' method but I think is not appropriated here as the content is not the same. Any advice (that is not add more content)?
Technical SEO | | jcobo0 -
Pages Indexed Not Changing
I have several sites that I do SEO for that are having a common problem. I have submitted xml sitemaps to Google for each site, and as new pages are added to the site, they are added to the xml sitemap. To make sure new pages are being indexed, I check the number of pages that have been indexed vs. the number of pages submitted by the xml sitemap every week. For weeks now, the number of pages submitted has increased, but the number of pages actually indexed has not changed. I have done searches on Google for the new pages and they are always added to the index, but the number of indexed pages is still not changing. My initial thought was as new pages are added to the index, old ones are being dropped. But I can't find evidence of that, or understand why that would be the case. Any ideas on why this is happening? Or am I worrying about something that I shouldn't even be concerned with since new pages are being indexed?
Technical SEO | | ang1 -
Inconsistent page titles in SERP's
I encountered a strange phenomenon lately and I’d like to hear if you have any idea what’s causing it. For the past couple of weeks I’ve seen some our Google rankings getting unstable. While looking for a cause, I found that for some pages, Google results display another page title than the actual meta title of the page. Examples http://www.atexopleiding.nl Meta title: Atex cursus opleider met ruim 40 jaar ervaring - Atexopleiding.nl Title in SERP: Atexopleiding.nl: Atex cursus opleider met ruim 40 jaar ervaring http://www.reedbusinessopleidingen.nl/opleidingen/veiligheid/veiligheidskunde Meta title: Opleiding Veiligheidskunde, MBO & HBO - Reed Business Opleidingen Title in SERP: Veiligheidskunde - Reed Business Opleidingen http://www.pbna.com/vca-examens/ Meta title: Behaal uw VCA diploma bij de grootste van Nederland - PBNA Title in SERP: VCA Examens – PBNA I’ve looked in the source code, fetched some pages as Googlebot in WMT, but the title shown in the SERP doesn’t even exist in the source code. Now I suspect this might have something to do with the “cookiewall” implemented on our sites. Here’s why: Cookiewall was implemented end of January The problem didn’t exist until recently, though I can’t pinpoint an exact date. Problem exists on both rbo.nl, atexopleiding.nl & pbna.com, the latter running on Silverstripe CMS instead of WP. This rules out CMS specific causes. The image preview in the SERPS of many pages show the cookie alert overlay However, I’m not able to technically prove that the cookiescript causes this and I’d like to rule out other any obvious causes before I "blame it on the cookies" :). What do you think?
Technical SEO | | RBO0 -
The number of pages indexed on Bing DROPPED significantly.
I haven't signed in to bing webmaster tool for a while. and I found that Bing is not indexing my site properly all of a sudden. IT DROPPED SIGNIFICANTLY Any idea why it is behaving this way? (please check the attachment) INg1o.png
Technical SEO | | joony20080 -
Which is The Best Way to Handle Query Parameters?
Hi mozzers, I would like to know the best way to handle query parameters. Say my site is example.com. Here are two scenarios. Scenario #1: Duplicate content example.com/category?page=1
Technical SEO | | jombay
example.com/category?order=updated_at+DESC
example.com/category
example.com/category?page=1&sr=blog-header All have the same content. Scenario #2: Pagination example.com/category?page=1
example.com/category?page=2 and so on. What is the best way to solve both? Do I need to use Rel=next and Rel=prev or is it better to use Google Webmaster tools parameter handling? Right now I am concerned about Google traffic only. For solving the duplicate content issue, do we need to use canonical tags on each such URL's? I am not using WordPress. My site is built on Ruby on Rails platform. Thanks!0 -
My Domain Authority is high but don't rank in serps
So i'm a beginner/intermediate SEO and uptil about 3 weeks ago i enjoyed Top 3 rankings for all my keywords(VIrtual Assistant,Virtual Assistants, Virtual Personal Assistant,Virtual Personal Assistants and so on) for my site www.247VirtualAssistant.com. All of a sudden i dropped in rankings and can't figure out why. I ran a link analysis and nothing looks like it changed, in fact i still command much higher domain authority than my competition, but i'm stuck on the bottom of the 2nd page. I can't tell if i'm being penalized, if the other sites all of sudden just outperformed me or something else is happening here. I've also noticed a lot of "dancing" in my serps, I've been in 2nd last position on the 2nd page, then 1st of the third page, then last on the 2nd page and so on. Can someone please help me make sense of this?? Thanks! Thomas, a very confused an desperate website owner
Technical SEO | | Shajan0 -
What is the best way to optimize a page for a magazine
Hi i have a serious problem with a website that i am building http://www.cheapflightsgatwick.com/ with reference to letting the search engines know what the magazine is about. I am building a holiday magazine which will focus on holiday news, cheap deals and holiday reviews. I am wanting the home page to feature for the following keywords holiday news, holiday magazine, holiday ideas, best holiday deals, but the problem i have is, i have tried putting an introduction on the home page but it looks out of place, so what is the best way for me to let google know about what the site is about and to get it ranking well in the search engines any help and advice would be great
Technical SEO | | ClaireH-1848860 -
If you only want your home page to rank, can you use rel="canonical" on all your other pages?
If you have a lot of pages with 1 or 2 inbound links, what would be the effect of using rel="canonical" to point all those pages to the home page? Would it boost the rankings of the home page? As I understand it, your long-tail keyword traffic would start landing on the home page instead of finding what they were looking for. That would be bad, but might be worth it.
Technical SEO | | watchcases0