Tens of duplicate homepages indexed and blocked later: How to remove from Google cache?
-
Hi community,
Due to some WP plugin issue, many homepages indexed in Google with anonymous URLs. We blocked them later. Still they are in SERP. I wonder whether these are causing some trouble to our website, especially as our exact homepages indexed. How to remove these pages from Google cache? Is that the right approach?
Thanks
-
Hi Nigel,
Thanks for the suggestion. I'm going to use "Remove URLs" tool from GSC. They have been created due to a bug in the Yoast SEO plugin. Very unfortunate and we paid for no mistake from our end.
Removing from SERP means removing from Google index also? Or Google will still consider them and just stops showing us? My intention is: Anyway we blocked them, but whether they will cause some distraction to our ranking efforts being there in results being cached.
Thanks
-
Thanks!
A agree - I have just done a similar clean up by:
1. Don't let them be created
2. Redirect all previous versions!One site I just worked on had 8 versions of the home page! lol
http
https
/index.php
/index.php/A mess!
We stopped them all being created and 301'd all versions just in case they were indexed anywhere or linked externally.
Cheers
-
It is assuredly true that, just like in any number of fields (medicine) - in SEO, prevention is better than cleanup based methodology. If your website doesn't take its medicine, you get problems like this one
I think your advice here was really good
-
Good solid advice
They can be created in any number of ways but it's normally simple enough to specify the preferred URL on the server then move any variations in htaccess, such as those with www (if the none www is preferred), those with a trailing slash at the end etc.
The self canonical on all will sort out any other duplicates.
As for getting rid of them - the search console way is the quickest. If they don't exist after that then the won't be reindexed unless they are linked from somewhere else. In such cases, they will 301 from htaccess so it shouldn't be a problem.
if you 410 you will lose any benefit from those links going to the pages and it's a bad experience for a visitor. Always 301 do not 410 if it is a version.
410s are fine for old pages you never want to see in the index again but not for a home page version.
Regards
Nigel
-
It's likely that you don't have access to edit the coding on these weird plugin URLs. As such, normal techniques like using a Meta no-index tag in the HTML may be non-viable.
You could use the HTTP header (server level stuff) to help you out. I'd advise adding two strong directives to the afflicted URLs through the HTTP header so that Google gets the message:
-
Use the X-Robots deployment of the no-index directive on the affected URLs, at the HTTP header (not the HTML) level. That linked pages tells you about the normal HTML implementation, but also about the X-Robots implementation which is the one you need (scroll down a bit)
-
Serve status code 410 (gone) on the affected URLs
That should prompt Google to de-index those pages. Once they are de-indexed, you can use robots.txt to block Google from crawling such URLs in the future (which will stop the problem happening again!)
It's important to de-index the URLs before you do any robots.txt stuff. If Google can't crawl the affected URLs, it can't find the info (in the HTTP header) to know that it should de-index those pages
Once Google is blocked from both indexing and crawling these pages, they should begin to stop caching them too
Hope that helps
-
-
+1 for "Make sure that they are not created in the first place" haha
-
Hi again vtmoz!
1. Make sure that they are not created in the first place
2. Make sure that they are not in the sitemap
3. Go to search console and remove any you do not want - it will say temporary removal but they will not come back if they are not in the structure or the sitemap.More:
https://support.google.com/webmasters/answer/1663419?hl=en
Note: Always self canonicalize the home page to stop versions with UTM codes (created by Facebook, Twitter etc) appearing in SERPS
Regards
Nigel
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website have Caching/Indexing / Ranking Issue
Hi, My Website (https://www.v3cars.com) is not cached or indexed on regular basic from last 15 days. before this it was cached or indexed on regular basic. We are uploading fresh content on daily basic. Currently my new content is not ranked anywhere in Google even after cached or indexed. Please help and suggest. Sandeep - Love to Cars
Algorithm Updates | | onlinesandeep0 -
Google indexing https sites by default now, where's the Moz blog about it!
Hello and good morning / happy Friday! Last night an article from of all places " Venture Beat " titled " Google Search starts indexing and letting users stream Android apps without matching web content " was sent to me, as I read this I got a bit giddy. Since we had just implemented a full sitewide https cert rather than a cart only ssl. I then quickly searched for other sources to see if this was indeed true, and the writing on the walls seems to indicate so. Google - Google Webmaster Blog! - http://googlewebmastercentral.blogspot.in/2015/12/indexing-https-pages-by-default.html http://www.searchenginejournal.com/google-to-prioritize-the-indexing-of-https-pages/147179/ http://www.tomshardware.com/news/google-indexing-https-by-default,30781.html https://hacked.com/google-will-begin-indexing-httpsencrypted-pages-default/ https://www.seroundtable.com/google-app-indexing-documentation-updated-21345.html I found it a bit ironic to read about this on mostly unsecured sites. I wanted to hear about the 8 keypoint rules that google will factor in when ranking / indexing https pages from now on, and see what you all felt about this. Google will now begin to index HTTPS equivalents of HTTP web pages, even when the former don’t have any links to them. However, Google will only index an HTTPS URL if it follows these conditions: It doesn’t contain insecure dependencies. It isn’t blocked from crawling by robots.txt. It doesn’t redirect users to or through an insecure HTTP page. It doesn’t have a rel="canonical" link to the HTTP page. It doesn’t contain a noindex robots meta tag. It doesn’t have on-host outlinks to HTTP URLs. The sitemaps lists the HTTPS URL, or doesn’t list the HTTP version of the URL. The server has a valid TLS certificate. One rule that confuses me a bit is : **It doesn’t redirect users to or through an insecure HTTP page. ** Does this mean if you just moved over to https from http your site won't pick up the https boost? Since most sites in general have http redirects to https? Thank you!
Algorithm Updates | | Deacyde0 -
What is the appropriate Robot.txt to unblock if Google cannot get all the resources from my homepage?
Hello everyone. I did some research as to why my website has decreased in the Google search rankings recently. After reading this Yoast article I believe it's because the robot.txt files I have set up on my wordpress website. The following is a screen shot of the results of a fetch & render query of my webpage.Googlebot couldn't get all resources for this page. Here's a list: URL Type Reason http://fonts.googleapis.com/css?family=Open+Sans:400,600,700,800%7CPT+Sans:400,400italic,700,700italic%7COswald:400,300,700&subset=latin,latin-ext Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/slick-contact-forms/css/admin.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/contact-form-plugin/css/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/hupso-share-buttons-for-twitter-facebook-google/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/latest-post-accordian-slider/css/lpaccordion.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/latest-post-accordian-slider/css/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/dynamic-captions.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/static-captions.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/wp-email-capture/inc/css/wp-email-capture-styles.css?ver=1.0 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/stylesheet.min.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/style_dynamic.php?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/custom_css.php?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/convertable-contact-form-builder-analytics-and-lead-management-dashboard/assets/css/convertable.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/google-maps-widget/css/gmw.css?ver=1.66 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/acurax-social-media-widget/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-includes/js/swfobject.js?ver=2.2-20120417 Script Denied by robots.txt My current robot.txt settings are as follows; User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: */xmlrpc.php Disallow: */wp-*.php Disallow: */trackback/ Disallow: *?wptheme= Disallow: *?comments= Disallow: *?replytocom Disallow: */comment-page- Disallow: *?s= Disallow: */wp-content/ Allow: */wp-content/uploads/ ```What to I need to allow/disallow to allow Google spiders to properly read my website?
Algorithm Updates | | gamesotd0 -
Duplicate Domain Listings Gone?
I'm noticing in several of the SERPs I track this morning that the domains that formerly had multiple pages listed on pages 1-3 for the same keyword are now reduced to one listing per domain. I'm hoping that this is a permanent change and widespread as it is a significant boon to my campaigns, but I'm wondering if anyone else here has seen this in their SERPs or knows what I'm talking about...? EX of what I mean by "duplicate domain listings": (in case my wording is confusing here) Search term "Product Item" Pages ranking: domain-one.com/product-item.html domain-one.com/product-item-benefits.html etc...
Algorithm Updates | | jesse-landry1 -
Google's not indexing my blog posts anymore! Why?
Google just recently stopped indexing my blog posts immediately after being published, why could this be? I would usually post a blog post and it would be in google results within 45 seconds, now they don't show up until 6 hours later, if at all (a few never even showed up). Also, my home page doesn't even refresh when I make a change to the site. My site is CantStopHipHop [dot] comI have all in one SEO, xml sitemap generator, and webmaster tools and nothing seemed irregular in the settings.I appreciate any thoughts/help/suggestions.
Algorithm Updates | | bb2550 -
Implications of removing all google products from site
Is there any data on the implications of removing everything google from a site; analytics, adsense, webmaster tools, sitemaps, etc. Obviously they still have their search data and they say they dont use these other sources of data for ranking information but has anyone actually tried this or is there any existing data on this?
Algorithm Updates | | jessefriedman0 -
Difference in which pages Google is ranking?
Over the past two weeks I've noticed that Google has decided to change which pages on our site rank for specific keywords. The thing is, this is for keywords that the homepage was already ranking for. Due to our workload, we've made no changes to the site, and I'm not tracking any additional backlinks. Certainly there are no new deep links to these pages. In SEOmoz dashboard (and via tools/manual checking with a proxy) of the 24 terms we have first page ranking for, 9 of them are marked "new to top 50". These are terms we were already ranking for. Google just appears to have switched out the homepage for other pages. I've noticed this across a couple of client sites, too, though none to the extent that I'm seeing on our own. Certainly this isn't a bad thing, as the deeper pages ranking means that they're landing on the content they want first, and I can work to up the conversion rates. It's just caught me by surprise. Anyone else noticing similar changes?
Algorithm Updates | | BedeFahey1 -
Rankings in Bing/Yahoo lower than in Google
Other than a few keywords, my rankings are consistently lower in MSN/Bing/Yahoo than in Google. Any ideas or suggestions as to why?
Algorithm Updates | | NueMD0