Problems with to many indexed pages
-
A client of our have not been able to rank very well the last few years. They are a big brand in our country, have more than 100+ offline stores and have plenty of inbound links.
Our main issue has been that they have to many indexed pages. Before we started we they had around 750.000 pages in the Google index. After a bit of work we got it down to 400-450.000. During our latest push we used the robots meta tag with "noindex, nofollow" on all pages we wanted to get out of the index, along with canonical to correct URL - nothing was done to robots.txt to block the crawlers from entering the pages we wanted out.
Our aim is to get it down to roughly 5000+ pages. They just passed 5000 products + 100 categories.
I added this about 10 days ago, but nothing has happened yet. Is there anything I can to do speed up the process of getting all the pages out of index?
The page is vita.no if you want to have a look!
-
Great! Please let us know how it goes so we can all learn more about it.
Thanks!
-
Thanks for that! What you are saying makes sense, so I'm going to go ahead and give it a try.
-
"Google: Do Not No Index Pages With Rel Canonical Tags"
https://www.seroundtable.com/noindex-canonical-google-18274.htmlThis is still being debated by people and I'm not saying it is "definitely" your problem. But if you're trying to figure out why those noindexed pages aren't coming out of the index this could be one thing to look into.
John Mueller (see screenshot below) is a Webmaster Trends Analyst for Google.
Good luck.
-
Isn't the whole point of using canonical to give Google a pointer of what page it is originally meant to be?
So if you have a category on shop.com/sub..
Using filter and/or pagenation you then get:
shop.com/sub?p=1
shop.com/sub?color=blue.. and so on! Both those pages then need canonical and neither do we want them index, so we by using both canonical and noindex tell Google to "don't index this page (noindex), here is the original version of it (canonical)".
Or did I misunderstand something?
-
Hello Inevo,
Most of the time when this happens it's just because Google hasn't gotten around to recrawling the pages and updating their index after seeing the new robots meta tag. It can take several months for this to happen on a large site. Submit an XML sitemap and/or create an HTML sitemap that makes it easy for them to get to these pages if you need it to go faster.
I had a look and see some conflicting instructions that Google could possibly be having a problem with.
The paginated version ( e.g. http://www.vita.no/duft?p=2 ) of the page has a rel canonical tag pointing to the first page (e.g. http://www.vita.no/duft/ ). Yet it also has a noindex tag while the canonical page has an index tag. And each page has its own unique title (Side 2 ... Side 3 | ...) . I would remove the rel canonical tag on the paginated pages since they probably don't have any pagerank worth giving to the canonical page. This way it is even more clear to Google that the canonical page is to be indexed, and the others are not to be - instead of saying they are the same page. The same is true of filter pages: http://www.vita.no/gavesett/herre/filter/price-400-/ .
I don't know if that has anything to do with your issue of index bloat, but it's worth a try. I did find some paginated pages in the index.
There also appears to be about 520 blog tag pages indexed. I typically set those to be noindex,follow.
Also remove all paginated pages and any other page that you don't want indexed from your XML sitemaps if you haven't already.
At least for the filter pages, since /filter/ is its own directory, you can use the URL removal tool in GWT. It does have a directory-level removal feature. Of course there are only 75 of these indexed at this moment.
-
My advice would be to include a fresh sitemap and upload it Google Webmaster tool. Not sure about time but I will second Donna, this will take time for the pages to get out of the Google Index.
There is one hack that I used for one page on my website but not sure if it will work for 1000+ pages.
I actually removed a page on my website using Google’s temporary removal request. It kicked the page out of the index for 90 days and in the mean time I added the link in the robots.txt file so it gone quickly and never returned back in the Google listing.
Hope this helps.
-
Hi lnevo,
I had a similar situation last year and am not aware of a faster way to get pages deindexed. You're feeding WMT an updated sitemap right?
It took 8 months for the excess pages to get dropped off my client's site. I'll be listening to hear if anyone knows a faster way.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page redirected too many times
Hello, How can we solve the following error : This page isn't working ** redirected you too many times.** It's very frustrating. I have cleared the cookies. Still, the problem persists. Thanks
Technical SEO | | Johnroger0 -
Spam pages being redirected to 404s but sill indexed
Client had a website that was hacked about a year ago. Hackers went in and added a bunch of spam landing pages for various products. This was before the site had installed an SSL certificate. After the hack, the site was purged of the hacked pages and and SLL certificate was implemented. Part of that process involved setting up a rewrite that redirects http pages to the https versions. The trouble is that the spam pages are still being indexed by Google, even months later. If I do a site: search I still see all of those spam pages come up before most of the key "real" landing pages. The thing is, the listing on the SERP are to the http versions, so they're redirecting to the https version before serving a 404. Is there any way I can fix this without removing the rewrite rule?
Technical SEO | | SearchPros1 -
301 redirected all my pages to my new domain, now I have a problem with Google Search Console
Hi guys! I bought a new domain name and redirected all my URLs from the old domain to the new one. Everything worked perfectly but now I have a little problem. I want to use the option 'Address Change' in google search console. Step 1 Works (Select new website in the list) Step 2 Works (Confirm that the 301 are working) Step 3 Asks me to Verify the old domain (huh!?) in order to complete the request. Obviously that doesn't work because my 301s WORKS! So if I try to verify the old website by putting a google file in the root of my domain Google tries to access it and it automatically redirects to the new domain. I must be missing something lol help!
Technical SEO | | benoit_20180 -
301 redirects- how long to keep and how many are too many?
Hi, I was told we have way too many 301 redirects on our site. We have some that have been there for 3 years. Our site is datacard.com . Question- how long should you keep a redirect out there when building a new page and expiring an old page? Is it 6 months, is it a certain time frame? wondering what the best practices are? Thanks! Laura
Technical SEO | | lauramrobinson320 -
Why would GWT say 0 pages indexed ?
Hi Looking in GWT > Google Index > Index Status says 0 pages indexed Yes if i search manually on google for brand site is listed, and i see organic traffic from Google in analytics I take it this is likely an error in GWT and nothing to worry about ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
How to know which pages are indexed by Google?
So apparently we have some sites that are just duplicates of our original main site but aiming at different markets/cities. They have completely different urls but are the same content as our main site with different market/city changed. How do I know for sure which ones are indexed. I enter the url into Google and its not there. Even if I put in " around " it. Is there another way to query google for my site? Is there a website that will tell you which ones are indexed? This is probably a dumb question.
Technical SEO | | greenhornet770 -
Why is my office page not being indexed?
Good Morning from 24 degrees C partly cloudy wetherby UK 🙂 This page is not being indexed by Google:
Technical SEO | | Nightwing
http://www.sandersonweatherall.co.uk/office-to-let-leeds/ 1st Question Ive checked robots txt file no problems, i'm in the midst of updating the xml sitemap (it had the old one in place). It only has one link from this page http://www.sandersonweatherall.co.uk/Site-Map/ So is the reason oits not being indexed just a simple case of lack if SEO juice from inbound links so the remedy lies in routing more inbound links to the offending page? 2nd question Is the quickest way to diagnose if a web address is not being indexed to cut and paste the url in the Google search box and if it doesnt return the page theres a problem? Thanks in advance, David0 -
Link juice distributed to too many pages. Will noindex,follow fix this?
We have an e-commerce store with around 4000 product pages. Although our domain authority is not very high (we launched our site in February and now have around 30 RD's) we did rank on lots of long tail terms, and generated around 8000 organic visits / month. Two weeks ago we added another 2000 products to our existing catalogue of 2000 products, and since then our organic traffic dropped significantly (more than 50%). My guess is that link juice has been distributed to too many pages, causing rankings to drop on overall. I'm thinking about noindexing 50% of the product pages (the ones not receiving any organic traffic). However, I am not sure if this will lead to more link juice for the remaining 50% of the product pages, or not. So my question is: if I noindex,follow page A, will 100% of the linkjuice go to page B INSTEAD of page A, or will just a part of the link juice flow to page B (after flowing through page A first)? Hope my question is clear 🙂 P.s. We have a Dutch store, so the traffic drop is not a Panda issue 🙂
Technical SEO | | DeptAgency0