Problems with to many indexed pages
-
A client of our have not been able to rank very well the last few years. They are a big brand in our country, have more than 100+ offline stores and have plenty of inbound links.
Our main issue has been that they have to many indexed pages. Before we started we they had around 750.000 pages in the Google index. After a bit of work we got it down to 400-450.000. During our latest push we used the robots meta tag with "noindex, nofollow" on all pages we wanted to get out of the index, along with canonical to correct URL - nothing was done to robots.txt to block the crawlers from entering the pages we wanted out.
Our aim is to get it down to roughly 5000+ pages. They just passed 5000 products + 100 categories.
I added this about 10 days ago, but nothing has happened yet. Is there anything I can to do speed up the process of getting all the pages out of index?
The page is vita.no if you want to have a look!
-
Great! Please let us know how it goes so we can all learn more about it.
Thanks!
-
Thanks for that! What you are saying makes sense, so I'm going to go ahead and give it a try.
-
"Google: Do Not No Index Pages With Rel Canonical Tags"
https://www.seroundtable.com/noindex-canonical-google-18274.htmlThis is still being debated by people and I'm not saying it is "definitely" your problem. But if you're trying to figure out why those noindexed pages aren't coming out of the index this could be one thing to look into.
John Mueller (see screenshot below) is a Webmaster Trends Analyst for Google.
Good luck.
-
Isn't the whole point of using canonical to give Google a pointer of what page it is originally meant to be?
So if you have a category on shop.com/sub..
Using filter and/or pagenation you then get:
shop.com/sub?p=1
shop.com/sub?color=blue.. and so on! Both those pages then need canonical and neither do we want them index, so we by using both canonical and noindex tell Google to "don't index this page (noindex), here is the original version of it (canonical)".
Or did I misunderstand something?
-
Hello Inevo,
Most of the time when this happens it's just because Google hasn't gotten around to recrawling the pages and updating their index after seeing the new robots meta tag. It can take several months for this to happen on a large site. Submit an XML sitemap and/or create an HTML sitemap that makes it easy for them to get to these pages if you need it to go faster.
I had a look and see some conflicting instructions that Google could possibly be having a problem with.
The paginated version ( e.g. http://www.vita.no/duft?p=2 ) of the page has a rel canonical tag pointing to the first page (e.g. http://www.vita.no/duft/ ). Yet it also has a noindex tag while the canonical page has an index tag. And each page has its own unique title (Side 2 ... Side 3 | ...) . I would remove the rel canonical tag on the paginated pages since they probably don't have any pagerank worth giving to the canonical page. This way it is even more clear to Google that the canonical page is to be indexed, and the others are not to be - instead of saying they are the same page. The same is true of filter pages: http://www.vita.no/gavesett/herre/filter/price-400-/ .
I don't know if that has anything to do with your issue of index bloat, but it's worth a try. I did find some paginated pages in the index.
There also appears to be about 520 blog tag pages indexed. I typically set those to be noindex,follow.
Also remove all paginated pages and any other page that you don't want indexed from your XML sitemaps if you haven't already.
At least for the filter pages, since /filter/ is its own directory, you can use the URL removal tool in GWT. It does have a directory-level removal feature. Of course there are only 75 of these indexed at this moment.
-
My advice would be to include a fresh sitemap and upload it Google Webmaster tool. Not sure about time but I will second Donna, this will take time for the pages to get out of the Google Index.
There is one hack that I used for one page on my website but not sure if it will work for 1000+ pages.
I actually removed a page on my website using Google’s temporary removal request. It kicked the page out of the index for 90 days and in the mean time I added the link in the robots.txt file so it gone quickly and never returned back in the Google listing.
Hope this helps.
-
Hi lnevo,
I had a similar situation last year and am not aware of a faster way to get pages deindexed. You're feeding WMT an updated sitemap right?
It took 8 months for the excess pages to get dropped off my client's site. I'll be listening to hear if anyone knows a faster way.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Carwling and indexing problems
hi, i have noticed since my site was upgraded that google is taking a long time to publish my articles. before the upgrade google would publish the article straight away, but now it takes an average of around 4 days. the article i am talking about at the moment is here http://www.in2town.co.uk/celebrities-in-the-news/stuart-hall-has-his-prison-sentence-for-sex-crimes-doubled-to-30-months now i have a blog here on blogger and the article was picked up within six mins http://showbizgossipandnews.blogspot.co.uk/2013/07/stuart-hall-has-his-prison-sentence-for.html so i am just wondering what the problem is and what i need to solve this my problem is, my site is mostly a news site so it is no good to me if google is publishing new stories every four days, any help would be great.
Technical SEO | | ClaireH-1848860 -
What is meant by to many on page links
I have just done the report for my site http://www.in2town.co.uk and it says i have 246 on page links but i am not sure how come i have got that many. I know i have a large number of links and in the old days it says that you should keep the links under 100 but now with website speed and the net, people are saying this is no longer listened to. A report i read said that the links should not confuse the reader or put them off, so i am just wondering what your thoughts are on a site with over a 100 links on the home page and also if my site does have to many links what should i do about it. I cannot understand why it is showing 246 when i do not see that many on the page, any advice would be great
Technical SEO | | ClaireH-1848860 -
Page titles in browser not matching WP page title
I have an issue with a few page titles not matching the title I have In WordPress. I have 2 pages, blog & creative gallery, that show the homepage title, which is causing duplicate title errors. This has been going on for 5 weeks, so its not an a crawl issue. Any ideas what could cause this? To clarify, I have the page title set in WP, and I checked "Disable PSP title format on this page/post:"...but this page is still showing the homepage title. Is there an additional title setting for a page in WP?
Technical SEO | | Branden_S0 -
Product Pages Outranking Category Pages
Hi, We are noticing an issue where some product pages are outranking our relevant category pages for certain keywords. For a made up example, a "heavy duty widgets" product page might rank for the keyword phrase Heavy Duty Widgets, instead of our Heavy Duty Widgets category page appearing in the SERPs. We've noticed this happening primarily in cases where the name of the product page contains an at least partial match for the desired keyword phrase we want the category page to rank for. However, we've also found isolated cases where the specified keyword points to a completely irrelevent pages instead of the relevant category page. Has anyone encountered a similar issue before, or have any ideas as to what may cause this to happen? Let me know if more clarification of the question is needed. Thanks!
Technical SEO | | ShawnHerrick0 -
Index page
To the SEO experts, this may well seem a silly question, so I apologies in advance as I try not to ask questions that I probably know the answer for already, but clarity is my goal I have numerous sites ,as standard practice, through the .htaccess I will always set up non www to www, and redirect the index page to www.mysite.com. All straight forward, have never questioned this practice, always been advised its the ebst practice to avoid duplicate content. Now, today, I was looking at a CMS service for a customer for their website, the website is already built and its a static website, so the CMS integration was going to mean a full rewrite of the website. Speaking to a friend on another forum, he told me about a service called simple CMS, had a look, looks perfect for the customer ... Went to set it up on the clients site and here is the problem. For the CMS software to work, it MUST access the index page, because my index page is redirected to www.mysite.com , it wont work as it cant find the index page (obviously) I questioned this with the software company, they inform me that it must access the index page, I have explained that it wont be able to and why (cause I have my index page redirected to avoid duplicate content) To my astonishment, the person there told me that duplicate content is a huge no no with Google (that's not the astonishing part) but its not relevant to the index and non index page of a website. This goes against everything I thought I knew ... The person also reassured me that they have worked within the SEO area for 10 years. As I am a subscriber to SEO MOZ and no one here has anything to gain but offering advice, is this true ? Will it not be an issue for duplicate content to show both a index page and non index page ?, will search engines not view this as duplicate content ? Or is this SEO expert talking bull, which I suspect, but cannot be sure. Any advice would be greatly appreciated, it would make my life a lot easier for the customer to use this CMS software, but I would do it at the risk of tarnishing the work they and I have done on their ranking status Many thanks in advance John
Technical SEO | | Johnny4B0 -
132 pages reported as having Duplicate Page Content but I'm not sure where to go to fix the problems?
I am seeing “Duplicate Page Content” coming up in our
Technical SEO | | danatanseo
reports on SEOMOZ.org Here’s an example: http://www.ccisolutions.com/StoreFront/product/williams-sound-ppa-r35-e http://www.ccisolutions.com/StoreFront/product/aphex-230-master-voice-channel-processor http://www.ccisolutions.com/StoreFront/product/AT-AE4100.prod These three pages are for completely unrelated products.
They are returning “200” status codes, but are being identified as having
duplicate page content. It appears these are all going to the home page, but it’s
an odd version of the home page because there’s no title. I would understand if these pages 301-redirected to the home page if they were obsolete products, but it's not a 301-redirect. The referring page is
listed as: http://www.ccisolutions.com/StoreFront/category/cd-duplicators None of the 3 links in question appear anywhere on that page. It's puzzling. We have 132 of these. Can anyone help me figure out
why this is happening and how best to fix it? Thanks!0 -
Duplicate index.php/webpage pages on website. Help needed!
Hi Guys, Having a really frustrating problem with our website. It is a Joomla 1.7 site and we have some duplicate page issues. What is happening is that we have a webpage, lets say domain.com/webpage1 and then we also have domain.com/index.php/webpage1. Google is seeing these as duplicate pages and is causing me some real SEO problems. I have tried setting up a 301 redirect but it wn't let me redirect /index.php/webpage1 to /webpage1. Anyone have any ideas or plugins that can be used to sort this out? Any help will be really appreciated! Matt.
Technical SEO | | MatthewBarby0 -
Keywords Ranking Dropped from 1st Page to Above 5th Page
Hello, My site URL is http://bit.ly/161NeE and our site was ranked first page for over hundred keywords before March, 30. But all of a sudden, all the keywords on first page dropped to 5th or 6th page. When we search for our site name without ".com", the results appeared on first page are all from other sites. And our page can only be seen on 6th page. We think we have been penalized by Google. But we don't know the exact reason. Can anyone please help? Some extra info on our site: 1. We have been building links by posting blog, articles and PR. All the articles are unique, written by the writers we hire. It has been working fine all the time. We also varied the anchor text a lot. 2. We didn't make any change to the website. But one real problem with our site is that the server is very slow recently and when google crawl our website, many errors were found, mostly 503, 404 errors. And the total number of errors have reach to over 50,000. Do you think this might be a problem for Google not displaying us on first page? Our technicals are working hard to solve server problem. And if it is solved, shall our rankings be back? Please advise. Thanks.
Technical SEO | | Milanoocom0