Smaller Index
-
Hi guys,
We are a price comparison website with thousands of webpages. Most of them are product webpages with not so good quality content. Only price information and product image, no product details nor costumers reviews.
We are planing to focus on less product categories by adding reviews, details, better images etc... and I would like to know if I should maintain the other "not-so-good" products in other categories or if I should remove it from index to leverage domain average content quality.
Our index size is 200k pages and we are planning to focus on 10k pages max.
Thanks for your help.
-
Hello Pedro,
I think you are making a very wise decision. If you have already been throttled by Panda this could be what you need to bring the site out of it. If not, this could be what you need to save you from a future update. In fact, Matt Cutts recently answered a question about this sort of thing:
http://www.youtube.com/watch?v=adocBLGQoYENote: The question is about "no results" pages but he discusses similar scenarios as well.
These sort of "stub pages" have been a thorn in Google's side for many years, and rest assured they will continue to find ways of keeping them out of the index - including punishing the good content on sites that use them.
As Infant Raj mentioned below, be sure the URLs return a 404 status code in the http header, which will ensure more prompt removal from the index than if they were to redirect or show a 200 status code. I'd ignore the first paragraph in his answer though.
-
If those less significant pages arent entry pages for organic or referral traffic you can remove it. Else its not a good idea to remove those pages just to reduce the number of indexed pages.
If those pages are removed, make sure you add a custom 404 page to handle the 404 errors
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemap - 200 out of 2100 pages indexed
I submitted the .xml sitemap in Google Webmaster Tools and only 200 out of 2100 pages were indexed.
Content Development | | Madlena
Why is that and what can I do ?0 -
One story stands out for not getting indexed?
We have all our stories published today ( 20-Jun-2013 ) got indexed by google except this ( http://coed.com/2013/06/20/heres-a-video-of-kate-upton-topless-on-a-horse/ ). Do anyone out there have any clue about that? Thanks in advance
Content Development | | COEDMediaGroup0 -
In Index but not in Serps
Hi, I have a situation with a client site which is quite frustrating. Basically, most "recent" (by that I mean for the last couple of months) blog posts are failing to reach the SERPS (actually, one has and a couple have from the early days but it's taken months for them to arrive). Previously the blog posts were indexed very quickly - often instantly. Now, I've checked WMT etc and I've submitted each post manually but still nothing. The Sitemap is valid etc. However, pages (not blog posts) seem to be getting into the serps very quickly. Another complication is that if I search: site:www.domainname.com and set the date filter to a month I can see some of the earlier blog posts in that result set. However, if I scrape a bit of unique content from one of those posts and search - nothing in the SERPS. And my Moz report tells me that the page is not to be found in the top 50 either (so I'm confident these pages are not in the SERPS). Any ideas why this would happen to just blog posts? Is it something to do with the parent blog landing perhaps being too strong in the rankings? Any ideas appreciated. Thanks.
Content Development | | KMUK0 -
How to make new content Indexed faster by google
I would like to know what can I do. Normally it takes google around 3 days to index my content. I got a site map, swiched the crawling rate to the fastest in my webmaster tools. I also tried crawling my homepage as google bot and sending it to the index with all linked pages but even if I do so my content takes around 3 days if not more to get indexed. I publish around 20 posts a week. My SEOmoz page authority is 48. Some sites of my competition seem to be getting their content indexed in the same day. What else can be done?
Content Development | | sebastiankoch0 -
Our blog is indexed by "google web" but does not show up in "google blogs". Why not and how can I fix this?
We have a pretty simple blog http://www.aviawest.com/blog I've noticed our articles arn't showing up in Google blogs on "web", we've submitted to http://blogsearch.google.com/ping a month ago. Anyone have some insight here?
Content Development | | Aviawest0 -
Index.html vs. default.html
Hi, I have a website that is about 7 years old. I had been using index.html as the home page. When I redesigned my site about 3 months ago I changed it to default.html. The old index.html page was still on my server. I just realized my mistake. All of my links to the home page lead to the new default.html. However, people are still landing on the old index.html. I have change the old index.html to the new design but that means i have 2 "home" pages out there. Should i delete one? Should I leave them both there but use the canonical tag for one so it is not considered duplicate content? What is best for my rankings?
Content Development | | bhsiao0 -
Please help me stop google indexing https pages on my wordpress site
I added SSL to my wordpress blog because that was the only way to get a dedicated IP address for my site at my host. Now I am noticing Google has started indexing posts both as http and https. Can some one please help how to force google not to index https as I am sure its like having duplicate content. All help is appreciated. So far I have added this to top of htaccess file: RewriteEngine on Options +FollowSymlinks RewriteCond %{SERVER_PORT} ^443$ RewriteRule ^robots.txt$ robots_ssl.txt And added robots_ssl.txt with following: User-agent: Googlebot Disallow: / User-agent: * Disallow: / But https pages are still being indexed. Please help.
Content Development | | rookie1230 -
Index pdf files but redirecto to site
Hi, One of our clients has tons of PDFs (manuals, etc.) and frequently gets good rankings for the direct PDF link. While we're happy about the PDFs attracting users' attention, we'd like to redirect them to the site where the original PDF link is published and avoid that people open the pdf directly. In short, we'd like to index the PDFs, but show to users the pdf link within a site - how should we proceed to do that? Thanks, GM
Content Development | | gmellak0