Why do I have so many extra indexed pages?
-
Stats-
Webmaster Tools Indexed Pages- 96,995
Site: Search- 97,800 Pages
Sitemap Submitted- 18,832
Sitemap Indexed- 9,746
I went through the search results through page 28 and every item it showed was correct. How do I figure out where these extra 80,000 items are coming from? I tried crawling the site with screaming frog awhile back but it locked because of so many urls. The site is a Magento site so there are a million urls, but I checked and all of the canonicals are setup properly. Where should I start looking?
-
It ended up being my search results. I was able to use the site operator to break it down.
-
To ensure Screaming Frog can handle the crawl you could chunk up the site and crawl it in parts, e.g. by each subdirectory. This can be done within the 'configuration' menu under 'include'. There's loads of tutorials online.
You can also use exclude to ensure it doesn't crawl unnecessary pages, images or scripts for example on wordpress I often block wp-content
Definitely sounds like a problem with query parameters being indexed though and its often good to ensure these are addressed in the search console.
-
1. Your first one is interesting. I actually haven't been in there before. There are 96 rows and everyone of them is set to let Googlebot Decide. Do you think I should change that up?
2. Not sure on how many images we have but it is a lot. Not we do not have an image sitemap.
I tried Screaming Frog and it couldn't handle it. After about 1.5 million urls it kept locking up. I just setup a free trial for Deep Crawl. It can only do 10,000 but I will see if it has anything worthwhile.
-
- Have you checked out the parameters settings in Google Search Console to find out how many pages Google has found for your site with the same parameters? That might give some insights on that side.
- How many images do you have across the site? Do you have image sitemaps for these kind of pages.
What I would advise + what you've already been trying is to get a full crawl by either using ScreamingFrog or Deepcrawl. This will provide you with better insights into how many pages a search engine can really find.
-
I wouldn't say it is doing fine. Before I started they launched a new site and messed up the 301 redirects. Traffic hasn't recovered yet.
For Robots I am using the Inchoo robots.txt-http://inchoo.net/ecommerce/ultimate-magento-robots-txt-file-examples/ maybe it is a parameters issue, but I can't figure out how to see all my indexed pages.
I tried doing a search for both inurl:= site:www.site.com and inurl:? site:www.site.com and nothing showed up unless I am missing something.
I can't figure out how to check if some of the canonicalized urls are indexed. The pages are all identical though.
We have less then 100 out of stock items.
-
As long as your organic traffic is doing fine I shouldn't be too concerned. That being said:
- Is your robots.txt or search console disallowing crawler access to parameters like '?count=' or '?color='?
- Is your robots.txt disallowing crawler access to urls that have a 'noindex' but were indexed before they got noindex?
- You can also take a couple of parameters from your site and test if any url's have been indexed, by using the 'inurl:parameter site:www.site.com' query.
- Are some of the canonicalized urls indexed anyway? This may indicate that page content is different enough for Google to index both versions.
- If there's a ton of articles that go in and out of stock and use dynamic ID's, Google may keep these in their index. Do out of stock articles return a 404 or are they kept alive?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO: How to change page content + shift its original content to other page at the same time?
Hello, I want to replace the content of one page of our website (already indexeed) and shift its original content to another page. How can I do this without problems like penalizations etc? Current situation: Page A
Intermediate & Advanced SEO | | daimpa
URL: example.com/formula-1
Content: ContentPageA Desired situation: Page A
URL: example.com/formula-1
Content: NEW CONTENT! Page B
URL: example.com/formula-1-news
Content: ContentPageA (The content that was in Page A!) Content of the two pages will be about the same argument (& same keyword) but non-duplicate. The new content in page A is more optimized for search engines. How long will it take for the page to rank better?0 -
HTTP Pages Indexed as HTTPS
My site used to be entirely HTTPS. I switched months ago so that all links in the pages that the public has access to are now http only. But I see now that when I do a site:www.qjamba.com, the results include many pages with https in the beginning (including the home page!), which is not what I want. I can redirect to http but that doesn't remove https from the indexing, right? How do I solve this problem? sample of results: Qjamba: Free Local and Online Coupons, coupon codes ... **<cite class="_Rm">https://www.qjamba.com/</cite>**One and Done savings. Printable coupons and coupon codes for thousands of local and online merchants. No signups, just click and save. Chicnova online coupons and shopping - Qjamba **<cite class="_Rm">https://www.qjamba.com/online-savings/Chicnova</cite>**Online Coupons and Shopping Savings for Chicnova. Coupon codes for online discounts on Apparel & Accessories products. Singlehop online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/singlehop</cite>Online Coupons and Shopping Savings for Singlehop. Coupon codes for online discounts on Business & Industrial, Service products. Automotix online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/automotix</cite>Online Coupons and Shopping Savings for Automotix. Coupon codes for online discounts on Vehicles & Parts products. Online Hockey Savings: Free Local Fast | Qjamba **<cite class="_Rm">www.qjamba.com/online-shopping/hockey</cite>**Find big online savings at popular and specialty stores on Hockey, and more. Hitcase online coupons and shopping - Qjamba **<cite class="_Rm">www.qjamba.com/online-savings/hitcase</cite>**Online Coupons and Shopping Savings for Hitcase. Coupon codes for online discounts on Electronics, Cameras & Optics products. Avanquest online coupons and shopping - Qjamba <cite class="_Rm">https://www.qjamba.com/online-savings/avanquest</cite>Online Coupons and Shopping Savings for Avanquest. Coupon codes for online discounts on Software products.
Intermediate & Advanced SEO | | friendoffood0 -
First Link on Page Still Only Link on Page?
Bruce Clay and others did some research and found that the first link on the page is the most important and what is accredited as the link. Any other links on the page mean nothing. Is this still true? And in that case, on an ecommerce site with category links in the top navigation (which is high on the code), is it not useful to link to categories in the content of the page? Because the category is already linked to on that page. Thank you, Tyler
Intermediate & Advanced SEO | | tylerfraser0 -
Whats the best way to remove search indexed pages on magento?
A new client ( aqmp.com.br/ )call me yestarday and she told me since they moved on magento they droped down more than US$ 20.000 in sales revenue ( monthly)... I´ve just checked the webmaster tool and I´ve just discovered the number of crawled pages went from 3.260 to 75.000 since magento started... magento is creating lots of pages with queries like search and filters. Example: http://aqmp.com.br/acessorios/lencos.html http://aqmp.com.br/acessorios/lencos.html?mode=grid http://aqmp.com.br/acessorios/lencos.html?dir=desc&order=name Add a instruction on robots.txt is the best way to remove unnecessary pages of the search engine?
Intermediate & Advanced SEO | | SeoMartin10 -
Optimize the category page or a content page?
Hi, We wish to start ranking on a specific keyword ("log house prices" in italian). We have two options on what pages we should optimize for this keyword: A long content page (1000+ words with images) Log houses category page, optimized for the keyword (we have 50+ houses on this page, together with a short price summary). I would think that we have better chances with ranking with option nr.2 , but then we can't use that page for ranking with a more short-tail keyword (like "log houses"). What would you suggest? Is there maybe a third option for this?
Intermediate & Advanced SEO | | JohanMattisson0 -
Hidden keywords - how many per page?
Hi All, We have a booking website we want to optimize for keywords we cannot really show, because some of our partners wouldn't want it. We figured we can put said keywords or close synonyms onpage in various places that are not too dangerous though (e.g. image names, image alt tags, URLs, etc.). The question is how much keywords we can target though? We know keyword stuffing is detrimental, and we will not start to create long URLs stuffed with keywords, same for H1 tags or page titles. So how many is acceptable/not counterproductive? Thanks!
Intermediate & Advanced SEO | | Philoups0 -
Duplicate content on index.htm page
How do I avoid duplicate content on the index.htm page . I need to redirect the spider from the /index.htm file to the main root of http://www.manandhisvan.com.au and hence avoid duplicate content. Does anyone know of a foolproof way of achieving this without me buggering up the complete site Cheers Freddy
Intermediate & Advanced SEO | | Fatfreddy0 -
Removing pages from index
Hello, I run an e-commerce website. I just realized that Google has "pagination" pages in the index which should not be there. In fact, I have no idea how they got there. For example, www.mydomain.com/category-name.asp?page=3434532
Intermediate & Advanced SEO | | AlexGop
There are hundreds of these pages in the index. There are no links to these pages on the website, so I am assuming someone is trying to ruin my rankings by linking to the pages that do not exist. The page content displays category information with no products. I realize that its a flaw in design, and I am working on fixing it (301 none existent pages). Meanwhile, I am not sure if I should request removal of these pages. If so, what is the best way to request bulk removal. Also, should I 301, 404 or 410 these pages? Any help would be appreciated. Thanks, Alex0