Crawling/indexing of near duplicate product pages
-
Hi,
Hope someone can help me out here. This is the current situation:
We sell stones/gravel/sand/pebbles etc. for gardens. I will take a type of pebbles and the corresponding pages/URL's to illustrate my question --> black beach pebbles.
- We have a 'top' product page for black beach pebbles on which you can find different types of quantities (differing from 20kg untill 1600 kg).
- There is not any search volume related to the different quantities
- The 'top' page does not link to the pages for the different quantities
- The content on the pages for the different quantities is not exactly the same (different price + slightly different content). But a lot of the content is the same.
Current situation:
- Most pages for the different quantities do not have internal links (about 95%)- But the sitemap does contain all of these pages.
- Because the sitemap contains all these URL's, google frequently crawls them (I checked the logfiles) and has indexed them.
Problems:
- Google spends its time crawling irrelevant pages --> our entire website is not that big, so these quantity URL's kind of double the total number of URL's.
- Having url's in the sitemap that do not have an internal link is a problem on its own
- All these pages are indexed so all sorts of gravel/pebbles have near duplicates.
My solution:
- remove these URL's from the sitemap --> that will probably stop Google from regularly crawling these pages
- Putting a canonical on the quantity pages pointing to the top-product page. --> that will hopefully remove the irrelevant (no search volume) near duplicates from the index
My questions:
- To be able to see the canonical, google will need to crawl these pages. Will google still do that after removing them from the sitemap?
- Do you agree that these pages are near duplicates and that it is best to remove them from the index?
- A few of these quantity pages do have intenral links (a few procent of them) because of a sale campaign. So there will be some (not much) internal links pointing to non-canonical pages. Would that be a problem?
Thanks a lot in advance for your help!
Best!
-
Hi Joseph, thanks for your reply, really helpful! 301 is not really an option, because these quantity URL's are sometimes used for promotions and need to be reachable. Therefore I guess canonicals are the second best solution.
We will implement the solution I described and see what will happen. Thanks again!
-
Hello there,
To answer your questions,
1. Google will still crawl your pages even if it's not from the sitemap unless you specify disallow from your robots.txt
2. If they are similar content with the main difference at "quantities" couldn't you consolidate them into one single page that lists all the quantities your company sell in and then 301 redirect the other pages to the consolidated one?
3. It doesn't seem like going to be causing any problem nor hurting your SEO performance, but you could always change these link to the canonical link.
Hope this helps,
Joseph Yap
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallow: /jobs/? is this stopping the SERPs from indexing job posts
Hi,
Intermediate & Advanced SEO | | JamesHancocks1
I was wondering what this would be used for as it's in the Robots.exe of a recruitment agency website that posts jobs. Should it be removed? Disallow: /jobs/?
Disallow: /jobs/page/*/ Thanks in advance.
James0 -
Google is indexing wrong page for search terms not on that page
I’m having a problem … the wrong page is indexing with Google, for search phrases “not on that page”. Explained … On a website I developed, I have four products. For example sake, we’ll say these four products are: Sneakers (search phrase: sneakers) Boots (search phrase: boots) Sandals (search phrase: sandals) High heels (search phrase: high heels) Error: What is going “wrong” is … When the search phrase “high heels” is indexed by Google, my “Sneakers” page is being indexed instead (and ranking very well, like #2). The page that SHOULD be indexing, is the “High heels” page (not the sneakers page – this is the wrong search phrase, and it’s not even on that product page – not in URL, not in H1 tags, not in title, not in page text – nowhere, except for in the top navigation link). Clue #1 … this same error is ALSO happening for my other search phrases, in exactly the same manner. i.e. … the search phrase “sandals” is ALSO resulting in my “Sneakers” page being indexed, by Google. Clue #2 … this error is NOT happening with Bing (the proper pages are correctly indexing with the proper search phrases, in Bing). Note 1: MOZ has given all my product pages an “A” ranking, for optimization. Note 2: This is a WordPress website. Note 3: I had recently migrated (3 months ago) most of this new website’s page content (but not the “Sneakers” page – this page is new) from an old, existing website (not mine), which had been indexing OK for these search phrases. Note 4: 301 redirects were used, for all of the OLD website pages, to the new website. I have tried everything I can think of to fix this, over a period of more than 30 days. Nothing has worked. I think the “clues” (it indexes properly in Bing) are useful, but I need help. Thoughts?
Intermediate & Advanced SEO | | MG_Lomb_SEO0 -
Will have /index in my url hurt?
I am trying to setup permalinks on a wordpress blog that is installed on iis. I can't update the web.config file so I have to make every page /index/pagetitle. as shown here-http://codex.wordpress.org/Using_Permalinks#PATHINFO:_.22Almost_Pretty.22 How much of a difference is there between no /index and having the /index in there?
Intermediate & Advanced SEO | | EcommerceSite0 -
Date of page first indexed or age of a page?
Hi does anyone know any ways, tools to find when a page was first indexed/cached by Google? I remember a while back, around 2009 i had a firefox plugin which could check this, and gave you a exact date. Maybe this has changed since. I don't remember the plugin. Or any recommendations on finding the age of a page (not domain) for a website? This is for competitor research not my own website. Cheers, Paul
Intermediate & Advanced SEO | | MBASydney0 -
Sort term product pages and fast indexing - XML sitemaps be updated daily, weekly, etc?
Hi everyone, I am currently working on a website that the XML sitemap is set to update weekly. Our client has requested that this be changed to daily. The real issue is that the website creates short term product pages (10-20 days) and then the product page URL's go 404. So the real problem is quick indexing not daily vs weekly sitemap. I suspect that daily vs weekly sitemaps may help solve the indexing time but does not completely solve the problem. So my question for you is how can I improve indexing time on this project? The real problem is how to get the product pages indexed and ranking before the 404 page shows u?. . Here are some of my initial thoughts and background on the project. Product pages are only available for 10 to 20 days (Auction site).Once the auction on the product ends the URL goes 404. If the pages only exist for 10 to 20 days (404 shows up when the auction is over), this sucks for SEO for several reasons (BTW I was called onto the project as the SEO specialist after the project and site were completed). Reason 1 - It is highly unlikely that the product pages will rank (positions 1 -5) since the site has a very low Domain Authority) and by the time Google indexes the link the auction is over therefore the user sees a 404. Possible solution 1 - all products have authorship from a "trustworthy" author therefore the indexing time improves. Possible solution 2 - Incorporate G+ posts for each product to improve indexing time. There is still a ranking issue here since the site has a low DA. The product might appear but at the bottom of page 2 or 1..etc. Any other ideas? From what I understand, even though sitemaps are fed to Google on a weekly or daily basis this does not mean that Google indexes them right away (please confirm). Best case scenario - Google indexes the links every day (totally unrealistic in my opinion), URL shows up on page 1 or 2 of Google and slowly start to move up. By the time the product ranks in the first 5 positions the auction is over and therefore the user sees a 404. I do think that a sitemap updated daily is better for this project than weekly but I would like to hear the communities opinion. Thanks
Intermediate & Advanced SEO | | Carla_Dawson0 -
Canonicals for product pages - confused, anyone help?
I have an ecommerce website (built using Magento), and have just had the functionality extended to allow me to define my own canonical URLs. Currently the URLs are www. domainname.com/product-name.html but I can now change this to www.domainname. com/product-category/product-name.html. I was led to believe that this would be good for SEO. However, I have since had conflicting advice - it's been suggested that any links across the website that link to domain/category/sub-category/product will pass weight and authority through to the specified canonical anyway. Plus longer URLs are generally worse... I'm confused. Is it worth changing them? If so, would it be a bad thing to change all 700 canonical URLs at once?
Intermediate & Advanced SEO | | Coraltoes770 -
Will an RSS feed help new product get indexed? How to create one for product?
Hi I've read that creating an RSS feed for one of our ecommerce sites will help the products get indexed faster. Currently it takes google 4-5 days to index our new products, we want to speed that up. Will an RSS feed of the new products we have help? How do you create an RSS feed for this? Our blog gets indexed within minutes, but our main website, 4 days. Help!
Intermediate & Advanced SEO | | xoffie0 -
We are changing ?page= dynamic url's to /page/ static urls. Will this hurt the progress we have made with the pages using dynamic addresses?
Question about changing url from dynamic to static to improve SEO but concern about hurting progress made so far.
Intermediate & Advanced SEO | | h3counsel0