Duplicate pages, overly dynamic URL’s and long URL’s in Magento
-
Hi there,
I’ve just completed the first crawl of my Magento site and SEOMOZ has picked up 1,000’s of duplicate pages, overly dynamic URL’s and long URL’s due to the sort function which appends URL’s with variables when sorting products (e.g. www.example.com?dir=asc&order=duration).
I’m not particularly concerned that this will affect our rankings as Google has stated that they are familiar with the structure of popular CMS’s and Magento is pretty popular.
However it completely dominates my crawl diagnostics so I can’t see if there are any real underlying issues.
Does anyone know a way of preventing this?
Cheers,
Al. -
You should use the Yoast Robots extension to fix almost all the duplicate content.
http://www.magentocommerce.com/magento-connect/yoast-metarobots.html
When using 2.0 Magento connect: http://connect20.magentocommerce.com/community/Yoast_MetaRobots
for 1.0 use: magento-community/Yoast_MetaRobots
Also use canonical URL. You can find this at the admin panel:
System - Configuration - Catalog - Canonical links for catagories
System - Configuration - Catalog - Canonical links for products
-
I'm actually a fan of selectively (programmatically) NOINDEX'ing like that. I find that the GWT parameter blocking doesn't always scale well. I'm running into a lot of clients trying to use it on 100s or 1000s (or millions, actually) of pages and Google is mostly ignoring it. Very frustrating.
We're working on features to let you ignore certain warnings/notices if you feel they don't apply, I but I do believe in being proactive about indexation issues. I think they matter a lot more than they used to, especially post-Panda.
I would double-check to see if there's a Magento plug-in to help, as this could be a common problem. Unfortunately, we don't have any Magento experts on-staff. I'll leave this open as a discussion question, in case any members have specific expertise.
-
Is it worth trying to tackle this programmatically e.g. if url includes dir= or limit= or order= then include a noindex meta tag on that page?
It’s easy to exclude these parameters in Google Webmaster tools, but again I’d really like to reduce the number of errors reported by seoMOZ as currently I have 10,000 errors due to duplicate content!
-
Hey Harald, Thanks for your response - I've come across that article whilst googling the issue, but it doesn't specifically deal with the duplicate URL's being crawled and being included in SEOmoz reports. As I say I'm not too worried about any negative impact here as I've implemented canonical URL's and I have a sitemap - however it ruins my SEOmoz crawl diagnostic report by creating 1,000's of errors. Cheers, Al.
-
Hi Almenzies, As you mentioned that SEOmoz repots you by telling that there area 1000 of pages which are having the issues of duplicate content , so below is alink which solves the Duplicate content issues:
Solving the Duplicate Content Issues in Magento.
I hope that your query had been solved.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best way to handle URLs of the to-be-translated pages on a multilingual site
Dear Moz community, I have a multilingual site and there are pages with content that is supposed to be translated but for now is English only. The structure of the site is such that different languages have their virtual subdirs: domain.com/en/page1.html for English, domain.com/fr/page1.html for French and so on. Obviously, if the page1.html is not translated, the URLs point to the same content and I get warnings about duplicate content. I see two ways to handle this situation: Break the naming scheme and link to original English pages, i.e. instead of domain.com/fr/index.html linking to domain.com/fr/page1.html link to domain.com/en/page.html Leave the naming scheme intact and set up a 301 redirect so that /fr/page1.html redirects to /en/page1.html Is there any difference for the two methods from the SEO standpoint? Thanks.
Technical SEO | | Lomar0 -
Test site got indexed in Google - What's the best way of getting the pages removed from the SERP's?
Hi Mozzers, I'd like your feedback on the following: the test/development domain where our sitebuilder works on got indexed, despite all warnings and advice. The content on these pages is in active use by our new site. Thus to prevent duplicate content penalties we have put a noindex in our robots.txt. However off course the pages are currently visible in the SERP's. What's the best way of dealing with this? I did not find related questions although I think this is a mistake that is often made. Perhaps the answer will also be relevant for others beside me. Thank you in advance, greetings, Folko
Technical SEO | | Yarden_Uitvaartorganisatie0 -
301 duplicate content dynamic url
I have a number of pages that appear as duplicate titles in google webmaster. They all have to do with a brand name query. I want to 301 these pages since I'm going to relaunch my new website on wordpress and don't want to have 404s on these pages. a simple 301 redirect doesn't work since they are dynamic urls. here is an example: /kidsfashionnetherlands/mimpi.html?q=brand%3Amim+pi%3A&page=2&sort=relevance /kidsfashionnetherlands/mimpi.html?q=mim+pi&page=3&sort=relevance /kidsfashionnetherlands/mimpi.html?q=mim+pi&page=5&sort=relevance should all be 301 to the original page that I want to remain indexed: /kidsfashionnetherlands/mimpi.html I have a lot of these but for different queries. Should I do a 301 on each of them to avoid having 404s when I change my site to wordpress? Thanks
Technical SEO | | dashinfashion0 -
Affiliate urls and duplicate content
Hi, What is the best way to get around having an affiliate program, and the affiliate links on your site showing as duplicate content?
Technical SEO | | Memoz0 -
URL query considered duplicate content?
I have a Magento site. In order to reduce duplicate content for products of the same style but with different colours I have combined them on to 1 product page. I would like to allow the pictures to be dynamic, i.e. allow a user to search for a colour and all the products that offer that colour appear in the results, but I dont want the default product image shown but the product image for that colour applying to the query. Therefore to do this I have to append a query string to the end of the URL to produce this result: www.website.com/category/product-name.html?=red My question is, will the query variations then be picked up as duplicate content: www.website.com/category/product-name.html www.website.com/category/product-name.html?=red www.website.com/category/product-name.html?=yellow Google suggest it has contingencies in its algorithm and I will not be penalised: http://googlewebmastercentral.blogspot.co.uk/2007/09/google-duplicate-content-caused-by-url.html But other sources suggest this is not accurate. Note the article was written in 2007.
Technical SEO | | BlazeSunglass0 -
Can I format my H1 to be smaller than H2's and H3's on the same page?
I would like to create a web design with 12px H1 and for sub headings on the page to be more like 24px. Will search engines see this and dislike it? The reason for doing it is that I want to put a generic page title in the banner, and more poetic headings above the main body. Example: Small H1: Wholesale coffee, online coffee shop and London roastery Large h2: Respect the bean... Thanks
Technical SEO | | Crumpled_Dog
Scott0 -
Best URL Structure for Product Pages?
I am happy with my URLs and my ecommerce site ranks well over all, but I have a question about product URL's. Specifically when the products have multiple attributes such as "color". I use a header URL in order to present the 'style' of products, www.americanmusical.com/Item--i-GIB-LPCCT-LIST and I allow each 'color' to have it's own URL so people can send or bookmark a specific item. www.americanmusical.com/Item--i-GIB-LPCCT-ANCH1 www.americanmusical.com/Item--i-GIB-LPCCT-WRCH1 I use a rel canonical to show that the header URL is the URL search engines should be indexing and to avoid duplicate content issues from having the exact same info, MP3's, PDF's, Video's accessories, etc on each specific item URL. I also have a 'noindex no follow' on the specific item URL. These header URLs rank well, but when using tools like SEOMoz, which I love, my header pages fail for using rel canonical and 'noindex no follow' I've considered only having the header URL, but I like the idea of shoppers being able to get to the specific product URL. Do I need the no index no follow? Do I even need the rel canonical? Any suggestions?
Technical SEO | | dianeb1520 -
Dynamic URLs via Refinements
What is the best way to handle large product pages with many different refinement possibilities. Ex. hard drive - 40 gigs - black case etc. All of these refinements add to the length of the url and potentially create crawling issues as the url is to dynamic. I have seen people canonical all refinements and pages to the main cat page, I have seen others no follow certain refinements. Also in the SEOmoz crawling report it tells me that over two parameters is bad. What is the best way to handle this? Thanks
Technical SEO | | Gordian0