Duplicate URL errors when URL's are unique
-
Hi All,
I'm running through MOZ analytics site crawl report and it is showing numerous duplicate URL errors, but the URLs appear to be unique. I see that the majority of the URL's are the same, but shouldn't the different brands make them unique to one another?
http://www.sierratradingpost.com/clearance~1/clothing~d~5/tech-couture~b~33328/
http://www.sierratradingpost.com/clearance~1/clothing~d~5/zobha~b~3072/
Any ideas as to why these would be shown as duplicate URL errors?
-
There is long article on the dev blog how they determine whether pages are duplicates - check https://moz.com/devblog/near-duplicate-detection/ - it's quite technical stuff - but this is the part which might interest you:
"This leads to one of the questions we get asked a lot: Why do I see duplicate content warnings in the context of Custom Crawl for pages that I see as different. Ultimately, it’s always because of the same reason: because no dechroming is done, there is a small amount of unique content relative to the total content. One of the places where this crops up a lot is web stores, where there’s a large amount of chrome layout, but only a short product description associated with it."
Dechroming : removing things like navigation, footer, ..etc from the page (exact def. to be found in the article)
If you compare both pages - apart from the image & product title there isn't too much difference between them so the crawler sees only a very small % of content which is different and marks them as duplicates.
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content with ?Page ID's in WordPress
Hi there, I'm trying to figure out the best way to solve a duplicate content problem that I have due to Page ID's that WordPress automatically assigns to pages. I know that in order for me to resolve this I have to use canonical urls but the problem for me is I can't figure out the URL structure. Moz is showing me thousands of duplicate content errors that are mostly related to Page IDs For example, this is how a page's url should look like on my site Moz is telling me there are 50 duplicate content errors for this page. The page ID for this page is 82 so the duplicate content errors appear as follows and so on. For 47 more pages. The problem repeats itself with other pages as well. My permalinks are set to "Post Name" so I know that's not an issue. What can I do to resolve this? How can I use canonical URLs to solve this problem. Any help will be greatly appreciated.
On-Page Optimization | | SpaMedica0 -
Wordpress sitemap url problem causing WMT errors
The following types of links are appearing in my webmaster tools crawl errors report under 'other'. I've noticed they are in my sitemaps ( I run wordpress and use a plugin called Google XML sitemaps). How do I get rid of this error? http://www.musicliveuk.com/bands/postname%/
On-Page Optimization | | SamCUK0 -
Will "internal 301s" have any effect on page rank or the way in which an SE see's our site interlinking?
We've been forced (for scalability) to completely restructure our website in terms of setting out a hierarchy. For example - the old structure : country / city / city area Where we had about 3500 nicely interlinked pages for relevant things like taxis, hotels, apartments etc in that city : We needed to change the structure to be : country / region / area / city / cityarea So as patr of the change we put in place lots of 301s for the permanent movement of pages to the new structure and then we tried to actually change the physical on-page links too. Unfortunately we have left a good 600 or 700 links that point to the old pages, but are picked up by the 301 redirect on page, so we're slowly going through them to ensure the links go to the new location directly (not via the 301). So my question is (sorry for long waffle) : Whilst it must surely be "best practice" for all on-page links to go directly to the 'right' page, are we harming our own interlinking and even 'page rank' by being tardy in working through them manually? Thanks for any help anyone can give.
On-Page Optimization | | TinkyWinky0 -
Competitor's 'hidden' links harming my site?
Hi everyone, I'm new to both Moz & seo, and am attempting to tackle our site's issues after being hit by panda / penguin, so would be grateful for any advice offered. I bought a website 3 years ago after the previous company that ran it went into administration. Having bought the website, it became apparent that the employees of the previous company had copied the entire site content, and relaunched it with a new look / brand. Over the last 3 years they've rewritten much of the content, but there remains a lot of links from their site back to ours which have had the anchor text stripped out, and point to images on our site which have since been removed, example below... <a href="http://www.MyCompany.com/catalog/images/filename.pdf" target="<a class="attribute-value">_blank</a>"><strong>strong>a> What I'm trying to understand is whether the 404 errors being returned by the broken links, and the presence of 'hidden' links on their site, is likely to reflect badly on our site or theirs? I'm not interested in outing anyone here, and I realise the standard recommendation for these kinds of situations is to write to the company telling them to remove the offending content, but if at all possible I'd prefer to fix our site by improving content & links etc, rather than 'force' them to take action and inadvertently improve their own site's content / rankings. As I say, all advice gratefully received 🙂
On-Page Optimization | | Sandy_M0 -
Errors in URL´s
SEOMOZ is showing quite a lot of URL Errors like this: http://trampoliny.net.pl/akcesoria/pokrowiec-basic?frontend=1825cb1eea3af8ee6ee2d96617d32ff6 All these URL´s use the parameter "?frontend=". In webmaster tools we told google not to index this parameter. Unfortunately at the moment we cannot set this parameter as "NOINDEX". We also dont want to use a robots.txt file. How to get rid of the URLS in Seomoz?
On-Page Optimization | | drgoodcat0 -
404 Error to homepage
Is there any risk of forwarding 404 links directly to homepage. I already have 404 page but now google showing you have lots of 404 links and some of them i can't control to fix this issue. Is there any problem if i do so for SEO ?
On-Page Optimization | | chandubaba0 -
Locating Duplicate Pages
Hi, Our website consists of approximately 15,000 pages however according to our Google Webmaster Tools account Google has around 26,000 pages for us in their index. I have run through half a dozen sitemap generators and they all only discover the 15,000 pages that we know about. I have also thoroughly gone through the site to attempt to find any sections where we might be inadvertently generating duplicate pages without success. It has been over six months since we did any structural changes (at which point we did 301's to the new locations) and so I'd like to think that the majority of these old pages have been removed from the Google Index. Additionally, the number of pages in the index doesn't appear to be going down by any discernable factor week on week. I'm certain it's nothing to worry about however for my own peace of mind I'd like to just confirm that the additional 11,000 pages are just old results that will eventually disappear from the index and that we're not generating any duplicate content. Unfortunately there doesn't appear to be a way to download a list of the 26,000 pages that Google has indexed so that I can compare it against our sitemap. Obviously I know about site:domain.com however this only returned the first 1,000 results which all checkout fine. I was wondering if anybody knew of any methods or tools that we could use to attempt to identify these 11,000 extra pages in the Google index so we can confirm that they're just old pages which haven’t fallen out of the index yet and that they’re not going to be causing us a problem? Thanks guys!
On-Page Optimization | | ChrisHolgate0 -
Value of PDF's in SEO
I have a client who has a lot of information in PDF form. They think they should move some of it over into HTML pages so it indexes better. Is there a benefit to converting these PDF's into HTML pages? It seems to me that HTML pages would be good, IF they are relevant pages that could be used online.
On-Page Optimization | | lvstrickland0