Duplicate URLs
-
A campaign that I ran said that my client's site had some 47,000+ duplicate pages and titles. I was wondering how I can possibly set that many 301 redirects, but a Moz help engineer said it has a lot to do with session IDs. See this set of duplicate URLs:
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring (clearly the main URL for the page)
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac00a2e0ad53eb90cb0b0304d178fc1
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac3039d0ad4af2720b3ccd2238547ab
http://www.lumberliquidators.com/ll/c/engineered-hardwood-flooring?PIPELINE_SESSION_ID=0ac071ed0ad4af292684b0746931158fTo a crawler, that looks like 4 different pages, when it's clear that they're actually all different URLs for the same page. I was wondering if some of you, maybe with experience in site architecture, would have insight into how to address this issue?
Thanks
Alan
-
Hm, have you looked into rel canonical?
If those are all stand alone pages, you will have to redirect, if they are no longer active, or if they can be replaced by the original page.
Andy is correct, those pages likely are not 'created' with intent.
You should look at what is causing this issue and start there. If not, you are going to be redirecting till the cows come home.
If you are deciding on going through 301's, you may want to take a step back and look at the folders of the entire domain. /ll/ is a folder but not a page, nor is /ll/c/.
Good Luck, Alan!
-
A quick way would be to disallow crawling of all pages starting with /?PIPELINE. That will prevent Google from seeing them. You can do this by adding the following into your robots.txt file...
Disallow: /*?PIPELINE
However, you want to get to the root cause, which will be something to do with the system generating these. Ideally, this needs to be fixed.
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz-Specific 404 Errors Jumped with URLs that don't exist
Hello, I'm going to try and be as specific as possible concerning this weird issue, but I'd rather not say specific info about the site unless you think it's pertinent. So to summarize, we have a website that's owned by a company that is a division of another company. For reference, we'll say that: OURSITE.com is owned by COMPANY1 which is owned by AGENCY1 This morning, we got about 7,000 new errors in MOZ only (these errors are not in Search Console) for URLs with the company name or the agency name at the end of the url. So, let's say one post is: OURSITE.com/the-article/ This morning we have an error in MOZ for URLs OURSITE.com/the-article/COMPANY1 OURSITE.com/the-article/AGENCY1 x 7000+ articles we have created. Every single post ever created is now an error in MOZ because of these two URL additions that seem to come out of nowhere. These URLs are not in our Sitemaps, they are not in Google... They simply don't exist and yet MOZ created an an error with them. Unless they exist and I don't see them. Obviously there's a link to each company and agency site on the site in the about us section, but that's it.
Moz Pro | | CJolicoeur0 -
Duplicate content issues with file download links (diff. versions of a downloadable application)
I'm a little unsure how canonicalisation works with this case. 🙂 We have very regular updates to the application which is available as a download on our site. Obviously, with every update the version number of the file being downloaded changes; and along with it, the URL parameter included when people click the 'Download' button on our site. e.g. mysite.com/download/download.php?f=myapp.1.0.1.exe mysite.com/download/download.php?f=myapp.1.0.2.exe mysite.com/download/download.php?f=myapp.1.0.3.exe, etc In the Moz Site Crawl report all of these links are registering as Duplicate Content. There's no content per se on these pages, all they do is trigger a download of the specified file from our servers. Two questions: Are these links actually hurting our ranking/authority/etc? Would adding a canonical tag to the head of mysite.com/download/download.php solve the crawl issues? Would this catch all of the download.php URLs? i.e. Thanks! Jon
Moz Pro | | jonmc
(not super up on php, btw. So if I'm saying something completely bogus here...be kind 😉 )0 -
Block Moz (or any other robot) from crawling pages with specific URLs
Hello! Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future. I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt: User-agent: dotbot
Moz Pro | | Blacktie
Disallow: /*numberOfStars=0 User-agent: rogerbot
Disallow: /*numberOfStars=0 My questions: 1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact? 2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?) I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there. Thank you for your help!0 -
Tags on my website cause duplicate content
Hi I just recently started a website and I am new to MOZ pro. What Moz pro detected on my website under high priority is that "duplicate page content" and what I realize about these duplicate page content is regarding the tags i put on my post. Because it is a wordpress blog, we are allow to add tags on the side before we publish our post. And because of these tags, it linked to the same page but different url. for example website.com/tags/whatever website.com/tags/whatever 2 and both these url direct to the same page So how do i solve this? do i just stop tagging whenever i write a post? delete all tags while it is not necessary? i seen method like 301 redirect or rel=canonical but is there anyway to solve this problem so I do not face this issue whenever i make a new post in my blog? I mean it doesnt make sense to redirect 301 to every single tags i have whenever i write a new post right? thanks guys
Moz Pro | | andzon0 -
Magento creating odd URL's, no idea why. GWT reporting 404 errors
Hi Mozzes! Problem 1 GWT and Moz, both are reporting approximately one hundred 404 errors for certain URL's. Examples shown below. We have no idea why or how these URL's are being created in Magento. Any hypothesis on the matter would be appreciated. The domain name in question is http://www.artorca.com/ These are valid URL's if /privacy is removed. The first URL is for a product, second for an artist profile and third for a CMS page 1. semi-abstract-landscape/privacy 2. jose-de-la-barra/privacy 3. seller-guide/privacy What may be the source for these URL's? What solution should we implement to fix existing 404's? 301 redirects should be fine? Problem 2 Website pages seem to also be accessible with index.php in the domain name. Example Artorca.com/index.php/URL's. Will this cause a duplicate content issue? Should we implement 301's, canonicals, or just leave as is? Cheers! MozAddict
Moz Pro | | MozAddict0 -
Why do I see a duplicate content errors when rel="canonical" tag is present
I was reviewing my first Moz crawler report and noticed the crawler returned a bunch of duplicate page content errors. The recommendations to correct this issue are to either put a 301 redirect on the duplicate URL or use the rel="canonical" tag so Google knows which URL I view as the most important and the one that should appear in the search results. However, after poking around the source code I noticed all of the pages that are returning duplicate content in the eyes of the Moz crawler already have the rel="canonical" tag. Does the Moz crawler simply not catch whether that tag is being used? If I have that tag in place, is there anything else I need to do in order to get that error to stop showing up in the Moz crawler report?
Moz Pro | | shinolamoz0 -
"Issue: Duplicate Page Content " in Crawl Diagnostics - but sample pages are not related to page indicated with duplicate content
In the crawl diagnostics for my campaign, the duplicate content warnings have been increasing, but when I look at the sample pages that SEOMoz says have duplicate content, they are completely different pages from the page identified. They have different Titles, Meta Descriptions and HTML content and often are different types of pages, i.e. product page appearing as having duplicate content vs. a category page. Anyone know what could be causing this?
Moz Pro | | EBCeller0 -
How do I delete a url from a keyword campaign
I have a couple of urls that are associated with the keywords in my campaign. They are no longer valid so how do I remove them?
Moz Pro | | PerriCline0