Locating Duplicate Pages
-
Hi,
Our website consists of approximately 15,000 pages however according to our Google Webmaster Tools account Google has around 26,000 pages for us in their index.
I have run through half a dozen sitemap generators and they all only discover the 15,000 pages that we know about. I have also thoroughly gone through the site to attempt to find any sections where we might be inadvertently generating duplicate pages without success.
It has been over six months since we did any structural changes (at which point we did 301's to the new locations) and so I'd like to think that the majority of these old pages have been removed from the Google Index. Additionally, the number of pages in the index doesn't appear to be going down by any discernable factor week on week.
I'm certain it's nothing to worry about however for my own peace of mind I'd like to just confirm that the additional 11,000 pages are just old results that will eventually disappear from the index and that we're not generating any duplicate content.
Unfortunately there doesn't appear to be a way to download a list of the 26,000 pages that Google has indexed so that I can compare it against our sitemap. Obviously I know about site:domain.com however this only returned the first 1,000 results which all checkout fine.
I was wondering if anybody knew of any methods or tools that we could use to attempt to identify these 11,000 extra pages in the Google index so we can confirm that they're just old pages which haven’t fallen out of the index yet and that they’re not going to be causing us a problem?
Thanks guys!
-
It's cool. Sorry, the point I was making is that irrespective of what you search for the page that is returned is http://www.refreshcartridges.co.uk/advanced_search_result.php (with nothing after the .php) and as such the search results page couldn't spurn multiple pages which could be indexed by Google.
-
Hmm, I'm not too knowledgeable about php pages. Sorry!
-
Sorry, I'm not sure what happened to that bit.ly address - The actual address of the website is www.refreshcartridges.co.uk.
Ah, I see what you mean about the search results now however this hopefully shouldn't be an issue as for security (our web guy said something about injections) the URL that is returned irrespective of what is searched for is http://www.refreshcartridges.co.uk/advanced_search_result.php
Thanks again!
-
I can't get that link to work.
What I said before still applies with physical input (this is what I assumed when I said it).
For example, user inputs the words "snakes and dogs" and clicks search. The new URL is "www.yoursite.com/search?q=snakes and dogs" All these weird URL pages need noindex meta tags or Google will flag them as duplicate content because, for example, this page and the result for "dogs and snakes" generate almost the same page.
Does that make sense?
It is in Google's Webmaster Guidelines that you should noindex these pages. -
Many thanks for your input on this. I have actually looked at this through the HTML improvements section of GWMT however I am showing only a few dozen duplicated titles / descriptions and this is simply due to the product categories being almost identical (for example HP Deskjet 500 and HP Deskjet 500+)
-
Many thanks for your response. Our site is an eCommerce site that doesn't employ tags as such and our categories are all accounted for in the 15,000 page figure.
-
We did have this at the beginning of the year when we used a ?dispmode=grid and ?dispmode=list to change the way our results were displayed. This has been rectified however by us completely removing the option and any instances of dispmode present in the URL force a 301 to the correct master page. There are still a few hundred instances of this dispmode being present in the Google index but 99% of them have fallen out now.
I have checked and double checked and we don't seem to have any issues like this at present.
-
I'm not certain if this is the case as our search engine requires physical input in order to yield a result. I don't know if it helps but the URL is http://bit.ly/4Cogchww if you fancy taking a look
-
Thanks for your reply. Indeed our website does force www. if someone were to attempt to navigate to us without prefixing www.
-
Hi Chris,
Google Webmaster has a tool that helps identify duplicate HTMLs and maybe you can use that to see if the 11,000 pages are duplicate. IF they are, I am assuming they should have the duplicate Title Tag and etc. which the tool may discover.
-
Have you checked for instances where a page parameter is being seen as another version of the same page? One of the sites I work for had an issue a few months back where every instance of a product page was being flagged as duplicate content because of an oversight. We had one of our coders write a clause into the page where every time a page loaded with a parameter such as ?color=72 it would canonicalize it to the page minus the parameter. This decreased our duplicate content warnings quickly and effectively.
-
it could be that your tags and categories are considered individual pages and therefore creating their own permalink: ex: http:www.example.com/keyword, and http://www.example.com/tag/keyword and http://www.example.com/category/keyword. Another way would be to check the sitemaps you have in webmaster tools and compare those to each other. Just a suggestion.
-
Does your website force 'www.'?
Both yourdomain.com and www.yourdomain.com are separate sites and can have different pages spidered.
-
Be sure to try different combinations of 'site:www.domain.com' and 'site:domain.com'. They will all yield different results.
Sounds to me like you probably have an internal search engine that is generating search results pages based off the search term, and each different results page is a piece of duplicate content.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicated page titles
Dear friends, We have a problem which occurs on our page with duplicated page titles. Landlords are posting rooms on our page and most of them are giving the same name to the rooms and after that, we are getting more and more duplicated page titles. We are applying random whit this title tag: Accommodation for students in {city.name}: {name}. English title
On-Page Optimization | | Eurasmus.com
Certified student rooms in {city.name}: {name} English title
Erasmus room for students in {city.name} | {name} English title
Student room in {city.name}: {name} English title Also our title tag is sometimes to long but there is no possibility to make them shorter. I think. If anyone would have some idea be free to comment and help us. Kind regards Miško Macolić Tomičić0 -
Avoiding Duplicate Title Tags and Duplicate Content
Hi - I have a question on how to both avoid duplicate title tags and duplicate content AND still create a good user experience. I have a lot of SEO basics to do as the company has not done any SEO to this point. I work for a small cruise line. We have a page for each cruise. Each cruise is associated with a unique itinerary. However the ports of call are not necessarily unique to each itinerary. For each port on the itinerary there are also a set of excursions and if the port is the embark/disembark port, hotels that are associated. The availability of the excursions and hotels depends on the dates associated with the cruise. Today, we have two pages associated with each cruise for the excursions and hotels: mycruisecompany.com/cruise/name-of-cruise/port/excursion/?date=dateinport mycruisecompany.com/cruise/name-of-cruise/port/hotel/?date=dateinport When someone navigates to these pages, they can see a list of relevant content. From a user perspective the list I see is only associated with the relevant date (which is determined by a set of query parameters). Unfortunately, there are situations where the same content is on multiple pages. For instance the exact same set of hotels or excursions might be available for two different cruises or on multiple dates of the same cruise. This is causing a couple of different challenges. For instance, with regard to title tags, we have <title>Hotels in Rome</title> multiple times. I know that isn't good. If I tried to just have a hub page with hotels and a hub page with excursions available from each cruise and then a page for each hotel and excursion, each with a unique title tag, then the challenge is that I don't know how to not make the customer have to work through whether the hotel they are looking for is actually available on the dates in question. So while I can guarantee unique content/title tags, I end up asking the user to think too much. Thoughts?
On-Page Optimization | | Marston_Gould1 -
Listing all services on one page vs separate pages per service
My company offers several generalized categories with more specific services underneath each category. Currently the way it's structured is if you click "Voice" you get a full description of each voice service we offer. I have a feeling this is shooting us in the foot. Would it be better to have a general overview of the services we offer on the "Voice" page that then links to the specified service? The blurb about the service on the overview page would be unique, not taken from the actual specific service's page.
On-Page Optimization | | AMATechTel0 -
How to rank well on 2 keywords - 2 separate pages or 1 combined page
Hi, I have a website about allergy. We ar developing new content, and through keyword research I have discovered that "dog allergy" and "cat allergy" are both very common searches. However, the cause, and symtoms are very alike for these 2 types of allergy so it would make sense to combine the two allergies on one page. So my question is: What do I choose to increase my chances to ranke the best I can for both "cat allergy", and "dog allergy"? Should I develop 2 separate pages for cat & dog allergy or should I do a combined page? (We would of course review the texts so no duplicate content/text would be used if we chose to have 2 pages) I would be so greatful for your advice!! Kind regards, Jeanette
On-Page Optimization | | Mylan-GDM0 -
Duplicate Pages software
Hey guys, i was told few hours ago about a system that can take few of your keywords and automatically will create new links and pages (in the map file) for your website, so a website that was build with 20 pages( for example) will be shown to SE as a site with hundreds of pages, thing that should help the SEO IS anyone heard about such a software? is it legal? any advice that you can give on this mater? Thanks i.
On-Page Optimization | | iivgi0 -
Faq page
We are redoing our faq page and we were trying to decide on the best format. 1. Create each question on a separate page 2. Create one page with all the question and have the questions expand 3. Create different faq category pages (like 4) and divide the questions between them From my perspective #1 seems the best ---. you can create hyper relevant content for the user and optimize each question really well Any experience with this?
On-Page Optimization | | Morris770 -
Page Rank Drop
Just trying to get more feedback - so we recently edited title and meta descriptions for existing website SEO and we've noticed in the past several weeks, our client's website has dropped out of the top 50 in a variety of terms we were targeting that they used to show up for (note: when updating SEO, we DID NOT remove any relevant terms we were targeting). When the website does come up in searches, it is the old meta description and title. So far, the feedback we've gotten is that first, it takes Google a few weeks to recrawl and index - however, we are now on week 3 after the changes and still no rebound in rankings. We were also told to check with the SEO Moz page grader to be sure the keywords were being optimized correctly - got As and Bs for the test terms I tried. We also submitted an XML site map to speed up the crawl process as another user suggested. We've tested the site with various tools to make sure there are redirect errors, etc. and everything looks fine. Again, it's now been 3+ weeks and no ranking rebound. Any other suggestions on what could be happening?
On-Page Optimization | | laidlawseo0 -
Duplicate content on video pages
Hi guys, We have a video section on our site containing about 50 videos, grouped by category/difficulty. On each video page except for the embedded player, a sentence or two describing the video and a list of related video links, there's pretty much nothing else. All of those appear as duplicate content by category. What should we do here? How long a description should be for those pages to appear unique for crawlers? Thanks!
On-Page Optimization | | lgrozeva0