How can I best find out which URLs from large sitemaps aren't indexed?

rango

I have about a dozen sitemaps with a total of just over 300,000 urls in them. These have been carefully created to only select the content that I feel is above a certain threshold.

However, Google says they have only indexed 230,000 of these urls. Now I'm wondering, how can I best go about working out which URLs they haven't indexed? No errors are showing in WMT related to these pages.

I can obviously manually start hitting it, but surely there's a better way?

Audiohype

There's no obvious function in WM tools, but having a look round there's this option:

http://www.aspfree.com/c/a/BrainDump/Extracting-Google-Indexed-Web-Site-Pages-Using-MS-Excel/

But Google will only display the first 1000 URLs on a site query so you would need to adapt it lots of times. From the looks of it there's not an easy way.

There's maybe a tool out there that is similar to Xenu, but checks the index status in Google also. I haven't ever had the need for this so I'm not aware of one, but the chances are there is something out there.

Good luck!

rango

Any ideas on how to go about exporting indexed urls?

Audiohype

Hi Peter,

I'd attempt some sort of export of both indexed URLs and actual URLs into an Excel file and try and remove duplicates.

You would need to look into it but I'm sure there's a way of matching and removing duplicates.

Other than that I wouldn't know.

Ben

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How can I best find out which URLs from large sitemaps aren't indexed?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Links in Webmaster Tools that aren't really linking to us

Can't get Google to Index .pdf in wp-content folder

How Does Google's "index" find the location of pages in the "page directory" to return?

Omitting URLs from XML Sitemap - Bad??

Ignore url parameters without the 'parameter=' ?

Hosted in the Cloud, my URL is a .com how can I rank in UK SERPS?

Which is the best wordpress sitemap plugin

We changed the URL structure 10 weeks ago and Google hasn't indexed it yet...