How to identify orphan pages?

MarieHaynes

I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?

I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.

Or are there other ways to identify orphan pages?

Fr3sh3gg

DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.

Cheers,
Mike | Fresh Egg Australia

Cyrus-Shepard

Hi Marie!

Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.

Couple of posts that may help:

1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit

Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly

2. 4 Tips for Technical SEO

Ian has some great tips here, including:

Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.

3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/

Hope this helps! Let us know what works.

AgentsofValue

Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:

1. get a list of all the pages created by your CMS

2. get the list of all the pages found by Screaming Frog

3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.

You could probably use an Excel trick like this one:

http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to identify orphan pages?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Should search pages be indexed?

Canonical Page Question

One landing page or many?

Why google does not remove my page?

Unavoidable duplicate page

Why is my office page not being indexed?

3 pages crawled?

SEOMoz Crawl Diagnostic indicates duplicate page content for home page?