How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should search pages be indexed?
Hey guys, I've always believed that search pages should be no-indexed but now I'm wondering if there is an argument to index them? Appreciate any thoughts!
Technical SEO | | RebekahVP0 -
Canonical Page Question
Hi, I have a question relation to Canonical pages That i need clearing up. I am not sure that my bigcommere website is correctly configured and just wanted clarification from someone in the know. Take this page for example https://www.fishingtackleshop.com.au/barra-lures/ Canonical link is https://www.fishingtackleshop.com.au/barra-lures/ The Rel="next" link is https://www.fishingtackleshop.com.au/barra-lures/?sort=bestselling&page=2 and this page has a canonical tag as rel='canonical' href='https://www.fishingtackleshop.com.au/barra-lures/?page=2' /> Is this correct as above and working as it should or should the canonical tag for the second (pagination page) https://www.fishingtackleshop.com.au/barra-lures/?page=2 in our source code be saying rel='canonical' href='https://www.fishingtackleshop.com.au/barra-lures/' />
Technical SEO | | oceanstorm0 -
One landing page or many?
I can not understand which is the best way to target similar keywords. Do the best way is create landingpage for each long tail keyword landing page or better one but with all included keywords? On the siste i have landingpages: 1. Metal doors 1.2. Steel doors for private houses 1.3. Durvis 1.4 Metal doors for technical rooms and so on. In Latvian language it sounds ok. Some time ago for other sites it worked good but now it just does not work. I see google meses these results up and seo performance is bad. Can you suggest correct structure? Thanks metāla durvis dzīvoklim un mājai m24 metāla durvis sos serviss durvju atvēršana
Technical SEO | | Felter0 -
Why google does not remove my page?
Hi everyone, last week i add "Noindex" tag into my page, but that site still appear in the organic search. what other things i can do for remove from google?
Technical SEO | | Jorge_HDI0 -
Unavoidable duplicate page
Hi, I have an issue where I need to duplicate content on a new site that I am launching. Visitors to the site need to think that product x is part of two different services. e.g. domain.com/service1/product-x domain.com/service2/product-x Re-writing content for product x for each service section is not an option but possibly I could get over that only one product-x page is indexed by search engines. What's the best way to do this? Any advice would be appreciated. Thanks, Stuart
Technical SEO | | Stuart260 -
Why is my office page not being indexed?
Good Morning from 24 degrees C partly cloudy wetherby UK 🙂 This page is not being indexed by Google:
Technical SEO | | Nightwing
http://www.sandersonweatherall.co.uk/office-to-let-leeds/ 1st Question Ive checked robots txt file no problems, i'm in the midst of updating the xml sitemap (it had the old one in place). It only has one link from this page http://www.sandersonweatherall.co.uk/Site-Map/ So is the reason oits not being indexed just a simple case of lack if SEO juice from inbound links so the remedy lies in routing more inbound links to the offending page? 2nd question Is the quickest way to diagnose if a web address is not being indexed to cut and paste the url in the Google search box and if it doesnt return the page theres a problem? Thanks in advance, David0 -
3 pages crawled?
For some reason, my account says it only crawled 3 pages this week, where its usually about 3K. This is my robots which shouldnt affect http://www.theprinterdepo.com/robots.txt and this is my site http://www.theprinterdepo.com any idea?
Technical SEO | | levalencia10 -
SEOMoz Crawl Diagnostic indicates duplicate page content for home page?
My first SEOMoz Crawl Diagnostic report for my website indicates duplicate page content for my home page. It lists the home page URL Page Title and URL twice. How do I go about diagnosing this? Is the problem related to the following code that is in my .htaccess file? (The purpose of the code was to redirect any non "www" backlink referrals to the "www" version of the domain.) RewriteCond %{HTTP_HOST} ^whatever.com [NC]
Technical SEO | | Linesides
RewriteRule ^(.*)$ http://www.whatever.com/$1 [L,R=301] Should I get rid of the "http" reference in the second line? Related to this is a notice in the "Crawl Notices Found" -- "301 Permanent redirect" which shows my home page title as "http://whatever.com" and shows the redirect address as http://http://www.whatever.com/ I'm guessing this problem is again related to the redirect code I'm using. Also... The report indicates duplicate content for those links that have different parameters added to the URL i.e. http://www.whatever.com?marker=Blah Blah&markerzoom=13 If I set up a canonical reference for the page, will this fix this? Thank you.0