How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page redirected too many times
Hello, How can we solve the following error : This page isn't working ** redirected you too many times.** It's very frustrating. I have cleared the cookies. Still, the problem persists. Thanks
Technical SEO | | Johnroger0 -
What crawler do you recommend for finding orphaned pages on a website?
Is there a crawler that you guys recommend for finding all pages, including orphaned pages on a website? A data export is not feasible. I saw a question from back in 2013 and was wondering if anything has changed since then in regards to crawling orphaned pages. Do most enterprise systems already have this built into their crawler? Or is it best to get a crawler like Xenu or Screaming Frog or Deepcrawl?
Technical SEO | | DigitalMarketingSEO0 -
Thousands of 404-pages, duplicate content pages, temporary redirect
Hi, i take over the SEO of a quite large e-commerce-site. After checking crawl issues, there seems to be +3000 4xx client errors, +3000 duplicate content issues and +35000 temporary redirects. I'm quite desperate regarding these results. What would be the most effective way to handle that. It's a magento shop. I'm grateful for any kind of help! Thx,
Technical SEO | | posthumus
boris0 -
Getting high priority issue for our xxx.com and xxx.com/home as duplicate pages and duplicate page titles can't seem to find anything that needs to be corrected, what might I be missing?
I am getting high priority issue for our xxx.com and xxx.com/home as reporting both duplicate pages and duplicate page titles on crawl results, I can't seem to find anything that needs to be corrected, what am I be missing? Has anyone else had a similar issue, how was it corrected?
Technical SEO | | tgwebmaster0 -
Duplicate page errors from pages don't even exist
Hi, I am having this issue within SEOmoz's Crawl Diagnosis report. There are a lot of crawl errors happening with pages don't even exist. My website has around 40-50 pages but SEO report shows that 375 pages have been crawled. My guess is that the errors have something to do with my recent htaccess configuration. I recently configured my htaccess to add trailing slash at the end of URLs. There is no internal linking issue such as infinite loop when navigating the website but the looping is reported in the SEOmoz's report. Here is an example of a reported link: http://www.mywebsite.com/Door/Doors/GlassNow-Services/GlassNow-Services/Glass-Compliance-Audit/GlassNow-Services/GlassNow-Services/Glass-Compliance-Audit/ btw there is no issue such as crawl error in my Google webmaster tool. Any help appreciated
Technical SEO | | mmoezzi0 -
Pages extensions
Hi guys, We're in the process of moving one of our sites to a newer version of the CMS. The new version doesn't support page extensions (.aspx) but we'll keep them for all existing pages (about 8,000) to avoid redirects. The technical team is wondering about the new pages - does it make any difference if the new pages are without extensions, except for usability? Thanks!
Technical SEO | | lgrozeva0 -
Off-page SEO and on-page SEO improvements
I would like to know what off-page SEO and on-page SEO improvements can be made to one of our client websites http://www.nd-center.com Best regards,
Technical SEO | | fkdpl2420 -
Duplicate page error
SEO Moz gives me an duplicate page error as my homepage www.monteverdetours.com is the same as www.monteverdetours.com/index is this actually en error? And is google penalizing me for this?
Technical SEO | | Llanero0