How to identify orphan pages?
-
I've read that you can use Screaming Frog to identify orphan pages on your site, but I can't figure out how to do it. Can anyone help?
I know that Xenu Link Sleuth works but I'm on a Mac so that's not an option for me.
Or are there other ways to identify orphan pages?
-
DeepCrawl.co.uk is another great resource here. This tool gives a full list of URLs, including number of internal links to each page. Filter this list by "No. links in" = 0, and this will give you a good list of orphaned pages.
Cheers,
Mike | Fresh Egg Australia -
Hi Marie!
Sadly, I don't use Xenu anymore either. Most of the solutions to find orphaned pages are either hit-and-miss manual methods (search OSE, search your server files). Or you could use a method like Agents of Value describes here.
Couple of posts that may help:
1. Find Orphaned Pages From Your Sitemap.xml File with Excel and IIS Toolkit
Requires IIS toolkit, which unless your installing on an external machine, isn't mac friendly
Ian has some great tips here, including:
- Search the server log files for every unique URL loaded over a 6-month period. Compare that to all unique URLs found in a site crawl. People have a funny way of stumbling into pages you’ve accidentally blocked or orphaned. Chances are, blocked pages will show up in your log file, even if they’re blocked.
- Do a database export. If you’re using WordPress or another content management system, you can export a full list of every page/post on the site, as well as the URL generated. Then compare that to a site crawl.
- Run two crawls of your site using your favorite crawler. Do the first one with the default settings. Then do a second with the crawler set to ignore robots.txt and nofollow. If the second crawl has more URLs than the first, and you want 100% of your site indexed, then check your robots.txt and look for meta ROBOTS issues.
3. Supposedly, Webseo has an automated option to find orphaned files, but I haven't used it nor can I vouch for it:http://www.webseo.com/
Hope this helps! Let us know what works.
-
Well, because they are 'orphans', you probably can't find them using a spider tool! I'd recommend the following process to find your orphan pages:
1. get a list of all the pages created by your CMS
2. get the list of all the pages found by Screaming Frog
3. add the two url lists into Excel and find the URLs in your CMS that are not in the Screaming Frog list.
You could probably use an Excel trick like this one:
http://superuser.com/questions/289650/how-to-compare-two-columns-and-find-differences-in-excel
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages with duplicate meta descriptions
We have around 17 pages have underscores in the URL. From the 17 pages, we have changed 3 pages URL for example if the url is test_sample_demo.html, we have changed as test-sample-demo.html After the updates, we have made redirect as follows Redirect 301 test_sample_demo.html test-sample-demo.html Presently google webmaster tool shows as "Pages with duplicate meta descriptions" & "Pages with duplicate title tags" for changed pages How to fix this. Please help us
Technical SEO | | Intellect0 -
Brand page meta tag
I have around 100 brands on my website (with 100 different pages.) Please suggest me best way to create meta tags for all brand pages.
Technical SEO | | Obbserv0 -
Skip indexing the search pages
Hi, I want all such search pages skipped from indexing www.somesite.com/search/node/ So i have this in robots.txt (Disallow: /search/) Now any posts that start with search are being blocked and in Google i see this message A description for this result is not available because of this site's robots.txt – learn more. How can i handle this and also how can i find all URL's that Google is blocking from showing Thanks
Technical SEO | | mtthompsons0 -
Crawl Test Report only shows home page and no inner site pages?
Hi, My site is [removed] When I first tried to set up a new campaign for the site, I received the error: Roger has detected a problem: We have detected that the root domain [removed] does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information. I then ran a Crawl Test per the FAQ. The SEOmoz crawl report only shows my home page URL and does not have any inner site pages. This is a Joomla site. What is the problem? Thanks! Dave
Technical SEO | | crave810 -
Page Content
Our site is a home to home moving listing portal. Consumers who wants to move his home fills a form so that moving companies can cote prices. We were generating listing page URL’s by using the title submitted by customer. Unfortunately we have understood by now that many customers have entered exactly same title for their listings which has caused us having hundreds of similar page title. We have corrected all the pages which had similar meta tag and duplicate page title tags. We have also inserted controls to our software to prevent generating duplicate page title tags or meta tags. But also the page content quality not very good because page content added by customer.(example: http://www.enakliyat.com.tr/detaylar/evden-eve--6001) What should I do. Please help me.
Technical SEO | | iskq0 -
Where to put Schema On Page
What part of my page should I put Schema data? Header? Footer? Also All pages? or just home page?
Technical SEO | | bozzie3114 -
Different links to to the same page
Hi, Based on the user's actions we post activity into users Facebook timeline. And each activity has link back to our particular page on our website. For example if original page was: www.Domain.com from Facebook timeline it would be like this: www.Domain.com?Ffb_action_ids=101508953168 Do you think this will have a negative effect on our page rankings as we will eded up having a lot of different URL's to the same page? www.Domain.com?Ffb_action_ids=101508953168 www.Domain.com?Ffb_action_ids=456788765609 etc.. Thank you, Karen Bdoyan
Technical SEO | | showme0 -
No. of links on a page
Is it true that If there is a huge number of links from the source page then each link will provide very little value in terms of passing link juice ?
Technical SEO | | seoug_20050