Where does the crawler find the urls?
-
The SEO Moz crawler has found a number of 500 error pages, and 404s etc which is very useful
however some of the urls are weird/broken formats we don't recognise and nobody remembers ever using - not weird enough to imply hacking, but something broken in the CMS
Is there anyway to find out where the crawler found these urls? I can patch up and redirect the end result as best I can but I would prefer to fix plug the leak
thanks
-
If you export the crawl diagnostics to a CSV, we do have this information in the last column.
-
thanks for the tips. It is a little frustrating that the information I need has passed through seomoz's system but I guess they don't have the inclination or resources to show us the info
Xenu reckons it can handle 1m urls, we are in the position of not really knowing how many pages our site has!
-
You can pop the links into the free Xenu Link Sleuth* - after you've done a crawl just right-click on the URL you're interested in and click 'URL Properties' - you'll see any inlinks it finds listed there. Depending on the size of your site, it could take a while for the crawl to complete.
You could try the link: property in Google first, though it won't be as thorough as Xenu.
*If you haven't seen it before, don't worry about how the Xenu website looks - the software is kosher - as recommended by many SEOmoz staff. Screaming Frog is a paid alternative (with a limited free version).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What is the best way to treat URLs ending in /?s=
Hi community, I'm going through the list of crawl errors visible in my MOZ dashboard and there's a few URLs ending in /?s= How should I treat these URLs? Redirects? Thanks for any help
Moz Pro | | Easigrass0 -
Magento: Moz finding URL and URL?p=1 as duplicate. Solution?
Good day Mozzers! Moz bot is finding URL's in the Catalogue pages with the format www.example.com/something and www.example.com/something?p=1 as duplicate (since they are the same page) Whats the best solution to implement here? Canonical? Any other? Cheers! MozAddict
Moz Pro | | MozAddict0 -
Good job! This URL received an grade A ?
What does this mean ? This page still ranks very bad at Google. So what does it mean that it recives a grade A ?Also, if this URL recives Grade A. It should clearly be optmizied very good on the Keyword. Stil its on page 9 in google. That is very very low on this keyword. Alot of bad blogs, foreign pages, pages without the keyword in heading, pages without any good content etc. rank better.Does this score have anything to do with ranking on google ?Something is clearly wrong with this page. The on-grade tool wont tell me what that it is. And probably dont understand either, since it gives A. Is there anywhere I can check this page on what is wrong with it ?http://www.butikksiden.no/archives/5-toffe-canada-goose-jakker and Canada Goose jakker Re-Grade Pa Good Job! This URL received an A grade
Moz Pro | | butikksiden0 -
Reset Crawler
Hello, Does anyone know how to reset the crawler? We recently uploaded our new website and deleted the current campaign but it seems the crawler is caching our old websites data and not the new so every time we try to create a new campaign with the same details, it's just pulling everything from cache it seems. Thanks
Moz Pro | | ForzaHost0 -
Finding the source of duplicate content URL's
We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible) However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz. My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?
Moz Pro | | DocdataCommerce0 -
Will SEOMoz offer URL data relating to Bot visits
Does SEOMoz in the future plan to report on Bot visits for each URL, when they are spidered and when they appear in for example Google's index ?
Moz Pro | | NeilTompkins0 -
SEO Web Crawler - Referrer Lists XML Sitemap URL
Hello!, I recently ran the crawl tool on a client site. Opening up the file, I noticed that the referring URLs listed are my XML sitemaps and not (X)HTML pages. Any reason or thoughts behind why this is happening? Thanks!
Moz Pro | | MorpheusMedia0 -
Campaign 4XX error gives duplicate page URL
I ran the report for my site and had many more 4xx errors than I've had in the past month. I updated my .htaccess to include 301 statements based on Google Webmaster Tools Crawl Errors. Google has been reporting a positive downward trend in my errors, but my SEOmoz campaign has shown a dramatic increase in the 4xx pages. Here is an example of an 4xx URL page: http://www.maximphotostudio.net/engagements/266/inniswood_park_engagements/http:%2F%2Fwww.maximphotostudio.net%2Fengagements%2F266%2Finniswood_park_engagements%2F This is strange because URL: http://www.maximphotostudio.net/engagements/266/inniswood_park_engagements/ is valid and works great, but then there is a duplicate entry with %2F representing forward slashes and 2 http statements in each link. What is the reason for this?
Moz Pro | | maximphotostudio1