Crawl Diagnostics 2261 Issues with Our Blog
-
I just recently signed up for MOZ, so much information. I've done the walk through and will continue learning how to us the tools. But I need your help.
Our first moz crawl indicated 2261 issues (447 404's, 803 duplicate content, 11 502's, etc). I've reviewed all of the crawls issues and they are linked to our Yahoo hosted WordPress blog. Our blog is over 9 years old. The only issue that I'm able to find is our categories are not set up correctly. I've searched for WordPress assistance on this topic and cant find any issues with our current category set up. Every category link that I click returns Nothing Found Apologies, but no results were found for the requested archive. Perhaps searching will help find a related post.
http://site.labellaflorachildrensboutique.com/blog/
Any assistance is greatly appreciated.
-
Go Dan!
-
While what Matt and CleverPHD (Hi Paul!) have said is correct - here's your specific issue:
Your categories are loading with "ugly" permalinks like this: http://site.labellaflorachildrensboutique.com/blog/?cat=175 (that loads fine)
But you are linking to them from the bottom of posts with the "clean" URLs --> http://screencast.com/t/RIOtqVCrs
The fix is that Catgory URLs need to load with "clean" URLs and the ugly one should redirect to the clean one.
Possible fixes:
- Try updating wordpress (I see you're on a slightly older version)
- See if you .htaccess file has been modified (ask a developer or your hosting for help with this perhaps)
Found another linking issue:
This link to Facebook in your left sidebar --> http://screencast.com/t/EqltiBpM it's just coded incorrectly. It adds the current page URL so you get a link like this http://site.labellaflorachildrensboutique.com/blog/category/unique-baby-girl-gifts/www.facebook.com/LaBellaFloraChildrensBoutique instead of your Facebook page: http://www.facebook.com/LaBellaFloraChildrensBoutique
You can fix that Facebook link probably in Appearance->Widgets.
That one issue is causes about 200 of your broken URLs
-
One other thing I forgot. This video by Matt Cutts
It explains why Google might show a link even though the page was blocked by robots.txt
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google really tries not to forget URLs and this video reminds us that Google uses links not just for ranking, but discovery so you really have to pay attention to how you link internally. This is especially important for large sites.
-
Awesome! Thanks for straightening it out.
-
Yes, the crawler will avoid the category pages if they are in robots.txt. It sounded like from the question that this person was going to remove or change the category organization and so you would have to do something with the old URLs (301 or noindex) and that is why I would not use robots.txt in this case so that those directives can be seen.
If these category pages had always been blocked using robots.txt, then this whole conversation is moo as the pages never got in the index. It is when unwanted pages get in the index that you potentially want to get rid of that things get a little tricky, but workable.
I have seen issues where there are pages on sites that got into the index and ranking but they were the wrong pages and so the person just blocked with robots.txt. Those URLs continued to rank and cause problems with the canonical pages that should be ranking. We had to unblock, let Google see the 301, rank the new pages then put the old URLs back into robots to prevent the old URLs from getting back into the index.
Cheers!
-
Oh yeah, that's a great point! I've found that the category pages rarely rank directly, but you'll definitely want to double-check before outright blocking crawlers.
Just to check my own understanding, CleverPhD, wouldn't crawlers avoid the category pages if they were disallowed by robots.txt (presuming they obey robots.txt), even if the links were still on the site?
-
One wrinkle. If the category pages are in Google and potentially ranking well - you may want to 301 them to consolidate them into a more appropriate page (if this makes sense) or if you want to get them out of the index, use a meta noindex robots tag on the page(s) to have them removed from the index, then block them in robots.txt.
Likewise, you have to remove the links on the site that are pointing to the category pages to prevent Google from recrawling and reindexing etc.
-
Category pages actually turn up as duplicate content in Crawl Diagnostics _really _often. It just means that those categories are linked somewhere on your site, and the resulting category pages look almost exactly like all the others.
Generally, I recommend you use robots.txt to block crawlers from accessing pages in the category directory. Once that's done and your campaign has re-crawled your site, then you can see how much of the problem was resolved by that one change, and consider what to do to take care of the rest.
Does that make sense?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Problem crawling a website with age verification page.
Hy every1, Need your help very urgent. I need to crawl a website that first has a page where you need to put your age for verification and after that you are redirected to the website. My problem is that SEOmoz, crawls only that first page, not the whole website. How can I crawl the whole website?, do you need me to upload a link to the website? Thank you very much Catalin
Moz Pro | | catalinmoraru0 -
Why does SEOMoz only crawl 1 page of my site?
My site is: www.thetravelingdutchman.com. It has quite a few pages, but for some reason SEOMoz only crawls one. Please advise. Thanks, Jasper
Moz Pro | | Japking0 -
Joined yesterday, today crawl errors (incorrectly) shows as zero...
Hi. We set up our SEOMoz account yesterday, and the initial crawl showed up a number of errors and warnings which we were in the process of looking at and resolving. I log into SEOMoz today and it's showing 0 errors, Pages Crawled: 0 | Limit: 10,000 Last Crawl Completed: Nov. 27th, 2012 Next Crawl Starts: Dec. 4th, 2012errors, warnings and notices show as 0, and the issues found yesterday show only in the change indicators.There's no way of getting to the results seen yesterday other than waiting a week?We were hoping to continue working through the found issues!
Moz Pro | | WorldText0 -
Why have I stopped receiving emails about crawl reports and rankings reports?
I used to receive emails weekly telling me a new crawl had completed and the reports were ready and also another email saying the new rankings and on page reports were ready to view - I am not getting these anymore. Does this mean these are still happening but I need to find them within the package or has something changed? Thanks
Moz Pro | | Fitsensesports0 -
Can you set-up a manual SEOmoz crawl?
I received a crawl report yesterday, made some site changes, and would like to see if those changes were done correctly. Rather than wait a week for my automatic crawl to be generated, is there anyway to initiate a manual crawl on a single subdomain as a PRO member? As a PRO member, you can schedule crawls for 2 subdomains every 24 hours, and you'll get up to 3,000 pages crawled per subdomain. When we've finished crawling, your reports will be sent to your PRO email address, which is currently From here... http://pro.seomoz.org/tools/crawl-test
Moz Pro | | ICM0 -
Sub-domain not crawled
One of our sites was recently re-designed. The home page is a landing page (www.labadieauto.com) and I moved the blog to this domain (labadieauto.com/blog/) and put a link is the bottom left of the home page. Since the change the SEOMOZ campaign overview is showing only 1 page crawled. This is not setup as a sub-domain so why isn't it showing in the crawl? Help!
Moz Pro | | LabadieAuto0 -
Is The Crawl Diagnostic tool working correctly?
The Crawl Diagnostic tool shows issues and displays a graph but they don't display the page specific results/suggestion like it used to. I get the "Congratulations, there are no pages affected by this issue!" message.
Moz Pro | | -PAUL-0 -
Initial Crawl Questions
Hello. I just joined and used the Crawl tool. I have many questions and hoping the community can offer some guidance. 1. I received an Excel file with 3k+ records. Is there a friendly online viewer for the Crawl report? Or is the Excel file the only output? 2. Assuming the Excel file is the only output, the Time Crawled is a number (i.e. 1305798581). I have tried changing the field to a date/time format but that did not work. How can I view the field as a normal date/time such as May 15, 2011 14:02? 3. I use the ™ symbol in my Title. This symbol appears in the output as a few ascii characters. Is that a concern? Should I remove the trademark symbol from my Title? 4. I am using XenForo forum software. All forum threads automatically receive a Title Tag and Meta Description as part of a template. The Crawl Test report shows my Title Tag and Meta Description as blank for many threads. I have looked at the source code of several pages and they all have clean Title tags and I don't understand why the Crawl Report doesn't show them. Any ideas? 5. In some cases the HTTP Status Code field shows a result of "3". Why does that mean? 6. For every URL in the Crawl Report there is an entry in the Referrer field. What exactly is the relationship between these fields? I thought the Crawl Tool would inspect every page on the site. If a page doesn't have a referring page is it missed? What if a page has multiple referring pages? How is that information displayed? 7. Under Google Webmaster Tools > Site Configurations > Settings > Parameter Handling I have the options set as either "Ignore" or "Let Google Decide" for various URL parameters. These are "pages" of my site which should mostly be ignored. For example a forum may have 7 headers, each on of which can be sorted in ascending or descending order. The only page that matters is the initial page. All the rest should be ignored by Google and the Crawl. Presently there are 11 records for many pages which really should only have one record due to these various sort parameters. Can I configure the crawl so it ignores parameter pages? I am anxious to get started on my site. I dove into the crawl results and it's just too messy in it's present state for me to pull out any actionable data. Any guidance would be appreciated.
Moz Pro | | RyanKent0