Crawl Diagnostics 2261 Issues with Our Blog
-
I just recently signed up for MOZ, so much information. I've done the walk through and will continue learning how to us the tools. But I need your help.
Our first moz crawl indicated 2261 issues (447 404's, 803 duplicate content, 11 502's, etc). I've reviewed all of the crawls issues and they are linked to our Yahoo hosted WordPress blog. Our blog is over 9 years old. The only issue that I'm able to find is our categories are not set up correctly. I've searched for WordPress assistance on this topic and cant find any issues with our current category set up. Every category link that I click returns Nothing Found Apologies, but no results were found for the requested archive. Perhaps searching will help find a related post.
http://site.labellaflorachildrensboutique.com/blog/
Any assistance is greatly appreciated.
-
Go Dan!
-
While what Matt and CleverPHD (Hi Paul!) have said is correct - here's your specific issue:
Your categories are loading with "ugly" permalinks like this: http://site.labellaflorachildrensboutique.com/blog/?cat=175 (that loads fine)
But you are linking to them from the bottom of posts with the "clean" URLs --> http://screencast.com/t/RIOtqVCrs
The fix is that Catgory URLs need to load with "clean" URLs and the ugly one should redirect to the clean one.
Possible fixes:
- Try updating wordpress (I see you're on a slightly older version)
- See if you .htaccess file has been modified (ask a developer or your hosting for help with this perhaps)
Found another linking issue:
This link to Facebook in your left sidebar --> http://screencast.com/t/EqltiBpM it's just coded incorrectly. It adds the current page URL so you get a link like this http://site.labellaflorachildrensboutique.com/blog/category/unique-baby-girl-gifts/www.facebook.com/LaBellaFloraChildrensBoutique instead of your Facebook page: http://www.facebook.com/LaBellaFloraChildrensBoutique
You can fix that Facebook link probably in Appearance->Widgets.
That one issue is causes about 200 of your broken URLs
-
One other thing I forgot. This video by Matt Cutts
It explains why Google might show a link even though the page was blocked by robots.txt
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google really tries not to forget URLs and this video reminds us that Google uses links not just for ranking, but discovery so you really have to pay attention to how you link internally. This is especially important for large sites.
-
Awesome! Thanks for straightening it out.
-
Yes, the crawler will avoid the category pages if they are in robots.txt. It sounded like from the question that this person was going to remove or change the category organization and so you would have to do something with the old URLs (301 or noindex) and that is why I would not use robots.txt in this case so that those directives can be seen.
If these category pages had always been blocked using robots.txt, then this whole conversation is moo as the pages never got in the index. It is when unwanted pages get in the index that you potentially want to get rid of that things get a little tricky, but workable.
I have seen issues where there are pages on sites that got into the index and ranking but they were the wrong pages and so the person just blocked with robots.txt. Those URLs continued to rank and cause problems with the canonical pages that should be ranking. We had to unblock, let Google see the 301, rank the new pages then put the old URLs back into robots to prevent the old URLs from getting back into the index.
Cheers!
-
Oh yeah, that's a great point! I've found that the category pages rarely rank directly, but you'll definitely want to double-check before outright blocking crawlers.
Just to check my own understanding, CleverPhD, wouldn't crawlers avoid the category pages if they were disallowed by robots.txt (presuming they obey robots.txt), even if the links were still on the site?
-
One wrinkle. If the category pages are in Google and potentially ranking well - you may want to 301 them to consolidate them into a more appropriate page (if this makes sense) or if you want to get them out of the index, use a meta noindex robots tag on the page(s) to have them removed from the index, then block them in robots.txt.
Likewise, you have to remove the links on the site that are pointing to the category pages to prevent Google from recrawling and reindexing etc.
-
Category pages actually turn up as duplicate content in Crawl Diagnostics _really _often. It just means that those categories are linked somewhere on your site, and the resulting category pages look almost exactly like all the others.
Generally, I recommend you use robots.txt to block crawlers from accessing pages in the category directory. Once that's done and your campaign has re-crawled your site, then you can see how much of the problem was resolved by that one change, and consider what to do to take care of the rest.
Does that make sense?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt file issues on Shopify server
We have repeated issues with one of our ecommerce sites not being crawled. We receive the following message: Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. Read our troubleshooting guide. Are you aware of an issue with robots.txt on the Shopify servers? It is happening at least twice a month so it is quite an issue.
Moz Pro | | A_Q0 -
What to do with a site of >50,000 pages vs. crawl limit?
What happens if you have a site in your Moz Pro campaign that has more than 50,000 pages? Would it be better to choose a sub-folder of the site to get a thorough look at that sub-folder? I have a few different large government websites that I'm tracking to see how they are fairing in rankings and SEO. They are not my own websites. I want to see how these agencies are doing compared to what the public searches for on technical topics and social issues that the agencies manage. I'm an academic looking at science communication. I am in the process of re-setting up my campaigns to get better data than I have been getting -- I am a newbie to SEO and the campaigns I slapped together a few months ago need to be set up better, such as all on the same day, making sure I've set it to include www or not for what ranks, refining my keywords, etc. I am stumped on what to do about the agency websites being really huge, and what all the options are to get good data in light of the 50,000 page crawl limit. Here is an example of what I mean: To see how EPA is doing in searches related to air quality, ideally I'd track all of EPA's web presence. www.epa.gov has 560,000 pages -- if I put in www.epa.gov for a campaign, what happens with the site having so many more pages than the 50,000 crawl limit? What do I miss out on? Can I "trust" what I get? www.epa.gov/air has only 1450 pages, so if I choose this for what I track in a campaign, the crawl will cover that subfolder completely, and I am getting a complete picture of this air-focused sub-folder ... but (1) I'll miss out on air-related pages in other sub-folders of www.epa.gov, and (2) it seems like I have so much of the 50,000-page crawl limit that I'm not using and could be using. (However, maybe that's not quite true - I'd also be tracking other sites as competitors - e.g. non-profits that advocate in air quality, industry air quality sites - and maybe those competitors count towards the 50,000-page crawl limit and would get me up to the limit? How do the competitors you choose figure into the crawl limit?) Any opinions on which I should do in general on this kind of situation? The small sub-folder vs. the full humongous site vs. is there some other way to go here that I'm not thinking of?
Moz Pro | | scienceisrad0 -
Complex Rankings Issue For A Law Firm Site
Be warned, this is a complex issue that I have and will require someone who has some advanced knowledge about 301s and link penalty’s. I have a law firm client whose site is having some issues. There are some very complex details here so I'm going to articulate them in bullet points in hopes of making the issues easy to understand. So here's my root problem: We have poor organic rankings (4th, 5th, 6th page for most terms) despite Domain Authority of 32 (avg. 1st page competitor is 28) and some very strong white hat link building the last 60 days or so. How's their backlink profile look, you ask? When you look at their backlink profile in OSE, their spam score is a 1/17 (not sure if that's credible in any way). Lot's of links that score 5's on the spam score make up about 10% of their OSE links. Here’s where it gets tricky; those links are not directed the client's New URL, they are links that go to some old URLs the client used to have, for which they had an SEO guy who built all those crappy links. Those URLs with the crappy links (we'll call them The Crappy URLs) were 301'd (can we all agree 301'd is a verb?) to the NEW URL for just a couple of months. Shortly after that, NEW URL dropped almost completely out of Google, so the client turned off the 301s. So despite those 301s being turned off, OSE still shows all the links going to The Crappy URLs but is giving The New URL credit for them. Keep in mind, the 301s were turned off about 6 months ago so it’s a little strange that OSE still shows those 301s. This has led me to the conclusion that the Domain Authority that OSE shows of 32, is not a “real” number since it is seemingly based off links inherited from 301s that no longer exist. So now I’m trying to create an action plan for this client that will hopefully help us start to make some real progress in our rankings. This client does not have the budget to wait another 6 months for some sign of hope so time is of the essence. Here’s my theoretical action plans I’m choosing from and would like the communities input on which, if any, they feel is best (Also, if I’m missing something or you have an idea, I’m all ears): **Potential Action Plans: ** Do nothing, keep building quality links, creating quality content, monitor crawl reports/gwt for issues. That strategy is going to win long term. #1 + Create one page sites on The Crappy URLs, setup GWT for them, submit sitemaps thus forcing Google, OSE and other web crawlers to index them, thus removing any potential residual penalties from the 301s. NOTE: Currently The Crappy URLS are just landing on GoDaddy’s default landing page which is of course not being indexed by Google or OSE. #2 + Disavow all the bad links going to The Crappy URLS. Then once the bad links no longer appear in the OSE profile for each of The Crappy Sites, 301 them again, thus inheriting the good links but not the bad. #1 + 301 the Crappy URLS back to the New URL, while also disavow any links going to The Crappy URLs. The logic here is that if the road back to recovery is going to be a few months away no matter what, when the 301 knocked them back 6 months ago no reputable link building was being done. I am cautiously optimistic the linkbuilding we are doing will eventually off set any penalty’s coming from the 301s. Plus now we’ll know the 32 Domain Authority OSE is giving us is real. This is the one I’m leaning towards quite frankly because I think it will reduce the recovery time and we’ll know somewhat quickly (30-60 days) if it’s actually working. 1-3 could each take 90 days before we know if it’s working. So please, if you have any expertise with any of this, your help or advice would be appreciated. I’d rather not share The New URL for obvious reasons but if you must know, simply message me and as long as you’re legit, I’ll share it with you.
Moz Pro | | BrianJGomez0 -
Only One page crawled..Need help
I have run a website in Seomoz which have many URLs with it. But when I saw the seomoz report that showing Pages Crawled: 1. Why this is happen my campaign limit is OK Tell me what to do for all page crawling in seomoz report. wV6fMWx
Moz Pro | | lucidsoftech0 -
"link_count" column in Crawl Diagnostics report
On the Crawl Diagnostics report, does "link_count" represent external (links to this URL), internal, both, or what ?
Moz Pro | | GlennFerrell0 -
1 week has passed: Crawled pages still N/A
Roughly one week ago I went pro, and then I created a campaing for the smallish webshop that I'm employed at, however it doesn't seem to crawl. I've check our visitors log and while we find other bots such as google, bing, yandex and so fourth, seomoz bot hasn't been visible. Perhaps I'm looking for a normal useragent, ohwell, onwards. While I thought it might take time, as a small test I added a domain that I've owned for sometime but don't really use, that target site is only 17 pages, now this site was crawled almost within the hour, and I realised that our ~5000pages on the main campaing would take some time, but wouldn't the initial 250 pages be crawled by now? I should add, that I didn't add http:// to the original Campaing, but the one that got crawled I did. I cannot seem to change this myself inorder to spot if that's the problem or not. Anyone has any ideas, should I just wait or is there something I can activly do to force it to start rolling?
Moz Pro | | Hultin0 -
In the middle of Link Building campaign! Do you think Dog Blogs are a good Idea?
Hello Again, Here's I thing, my main website is my Digital Web Agency www.FenwayMedia.co.uk but I also hold good affinity with dog lovers on Facebook over 2100 friends and have a website www.Sepperro.com dedicated to my kennel of 9 racing Siberian huskies. In return I would give them exposure to my dog mediums (articles, links etc.) My only slight concern was relevance. However my thinkning is that as a Digital Agency surely you would be expected to have links coming from all sorts of sources as you "may" have had some input into the development of their site/ marketing etc. What do you guys think? Thank you once again. Kind Regards, Craig
Moz Pro | | fenwaymedia0 -
Can I change the crawl day ?
Hi All I hope there is a simple solution to this - we have a number of campaigns setup which are all crawled, and therefore updated, on different days of the week. We review these weekly and it would be much easier if they were all crawled on the same day. Is it possible to change the crawl day for some campaigns? Thanks Roy
Moz Pro | | bluelogic0