Crawl Diagnostic Errors
-
Hi there,
Seeing a large number of errors in the SEOMOZ Pro crawl results. The 404 errors are for pages that look like this:
http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F
I know that t%2F represents the two slashes, but I'm not sure why these addresses are being crawled. The site is a wordpress site. Anyone seen anything like this?
-
Yep, i think you nailed it. I crawled another 2 sites I manage, one has sexy bookmarks, one doesn't. The one with had 404 errors. A quick search for sexy bookmarks causes 404 had some results as well.
You're right about the issue with the other plugin, commentluv. Will definitely take that suggestion to the developer.
And a hat trick, you're right about the block of latest from the blog on the footer. Been meaning to take that out for ages.
Very grateful for your attention and wisdom! Thank you!
-
Ross, it seems you have a plugin for comments which adds a link to the last post of the person who made the comment. This is an interesting plugin which i have not seen before. There are two problems I see with the plugin. First, it identifies links to your own site as external, when they should be tagged as internal. Secondly, it probably shouldn't be used to link to the current page. Debbi's comment is a link asking readers to view her latest article, which is the current page.
There is also a link to the current article under Recent Posts. It would be a great advancement for the plugin if it could identify the current URL and not include it in the list.
There is also a footer section "Latest from blog" which offers a link to the post. In my opinion offering the same links in the Recent Posts side bar and the "Latest from blog" footer is excessive, and since footer links aren't used very much I would recommend removing the footer block.
The fourth link to the article I located on the page is from a plugin which is referred to as "Shareaholic TopSharingBar SexyBookmarks". The link is contained within javascript.
All of the above 4 links are valid links and should not be the source of the 404 error.
And finally I believe I just now discovered the root cause of this issue. It seems to be your "Shareaholic" plugin. Try disabling it and then crawling your site again. The 404 error should disappear.
The URL you shared, in the exact format you shared it, is present in your site's HTML code in a line which begins with the following code:
-
will do and thank you for your insight!
-
I just started a SEOmoz crawl for your site. It will take some time to complete. Once the report is available I'll take a look.
Since you removed a plug in, the results may not be the same. You may have resolved the issue. Please refrain from making further changes until the crawl is complete.
-
Okay sure. Embarassingly enough, it's my own site at bayareaseo.net.
http://www.bayareaseo.net/2011/11/things-that-can-mess-up-your-google-places-rankings/
is referring to in SEOMOZ crawler
and in GWT the original url refers to
http://www.bayareaseo.net/2011/11/things-that-can-mess-up-your-google-places-rankings/<a< p=""></a<>
Just removed a "related posts" style plug in, not sure if that's the culprit.
-
It doesn't make sense to me that the referrer is the page itself. If you are willing to share your site's URL and the specific URL which is having an issue I can perform a crawl and offer more details.
-
The referrer is the page itself. Examined the code and I'm not seeing any links that match, with or without the funky markup, i.e. searching for
http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F
as well as
http://www.example.com/2010/07/blogpost/http://www.example.com/2010/07/blogpost/
I'm thinking it's down to one of two WP plugins causing the error. Found similar results in GWT, with many 404s referring from themselves as
http://www.example.com/page<a< p=""></a<>
Will disable the plugins and report back after the next crawl
-
The crawler normally will start on your site's home page and move through all the html code on the home page, then crawl each and every link on the home page following it throughout your site. If you are seeing these errors on your crawl report then the links are on your site.
Examine your crawl report and look for the REFERRER field. This field indicates the page which contains the link. If you can't see the link on the page itself, right-click on the page and choose View Page Source, then do a search of the html code (CTRL+F) for the link.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Issues with Moz producing 404 Errors from sitemap.xml files recently.
My last campaign crawl produced over 4k 404 errors resulting from Moz not being able to read some of the URLs in our sitemap.xml file. This is the first time we've seen this error and we've been running campaigns for almost 2 months now -- no changes were made to the sitemap.xml file. The file isn't UTF-8 encoded, but rather Content-Type:text/xml; charset=iso-8859-1 (which is what Moveable Type uses). Just wondering if anyone has had a similar issue?
Moz Pro | | BriceSMG0 -
Have a Campaign, but only states 1 page has been crawled by SEOmoz bots. What needs to be done to have all the pages crawled?
We have a campaign running for a client in SEOmoz and only 1 page has been crawled per SEOmoz' data. There are many pages in the site and a new blog with more and more articles posted each month, yet Moz is not crawling anything, aside from maybe the Home page. The odd thing is, Moz is reporting more data on all the other inner pages though for errors, duplicate content, etc... What should we do so all the pages get crawled by Moz? I don't want to delete and start over as we followed all the steps properly when setting up. Thank you for any tips here.
Moz Pro | | WhiteboardCreations0 -
How do I exclude my blog subfolder from being crawled with my main domain (www.) folder?
I am trying to setup two separate campaigns for my blog and for my main site. While it is easy enough to do through the wizard the results I am getting for my main site still include pages that are in my blog sub folder. Please Advise!
Moz Pro | | sameufemia0 -
Warnings, Notices, and Errors- don't know how to correct these
I have been watching my Notices, Warnings and Errors increase since I added a blog to our WordPress site. Is this effecting our SEO? We now have the following: 2 4XX errors. 1 is for a page that we changed the title and nav for in mid March. And one for a page we removed. The nav on the site is working as far as I can see. This seems like a cache issue, but who knows? 20 warnings for “missing meta description tag”. These are all blog archive and author pages. Some have resulted from pagination and are “Part 2, Part 3, Part 4” etc. Others are the first page for authors. And there is one called “new page” that I can’t locate in our Pages admin and have no idea what it is. 5 warnings for “title element too long”. These are also archive pages that have the blog name and so are pages I can’t access through the admin to control page title plus “part 2’s and so on. 71 Notices for “Rel Cononical”. The rel cononicals are all being generated automatically and are for pages of all sorts. Some are for a content pages within the site, a bunch are blog posts, and archive pages for date, blog category and pagination archive pages 6 are 301’s. These are split between blog pagination, author and a couple of site content pages- contact and portfolio. Can’t imagine why these are here. 8 meta-robot nofollow. These are blog articles but only some of the posts. Don’t know why we are generating this for some and not all. And half of them are for the exact same page so there are really only 4 originals on this list. The others are dupes. 8 Blocked my meta-robots. And are also for the same 4 blog posts but duplicated twice each. We use All in One SEO. There is an option to use noindex for archives, categories that I do not have enabled. And also to autogenerate descriptions which I do not have enabled. I wasn’t concerned about these at first, but I read these (below) questions yesterday, and think I'd better do something as these are mounting up. I’m wondering if I should be asking our team for some code changes but not sure what exactly would be best. http://www.seomoz.org/q/pages-i-dont-want-customers-to-see http://www.robotstxt.org/meta.html Our site is http://www.fateyes.com Thanks so much for any assistance on this!
Moz Pro | | gfiedel0 -
Crawl Diagnostics: Next crawl date is in the past
Hi - I have quite a few crawl diagnostic errors and warnings. I have attempted to fix many of them but noticed this note at the bottom of the crawl diagnostics chart: "Last Crawl Completed: Mar. 22nd, 2013 Next Crawl Starts: Mar. 29th, 2013" It looks like SEOMoz thinks the next crawl date is Mar 29th, 2013, which is two weeks ago. Is there any way to "force" the crawl and get it back on regular schedule? This may have happened when my account was disabled because my credit card expired...Thoughts?
Moz Pro | | 6thirty0 -
SEOMOZ Crawl Test
Guys I really have an issue that i know have but cannot see if that makes sense. Basically 3 months ago i did a site wide 301 from economyleasinguk.co.uk to www.economy-car-leasing.co.uk Every thing looks good get all the correct header responses , all canonicals work perfectly , Google webmaster tools is updated fetch as google bot shows the old site is 301 I tried the seomoz crawl test today on the old domain and got this message Oh no! Looks like the page you were trying to access is temporarily down which at first thought ok because the site was not there it wont do it on an old 301 domain, however i tried it on a domain i know has just been 301'd and i got this message The URL http://www.site1.com/ redirects to http://site2.com/. Do you want to crawl http://site2.com/ instead?
Moz Pro | | kellymandingo
Would you like to:
Continue with www.site1.com
Continue with site2.com I really do not know what to do, its either the redirect script is missing something however its doing what it should or the server is a problem but again its doing what it should so why would SEOMOZ not be able to crawl the old URL like it example site above. Now the strange thing is Open Site Explorer does see the 301 and asks if i want to check the new URL instead Ps the redirect is done using PHP redirect which i am asking him to change to a htaccess as its now on a apache server and was wondering if this could be an issue, all pages go to correct pages as requested Thanks in Advance1 -
My crawl diagnostic is showing 2 duplicate content and titles.
First of all Hi - My name is Jason and I've just joined - How you all doing? My 1st question then: When I view where these errors are occurring it says www mydomain co uk and www mydomain co uk/index.html Isn't this the same page? I have looked into my root folder and only index.html exists.
Moz Pro | | JasonHegarty0 -
Can I change the crawl day ?
Hi All I hope there is a simple solution to this - we have a number of campaigns setup which are all crawled, and therefore updated, on different days of the week. We review these weekly and it would be much easier if they were all crawled on the same day. Is it possible to change the crawl day for some campaigns? Thanks Roy
Moz Pro | | bluelogic0