Crawl Diagnostic Errors

rosstaylor

Hi there,

Seeing a large number of errors in the SEOMOZ Pro crawl results. The 404 errors are for pages that look like this:

http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F

I know that t%2F represents the two slashes, but I'm not sure why these addresses are being crawled. The site is a wordpress site. Anyone seen anything like this?

rosstaylor

Yep, i think you nailed it. I crawled another 2 sites I manage, one has sexy bookmarks, one doesn't. The one with had 404 errors. A quick search for sexy bookmarks causes 404 had some results as well.

You're right about the issue with the other plugin, commentluv. Will definitely take that suggestion to the developer.

And a hat trick, you're right about the block of latest from the blog on the footer. Been meaning to take that out for ages.

Very grateful for your attention and wisdom! Thank you!

RyanKent

Ross, it seems you have a plugin for comments which adds a link to the last post of the person who made the comment. This is an interesting plugin which i have not seen before. There are two problems I see with the plugin. First, it identifies links to your own site as external, when they should be tagged as internal. Secondly, it probably shouldn't be used to link to the current page. Debbi's comment is a link asking readers to view her latest article, which is the current page.

There is also a link to the current article under Recent Posts. It would be a great advancement for the plugin if it could identify the current URL and not include it in the list.

There is also a footer section "Latest from blog" which offers a link to the post. In my opinion offering the same links in the Recent Posts side bar and the "Latest from blog" footer is excessive, and since footer links aren't used very much I would recommend removing the footer block.

The fourth link to the article I located on the page is from a plugin which is referred to as "Shareaholic TopSharingBar SexyBookmarks". The link is contained within javascript.

All of the above 4 links are valid links and should not be the source of the 404 error.

And finally I believe I just now discovered the root cause of this issue. It seems to be your "Shareaholic" plugin. Try disabling it and then crawling your site again. The 404 error should disappear.

The URL you shared, in the exact format you shared it, is present in your site's HTML code in a line which begins with the following code:

rosstaylor

will do and thank you for your insight!

RyanKent

I just started a SEOmoz crawl for your site. It will take some time to complete. Once the report is available I'll take a look.

Since you removed a plug in, the results may not be the same. You may have resolved the issue. Please refrain from making further changes until the crawl is complete.

rosstaylor

Okay sure. Embarassingly enough, it's my own site at bayareaseo.net.

http://www.bayareaseo.net/2011/11/things-that-can-mess-up-your-google-places-rankings/

is referring to in SEOMOZ crawler

www.bayareaseo.net/2011/11/things-that-can-mess-up-your-google-places-rankings/http%3A%2F%2Fwww.bayareaseo.net%2F2011%2F11%2Fthings-that-can-mess-up-your-google-places-rankings%2F

and in GWT the original url refers to

http://www.bayareaseo.net/2011/11/things-that-can-mess-up-your-google-places-rankings/<a< p=""></a<>

Just removed a "related posts" style plug in, not sure if that's the culprit.

RyanKent

It doesn't make sense to me that the referrer is the page itself. If you are willing to share your site's URL and the specific URL which is having an issue I can perform a crawl and offer more details.

rosstaylor

The referrer is the page itself. Examined the code and I'm not seeing any links that match, with or without the funky markup, i.e. searching for

http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F

as well as

http://www.example.com/2010/07/blogpost/http://www.example.com/2010/07/blogpost/

I'm thinking it's down to one of two WP plugins causing the error. Found similar results in GWT, with many 404s referring from themselves as

http://www.example.com/page<a< p=""></a<>

Will disable the plugins and report back after the next crawl

RyanKent

The crawler normally will start on your site's home page and move through all the html code on the home page, then crawl each and every link on the home page following it throughout your site. If you are seeing these errors on your crawl report then the links are on your site.

Examine your crawl report and look for the REFERRER field. This field indicates the page which contains the link. If you can't see the link on the page itself, right-click on the page and choose View Page Source, then do a search of the html code (CTRL+F) for the link.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Crawl Diagnostic Errors

http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Unsolved Bug in site crawl analysis - 308 redirect flagged as temporary

How do I exclude my blog subfolder from being crawled with my main domain (www.) folder?

Advice for 4000+ duplicate errors on 1st check

Why does Crawl Diagnostics report this as duplicate content?

Can we add sites to the crawl queue for OSE?

Crawl Report Technical Issue

Changing the Timeframe of Historical Crawl Data

Scheduling crawls between certain time periods