Welcome to the Q&A Forum

carinoverturf

Sounds great, Mike! Just send them over and I'll take a look!

Thanks,
Carin

carinoverturf

Hey Mike,

I'm sorry you're so frustrated with the issues in the index lately - I know it's an inconvenience, but, I assure, you the team has been working all hours to work out these kinks!

In fact, after many nights and weekends sacrificed, we're looking at probably being early on our next release. The bugs will be much less evident in this next index as the stale crawl data is dropped from the index.

I know that doesn't help you out right now. Can you send me some details on the corruption you're seeing? A full OSE link with all the parameters would be perfect as well as a CSV, if you have one. If you don't feel comfortable posting in Q&A, please email me at [email protected].

This sounds like the same bug we saw emerge in this index, and have since fixed, but I want to make sure that is the case.

Again, I'm really sorry for the inconvenience and frustration this is causing - we are working hard at ironing out these final issues!

Thanks,
Carin

carinoverturf

This just came to our attention yesterday and our engineers have been investigating over the weekend. It appears to be fallout from the parsing bug that caused the initial delay of this index launch.

We're still investigating, but we do have another index in the works, with the parsing bug no longer present. We hope to have this ready in the next two weeks. In the meantime, we're looking into how we can remedy this current anchor text portion.

If you would like to read more about the parsing bug, Phil provided a great explanation in the forum article here.

Sorry for the inconvenience this will cause - we're looking into ways to remedy this as soon as we can!

Thanks,
Carin

carinoverturf

Yep, David is correct - that call is only available with a paid API plan. If you are interested in a paid plan, check out the different tiers on our Mozscape API page.

Thanks!
Carin

carinoverturf

No problem! I'm so sorry for the inconvenience!

I just pushed the remaining pending reports through, so I think you should be set, but if you continue to run into any problems, just let me know!

carinoverturf

Hey there!

Haha - we ran into a problem on Monday night with one of the machines falling over causing a huge backlog to pile up. We were able to get things back on track yesterday and churn through the backed up reports, but with the index launch yesterday, we're seeing a bit of a backlog again this morning.

We are getting a monster machine up right now to speed through this! Once things calm down you should see these come through. It looks like you have about 8 pending - I'll keep an eye on them to make sure they go through!

Thanks,
Carin

carinoverturf

Hey there!

The Top 500 list is compiled from our Mozscape (formerly Linkscape) link data compiled from our crawlers, but, unfortunately, we don't crawl Facebook since the pages are https.

Adding the ability to crawl https is on our road map, however!

Thanks,
Carin

carinoverturf

Hey Ravi,

Sorry for the delayed response - I wanted to follow up with the engineers to see if they had any suggestions for you.

They agreed the Limit parameter set to 1,000 might be too large to process. Have you tried adjusting that to 300 or even 500? Do you see better success at a lower limit?

Our system will timeout at about 60 seconds so I'm not sure if the hanging is on our end. If dropping the limit size doesn't help, you might want to think about ending the request after about a minute. Sometimes requests that are too long wil timeout, but work fine on a retry as some data will be cached from the previous request.

I hope this information is helpful, but let me know if you're still experiencing issues!

Thanks,
Carin

carinoverturf

Hey there!

Just want to make sure I'm understanding what you're trying to do - basically you're hoping to use jQuery to send requests to the API and then fetch the JSON results?

What type of queries are you sending the API? What would the API query look like?

Also, we do have the API Help Forums to post in or search as well - not sure if you've explored these pages, but there could be some helpful information for you there as well!

Thanks!
Carin

carinoverturf

Hey! This is an issue I haven't heard of before - would you be able to provide anymore information like an example query to the API and some of the pages you are seeing hang?

Thanks!

Carin

carinoverturf

Hey guys,

Yep, Keri is correct, unfortunately We found a bug in ourJuly index with our new crawlers - they were crawling binary files as if they were links and, since they are not normal links, the crawler couldn't handle them very well.

We have made some updates to our crawling so it will go deeper into sites. The reason for these odd inbound links from high-authority sites is due to the crawler reaching much deeper into sites where there are more download (i.e. binary) links. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is why you’re seeing inbound links to pages that don’t really exist.

There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be able to handle the majority of these files correctly, however, this update will need a few more weeks to propagate. The fix for this issue probably won’t be seen for another update, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!

The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these phantom inbound links.Thanks for your patience!Carin

carinoverturf

Hey Zack, I saw the ticket you filed was answered by Aaron, but I just wanted to follow up with you as well. We have made some really exciting changes to the crawler, but, unfortunately, there is a pretty obvious bug as well...

The reason for the “questionable” links coming from the Internet Wild West is due to the crawler reaching much deeper into sites where there are more download (i.e. binary) links. The first issue is the crawler is counting a binary file as a link, but the larger issue, is that the crawler doesn’t really know how to handle these types of files. This bug is causing some links to be improperly associated with certain domains. This is probably what you're seeing with all the crazy links from China and Russia which don't actually link to the site you're researching.

There are two steps to addressing this issue: changing how the crawler sees these file types and then fixing how the crawler handles these file types. We have made improvements to our algorithm so that we will be handle the majority of these files correctly, however, this update will need about a month to propagate. The fix for this issue probably won’t be seen for two more updates, meaning late September. Our improvements should catch most of the issues, but there still could be a few cases we haven't addressed. If this happens, don't hesitate to let us know; we love feedback since it helps us improve and make our index even better!

The next step is to fix how our crawlers handle binary file links and prevent them from being improperly associated with certain domains. We are in the process of working through that issue right now. We’re doing everything we can to resolve this bug as we know it is alarming to see these “questionable” links associated with your sites.I hope this helps and thanks so much for being patient :)Thanks,Carin

carinoverturf

Hey!

That sounds like odd behavior and I don't think I've heard of that happening before. I'd love to dig a bit deeper to see what's going on.

Would you be able to send me the pages you are searching? I assume you are experiencing this in Open Site Explorer?

If you would prefer not post the URLs in this forum, feel free to email me directly at [email protected]!

Thanks,
Carin

carinoverturf

Hey there!

The Top 500 list is compiled from our Mozscape (formerly Linkscape) link data compiled from our crawlers, but, unfortunately, we don't crawl Facebook since the pages are https.

Adding the ability to crawl https is on our road map, however!

Thanks,
Carin

carinoverturf

Hey Lee,

Sha just gave me the heads up about this thread so I wanted to jump in and see if I can clarify what's going on with these downloadable links.

We made some improvements to the Linkscape crawler to make it fresher, crawl deeper and crawl more diverse domains - however, the deeper part ended up bringing to light a bug we had in the crawler. Once we started crawling deeply into websites, we started encountering more downloadable files which our crawler had no idea what to do with. They thought it was a link so they crawled it, but then when trying to associate it with a domain, it didn't know how to properly handle it and it ended up causing weird associations with domains previously crawled by the crawler.

We have been able to implement a few fixes, but, unfortunately, they take a bit of time to propagate through into the index - a full month to crawl and several weeks to process.

There were two solutions we found after investigating this problem. First, don't count binary files as a link - this has been done and should be part of our next index scheduled to launch 10/18. This should address about 70% of the issue. Second, update the crawler to disregard download files if it does encounter them. This update was just recently deployed to our crawlers and still needs about a month to propagate and go through processing. The affects of this fix probably won't be seen for another two index updates.

I hope this helps clear up some of the confusion going on here - most likely the weird "phantom" links you're seeing are a result of this bug we discovered in our updated crawler. If you're still seeing odd behavior after the next index update scheduled 10/18, please email the Customer Service team! We love the feedback as it makes our crawler be even better!

Thanks,

Carin

carinoverturf

Thanks Ryan for the great answer! We do have the new social features in Open SIte Explorer that display the Facebook shares, collected from their FQL API.

We are also in development of a new tool in the PRO app offering Social Analytics metrics. Here is Rand's blog post about it!

Hope that helps, but let me know if you have any more questions!

Thanks,

Carin

carinoverturf

Hey guys!

Keri is right - we have done some updating with our crawler and this index represents the newest version - unfortunately with a few hiccups. People seem to be seeing two issues with this new index - link counts and domain authorities are going up or down considerably and there is an increase of "questionable" inbound links.

Both issues are due to the same root cause: our new crawler is built to be fresher, but it is going deeper into domains, and, unfortunately not visiting as many domains. Domains with a high MozRank are getting crawled deeper, but domains with middle to lower MozRanks are not getting crawled.

Our top priority now is to get the domain diversity back up to or better than that of our last update as was originally designed. It's fixable and we will be focusing all efforts on this.

Previous crawling worked by selecting a list of the top MozRank URLs (around 10B) and then crawling one page from each of them. Now we are crawling links as we discover them, and crawling high MozRank sites daily, weekly or monthly. The advantage of the new crawlers is we are crawling all the time and so we will have fresher data. As links are added, we are much more likely to discover these deeper links. The new crawl had 59B urls, a lot more than the previous 42B, however, more of these links are from the same domain.

The reason for the "questionable" links is due to the fact that the crawler is reaching deeper into the domains where there are more download links. We are currently looking into fixing this so these won't be counted as links. We'll let you know as soon as that issue is resolved!

We are really sorry for the inconvenience. Once we have this new crawler dialed it will provide much fresher and higher quality data!!

Thanks,

Carin

carinoverturf

Hey guys,

The issue you are seeing is due to the new OSE update. We have done some updating with our crawler and this index represents the newest version - sadly, with a few bugs...We are looking into this issue and hope to have it resolved as soon as possible!

The newest version of our crawler is built to be fresher, but it is also going much deeper into high MozRank pages. This bug has probably always existed, but has never been obvious since we weren't crawling as deep into domains where there are more download links. We are currently looking into fixing this so these won't be counted as inbound links.

I'm so sorry for the inconvenience - once we get this new version of the crawler dialed and smoothed out, it will be providing you guys a much fresher and higher quality index!

There is another thread regarding this topic, so check it out if you want more information on what is going on with this index.

Thanks,

Carin

carinoverturf

Hey! I just saw this post - not sure if you filed a ticket the Customer Service team, but I wanted to see if I can help explain what's going on.

I wanted to make sure I had a full handle of what was going on so I talked with one of the Open Site Explorer developers. So my understanding is it will automatically redirect when Linkscape has identified it as a redirect, but if Open Site Explorer requests data from Linkscape and is returned a 301, then the message alert will show up and Open Site Explorer will show the redirected URL metrics.

Two main reasons this could happen - the redirect wasn't in place when we crawled the page so it wasn't recorded as a redirect. The other reason could be when Open Site Explorer requests Linkscape metrics for a given URL, it does not explicitly say if the URL is a redirect - it only tells us if the searched form is canonical. If the search form does not match the canonical, we assume it redirects to the canonical.

I hope that helps explain a bit, but let me know if you have more questions! The customer service team is awesome and they will be able to help you at as well!

Thanks,

Carin

carinoverturf

For sure! I figured it's a question that has crossed people's minds before (I know it has for me!) so if you see anyone else wondering, I wanted you to have something to point them to

Thanks!

Carin

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

carinoverturf

@carinoverturf

Latest posts made by carinoverturf

Best posts made by carinoverturf

Blog Posts

Another March Mozscape Index is Live!

Announcing the March Mozscape Index!

The Second February Mozscape Index is Live!

February Mozscape Index is Live

Another January Mozscape Index Has Been Released!

January Mozscape Index is Live!

December Mozscape Index is Live!

Another November Index is Live!

November Mozscape Index is Live!