Having some weird crawl issues in Google Webmaster Tools
-
I am having a large amount of errors in the not found section that are linked to old urls that haven't been used for 4 years. Some of the ulrs being linked to are not even in the structure that we used to use for urls. Never the less Google is saying they are now 404ing and there are hundreds of them. I know the best way to attack this is to 301 them, but I was wondering why all of these errors would be popping up. I cant find anything in the google index searching for the link in "" and in webmaster tools it shows unavailable as where these are being linked to from.
Any help would be awesome!
-
There are website crawlers you can employ to scan the site and hunt for specific parameters such as: http://home.snafu.de/tilman/xenulink.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Webmaster tools Hentry showing pages that don't exist
In Webmaster Tools I have a ton of pages listed under Structured Data >> Hentry. These pages are not on my website and I don't know where they are coming from. I redid the site for someone and perhaps they are from the old site. How do I find and delete these? Thank you Rena
Technical SEO | | renalynd270 -
What's going on with google index - javascript and google bot
Hi all, Weird issue with one of my websites. The website URL: http://www.athletictrainers.myindustrytracker.com/ Let's take 2 diffrenet article pages from this website: 1st: http://www.athletictrainers.myindustrytracker.com/en/article/71232/ As you can see the page is indexed correctly on google: http://webcache.googleusercontent.com/search?q=cache:dfbzhHkl5K4J:www.athletictrainers.myindustrytracker.com/en/article/71232/10-minute-core-and-cardio&hl=en&strip=1 (that the "text only" version, indexed on May 19th) 2nd: http://www.athletictrainers.myindustrytracker.com/en/article/69811 As you can see the page isn't indexed correctly on google: http://webcache.googleusercontent.com/search?q=cache:KeU6-oViFkgJ:www.athletictrainers.myindustrytracker.com/en/article/69811&hl=en&strip=1 (that the "text only" version, indexed on May 21th) They both have the same code, and about the dates, there are pages that indexed before the 19th and they also problematic. Google can't read the content, he can read it when he wants to. Can you think what is the problem with that? I know that google can read JS and crawl our pages correctly, but it happens only with few pages and not all of them (as you can see above).
Technical SEO | | cobano0 -
Is there a good Free tool that will check my entire subdomain for mobility issues?
I've been using the Google tool and going page by page, everything seems great. But I'd really like something that will crawl the entire subdomain and give me a report. Any suggestions?
Technical SEO | | absoauto0 -
Bogus Crawl Errors in Webmaster Tools?
I am suddenly seeing a ton of crawl errors in webmaster tools. Almost all of them are URL links coming from scraper sites.that I do not own. Do you see these in your Webmaster Tools account? Do you mark them as "fixed" if they are on a scraper site? There are waaaay too many of these to make redirects. Thanks!
Technical SEO | | EGOL0 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
When is the last time Google crawled my site
How do I tell the last time Google crawled my site. I found out it is not the "Cache" which I had thought it was.
Technical SEO | | digitalops0 -
404-like content in webmaster tools
Hello this is so strange i have just noticed when looking in webmastertools there are some pages of our website it is showing like 404-like content but in reality the page is getting 200 response and it is good. but why does google read that as 404-like content! we have product information and image of the product and even buy button in the page.. not only product pages but also for some content pages it showing 404-like content! thanks in advance for your valuable opinions on the topic..
Technical SEO | | idreams0 -
Google and QnA sites
My website has a QnA site - a bit like this one except it's not private to premium members. It is a page with a left colomn for category links and it has a list of recently asked questions, each question is a link to view the full question and answers etc. Does google know this is a QnA ? Or will it say - hey, there are far too many links on this page, tut tut. Is there anything I can do to help it understand what the page is.
Technical SEO | | borderbound0