Struggling with Google Bot Blocks - Please help!
-
I own a site called www.wheretobuybeauty.com.au
After months and months we still have a serious issue with all pages having blocked URLs according to Google Webmaster Tools.
The 404 errors are returning a 200 header code according to the email below. Do you agree that the 404.php code should be changed? Can you do that please ?
The current state:
Google webmaster tools Index Status shows:
26,000 pages indexed
44,000 pages blocked by robots.
In late March, we implemented a change recommended by an SEO expert and he provided a new robots.txt file, advised that we should amend sitemap.xml and other changes. We implemented those changes and then setup a re-index of the site by google. The no of blocked URLs eventually reduced in May and June to 1,000 for a few days – but now the problem has rapidly returned.
The no of pages that are displayed in a google search request of www.google.com.au where the query was ‘site:wheretobuybeauty.com.au’ is 37,000:
This new site has been re-crawled over last 4 weeks.
About the site
This is a Linux php site and has the following:
55,000 URLs in sitemap.xml submitted successfully to webmaster tools
robots.txt file has been modified several times:
Firstly we had none
Then we created one but were advised that it needed to have this current content:
User-agent: *
Disallow:
-
No problem my friend. You are most welcome and here at Moz, you will not only be able to get almost all your SEO related queries addressed and solved, you will also learn a great deal about digital marketing. I highly recommend to every aspiring digital marketer to be active on a community like Moz and I bet they will be able to save a great deal of time and money as well. Wish you all the very best.
Regards,
Devanur Rafi.
-
Thanks Devanur - trying out everything you have suggested.
-
Hi Alex,
Sorry, if I were not clear in my previous post. I meant that in general pages with cleaner code will have an edge over similar pages with bad code when it comes to SEO.
Just an example: Page A has cleaner code compared to page B with all other SEO factors being equal. In a scenario like this, page B might not be favored by Google because of issues arising from bad code like page loading performance, poor rendering in browsers etc,.
The issue at hand might not be because your pages do not pass W3 Validation but its not a bad idea to have a cleaner code on your website
Best regards,
Devanur Rafi.
-
Hi Devanur
My understanding is that Google does not have a problem with invalid XHTML or pages that are not W3C accessible. Please see a comment on this at SEOMOZ:
-
Hi Alex,
I did a code validation check for the following URL:
It gave 238 Errors and 538 Warnings!!
Search engines like Google favor pages with cleaner code. So, I strongly recommend to have the code cleaned on the website.
Here you go for validation check:
Best regards,
Devanur Rafi.
-
Hi Alex,
If the underscores constitute only 4% of the total URLs, then this can be safely kept aside in purview of the current issue.
Same goes with the keyword repetition in the page titles and URLs. However, if it is possible for you to revisit your URL structure and have it like the following, you should go for it:
www.wheretobuybeauty.com.au/<brand< a=""> name>/<product name="">, e.g.</product></brand<>
http://www.wheretobuybeauty.com.au/floris/royal-arms-diamond-edition-eau-de-parfum-spray-100ml-34oz
Same thing with the Page titles also.
Now we are left with two things, the page performance and URL canonicalization. Please have them fixed as early as possible.
Also, I checked your IP address and you have gone for a shared hosting. This is not at all recommended if you are a serious online business owner. Your IP, 103.9.170.75 is being shared by at least 250 other domains that include some bad websites.
Though there are different views about IP bad neighborhood and its impact on SEO, I have always been an advocate of clean IP and recommended it to all my clients always. You can go in for a dedicated IP which is very cheap these days and better yet if you go for a VPS.
For more about this, please check out the "Oops, your IP is either dirty or virtual" section on the following page:
http://www.bruceclay.com/in/seo-tech-tips/techtips.htm
And also, this section, "A Strong Foundation for Your Site to Operate On" on the following page:
http://www.bruceclay.com/blog/2011/04/the-seo-bucket-list-3-things-to-do-before-your-site-dies/
Lastly, I checked your domain's DNS health and here you go for the results:
http://intodns.com/wheretobuybeauty.com.au
Though these might not be causing the current issue, its good to sort everything as we should not leave any stone unturned in making our website a better one out there.
Best regards,
Devanur Rafi.
-
Hey Devanur
please see our responses below:
Hi Alex,
Thanks for the info. Here are few issues that I observed with the website and I am very confident that if you can address and fix these, you should come out of the issue with flying colors:
1. URL canonicalization issue: Both the www and non-www versions of your website URLs return an HTTP header status code 200. You should ideally make all the non-www URLs to be redirected to their respective www versions via a 301 permanent redirection immediately.
**Response: We are asking the developer to correct this. **
2. Inconsistent URL structure: Your website is still using 'underscrores (_) in the URLs as word separators. There are underscores along with the recommended hyphens (-). This inconsistent usage can sometimes lead to issues. So please replace all the underscores with hyphens.
Response: This problem only occurs in a few pages where special characters have been replaced with underscores – probably in 4% of product pages. I can’t see that this has an impact on the SEO?
3. Google PageSpeed check: When I ran Google PageSpeed test on some of the URLs from your website along with the ones that you gave, I found the score varying between, 28 and 60. Please look at the recommendations that the PageSpeed tool gives and try to address the issues (especially the ones like, "Reduce blocking resources". For more: https://developers.google.com/speed/docs/best-practices/rtt#PreferAsyncResources)
I suggest you to please run Google PageSpeed check for some of the URLs.
Note: The URLs from your website that are present in the Google's index may also give similar issues when run through PageSpeed test. This should not make you not addressing these issues.
Response: We will ask the developers to improve performance specifically with the highest value things that are showing up in Google PageSpeed check.
4. Heavy pages leading to higher page loading times and response times:
Many of the pages that I checked are more than 1.3 MB in size which is very huge.This can be a really big problem most of the times that will not only impacts your website from search engines' perspective but also leads to bad user experience which ultimately affects the SEO of your website. You can use tools like gtmetrix.com and fix the issues shown by them.
Response: We will ask the developers to improve performance specifically with the highest value things that are showing up in gtmetrix.com suggestions.
5. Repetition of keywords or phrases in page titles and URLs:
This issue might look like an over optimization effort and should be fixed as early as possible.
For example: www.wheretobuybeauty.com.au/acqua-di-parma/acqua-di-parma-acqua-di-parma-collezione-barbiere-shaving-cream-75ml_25oz
If you look at the above page, the phrase, 'acqua-di-parma' is present twice in both the URL and page title. This is something that you need to review seriously as it looks like keyword repetition that is not good from an SEO stand point.
Response: This occurs with approx 300 product pages out of 40,000 so a very small percentage. We will clean this up when we update our data. I can’t see that this has any impact on SEO considering the small no? Note however that every product page is constructed as follows:
http://www.wheretobuybeauty.com.au/floris/floris-royal-arms-diamond-edition-eau-de-parfum-spray-100ml_34oz
Is there some risk that this will look like over optimisation?
By the way, your robots.txt file is clean and it should not be causing these issues.
Please have the issues mentioned above as soon as possible and you should be out of the woods soon after that.
I wish you good luck Alex.
Best regards,
Devanur Rafi.
-
Hi Alex,
Thanks for the info. Here are few issues that I observed with the website and I am very confident that if you can address and fix these, you should come out of the issue with flying colors:
1. URL canonicalization issue: Both the www and non-www versions of your website URLs return an HTTP header status code 200. You should ideally make all the non-www URLs to be redirected to their respective www versions via a 301 permanent redirection immediately.
2. Inconsistent URL structure: Your website is still using 'underscrores (_) in the URLs as word separators. There are underscores along with the recommended hyphens (-). This inconsistent usage can sometimes lead to issues. So please replace all the underscores with hyphens.
3. Google PageSpeed check: When I ran Google PageSpeed test on some of the URLs from your website along with the ones that you gave, I found the score varying between, 28 and 60. Please look at the recommendations that the PageSpeed tool gives and try to address the issues (especially the ones like, "Reduce blocking resources". For more: https://developers.google.com/speed/docs/best-practices/rtt#PreferAsyncResources)
I suggest you to please run Google PageSpeed check for some of the URLs.
Note: The URLs from your website that are present in the Google's index may also give similar issues when run through PageSpeed test. This should not make you not addressing these issues.
4. Heavy pages leading to higher page loading times and response times:
Many of the pages that I checked are more than 1.3 MB in size which is very huge.This can be a really big problem most of the times that not only impacts your website from search engines' perspective but also leads to bad user experience which ultimately affects the SEO of your website. You can use tools like gtmetrix.com and fix the issues shown by them.
5. Repetition of keywords or phrases in page titles and URLs:
This issue might look like an over optimization effort and should be fixed as early as possible.
For example: www.wheretobuybeauty.com.au/acqua-di-parma/acqua-di-parma-acqua-di-parma-collezione-barbiere-shaving-cream-75ml_25oz
It could have been like: www.wheretobuybeauty.com.au/acqua-di-parma/collezione-barbiere-shaving-cream-75ml-25oz
If you look at the above page, the phrase, 'acqua-di-parma' is present twice in both the URL and page title. This is something that you need to review seriously as it looks like keyword repetition that is not good from an SEO stand point.
By the way, your robots.txt file is clean and it should not be causing these issues.
Please have the issues mentioned above as soon as possible and you should be out of the woods soon after that.
I wish you good luck Alex.
Best regards,
Devanur Rafi.
-
Thanks Devanur
I put this to my partners and he said he is addressing it but that the main issue still remains.
This is the critical issue where there are only a few pages visible to google search as almost all are blocked by the google bot. I am re-stating the problem in this email for you.
Can you please take a look at the whole problem and see if you can see what is causing this.
Is robots.txt causing this? It is the only change that we have made and at one point the problem was corrected but has now returned. I have read everything that I can about robots.txt on the google site and in forums.
Here are two examples (out of 44,000) that are blocked. It is easy to find other examples – simply test any of the product pages – only 200 out of 44,000 return any result.
Try searching using www.google.com.au and using the search query
Abercrombie & Fitch 1892 Cobalt Eau De Cologne Spray 50ml/1.7oz site:wheretobuybeauty.com.au
Second example:
Try searching using:
Acqua Di Parma Collezione Barbiere Shaving Cream 75ml/2.5oz site:wheretobuybeauty.com.au
The current state:
Google webmaster tools Index Status shows:
26,000 pages indexed
44,000 pages blocked by robots.
In late March, we implemented a change recommended by an SEO expert Harmeen and he provided a new robots.txt file, advised that we should amend sitemap.xml and other changes. We implemented those changes and then setup a re-index of the site by google. The no of blocked URLs eventually reduced in May and June to 1,000 for a few days – but now the problem has rapidly returned.
This new site has been re-crawled over last 4 weeks.
About the site
55,000 URLs in sitemap.xml submitted successfully to webmaster tools
robots.txt file has been modified several times:
Firstly we had none, then we created one but were advised that it needed to have this current content:
“User-agent: *
Disallow:
Sitemap: http://www.wheretobuybeauty.com.au/sitemap.xml”
I put this into robots.txt but was then advised yesterday that there should be no blank line between these lines and I removed them yesterday.
-
Hi Alex,
Without diving in to the issue of increased number of 404 errors being reported by Webmaster tools account, let us first look at the core issue where, 404 pages (non-existing resources) that return an HTTP header status code 200. These are called, 'soft 404 errors'. Ideally all the non-existing resources on the website should return an HTTP header status code 404 or 410 as per the situation and not a status 200 which is very confusing for search engines and a bad practice. This should be fixed immediately. Please have all such pages return 404 and not 200 as soon as possible.
Here you go for more about the soft 404 errors:
https://support.google.com/webmasters/answer/181708?hl=en
and here to know more about when to return a 404 status code:
https://support.google.com/webmasters/answer/2409439?hl=en
Best regards,
Devanur Rafi.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
January 10, 2017 - Intrusive interstitials Google Update
Hi all, As everyone is most likely aware, Google have recently announced that if a site has intrusive intersitals that push the main content below the fold, will be downgraded in the SERP's from January 10th. At the moment we have a range of international sites, .ca, .com.au, .co.uk, .fr etc - if a user from a UK IP goes to a .ca site - a country switcher dialog will appear. I am aware that this may affect our sites performance in mobile search when the update comes out - however, if we block Google from seeing this - will they still pick it up? Thanks.
Algorithm Updates | | Brett-S0 -
Anchor name URLs & anchor blocks: how Google sees them?
Hi guys, Anchor name URLs & anchor blocks: how Google sees them? As far as I know Google hasn't ever recommended anchor name URLs and anchor blocks, mostly when you have one page site, but I have ran into an organic result with an hyper-link to an anchor name URL. anchor name link There is a proper link and there aren't on the page and the code the words "Jump to". It means Google has put those words there and it has also taken the header of that block as anchor text. Why has Google placed that link? The query is "faqs umbrella company", so I thought that Google has seen "faqs umbrella company" like "what is the most popular faq about umbrella companies?" and therefore perhaps the correct answer could be "Is an umbrella company the only option I have? What are the alternatives?". Although, IMHO the most popular FAQ on Umbrella Companies should always be "what is an umbrella company". Unfortunately, that page is only worthy of third Google organic result page and there is no hint of rich snippet or any kind of conversational/KBT optimisation on its source code. no-rich-snippet Someone has any idea of why Google shows that link and if it's something that we can optimise in our pages? Cheers Pierpaolo IhwGwkb.jpg VWORt5F.jpg
Algorithm Updates | | madcow780 -
Need Advice - Google Still Not Ranking
Hi Team - I really need some expert level advice on an issue I'm seeing with our site in Google. Here's the current status. We launched our website and app on the last week of November in 2014 (soft launch): http://goo.gl/Wnrqrq When we launched we were not showing up for any targeted keywords, long tailed included, even the title of our site in quotes. We ranked for our name only, and even that wasn't #1. Over time we were able to build up some rankings, although they were very low (120 - 140). Yesterday, we're back to not ranking for any keywords. Here's the history: While developing our app, and before I took over the site, the developer used a thin affiliate site to gather data and run a beta app over the course of 1 - 2 years. Upon taking on the site and moving to launch the new website/app I discovered what had been run under the domain. Since than the old site has been completely removed and rebuild, with all associated urls (.uk, .net, etc...) and subdomains shutdown. I've allowed all the old spammy pages (thousands of them to 404). We've disavowed the old domains (.net, .uk that were sending a ton of links to this), along with some links that seemed a little spammy that were pointing to our domain. There are no manual actions or messaged in Google Webmaster Tools. The new website uses (SSL) https for the entire site, it scores a 98 / 100 for a mobile usability (we beat our competitors on Google's PageSpeed Tool), it has been moved to a business level hosting service, 301's are correctly setup, added terms and conditions, have all our social profiles linked, linked WMT/Analytics/YouTube, started some Adwords, use rel="canonical", all the SEO 101 stuff ++. When I run the page through the moz tool for a specific keyword we score an A. When I did a crawl test everything came back looking good. We also pass using other tools. Google WMT, shows no html issues. We rank well on Bing, Yahoo and DuckDuckGo. However, for some reason Google will not rank the site, and since there is no manual action I have no course of action to submit a reconsideration request. From an advanced stance, should we bail on this domain, and move to the .co domain (that we own, but hasn't been used before)? If we 301 this domain over, since all our marketing is pointed to .com will this issue follow us? I see a lot of conflicting information on algorithmic issues following domains. Some say they do, some say they don't, some say they do since a lot of times people don't fix the issue. However, this is a brand new site, and we're following all of Google's rules. I suspect there is an algorithmic penalty (action) against the domain because of the old thin affiliate site that was used for the beta and data gathering app. Are we stuck till Google does an update? What's the deal with moving us up, than removing again? Thoughts, suggestions??? I purposely, did a short url to leave out the company name, please respect that, since I don't want our issues to popup on a web search. 🙂
Algorithm Updates | | get4it0 -
Big hit taken on Google Search in Jan - Any Ideas?
Hello, I manage a news site that gets new items posted daily. We had had a pretty even keel with Google search and ranking for some time now only on the 9th Jan we took a massive drop and have no recovered except for one big spike on the 29th January. The only think we had done differently was not post as much over Christmas for about a week as people were on holiday but if this was the reason for it the posting is back to normal now and has been since the 6th Jan and nothing has recovered. The site is wjlondon.com - any ideas greatly appreciated. Thank you
Algorithm Updates | | luwhosjack0 -
Does anyone have an idea of the benefits of Google Analytics Premium?
We've been having a discussion about the GA Premium service here in our office, trying to weigh up the pro's and con's... For the majority all it seems you gain access to is more support from google. We're trying to find out if that is the case or if you gain extra information, such as and insight into the search terms who must not be named. Of course i'm talking about the (Not Set) data... This section of data is ever increasing, yes i know we can access certain terms through webmasters but it was so much easier (in the good ol' days) when all the data was under one roof! Any thought opinions or even more questions would be greatly appreciated, i look forward to your responses. Anthony
Algorithm Updates | | Kal-SEO0 -
De-indexed homepage in Google - very confusing.
A website I provide content for has just suffered a de-indexed homepage in Google (not in any of the other search engines) - all the other pages remained indexed as usual. Client asked me what might be the problem and I just couldn't figure it out - no linkbuilding has ever been carried out so clean backlink profile, etc. I just resubmitted it and it's back in its usual place, and has maintained the rankings (and PR) it had before it disappeared a few days ago. I checked WMT and no warnings or issues there. Any idea why this might've happened?
Algorithm Updates | | McTaggart0 -
Google Algo Update In Que. What consititues over optimization?
http://www.pcmag.com/article2/0,2817,2401732,00.asp According to this, Google is bringing the hammer down soon on another 10-20% of the search results. While we don't advocate keyword stuffing, exchanging links, or anything too risky I am still concerned. Do we know if the example "perfectly optimized page"; http://www.seomoz.org/blog/perfecting-keyword-targeting-on-page-optimization is now going to be penalty bait? Is this over stuffing? Also, how might this effect ecommerce sites in particular?
Algorithm Updates | | iAnalyst.com2 -
Removing secure subdomain from google index
we've noticed over the last few months that Google is not honoring our main website's robots.txt file. We have added rules to disallow secure pages such as: Disallow: /login.cgis Disallow: /logout.cgis Disallow: /password.cgis Disallow: /customer/* We have noticed that google is crawling these secure pages and then duplicating our complete ecommerce website across our secure subdomain in the google index (duplicate content) https://secure.domain.com/etc. Our webmaster recently implemented a specific robots.txt file for the secure subdomain disallow all however, these duplicated secure pages remain in the index. User-agent: *
Algorithm Updates | | marketing_zoovy.com
Disallow: / My question is should i request Google to remove these secure urls through Google Webmaster Tools? If so, is there any potential risk to my main ecommerce website? We have 8,700 pages currently indexed into google and would not want to risk any ill effects to our website. How would I submit this request in the URL Removal tools specifically? would inputting https://secure.domain.com/ cover all of the urls? We do not want any secure pages being indexed to the index and all secure pages are served on the secure.domain example. Please private message me for specific details if you'd like to see an example. Thank you,0