Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites
-
Hi,
My first time posting here, I am just looking for some feedback on a indexation issue we have with a client and any feedback on possible next steps or items I may have overlooked.
To give some background, our client operates a website for the core band and a also a number of microsites based on specific business units, so you have corewebsite.com along with bu1.corewebsite.com, bu2.corewebsite.com.
The content structure isn't ideal, as each microsite follows a structure of bu1.corewebsite.com/bu1/home.aspx, bu2.corewebsite.com/bu2/home.aspx and so on.
In addition to this each microsite has duplicate folders from the other microsites so bu1.corewebsite.com has indexable folders bu1.corewebsite.com/bu1/home.aspx but also bu1.corewebsite.com/bu2/home.aspx the same with bu2.corewebsite.com has bu2.corewebsite.com/bu2/home.aspx but also bu2.corewebsite.com/bu1/home.aspx. Therre are 5 different business units so you have this duplicate content scenario for all microsites.
This situation is being addressed in the medium term development roadmap and will be rectified in the next iteration of the site but that is still a ways out.
The issue
About 6 weeks ago we noticed a drop off in search rankings for two of our microsites (bu1.corewebsite.com and bu2.corewebsite.com) over a period of 2-3 weeks pretty much all our terms dropped out of the rankings and search visibility dropped to essentially 0.I can see that pages from the websites are still indexed but oddly it is the duplicate content pages so (bu1.corewebsite.com/bu3/home.aspx or (bu1.corewebsite.com/bu4/home.aspx is still indexed, similiarly on the bu2.corewebsite microsite bu2.corewebsite.com/bu3/home.aspx and bu4.corewebsite.com/bu3/home.aspx are indexed but no pages from the BU1 or BU2 content directories seem to be indexed under their own microsites.
Logging into webmaster tools I can see there is a "Google couldn't crawl your site because we were unable to access your site's robots.txt file." This was a bit odd as there was no robots.txt in the root directory but I got some weird results when I checked the BU1/BU2 microsites in technicalseo.com robots text tool.
Also due to the fact that there is a redirect from bu1.corewebsite.com/ to bu1.corewebsite.com/bu4.aspx I thought maybe there could be something there so consequently we removed the redirect and added a basic robots to the root directory for both microsites.
After this we saw a small pickup in site visibility, a few terms pop into our Moz campaign rankings but drop out again pretty quickly. Also the error message in GSC persisted.
Steps taken so far after that
- In Google Search Console, I confirmed there are no manual actions against the microsites.
- Confirmed there is no instances of noindex on any of the pages for BU1/BU2
- A number of the main links from the root domain to microsite BU1/BU2 have a rel="noopener noreferrer" attribute but we looked into this and found it has no impact on indexation
- Looking into this issue we saw some people had similar issues when using Cloudflare but our client doesn't use this service
- Using a response redirect header tool checker, we noticed a timeout when trying to mimic googlebot accessing the site
- Following on from point 5 we got a hold of a week of server logs from the client and I can see Googlebot successfully pinging the site and not getting 500 response codes from the server...but couldn't see any instance of it trying to index microsite BU1/BU2 content
So it seems to me that the issue could be something server side but I'm at a bit of a loss of next steps to take.
Any advice at all is much appreciated!
-
Hello ImpericMedia,
If you can share the site with me (private message is OK) I'll look into it. If you don't want to do that, here are some things I would look at:
1. If you have verified that the Robots.txt file is not blocking the pages you want indexed, and the pages are still not indexed (or indexed with a message about the Robots.txt file) you should check for a Robots Noindex meta tag on the page. If the source code looks strange you may have to use the Chrome Inspect tool to see the fully rendered page.
2. If there are no blocking robots meta tags on the page you should check the HTTP response for an X-Robots header.
3. If there is no X-Robots header, it's probably because of the duplicate content and spammy(seeming) subdomain setup.
Sorry about the wait. If you include the site URL it will get other community member's curious enough to check it out next time.
I hope this helps. If not, feel free to message me.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
International SEO and Indexing in Country Specific Search Engines
Hey everyone! My company has recently migrated to a new domain (www.napoleon.com) which includes migrating many TLD separate domains to the new. We have structured the website to have multi-language and regions, including regional content, product offerings etc. Our structure is as follows just to give an example. napoleon.com/en/ca/
Intermediate & Advanced SEO | | Napoleon.com
napoleon.com/fr/ca/
napoleon.com/en/us/
napoleon.com/de/de Currently, specifically the homepage version of the USA website is indexing into Canadian Search Engines, and I can't figure out why. It has been roughly 6 weeks since launch. Any thoughts on this? Thank you Dustin0 -
Readd/Reindex a page that was 410'd
A script of ours had an error that caused some pages we didn't wish 410'd to be 410'd, we caught it in about 12 hours but for some pages it was too late. My question is, will those pages be reindexed again and how will that affect their page ranking will they eventually be back where they were? Would submitting a site map with them help, or what would be the best way to correct this error (submit the links to google indexer maybe?).
Intermediate & Advanced SEO | | Wana-Ryd0 -
Google Search Console Crawl Errors?
We are using Google Search Console to monitor Crawl Errors. It seems Google is listing errors that are not actual errors. For instance, it shows this as "Not found": https://tapgoods.com/products/tapgoods__8_ft_plastic_tables_11_available So the page does not exist, but we cannot find any pages linking to it. It has a tab that shows Linked From, but if I look at the source of those pages, the link is not there. In this case, it is showing the front page (listed twice, both for http and https). Also, one of the pages it shows as linking to the non-existant page above is a non-existant page. We marked all the errors as fixed last week and then this week they came up again. 2/3 are the same pages we marked as fixed last week. Is this an issue with Google Search Console? Are we getting penalized for a non existant issue?
Intermediate & Advanced SEO | | TapGoods0 -
Visibility for https://goo.gl/gJH7eh
Hi Mozzers, I am wondering if anyone can help me with the following. At the start of May this year we really lost visibility for the homepage of this site https://goo.gl/gJH7eh. This was particularly noticeable by tracking rankings for the term 'oak furniture'. We previously ranked on page 1 for the term 'oak furniture', but since May the homepage has struggled to make the top 100 positions for this term. We're confident that we have done everything within Google's guidelines, but it seems something is really holding the homepage back. The site ranks on page 1 for 'oak furniture' on Bing. The site had previously had a manual penalty for unnatural links (warning received several years ago). These links had a particular emphasis on using the anchor text 'oak furniture'. When we took over the site we did an extensive link clean up and disavow and managed to get the penalty removed at the end of October 2013. Any help would be greatly appreciated. Karen
Intermediate & Advanced SEO | | OFS0 -
Using a 302 re-direct from http://www to https://www to secure customer data
My website sends Customers from a http://www.mysite.com/features page to a https://www.mysite.com/register page which is an account sign-up form using a 302 re-direct. Any page that collects customer data has an authenticated SSL certificate to protect any data on the site. Is this 302 the most appropriate way of doing this as the weekly crawl picks it up as being bad practise? Is there a better alternative?
Intermediate & Advanced SEO | | Ubique0 -
Re-Direct Users But Don't Affect Googlebot
This is a fairly technical question... I have a site which has 4 subdomains, all targeting a specific language. The brand owners don't want German users to see the prices on the French sub domain and are forcing users into a re-direct to the relevant subddomain, based on their IP address. If a user comes from a different country, (ie the US) they are forced on the UK sub domain. The client is insistent on keeping control of who sees what (I know that's a debate in it's own right), but these re-directs we're implementing to make that happen, are really making it difficult to get all the subdomains indexed as I think googlebot is also getting re-directed and is failing to do it's job. Is there are a way of re-directing users, but not Googlebot?
Intermediate & Advanced SEO | | eventurerob0 -
Sitemap - % of URL's in Google Index?
What is the average % of links from a sitemap that are included in the Google index? Obviously want to aim for 100% of the sitemap urls to be indexed, is this realistic?
Intermediate & Advanced SEO | | stats440 -
What are the different tactics for getting ranked/ included in Google finance searches such as http://www.google.com/finance/company_news?q=NASDAQ:ADBE
I don't know what ranking factors they are using for this feed. The results vary greatly from a search done at google.com or google.com/news and google.com/finance I'm working with a website that regularly publishes finance-related news and currently gets traffic from google finance. I'm wondering what we can do to optimize our news articles to possibly show more prominently or more often. Thanks
Intermediate & Advanced SEO | | joemascaro0