Google tries to index non existing language URLs. Why?

TheHecksler

Hi,

I am working for a SAAS client. He uses two different language versions by using two different subdomains.
de.domain.com/company for german and en.domain.com for english. Many thousands URLs has been indexed correctly.

But Google Search Console tries to index URLs which were never existing before and are still not existing.

de.domain.com**/en/company
en.domain.com/de/**company

... and an thousand more using the /en/ or /de/ in between. We never use this variant and calling these URLs will throw up a 404 Page correctly (but with wrong respond code - we`re fixing that ). But Google tries to index these kind of URLs again and again. And, I couldnt find any source of these URLs. No Website is using this as an out going link, etc.
We do see in our logfiles, that a Screaming Frog Installation and moz.com w opensiteexplorer were trying to access this earlier.

My Question: How does Google comes up with that? From where did they get these URLs, that (to our knowledge) never existed?

Any ideas? Thanks

NickSamuel

Hi Hecksler,

Did you ever resolve this?

Quick idea from me is to double check ALL version of your website within Google Search Console. You can now register the entire domain property using DNS: https://searchengineland.com/how-to-set-up-google-search-console-domain-verification-for-site-wide-reporting-data-313256

I found that Google was trying to crawl a very old HTTP sitemap from about five years ago for one of my sites, and thus I was able to delete it.

There's some mixed comments/feeling within the Search Community about whether or not GoogleBot really "guesses" URLs, so it's probably more than likely they are getting the links from somewhere....https://stackoverflow.com/questions/20855082/googlebot-guesses-urls-how-to-avoid-handle-this-crawling

Look forward to hearing from you,

Nick

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Google tries to index non existing language URLs. Why?

Browse Questions

Explore more categories

Related Questions

When i type site:jamalon.com to discover number of pages indexed it gives me different result from google web master tools

Google's Omitted Results - Attempt to De-Index

I am trying to figure out why a website is not getting fully indexed by google. Any ideas?

Carl errors on urls that don't normally exist

Crawl reveals hundreds of urls with multiple urls in the url string

Can JavaScrip affect Google's index/ranking?

Google indexing thousands crazy search results with %25253

Strange Top URLs for Keywords in Google Webmaster Tools