After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?

rickyporco

I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there.

How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped.

Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?

effectdigital

Google Search Console actually has a URL removal tool built into it, unfortunately it's not really scaleable (mostly it's one at a time submissions) and in addition to that the effect of using the tool is only temporary (the URLs come back again)

In your case I reckon' that changing the status code of the 'gone' URLs from 404 ("temporarily not found, but will be returning soon") to 410 ("GONE!") might be a good idea. Google might digest that better as it's a harder indexation directive and a very strong crawl directive ("go away, don't come back!")

You could also serve the Meta no-index directive on those URLs. Obviously you're unlikely to have access to the HTML of non-existent pages, but did you know Meta no-index can also be fired through x-robots, through the HTTP header? So it's not impossible

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

(Ctrl+F for "X-Robots-Tag HTTP header")

Another option is this form to let Google know outdated content is gone, has been removed, and isn't coming back:

https://www.google.com/webmasters/tools/removals

... but again, URLs one at a time is going to be mega-slow. It does work pretty well though (at least in my experience)

In any eventuality I think you're looking at, a week or two for Google to start noticing in a way that you can see visually - and then maybe a month or two until it rights itself (caveat: it's different for all sites and URLs, it's variable)

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Google's Knowledge Panel

Print pages returning 404's

Crawled page count in Search console

Google's 'related:' operator

Street Address Not Appearing on Business Google+ Page

Is there any negative SEO effect of having comma's in URL's?

What are the different tactics for getting ranked/ included in Google finance searches such as http://www.google.com/finance/company_news?q=NASDAQ:ADBE

DCMI and Google's rich snippets