GSC is reporting a lot of chopped URLs
-
Recently, in the last two weeks, I started seeing a lot of odd 404 errors in GSC for my site. Upon investigation, the URLs are for fairly new articles, and the URLs are chopped in various places. From missing a character at the end to missing about 10 characters at the end of the URL. (an old similar issue is that GSC reports duplicate contents on weird subdomains that we've never used like 'smtp' 'ww1' or even random ones like 'bobo'.)
GSC doesn't report any 'linked from' for those odd URLs and I know for sure these links aren't on the site itself. They're definitely not errors in the CMS.
The site is a long established site (started 1997-1998) and we've been subject to a lot of negative SEO. I recently had to disavow about 1000 .ru domain linking to us, with some domains containing over a million link each.
Could these chopped links be a new tactic of negative SEO? How do I find these seemingly intentionally broken links to us?
-
Thanks for the question. It isn't uncommon for there to be strange 404 errors in Search Console with little information/bad information. They are working hard to improve this, but I wouldn't take everything you see there as set-in-stone.
This doesn't sound like a negative SEO tactic. I would just mark them all as fixed, and see if they appear again in about a week. If they do, I'd make sure they are actually served as 4xx status and not worry too much about it. If you want to do more digging...
Some ideas of where you could look further
- Logs logs logs. This will be the ultimate truth - you will be able to see whether or not GoogleBot is actually hitting those URLs.
- It could be something weird happening with a plugin of yours that generates those URLs (particularly on Wordpress).
- Perhaps you have a filtering system setup that generates these URLs?
- If you have a search function on the site, sometimes weird URLs can be generated through that.
- Do the URLs come-up when you crawl the site at all?
Just a few ideas!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL indexed but not submitted in sitemap, however the URL is in the sitemap
Dear Community, I have the following problem and would be super helpful if you guys would be able to help. Cheers Symptoms : On the search console, Google says that some of our old URLs are indexed but not submitted in sitemap However, those URLs are in the sitemap Also the sitemap as been successfully submitted. No error message Potential explanation : We have an automatic cache clearing process within the company once a day. In the sitemap, we use this as last modification date. Let's imagine url www.example.com/hello was modified last time in 2017. But because the cache is cleared daily, in the sitemap we will have last modified : yesterday, even if the content of the page did not changed since 2017. We have a Z after sitemap time, can it be that the bot does not understands the time format ? We have in the sitemap only http URL. And our HTTPS URLs are not in the sitemap What do you think?
Intermediate & Advanced SEO | | ZozoMe0 -
Long product urls ecommerce store
Hi we have a site in the mens fashion space who have long product urls which look like this: https://www.domain.com/catalog/product/view/id/13700/s/the-mate-tee-grey-marle-upm618g/category/120/ The site is on Magento. Are there any serious SEO negatives of having such a long product url and including irrelevant information in the url like product/view/id/13700/s/ & /category/120/ in the URL. Or are the benefits of changing them to more URL friendly product urls like: https://www.domain.com/the-mate-tee-grey-marle-upm/ Minimal? Cheers.
Intermediate & Advanced SEO | | wozniak650 -
Google ranking 301 redirected vanity urls
We use vanity URLs for offline marketing. An example vanity URL would be www.clientsite.com/promotion, this URL 301 redirects to a page on the site with tracking parameter ex: www.clientsite.com/mainpage?utm_source=source&utm_medium=print&utm_campaign=xyz. We are running into issues with Google ignoring the 301 redirect and ranking these vanity URLs instead of the actual page on the website. Any suggestions on how to resolve?
Intermediate & Advanced SEO | | digitalhound0 -
Many New Urls at once
Hi, I have about 5,000 new URLs to publish. For SEO/Google - Should I publish them gradually, or all at once is fine? *By the way - all these URLs were already indexed in the past, but then redirected. Cheers,
Intermediate & Advanced SEO | | viatrading10 -
What would cause these ⠃︲蝞韤諫䴴SPপ� emblems in my urls?
In Search Console I am getting errors under other. It is showing urls that have this format- https://www.site.com/Item/654321~SURE⠃︲蝞韤諫䴴SPপ�.htm When clicked it shows 蝞韤諫䴴SPপ� instead of the % stuff. As you can see this is an item page and the normal item page pulls up fine with no issues. This doesn't show it is linked from anywhere. Why would google pull this url? It doesn't exist on the site anywhere. It is a custom asp.net site. This started happening in mid May but we didn't make any changes then.
Intermediate & Advanced SEO | | EcommerceSite0 -
URL Injection Hack - What to do with spammy URLs that keep appearing in Google's index?
A website was hacked (URL injection) but the malicious code has been cleaned up and removed from all pages. However, whenever we run a site:domain.com in Google, we keep finding more spammy URLs from the hack. They all lead to a 404 error page since the hack was cleaned up in the code. We have been using the Google WMT Remove URLs tool to have these spammy URLs removed from Google's index but new URLs keep appearing every day. We looked at the cache dates on these URLs and they are vary in dates but none are recent and most are from a month ago when the initial hack occurred. My question is...should we continue to check the index every day and keep submitting these URLs to be removed manually? Or since they all lead to a 404 page will Google eventually remove these spammy URLs from the index automatically? Thanks in advance Moz community for your feedback.
Intermediate & Advanced SEO | | peteboyd0 -
One site two languages - what to do with urls?
Hi, We are working with a client who has a Spanish site which is in English and Spanish, what is the best url structure to go for? www.domain.es and en.domain.es or www.domain.es and www.domain.es/en or none of the above?
Intermediate & Advanced SEO | | J_Sinclair0 -
Expired urls
For a large jobs site, what would be the best way to handle job adverts that are no longer available? Ideas that I have include: Keep the url live with the original content and display current similar job vacancies below - this has the advantage of continually growing the number of indexed pages. 301 redirect old pages to parent categories - this has the advantage of concentrating any acquired link juice where it is most needed. Your thoughts much appreciated.
Intermediate & Advanced SEO | | cottamg0