Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Reverify website
Hello, I want to disavow some really dodgy links and although the website is verifed since we see a meta tag in the source code, we no longer have the login details for the email address used to verify it so can't access the tools console. So how do I upload a disavow file (the links i got were from Ahref and Moz analysis)? Should I reverify the website using a new email address and upload a new meta tag? Is there a problem with doing this? Sorry if this is obvious!
Technical SEO | | AL123al0 -
Sitelinks only show when the URL is searched- Why don't they show when our company name is searched?
Why is is that when I search "protonmail.ch", sitelinks show for our company. However when you search for "ProtonMail", no sitelinks show, even though our homepage is now on the top result. We've been trying different things to improve the navigational structure of the homepage, such as using the <nav>tag. If you have any thoughts on why sitelinks might not be showing up, we'd really appreciate it! Thank you </nav>
Technical SEO | | kevinzh0 -
Google Site Search
I'm considering to implement google site search bar into my site.
Technical SEO | | JonsonSwartz
I think I probably choose the version without the ads (I'll pay for it). does anyone use Google Site Search and can tell if it's a good thing? does it affects in any way on seo? thank you0 -
Link Anchor Text
When we have run a Open Site Explorer analysis on our own site, it says that for all our internal links the Link Anchor Text is 'Help with logging in' I am a bit confused as to why it shows that. That text does appear in the header of the page, but is not the first piece of text. Why is it happening on our site?
Technical SEO | | MattAshby
Why do I not see this on other sites?
What affect does this have on our ranking?
What's the best fix? Example page that we ran on Open Site Explorer: www.rightboat.com/search?manufacturer=Beneteau&model=Antares+9.800 -
Better to Remove Toxic/Low Quality Links Before Building New High Quality Links?
Recently an SEO audit from a reputable SEO firm identified almost 50% of the incoming links to my site as toxic, 40% suspicious and 5% of good quality. The SEO firm believes it imperative to remove links from the toxic domains. Should I remove toxic links before building new one? Or should we first work on building new links before removing the toxic ones? My site only has 442 subdomains with links pointing to it. I am concerned that there may be a drop in ranking if links from the toxic domains are removed before new quality ones are in place. For a bit of background my site has a MOZ Domain authority of 27, a Moz page authority of 38. It receives about 4,000 unique visitors per month through organic search. About 150 subdomains that link to my site have a Majestic SEO citation flow of zero and a Majestic SEO trust flow of zero. They are pretty low quality. However I don't know if I am better off removing them first or building new quality links before I disavow more than a third of the links to the site. Any ideas? Thanks,
Technical SEO | | Kingalan1
Alan0 -
How do I know which page a link is from
I've got an interesting situation. I hope you can help. I have a list of links but I'm not sure which pages of my site they are from. How do I know which page a specific link is from? Thanks in advance.
Technical SEO | | VinceWicks0 -
OSE Link Differential
I have the chrome toolbar installed. In the SERP a site I was looking at had 686 links from 12 domains linking to the root domain. When I checked this site in OSE with filters set to all pages in root domain it shows 65 links from 12 domains. Can anyone explain the difference?
Technical SEO | | waynekolenchuk0