Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Image Search
Hello Community, I have been reading and researching about image search and trying to find patterns within the results but unfortunately I could not get to a conclusion on 2 matters. Hopefully this community would have the answers I am searching for. 1) Watermarked Images (To remove or not to remove watermark from photos) I see a lot of confusion on this subject and am pretty much confused myself. Although it might be true that watermarked photos do not cause a punishment, it sure does not seem to help. At least in my industry and on a bunch of different random queries I have made, watermarked images are hard to come by on Google's images results. Usually the first results do not have any watermarks. I have read online that Google takes into account user behavior and most users prefer images with no watermark. But again, it is something "I have read online" so I don't have any proof. I would love to have further clarification and, if possible, a definite guide on how to improve my image results. 2) Multiple nested folders (Folder depth) Due to speed concerns our tech guys are using 1 image per folder and created a convoluted folder structure where the photos are actually 9 levels deep. Most of our competition and many small Wordpress blogs outrank us on Google images and on ALL INSTANCES I have checked, their photos are 3, 4 or 5 levels deep. Never inside 9 nested folders.
Technical SEO | | Koki.Mourao
So... A) Should I consider removing the watermark - which is not that intrusive but is visible?
B) Should I try to simplify the folder structure for my photos? Thank you0 -
Links disappeared
Hi, I am a wedding photographer based in Liverpool. I have been trying to do my own SEO for the last 6 months. I have been hovering around the top of page two for the main search terms for the past few years. I used an SEO company before christmas who got a lot of spammy links which resulted in my site dropping to page 4 of the SERPS. With the help of this forum I managed to locate them and disavow those links, and have tried to do it myself. I have managed to gain a few "featured weddings" on national wedding blogs and wrote a few articles for another wedding blog and also some forum comments. I have also got a few links for example from a wedding band in exchange for some photographs. I have got onto page 1 about 4 times, the best result was at position 6 on page 1 but every time I have slowly dropped out again. I have methodically (once a month) checked for any of the spammy links and updated the disavow list. My competitors have at best old forum comments and the like and on checking their websites with open site explorer are not actively link building at all. I have just checked my Webmaster tools and google is only recognising 51 links. (none of my good wedding blog links are there) I have an external links csv from the 28th June with 602 links on it. I changed my website around May of this year but it is still on the same domain name www.dwliverpoolphotography.co.uk. Can anybody help? Best wishes. David.
Technical SEO | | WallerD0 -
Do the terms in a website url drive search hits
I've tried to do a search on a few key words that I knew was on my landing page and I couldn't get Google to find it. So I thought maybe I needed to change my url to reflect a few the terms.
Technical SEO | | Toal0 -
Is there any SEO benefit to pulling a picture from another website and linking to it from a blog?
For example, if blog.mountainmedia.com were to link a product picture directly to mountainmedia.com. Would this be considered a high quality backlink?
Technical SEO | | MountainMedia0 -
Mini site links?
Can anyone point me to information about the "mini" site links on the Google search results or tell me how to get them set up? These aren't the full site links that show 3 by 3 under the first listing but small text links that appear for certain results. (See attached image for reference.) Are these something that can controlled/requested? NAj6E.png
Technical SEO | | DVanSchepen0 -
No. of links on a page
Is it true that If there is a huge number of links from the source page then each link will provide very little value in terms of passing link juice ?
Technical SEO | | seoug_20050 -
Linking out?
First of all, sorry this Q is all in one block, but iPads don't like this site or vc/vs. When using the SEOmoz on-site keyword optimizer tool, it suggests at least one link to be to an off-site page. Would it be considered a link exchange if we linked out to an niche SUPER Authority sit that had a link back to our website? It seems like a naturally good strategy, but I'm afraid google may not agree. If the answer is no, there are many similar sites that mention our company in ver good ways, awards, etc.., but with no links. I would think this is a no-brainer. Personally I would like to eventually harvest all this press coverage to benefit our site. Btw, I was grey before I learned about SEOmoz, just like the rest of our niche. Now I'm shooting to be Snow White! Hopefully it works out. 🙂 I also wrote two landing pages that I tried to SEO the right way. I would love to hear your feedback to know if they are truly effective and if they are actually white. I think they are, but don't know "all" the rules of being white http://jamproa.com/ideology/product-innovation.php http://jamproa.com/industrial-design/what-is.php Thanks!
Technical SEO | | dmac0 -
Linking from other language websites passes juice or not?
If i get links from websites with different language than english - has the same sort of field (business type) ... will that pass juice or not? Is it worth linking or not?
Technical SEO | | mosaicpro0