Google indexing thousands crazy search results with %25253
-
In GWT I started seeing very strange pages indexed a few weeks, and Google is no reporting over 21,000 of pages (blocked by robots.txt) with weird URLs like this:
The current robots.txt looks like this:
User-agent: *
Disallow: /wp-contentDisallow: /wp-admin
Disallow: /wp-includes
Disallow: /data
Disallow: /slideshows
Disallow: /page/*/?s=
Disallow: /?s=
Disallow: /searchThis website is running an up to date WP install with Yoast's Google Analytics and SEO plug-in. I can't point to anything specific that happened with the site when these URLs started appearing even after I modified the robots.txt.
What can be done to try and stop Google from creating and indexing these goofy URLs?
I see lots of sites having this issue when I search in Google, but no one seems to have a solution.
-
As it turns out the problem is with Yoast's Google Analytics plug-in per Yoast. However, he has not yet released a fix nor given a date for the fix as of yet. So one either needs to deal with it until fixed or switch plug-ins.
-
Hi Sha,
Well, that is a new possible lead, but unfortunately Pictage is basically worthless when it comes to any technological issues.
Hmm, is there some way I could add "noindex" tags to anything link that appears on the Proof page as they are dynamic in appearance?
Thanks,
Joe
-
Hi again Joe,
After a more detailed look at your site (which has no obvious search box available to users) I was curious as to why all of the things that you are doing on the site seem to have no effect upon the issues you are trying to resolve...and why your site is generating thousands of search queries without a search box!
This says to me "do you have control of all of the content?" ... and it appears that you are using an external service called Pictage to upload and display client portfolios.
So, are you pulling content into your site from Pictage? Is it some kind of white label add-on to your site?
If the pages from Pictage are being generated externally, then the yoast plugin cannot add the "noindex" tag to those pages...if this is the case then I would say you need to contact the Pictage help people and advise them that there is a problem they need to attend to.
Hope that helps,
Sha
-
Hi Egol,
Hmm, I have never heard of that possibility.
How can I change the resultant search URL with a Wordpress install?
Thanks.
-
Hi Sha,
I made the changes weeks ago, but more pages keep appearing which tells me Google is still trying to index them?
There is already an "s" parameter set in GWT, but I don't really see many options in this screen - are there some settings I'm missing?
There are also page URLs like this one, can they be blocked as well?
-
In addition to the suggestions already given... if this was my site I would change the URL of the search results page. Someone might have a robot that is tossing crap queries into your search box.
-
Hi Joe,
A couple of things:
- If you have made the change to noindex search results recently, it may take some time for the errors to disappear from GWT. If the number of pages continues to grow, then clearly the noindex is not implemented as you expect.
- You could try using the parameter handling feature in GWT to tell googlebot to ignore all pages with the parameter in question. In your search string, the ? says "here come some parameters" and the "s" is the parameter that you want to ignore.
Incidentally, there is definitely something funky happening with the generation of those search strings which should be investigated and resolved as well.
Hope that helps,
Sha
-
Yoast's WordPress SEO plug-in automatically does the following:
- RSS feeds are now always noindex, followed. No search engine should ever list an RSS feed as a result in the resultpages.
- Admin, login and registration pages are always noindexed now for the same reason.
- Search result pages are now always noindex, follow.
-
This is in your own website's search, right?
I've always heard that you should do on page robots that make it:
no-index, follow
So that all of the links on the page can be followed, but Google will not index it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page missing from Google index
Hi all, One of our most important pages seems to be missing from the Google index. A number of our collections pages (e.g., http://perfectlinens.com/collections/size-king) are thin, so we've included a canonical reference in all of them to the main collection page (http://perfectlinens.com/collections/all). However, I don't see the main collection page in any Google search result. When I search using "info:http://perfectlinens.com/collections/all", the page displayed is our homepage. Why is this happening? The main collection page has a rel=canonical reference to itself (auto-generated by Shopify so I can't control that). Thanks! WUKeBVB
Technical SEO | | leo920 -
How do I influence what page on my site google shows for specific search phrases?
Hi People, My client has a site www.activeadventures.com. They provide adventure tours of New Zealand, South America and the Himalayas. These destinations are split into 3 folders in the site (eg: activeadventures.com/new-zealand, activeadventures.com/south-america etc....). The actual root folder of the site is generic information for all of the destinations whilst the destination specific folders are specific in their information for the destination in question. The Problem: If you search for say "Active New Zealand" or "Adventure Tours South America" our result that comes up is the activeadventures.com homepage rather than the destination folder homepage (eg: We would want activeadventures.com/new-zealand to be the landing page for people searching for "active new zealand"). Are there any ways in influence google as to what page on our site it chooses to serve up? Many thanks in advance. Conrad
Technical SEO | | activenz0 -
Skip indexing the search pages
Hi, I want all such search pages skipped from indexing www.somesite.com/search/node/ So i have this in robots.txt (Disallow: /search/) Now any posts that start with search are being blocked and in Google i see this message A description for this result is not available because of this site's robots.txt – learn more. How can i handle this and also how can i find all URL's that Google is blocking from showing Thanks
Technical SEO | | mtthompsons0 -
Website Migration - Very Technical Google "Index" Question
This is my understanding of how Google's search works, and I am unsure about one thing in specifc: Google continuously crawls websites and stores each page it finds (let's call it "page directory") Google's "page directory" is a cache so it isn't the "live" version of the page Google has separate storage called "the index" which contains all the keywords searched. These keywords in "the index" point to the pages in the "page directory" that contain the same keywords. When someone searches a keyword, that keyword is accessed in the "index" and returns all relevant pages in the "page directory" These returned pages are given ranks based on the algorithm The one part I'm unsure of is how Google's "index" connects to the "page directory". I'm thinking each page has a url in the "page directory", and the entries in the "index" contain these urls. Since Google's "page directory" is a cache, would the urls be the same as the live website? For example if webpage is found at wwww.website.com/page1, would the "page directory" store this page under that url in Google's cache? The reason I ask is I am starting to work with a client who has a newly developed website. The old website domain and files were located on a GoDaddy account. The new websites files have completely changed location and are now hosted on a separate GoDaddy account, but the domain has remained in the same account. The client has setup domain forwarding/masking to access the files on the separate account. From what I've researched domain masking and SEO don't get along very well. Not only can you not link to specific pages, but if my above assumption is true wouldn't Google have a hard time crawling and storing each page in the cache?
Technical SEO | | reidsteven750 -
Example of Google Indexing my Feedburner Links
As you can see, there are 2 results for the same page. One is the correct page URL, the other has the Feedburner parameters at the end: http://www.thewebhostinghero.com/articles/improving-user-engagement-with-the-right-blog-commenting-system.html http://www.thewebhostinghero.com/articles/improving-user-engagement-with-the-right-blog-commenting-system.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+thewebhostinghero+(TheWebHostingHero.com) Can this cause duplicate content issues? Can I prevent Google from indexing my Feedburner links? My Feedburner settings are already set to noindex, what else can I do?!? 22cfThX.png
Technical SEO | | sbrault740 -
Should I nofollow search results pages
I have a customer site where you can search for products they sell url format is: domainname/search/keywords/ keywords being what the user has searched for. This means the number of pages can be limitless as the client has over 7500 products. or should I simply rel canonical the search page or simply no follow it?
Technical SEO | | spiralsites0 -
Does Google index has expiration?
Hi, I have this in mind and I think you can help me. Suppose that I have a pagin something like this: www.mysite.com/politics where I have a list of the current month news. Great, everytime the bot check this url, index the links that are there. What happens next month, all that link are not visible anymore by the user unless he search in a search box or google. Does google keep those links? The current month google check that those links are there, but next month are not, but they are alive. So, my question is, Does google keep this links for ever if they are alive but nowhere in the site (the bot not find them anymore but they work)? Thanks
Technical SEO | | informatica8100 -
Can JavaScrip affect Google's index/ranking?
We have changed our website template about a month ago and since then we experienced a huge drop in rankings, especially with our home page. We kept the same url structure on entire website, pretty much the same content and the same on-page seo. We kind of knew we will have a rank drop but not that huge. We used to rank with the homepage on the top of the second page, and now we lost about 20-25 positions. What we changed is that we made a new homepage structure, more user-friendly and with much more organized information, we also have a slider presenting our main services. 80% of our content on the homepage is included inside the slideshow and 3 tabs, but all these elements are JavaScript. The content is unique and is seo optimized but when I am disabling the JavaScript, it becomes completely unavailable. Could this be the reason for the huge rank drop? I used the Webmaster Tolls' Fetch as Googlebot tool and it looks like Google reads perfectly what's inside the JavaScrip slideshow so I did not worried until now when I found this on SEOMoz: "Try to avoid ... using javascript ... since the search engines will ... not indexed them ... " One more weird thing is that although we have no duplicate content and the entire website has been cached, for a few pages (including the homepage), the picture snipet is from the old website. All main urls are the same, we removed some old ones that we don't need anymore, so we kept all the inbound links. The 301 redirects are properly set. But still, we have a huge rank drop. Also, (not sure if this important or not), the robots.txt file is disallowing some folders like: images, modules, templates... (Joomla components). We still have some html errors and warnings but way less than we had with the old website. Any advice would be much appreciated, thank you!
Technical SEO | | echo10