Google indexing thousands crazy search results with %25253
-
In GWT I started seeing very strange pages indexed a few weeks, and Google is no reporting over 21,000 of pages (blocked by robots.txt) with weird URLs like this:
The current robots.txt looks like this:
User-agent: *
Disallow: /wp-contentDisallow: /wp-admin
Disallow: /wp-includes
Disallow: /data
Disallow: /slideshows
Disallow: /page/*/?s=
Disallow: /?s=
Disallow: /searchThis website is running an up to date WP install with Yoast's Google Analytics and SEO plug-in. I can't point to anything specific that happened with the site when these URLs started appearing even after I modified the robots.txt.
What can be done to try and stop Google from creating and indexing these goofy URLs?
I see lots of sites having this issue when I search in Google, but no one seems to have a solution.
-
As it turns out the problem is with Yoast's Google Analytics plug-in per Yoast. However, he has not yet released a fix nor given a date for the fix as of yet. So one either needs to deal with it until fixed or switch plug-ins.
-
Hi Sha,
Well, that is a new possible lead, but unfortunately Pictage is basically worthless when it comes to any technological issues.
Hmm, is there some way I could add "noindex" tags to anything link that appears on the Proof page as they are dynamic in appearance?
Thanks,
Joe
-
Hi again Joe,
After a more detailed look at your site (which has no obvious search box available to users) I was curious as to why all of the things that you are doing on the site seem to have no effect upon the issues you are trying to resolve...and why your site is generating thousands of search queries without a search box!
This says to me "do you have control of all of the content?" ... and it appears that you are using an external service called Pictage to upload and display client portfolios.
So, are you pulling content into your site from Pictage? Is it some kind of white label add-on to your site?
If the pages from Pictage are being generated externally, then the yoast plugin cannot add the "noindex" tag to those pages...if this is the case then I would say you need to contact the Pictage help people and advise them that there is a problem they need to attend to.
Hope that helps,
Sha
-
Hi Egol,
Hmm, I have never heard of that possibility.
How can I change the resultant search URL with a Wordpress install?
Thanks.
-
Hi Sha,
I made the changes weeks ago, but more pages keep appearing which tells me Google is still trying to index them?
There is already an "s" parameter set in GWT, but I don't really see many options in this screen - are there some settings I'm missing?
There are also page URLs like this one, can they be blocked as well?
-
In addition to the suggestions already given... if this was my site I would change the URL of the search results page. Someone might have a robot that is tossing crap queries into your search box.
-
Hi Joe,
A couple of things:
- If you have made the change to noindex search results recently, it may take some time for the errors to disappear from GWT. If the number of pages continues to grow, then clearly the noindex is not implemented as you expect.
- You could try using the parameter handling feature in GWT to tell googlebot to ignore all pages with the parameter in question. In your search string, the ? says "here come some parameters" and the "s" is the parameter that you want to ignore.
Incidentally, there is definitely something funky happening with the generation of those search strings which should be investigated and resolved as well.
Hope that helps,
Sha
-
Yoast's WordPress SEO plug-in automatically does the following:
- RSS feeds are now always noindex, followed. No search engine should ever list an RSS feed as a result in the resultpages.
- Admin, login and registration pages are always noindexed now for the same reason.
- Search result pages are now always noindex, follow.
-
This is in your own website's search, right?
I've always heard that you should do on page robots that make it:
no-index, follow
So that all of the links on the page can be followed, but Google will not index it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical Url Structure Vs. Google Search View
I recently set up a new site and set the "preferred" domain in Google Webmasters to show URLs WITHOUT the WWW for google search purposes. In the confirmation email from google, this confused me: "This setting defines which host - www or not - should be considered the canonical host when indexing your site." In the website, we have cononical URLS at the top of every page in the header, but still have the WWW in those. Any issues with that?
Technical SEO | | vikasnwu0 -
Google will index us, but Bing won't. Why?
Bing is crawling our site, but not indexing it, and we cannot figure out why -- plus it's being indexed fine in Google. Any ideas on what the issue with Bing might be? Here's are some details to let you know what we've already checked/established: We have 4 301’s and the rest of our site checks out We’ve already established our Robots is ok, and that we are fixing our site map/it's in fine shape We do not see anything blocking bingbot access to the site There is no varnish or any load balancers, so nothing on that end that would be blocking the access We also don't see any rules in the apache or the .htaccess config that would be blocking the access
Technical SEO | | Alex_RevelInteractive0 -
Should I be concerned about Google indexing an old domain if the listings redirect to the new domain?
I noticed this about Moz's old domain SEOMoz.org. If the URLs from the old domain are redirecting, is there any reason to be concerned about an old domain still appearing to be indexed by Google? See here: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=site%3Aseomoz.org Links to seomoz.org are listed, but if you click them they redirect to moz.com. Is this anything to be concerned about or is everything operating as expected?
Technical SEO | | 352inc0 -
Why wont google Index this page?
A week ago i accidentally changed this page settings in my CMS to "disable & dont index" as i was going to replace this page with another, but this didnt happen, but i forgot to switch the settings back! http://www.over50choices.co.uk/funeral-planning/funeral-plans Anyhow in an effort to get it back up quickly i submitted in GWTs but its still not indexed. When i use several SEO on page checking tools it has the Meta Title data as "Form" and not the correct title. Any ideas please? Yours frustrated Ash
Technical SEO | | AshShep10 -
Mobile site is not ranking in the mobile search results
I posted last month about problems with a mobile site, which is served from a separate URL (m.mydomain.com) as currently responsive design is not an option. The problem was that the mobile site was being returned in the desktop index along with the desktop site, and the desktop site was being returned in the mobile index instead of the mobile site. I have therefore implemented rel=canonical and rel=alternate as is advised by Google, but this has stopped the desktop site from appearing in the mobile index, but hasn't caused the mobile site to rank instead. What should I do now? One idea I have is to remove the rel=canonical and rel=alternate links so that the desktop site ranks in the mobile index again. There is a redirect in place anyway so when people click on a desktop link from a mobile search, they will still be redirected to the mobile equivalent. I could then set the m.mydomain.com to noindex to stop it from being returned in the desktop results and potentially causing duplicate content issues? What do you think about this as a work around?
Technical SEO | | pugh0 -
Notice of DMCA removal from Google Search
Dear Mozer's Today I get from Google Webmaster tools a "Notice of DMCA removal" I'll paste here the note to get your opinions "Hello, Google has been notified, according to the terms of the Digital Millennium Copyright Act (DMCA), that some of your materials allegedly infringe upon the copyrights of others. The URLs of the allegedly infringing materials may be found at the end of this message. The affected URLs are listed below: http://www.freesharewaredepot.com/productpages/Ultimate_Spelling__038119.asp" So I perform these steps: 1. Remove the page from the site (now it gives 404). 2. Remove it from database (no listed on directory, sitemap.xml and RSS) 3. I fill the "Google Content Removed Notification form" detailing the removal of the page. My question is now I have to do any other task, such as fill a site reconsideration, or only I have to wait. Thank you for your help. Claudio
Technical SEO | | SharewarePros0 -
Will using http ping, lastmod increase our indexation with Google?
If Google knows about our sitemaps and they’re being crawled on a daily basis, why should we use the http ping and /or list the index files in our robots.txt? Is there a benefit (i.e. improving indexability) to using both ping and listing index files in robots? Is there any benefit to listing the index sitemaps in robots if we’re pinging? If we provide a decent <lastmod>date is there going to be any difference in indexing rates between ping and the normal crawl that they do today?</lastmod> Do we need to all to cover our bases? thanks Marika
Technical SEO | | marika-1786190 -
Www or no www in search results??
I am working with a client, and when I check on SERP placement, I never see the "www" in the SERP's only nameofcustomer.com not www.nameofcustomer.com. Of course there is a redirect going on...Question is...should this matter at all? I dont understand the relationship between this kind of redirect and SEO. Thank Mozzers
Technical SEO | | Giggy0