How should I handle URL's created by an internal search engine?
-
Hi,
I'm aware that internal search result URL's (www.example.co.uk/catalogsearch/result/?q=searchterm) should ideally be blocked using the robots.txt file. Unfortunately the damage has already been done and a large number of internal search result URL's have already been created and indexed by Google. I have double checked and these pages only account for approximately 1.5% of traffic per month.
Is there a way I can remove the internal search URL's that have already been indexed and then stop this from happening in the future, I presume the last part would be to disallow /catalogsearch/ in the robots.txt file.
Thanks
-
Basic cleanup
From a procedural standpoint, you want to first add the noindex meta tag to the search results first. Google has to see that tag to then act on it and remove the URLs. You can also enter some of the URLs into the Webmaster tools removal tool.
Next you would want to add /catalogsearch/ to robots.txt once you see all the pages getting out of the index.
Advanced cleanup
If any of these search result URLs are ranking and are landing pages in Google. You may want to consider 301 redirecting those pages to the properly related category pages.
My 2 cents. I only use the GWT parameter handler on parameters that I have to show to the search engines. I otherwise try to hide all those URLs from Google to help with crawl efficiency.
Note that it is really important that you do the work to find what pages/urls Google has cataloged to make sure you dont delete a page that is actually generating some traffic for you. A landing page report from GA would help with this.
Cheers!
-
On top of Lesley's recommendations, both google and bing have url parameter exclusion options in webmaster tools.
-
I am guessing that you are using a system that templates pages and maybe adds a query string after the search, something like search.php?caws+cars. I would set in the header of all of the pages that use the search template a noindex, nofollow. Then I would also add it to the robots text as well to disregard the search pages. They will start dropping out of the results pages in about a week or so.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Creating a site search engine while keeping SEO factors in mind
I run and own my own travel photography business. (www.mickeyshannon.com) I've been looking into building a search archive of photos that don't necessarily need to be in the main galleries, as a lot of older photos are starting to really clutter up and take away the emphasis from the better work. However, I still want to keep these older photos around. My plan is to simplify my galleries, and pull out 50-75% of the lesser/older photos. All of these photos will still be reachable by a custom-build simple search engine that I'm building to house all these older photos. The photos will be searchable based on keywords that I attach to each photo as I add them to my website. The question I have is whether this will harm me for having duplicate content? Some of the keywords that would be used in the search archive would be similar or the same to the main gallery names. However, I'm also really trying to push my newer and better images out there to the front. I've read some articles that talk about noindexing search keyword results, but that would make it really difficult for search engines to even find the older photos, as searching for their keywords would be the only way to find them. Any thoughts on a way to work this out that benefits, or at least doesn't hurt me, SEO-wise?
Intermediate & Advanced SEO | | msphotography0 -
'?q=:new&sort=new' URL parameters help...
Hey guys, I have these types of URLs being crawled and picked up on by MOZ but they are not visible to my users. The URLs are all 'hidden' from users as they are basically category pages that have no stock, however MOZ is crawling them and I dont understand how they are getting picked up as 'duplicate content'. Anyone have any info on this? http://www.example.ch/de/example/marken/brand/make-up/c/Cat_Perso_Brand_3?q=:new&sort=new Even if I understood the technicality behind it then I could try and fix it if need be. Thanks Guys Kay
Intermediate & Advanced SEO | | eLab_London0 -
When Mobile and Desktop sites have the same page URLs, how should I handle the 'View Desktop Site' link on a mobile site to ensure a smooth crawl?
We're about to roll out a mobile site. The mobile and desktop URLs are the same. User Agent determines whether you see the desktop or mobile version of the site. At the bottom of the page is a 'View Desktop Site' link that will present the desktop version of the site to mobile user agents when clicked. I'm concerned that when the mobile crawler crawls our site it will crawl both our entire mobile site, then click 'View Desktop Site' and crawl our entire desktop site as well. Since mobile and desktop URLs are the same, the mobile crawler will end up crawling both mobile and desktop versions of each URL. Any tips on what we can do to make sure the mobile crawler either doesn't access the desktop site, or that we can let it know what is the mobile version of the page? We could simply not show the 'View Desktop Site' to the mobile crawler, but I'm interested to hear if others have encountered this issue and have any other recommended ways for handling it. Thanks!
Intermediate & Advanced SEO | | merch_zzounds0 -
Should I include www in url, or doesn't it matter?
Hello Mozzers, I was just wondering whether Google prefers www or non www URLs? Or doesn't it matter? Thanks in advance!
Intermediate & Advanced SEO | | McTaggart0 -
Search engine simulators are not finding text on my website. Do I have a problem with Javascript or AJAX?
My website text is not appearing in search engine simulators. Is there a problem with the javascript? Or perhaps AJAX is affecting it? Is there a tool I can use to examine how my website architecture is affecting how the site is crawled? I am totally lost. Help!
Intermediate & Advanced SEO | | ecigseo0 -
Duplicate site (disaster recovery) being crawled and creating two indexed search results
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain. Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm. There seem to be two potential fixes. Which is best for this case? use the robots.txt to block Google from crawling the .gtm site 2) canonicalize the the gtm urls to toptable.co.uk In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best. Thanks in advance to the SEOmoz community!
Intermediate & Advanced SEO | | OpenTable0 -
Capitals in url creates duplicate content?
Hey Guys, I had a quick look around however I couldn't find a specific answer to this. Currently, the SEOmoz tools come back and show a heap of duplicate content on my site. And there's a fair bit of it. However, a heap of those errors are relating to random capitals in the urls. for example. "www.website.com.au/Home/information/Stuff" is being treated as duplicate content of "www.website.com.au/home/information/stuff" (Note the difference in capitals). Anyone have any recommendations as to how to fix this server side(keeping in mind it's not practical or possible to fix all of these links) or to tell Google to ignore the capitalisation? Any help is greatly appreciated. LM.
Intermediate & Advanced SEO | | CarlS0 -
Sitemap - % of URL's in Google Index?
What is the average % of links from a sitemap that are included in the Google index? Obviously want to aim for 100% of the sitemap urls to be indexed, is this realistic?
Intermediate & Advanced SEO | | stats440