Best use of robots.txt for "garbage" links from Joomla!

teleman

I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received.

One of my biggest gripes is the point of "Dublicate Page Content".

Right now im having over 200 pages with dublicate page content.

Now.. This is triggerede because Seomoz have snagged up auto generated links from my site.

My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears.

Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google.

So i just want to get rid of them.

Now to my question

I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all.

But, how do i do this? what should my syntax be?

A lof of the links looks like this, but has different id numbers according to the product that is being send:

http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167

I guess i need a rule that grabs the following and makes google ignore links that contains this:

view=send_friend

Cyrus-Shepard

Hi Henrik,

It can take up to a week for SEOmoz crawlers to process your site, which may be an issue if you recently added the tag. Did you remember to include all user agents in your first line?

User-agent: *

Be sure to test your robots.txt file in Google Webmaster Tools to ensure everything is correct.

Couple of other things you can do:

1. Add a rel="nofollow" on your send to friend links.

2. Add a meta robots "noindex" to the head of the popup html.

3. And/or add a canonical tag to the popup. Since I don't have a working example, I don't know what to canonical it too (whatever content it is duplicating) but this is also an option.

teleman

I just tried to add

Disallow: /view=send_friend

I removed the last /

however a crawl gave me the dublicate content problem again.

Is my syntax wrong?

chris.kent

The second one "Disallow: /*view=send_friend" will prevent googlebot from crawling any url with that string in it. So that should take care of your problem.

teleman

So my link example would look like this in robots.txt?

Disallow: /index.php?option=com_redshop&view=send_friend&pid=&tmpl=component&Itemid=

Or

Disallow: /view=send_friend/

chris.kent

Your right I would disallow via robots.txt & a wildcard (*) wherever a unique item id # could be generated.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best use of robots.txt for "garbage" links from Joomla!

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Page with "random" content

Disallow wildcard match in Robots.txt

Dealing with broken internal links/404s. What's best practice?

"Url blocked by robots.txt." on my Video Sitemap

Robots.txt - "File does not appear to be valid"

Confirming Robots.txt code deep Directories

How to block "print" pages from indexing

Site Navigation leads to "Too Many On-Page Links" warning