Index pdf files but redirecto to site
-
Hi,
One of our clients has tons of PDFs (manuals, etc.) and frequently gets good rankings for the direct PDF link. While we're happy about the PDFs attracting users' attention, we'd like to redirect them to the site where the original PDF link is published and avoid that people open the pdf directly.
In short, we'd like to index the PDFs, but show to users the pdf link within a site - how should we proceed to do that?
Thanks,
GM
-
Thanks for the follow-up ... if it weren't for phrases like
- The page displayed to all users who visit from Google must be identical to the content that is shown to Googlebot.
I'd be quite comfortable with that ... in the meantime, however, I might try some pdf2html conversion tools to see if there is a viable way to present PDF-information on a HTML page and block the PDF link for robots.
Regards,
Gert
-
Hi Gret,
After further research, it might not be considered as cloacking that much as the Google First Click Free for Web Search system works the same way and check the HTTP referer.
For more details, read the official Google Webmaster Central blog post about it here :
http://googlewebmastercentral.blogspot.com/2008/10/first-click-free-for-web-search.htmlBest regards,
Guillaume Voyer. -
Thanks for your detailed reply, Guillaume,
I guess the possible "cloaking troubles" with this strategy are probably too risky for our project. However, I like the "click here" idea, we'll check if we can automate that somehow to drag users reading the PDFs back to our site.
-
Hi Gert,
Technically, this is not possible unless you use cloaking to display the PDF to the search engines and redirect the users to a different page.
What you could do to avoid cloacking is to include a banner at the top of your PDF with something like "Click here to see all our related PDFs" that would link to your website, this way users might be interested in going to your website.
Otherwise, you could detect the referer with htaccess and redirect the user to the user if he is coming from google, but this might be considered as cloaking. Here's an example :
RewriteEngine On
RewriteCond %{HTTP_REFERER} (.)google.(.)
RewriteRule ^pdf/(.*).pdf /pdf-list [R=302]If you are running a apache server and you put this in your .htaccess file, the first line activate mod_rewrite, the second line check if the referer matches anythinggoogle.anything and the third line redirect all .pdf files in the pdf folder to the /pdf-list page if the referer matches.
Best regards,
Guillaume Voyer.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Renaming web pages vs new web site
I am struggling with renaming a lot of my web pages because I used short form acronyms vs long form keyword page names and now my pages aren't ranking where they should be and used to be. I am weighing a whole new web site or just a massive update with new page names. I also have an old domain that 301's to the new url but the old one outranks the new one. If you search google for cheap tubes the first domain you see is www.cheaptubesinc.com (the 301'd version) when the real url is www.cheaptubes.com. I know I am getting a duplicate content penalty and when moz crawls my site they see 2X the page that I really have. I tried fixing this with canonical tags but it only helped 5 pages according the moz crawls since doing them. Since last July 4th my business has been declining and I know there was an SEO algorithm update last July 4th. I think either method of renaming the web pages with better SEO for instance cheaptubes.com/single-wall-carbon-nanotubes.htm vs cheaptubes.com/swnts.htm as it is currently. In either case, it is still an HTML 2 website done on frontpage and the question I keep asking myself is if I should just scrap the whole site and start over with a more modern format. Should I try to get a new site together with good SEO and publish it quickly vs rename and 301 a bunch of pages? What about the old site? Do I need to track the old page names and 301 them to the new ones? Any help is appreciates Mike
Content Development | | cheaptubes0 -
Am I spreading my content & site thin?
I have a video section on my site. Basically I am filtering quality videos for my readers to check out. The videos are pretty much all embedded youtube/vimeo vids. There are a few categories, which are pretty niche-y in relation to my readers. In general they probably aren't seen as too relevant to the overall content on my site... Is it a mistake to keep these videos up? Could they be messing up my rankings since they aren't necessarily in line with the rest of the content on my site?
Content Development | | PedroAndJobu0 -
Can you help me with my options on publishing others' news releases on my site?
I wish to add a "News" section to a highly-read, highly ranked blog I have. The News pieces will not be in the same flow as my regular posts. I'm contemplating what the best way to do this is, and would like some advice, please. I see these options: Option 1. Pay textbroker type people to rewrite news releases and post them into the news flow. Pro: indexable content. Con: expense. Option 2: Have a Submit News form on the site for vendors to submit their news stories. I would have to ask them to rewrite their stories to avoid dup content. Pros: Easy for me, no cost. Cons: Will still get dup content I bet, a lot of companies won't take the time to do it, and I will have no control over quality. (I really doubt this option will work). Option 3: Post news releases from companies in their raw format, and mark them as no index (even if I don't noindex, they won't move up the SERPs anyway, so why not just noindex them). Pros: very easy, all the news I want. Cons: not creating any indexable content. Bonus question: If I do Option #3, and I place an adsense ad on the page, will it work the same as if it was an indexed, non-duplicate content page? Your thoughts?
Content Development | | bizzer0 -
Which is the best site to host photos on?
For SEO and in general, which is the best website to host photos on and why: Flickr.com or Picasa.com or Whosay.com?
Content Development | | tennisexpress0 -
Our blog is indexed by "google web" but does not show up in "google blogs". Why not and how can I fix this?
We have a pretty simple blog http://www.aviawest.com/blog I've noticed our articles arn't showing up in Google blogs on "web", we've submitted to http://blogsearch.google.com/ping a month ago. Anyone have some insight here?
Content Development | | Aviawest0 -
How to best implement "metered model" on a site
Hi, I'm scratching my head over how to best implement the "metered model" on a site without users being able to game it all too easily. Has anybody in this QA forums implemented one before and is willing to share his/her best practises and findings? Currently I think raising the bar to force everybody to login is a bad idea + we would still need to open the site for google and other engines and can be tricked that way. Also this might lead to some penalty (cloaking)? Using cookies might not be enought as I think almost every Internet user these days knows that this might be the #1 place to look and they are deleted in a second. Counting based on a users IP-adress is also a bit critical as this is not accurate enough. Should we just use cookies and hope for the best?
Content Development | | jmueller0 -
Should I Have No Index, No Follow On Blog Category & Tag Pages?
At some point in the past I read or was told that No Index, No Follow tags on category and tag pages were a good thing on a standard WordPress blog in order to prevent duplicate content issues. Is this still true or was it ever true?
Content Development | | eTundra0 -
Blog content practices for e-commerce sites
What is the best practice in regards to content for e-commerce blogs on the same domain as the web-store (blog.storename.com)? What balance of content should be on the blog vs. the item & section pages or doesn't it matter?
Content Development | | MEldridge0