PDF for link building - avoiding duplicate content
-
Hello,
We've got an article that we're turning into a PDF. Both the article and the PDF will be on our site. This PDF is a good, thorough piece of content on how to choose a product.
We're going to strip out all of the links to our in the article and create this PDF so that it will be good for people to reference and even print. Then we're going to do link building through outreach since people will find the article and PDF useful.
My question is, how do I use rel="canonical" to make sure that the article and PDF aren't duplicate content?
Thanks.
-
Hey Bob
I think you should forget about any kind of perceived conventions and have whatever you think works best for your users and goals.
Again, look at unbounce, that is a custom landing page with a homepage link (to share the love) but not the general site navigation.
They also have a footer to do a bit more link love but really, do what works for you.
Forget conventions - do what works!
Hope that helps
Marcus -
I see, thanks! I think it's important not to have the ecommerce navigation on the page promoting the pdf. What would you say is ideal as far as the graphical and navigation components of the page with the PDF on it - what kind of navigation and graphical header should I have on it?
-
Yep, check the HTTP headers with webbug or there are a bunch of browser plugins that will let you see the headers for the document.
That said, I would push to drive the links to the page though rather than the document itself and just create a nice page that houses the document and make that the link target.
You could even make the PDF link only available by email once they have singed up or some such as canonical is only a directive and you would still be better getting those links flooding into a real page on the site.
You could even offer up some HTML to make this easier for folks to link to that linked to your main page. If you take a look at any savvy infographics etc folks will try to draw a link into a page rather than the image itself for the very same reasons.
If you look at something like the Noobs Guide to Online Marketing from Unbounce then you will see something like this as the suggested linking code:
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
Unbounce – The DIY Landing Page Platform
So, the image is there but the link they are pimping is a standard page:
http://unbounce.com/noob-guide-to-online-marketing-infographic/
They also cheekily add an extra homepage link in as well with some keywords and the brand so if folks don't remove that they still get that benefit.
Ultimately, it means that when links flood into the site they benefit the whole site rather than just promote one PDF.
Just my tuppence!
Marcus -
Thanks for the code Marcus.
Actually, the pdf is what people will be linking to. It's a guide for websites. I think the PDF will be much easier to promote than the article.I assume so anyway.
Is there a way to make sure my canonical code in htaccess is working after I insert the code?
Thanks again,
Bob
-
Hey Bob
There is a much easier way to do this and simply have your PDFs that you don't want indexed in a folder that you block access to in robots.txt. This way you can just drop PDFs into articles and link to them knowing full well these pages will not be indexed.
Assuming you had a PDF called article.pdf in a folder called pdfs/ then the following would prevent indexation.
User-agent: * Disallow: /pdfs/
Or to just block the file itself:
User-agent: *
Disallow: /pdfs/yourfile.pdf Additionally, There is no reason not to add the canonical link as well and if you find people are linking directly to the PDF then having this would ensure that the equity associated with those links was correctly attributed to the parent page (always a good thing).Header add Link '<http: www.url.co.uk="" pdfs="" article.html="">; </http:> rel="canonical"'
Generally, there are better ways to block indexation than with robots.txt but in the case of PDFs, we really don't want these files indexed as they make for such poor landing pages (no navigation) and we certainly want to remove any competition or duplication between the page and the PDF so in this case, it makes for a quick, painless and suitable solution.
Hope that helps!
Marcus -
Thanks ThompsonPaul,
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.html
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.html="">; rel="canonical""</http:>
-
You can insert the canonical header link using your site's .htaccess file, Bob. I'm sure Hostgator provides access to the htaccess file through ftp (sometimes you have to turn on "show hidden files") or through the file manager built into your cPanel.
Check tip #2 in this recent SEOMoz blog article for specifics:
seomoz.org/blog/htaccess-file-snippets-for-seosJust remember too - you will want to do the same kind of on-page optimization for the PDF as you do for regular pages.
- Give it a good, descriptive, keyword-appropriate, dash-separated file name. (essential for usability as well, since it will become the title of the icon when saved to someone's desktop)
- Fill out the metadata for the PDF, especially the Title and Description. In Acrobat it's under File -> Properties -> Description tab (to get the meta-description itself, you'll need to click on the Additional Metadata button)
I'd be tempted to build the links to the html page as much as possible as those will directly help ranking, unlike the PDF's inbound links which will have to pass their link juice through the canonical, assuming you're using it. Plus, the visitor will get a preview of the PDF's content and context from the rest of your site which which may increase trust and engender further engagement..
Your comment about links in the PDF got kind of muddled, but you'll definitely want to make certain there are good links and calls to action back to your website within the PDF - preferably on each page. Otherwise there's no clear "next step" for users reading the PDF back to a purchase on your site. Make sure to put Analytics tracking tags on these links so you can assess the value of traffic generated back from the PDF - otherwise the traffic will just appear as Direct in your Analytics.
Hope that all helps;
Paul
-
Can I just use htaccess?
See here: http://www.seomoz.org/blog/how-to-advanced-relcanonical-http-headers
We only have one pdf like this right now and we plan to have no more than five.
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.pdf
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.pdf="">; rel="canonical""</http:>
-
How do I know if I can do an HTTP header request? I'm using shared hosting through hostgator.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
Thank you DoRM,
I assume that the PDF is what I want to be the main version since that is what I'll be marketing, but I could be wrong? What if I get backlinks to both pages, will both sets of backlinks count?
-
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
You can read more her http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Link building strategy
Hello Moz Community, For the last couple of months we have been trying to improve our ranking in Google UK for the keyword "church candles" http://www.wattsandco.com/church-supplies/church-candles.html We’ve been contacting relevant interiors/lifestyle blogs to feature our candles including anchor text linking back to our page. Our anchor text has been predominately our brand (Watts & Co) but also other key search terms (Watts and Co church candles, Watts and Co pillar candles). We have been tracking our ranking for the keyword “Church candles” using the Moz “ Rank Tracker” and we started on position 15 in Google UK. We went up to 12 briefly before moving down every week to 15, 17, 19 and 22. We checked today and we have moved back up slightly to 19. Our progress seems to be a bit slow and inconsistent. We wanted to reach out for any advice on how we can move up? If there was any way we can improve our strategy? Here’s the links we have built so far: http://nostalgiecat.blogspot.co.uk/2015/10/what-autumn-means-to-me.html http://blog.pollyrowan.com/2015/10/5-small-ways-to-decorate-your-home-that.html http://www.happyhomebird.com/2015/10/watts-co-candles-for-cosy-autumn-home.html http://www.frolic-blog.com/2015/10/beeswax-candles-for-fall/ http://hisforhomeblog.com/lighting/watts-co-church-candles/#axzz3qhqN1wzA http://lorilangille.blogspot.co.uk/2015/11/sponsored-post-watts-and-co.html http://www.californiahomedesign.com/product-finds/waxing-poetic-must-have-candles Thanks so much!
Intermediate & Advanced SEO | | roberthseo0 -
Duplicate Content Dilemma for Category and Brand Pages
Hi, I have a online shop with categories such as: Trousers Shirts Shoes etc. But now I'm having a problem with further development.
Intermediate & Advanced SEO | | soralsokal
I'd like to introduce brand pages. In this case I would create new categories for Brand 1, Brand 2, etc... The text on categories and brand pages would be unique. But there will be an overlap in products. How do I deal with this from a duplicate content perspective? I'm appreciate your suggestions. Best, Robin0 -
Back links Building and article/blog posting
Hi all, I have been researching the best way for back links building, and I would like to ask few questions before I start. Which one of these tools would you recommend for back link building diagnostics. www.linkrisk.com - www.linkdetox.com What would be the best procedure to begin creating healthy back links? Would looking at my competitors back links help me? What would be the recommended amount of back links created per week? Also how many blogs entries should we aim to create per week? The website i'm working on is manvanlondon.co.uk If you guys have any further suggestions please let me know. Many thanks for your time.
Intermediate & Advanced SEO | | monicapopa0 -
Duplicate Content... Really?
Hi all, My site is www.actronics.eu Moz reports virtually every product page as duplicate content, flagged as HIGH PRIORITY!. I know why. Moz classes a page as duplicate if >95% content/code similar. There's very little I can do about this as although our products are different, the content is very similar, albeit a few part numbers and vehicle make/model. Here's an example:
Intermediate & Advanced SEO | | seowoody
http://www.actronics.eu/en/shop/audi-a4-8d-b5-1994-2000-abs-ecu-en/bosch-5-3
http://www.actronics.eu/en/shop/bmw-3-series-e36-1990-1998-abs-ecu-en/ate-34-51 Now, multiply this by ~2,000 products X 7 different languages and you'll see we have a big dupe content issue (according to Moz's Crawl Diagnostics report). I say "according to Moz..." as I do not know if this is actually an issue for Google? 90% of our products pages rank, albeit some much better than others? So what is the solution? We're not trying to deceive Google in any way so it would seem unfair to be hit with a dupe content penalty, this is a legit dilemma where our product differ by as little as a part number. One ugly solution would be to remove header / sidebar / footer on our product pages as I've demonstrated here - http://woodberry.me.uk/test-page2-minimal-v2.html since this removes A LOT of page bloat (code) and would bring the page difference down to 80% duplicate.
(This is the tool I'm using for checking http://www.webconfs.com/similar-page-checker.php) Other "prettier" solutions would greatly appreciated. I look forward to hearing your thoughts. Thanks,
Woody 🙂1 -
Multiply domains and duplicate content confusion
I've just found out that a client has multiple domains which are being indexed by google and so leading me to worry that they will be penalised for duplicate content. Wondered if anyone could confirm a) are we likely to be penalised? and b) what should we do about it? (i'm thinking just 301 redirect each domain to the main www.clientdomain.com...?). Actual domain = www.clientdomain.com But these also exist: www.hostmastr.clientdomain.com www.pop.clientdomain.com www.subscribers.clientdomain.com www.www2.clientdomain.com www.wwwww.clientdomain.com ps I have NO idea how/why all these domains exist I really appreciate any expertise on this issue, many thanks!
Intermediate & Advanced SEO | | bisibee10 -
Link Building: What Can I Reasonably Expect from SEO Firm
Dear Moz Community: I am considering hiring the SEO firm that conducted a web site audit for my company. The audit was very serious and thorough. Out of 400 domains linking to my site, the audit identified 40% as toxic, 45% as suspicious and only 5% as good quality. The SEO firm believes the poor link profile is very much holding back organic ranking and traffic. I am considering signing a six month contract to have them remove the toxic and suspicious links and also, build new quality links. Basically I have the following questions: -The SEO firm hopes to build 5-10 very high quality incoming links to my site per month. Is this a reasonable number? They claim that quantity is much more important than quality. -In the six month campaign, I will be paying for one month of research by the SEO provider before the link building kicks in earnest. In fact I will only get five months of link building despite paying for six months of service. Is this fair? -Is the removal of toxic links and the development of 25-50 new quality links and enough to improve ranking and traffic over six months? The site currently receives 4,000 visitors from organic search results per month.
Intermediate & Advanced SEO | | Kingalan1
Note that at the moment the site has only about 20 "quality" links. I would hate to exhaust my budget after six months with no tangible improvement! I would very much like to hear of anyone's experience or input regarding reasonable expectations regarding hiring an SEO firm for link building campaigns. Thanks!!!
Alan0 -
Duplicate page content query
Hi forum, For some reason I have recently received a large increase in my Duplicate Page Content issues. Currently it says I have over 7,000 duplicate page content errors! For example it says: Sample URLs with this Duplicate Page Content http://dikelli.com.au/accessories/gowns/news.html http://dikelli.com.au/accessories/news.html
Intermediate & Advanced SEO | | sterls
http://dikelli.com.au/gallery/dikelli/gowns/gowns/sale_gowns.html However there are no physical links to any of these page on my site and even when I look at my FTP files (I am using Dreamweaver) these directories and files do not exist. Can anyone please tell me why the SEOMOZ crawl is coming up with these errors and how to solve them?0 -
Duplicate content in Webmaster tools, is this bad?
We launched a new site, and we did a 301 redirect to every page. I have over 5k duplicate meta tags and title tags. It shows the old page and the new page as having the same title tag and meta description. This isn't true, we changed the titles and meta description, but it still shows up like that. What would cause that?
Intermediate & Advanced SEO | | EcommerceSite0