PDFs and webpages
-
If a website provides PDF versions of the page as a download option, should the PDF be no-indexed in your opinion?
We have to offer PDF versions of the webpage as our customers want them, they are a group who will download/print the pdfs. I thought of leaving the pdfs alone as they site in a subdomain but the more I think about it, I should probably noindex them. My reasons
- They site in a subdomain, if users have linked to them, my main domain isn't getting the rank juice
- Duplication issues, they might be affecting the rank of the existing webpages
- I can't track the PDF as they are in a subdomain, I can see event clicks to them from the main site though
On the flipside
- I could lose out on the traffic the pdfs bring when a user loads it from an organic search and any link existing on the pdf
What are your experiences?
-
Cool. It's advisable to add canonical HTTP headers to the PDFs too, if you can.
-
Thanks Alex,
I do have canonical tags on the webpages to ensure they are seen as the main one. I'll look into tracking subdomains.
-
Google now class subdomains pretty much as part of your main domain: http://www.youtube.com/watch?v=_MswMYk05tk - so you will be getting some of that rank juice.
I'd think that the major search engines wouldn't have a problem knowing that an HTML version of a page is preferred over a PDF. However, you can use canonical HTTP headers to make sure there are no problems with duplicate content: http://moz.com/blog/how-to-advanced-relcanonical-http-headers
If you use Google Analytics you will be able to track the subdomain. You can do it as part of your existing profile or by setting up a separate one: https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingSite (ensure this is the version of Analytics you have installed).
There's a short guide here on getting more data about PDFs through Google Analytics: http://moz.com/ugc/how-to-track-pdf-traffic-links-in-google-analytics-open-site-explorer
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Webpage is not ranking
https://www.nextheadphone.com/oneodio-a10-hybrid-review/ I just published a new article. Keyword is : OneOdio A10 Hybrid Review Before writing the article, I researched that This is Easy keyword. I should rank as soon as I publish the article. After publishing the article my Current rank is #32 Why its not ranking? Any idea
Intermediate & Advanced SEO | | NextHeadphone0 -
I have a same paragraph appearing on two webpages of my site!
is it gonna affect rankings if so what should be done thanks
Intermediate & Advanced SEO | | Sam09schulz0 -
Creating Redirect Maps -To include PDFs or Not to include PDFs?
When creating a redirect map for a site re-build or domain change, it is necessary to include .PDFs or any other non-HTML URLs? Do PDFs even carry "seo juice" over? When switching CMS, does it even matter to include them? Thanks!
Intermediate & Advanced SEO | | emilydavidson0 -
Client has moved to secured https webpages but non secured http pages are still being indexed in Google. Is this an issue
We are currently working with a client that relaunched their website two months ago to have hypertext transfer protocol secure pages (https) across their entire site architecture. The problem is that their non secure (http) pages are still accessible and being indexed in Google. Here are our concerns: 1. Are co-existing non secure and secure webpages (http and https) considered duplicate content?
Intermediate & Advanced SEO | | VanguardCommunications
2. If these pages are duplicate content should we use 301 redirects or rel canonicals?
3. If we go with rel canonicals, is it okay for a non secure page to have rel canonical to the secure version? Thanks for the advice.0 -
How to make AJAX content crawlable from a specific section of a webpage?
Content is located in a specific section of the webpage that are being loaded via AJAX.
Intermediate & Advanced SEO | | zpm20140 -
Javascript to fetch page title for every webpage, is it good?
We have a zend framework that is complex to program if you ask me, and since we have 20k+ pages that we need to get proper titles to and meta descriptions, i need to ask if we use Javascript to handle page titles (basically the previously programming team had NOT set page titles at all) and i need to get proper page titles from a h1 tag within the page. current course of action which we can easily implement is fetch page title from that h1 tag being used throughout all pages with the help of javascript, But this does makes it difficult for engines to actually read what's the page title? since its being fetched with javascript code that we have put in, though i had doubts, is anyone one of you have simiilar situation before? if yes i need some help! Update: I tried the JavaScript way and here is what it looks like http://islamicencyclopedia.org/public/index/hadith/id/1/book_id/106 i know the fact that google won't read JavaScript like the way we have done with the website, But i need help on "How we can work around this issue" Knowing we don't have other options.
Intermediate & Advanced SEO | | SmartStartMediacom0 -
Google isn't seeing the content but it is still indexing the webpage
When I fetch my website page using GWT this is what I receive. HTTP/1.1 301 Moved Permanently
Intermediate & Advanced SEO | | jacobfy
X-Pantheon-Styx-Hostname: styx1560bba9.chios.panth.io
server: nginx
content-type: text/html
location: https://www.inscopix.com/
x-pantheon-endpoint: 4ac0249e-9a7a-4fd6-81fc-a7170812c4d6
Cache-Control: public, max-age=86400
Content-Length: 0
Accept-Ranges: bytes
Date: Fri, 14 Mar 2014 16:29:38 GMT
X-Varnish: 2640682369 2640432361
Age: 326
Via: 1.1 varnish
Connection: keep-alive What I used to get is this: HTTP/1.1 200 OK
Date: Thu, 11 Apr 2013 16:00:24 GMT
Server: Apache/2.2.23 (Amazon)
X-Powered-By: PHP/5.3.18
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Last-Modified: Thu, 11 Apr 2013 16:00:24 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
ETag: "1365696024"
Content-Language: en
Link: ; rel="canonical",; rel="shortlink"
X-Generator: Drupal 7 (http://drupal.org)
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8 xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:og="http://ogp.me/ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:sioc="http://rdfs.org/sioc/ns#"
xmlns:sioct="http://rdfs.org/sioc/types#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <title>Inscopix | In vivo rodent brain imaging</title>0 -
Schema.org for LocalBusiness v. Webpage
I am adding schema.org markup for some clients and I am running into an issue. Specifically the site I am working on uses a child theme for Genesis 2.0. Genesis 2.0 added schema.org markup by default to all pages. If I want to change the markup on the home page to a LocalBusiness should I remove ALL other schema.org markup on there for the website (marking nav as nav , header as header, etc)? Or can I leave the markup as a webpage and just add the local business markup as well? When I look at it in the testing tool in GWMT, it just shows the webpage markup not the LocalBusiness markup. You can take a look here: http://blueskyrestoration.com. Thanks in advance for your help!
Intermediate & Advanced SEO | | farlandlee0