Google and PDF indexing
-
It was recently brought to my attention that one of the PDFs on our site wasn't showing up when looking for a particular phrase within the document. The user was trying to search only within our site. Once I removed the site restriction - I noticed that there was another site using the exact same PDF. It appears Google is indexing that PDF but not ours. The name, title, and content are the same. Is there any way to get around this?
I find it interesting as we use GSA and within GSA it shows up for the phrase. I have to imagine Google is saying that it already has the PDF and therefore is ignoring our PDF. Any tricks to get around this?
BTW - both sites rightfully should have the PDF. One is a client site and they are allowed to host the PDFs created for them. However, I'd like Mathematica to also be listed.
Query: no site restriction (notice: Teach for america comes up #1 and Mathematica is not listed). https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#q=HSAC_final_rpt_9_2013.pdf+"Teach+charlotte"+filetype:pdf&as_qdr=all&filter=0
Query: site restriction (notice that it doesn't find the phrase and redirects to any of the words) https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#as_qdr=all&q="Teach+charlotte"+site:www.mathematica-mpr.com+filetype:pdf
-
Hi Rose, Jeff provided a great response. Did it answer your question? If so, please mark it "good answer", thanks!
Christy
-
It sounds like it could be a few different issues, but my initial look at the site makes it seem to me that it's duplicate content, and that Google is only returning the results for their PDF and not for yours.
Google does hate duplicate content, and this is probably why they are only showing their PDF.
It may be that they posted theirs first, or they linked to it first.
As you probably know, not all PDFs are created equally. Some are more difficult for a search engine to read (i.e. might have text tied up in graphics). And PDFs are really not a great end user experience.
If you really want to rank for this, I might suggest creating this page as an HTML page instead of a PDF, as it might be able to be indexed more easily, and you might rank for it better.
I hope this helps?
Thanks,
-- Jeff
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing indexed internal search pages from Google when it's driving lots of traffic?
Hi I'm working on an E-Commerce site and the internal Search results page is our 3rd most popular landing page. I've also seen Google has often used this page as a "Google-selected canonical" on Search Console on a few pages, and it has thousands of these Search pages indexed. Hoping you can help with the below: To remove these results, is it as simple as adding "noindex/follow" to Search pages? Should I do it incrementally? There are parameters (brand, colour, size, etc.) in the indexed results and maybe I should block each one of them over time. Will there be an initial negative impact on results I should warn others about? Thanks!
Intermediate & Advanced SEO | | Frankie-BTDublin0 -
Can Google Crawl & Index my Schema in CSR JavaScript
We currently only have one option for implementing our Schema. It is populated in the JSON which is rendered by JavaScript on the CLIENT side. I've heard tons of mixed reviews about if this will work or not. So, does anyone know for sure if this will or will not work. Also, how can I build a test to see if it does or does not work?
Intermediate & Advanced SEO | | MJTrevens0 -
Why Is this page de-indexed?
I have dropped out for all my first page KWDs for this page https://www.key.co.uk/en/key/dollies-load-movers-door-skates Can anyone see an issue? I am trying to find one.... We did just migrate to HTTPS but other areas have no problem
Intermediate & Advanced SEO | | BeckyKey0 -
"No Index, No Follow" or No Index, Follow" for URLs with Thin Content?
Greetings MOZ community: If I have a site with about 200 thin content pages that I want Google to remove from their index, should I set them to "No Index, No Follow" or to "No Index, Follow"? My SEO firm has advised me to set them to "No Index, Follow" but on a recent MOZ help forum post someone suggested "No Index, No Follow". The MOZ poster said that telling Google the content was should not be indexed but the links should be followed was inconstant and could get me into trouble. This make a lot of sense. What is proper form? As background, I think I have recently been hit with a Panda 4.0 penalty for thin content. I have several hundred URLs with less than 50 words and want them de-indexed. My site is a commercial real estate site and the listings apparently have too little content. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Google places Ad
Like adwords is their any good way of advertising a business place at Google map? If the answer is yes.Can you please take me through the process and give me rough idea about cost?
Intermediate & Advanced SEO | | csfarnsworth0 -
URL with a # but no ! being indexed
Given that it contains a #, how come Google is able to index this URL?: http://www.rtl.nl/xl/#/home It was my understanding that Google can't handle # properly unless it's paired with a ! (hash fragment / bang). site:http://www.rtl.nl/xl/#/home returns nothing, but: site:http://www.rtl.nl/xl returns http://www.rtl.nl/xl/#/home in the result set
Intermediate & Advanced SEO | | EdelmanDigital0 -
Do you bother cleaning duplicate content from Googles Index?
Hi, I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do... For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'. Do you think this is right? Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled? Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time? Thanks FashionLux
Intermediate & Advanced SEO | | FashionLux0 -
How to position in local Google
Hello, It's been easy for me to jump in .com - English only results. But, regional google is making me problems. How to position there? What are the top 3 key elements for ranking locally that don't matter much internationally? Thanks!
Intermediate & Advanced SEO | | DaBomb110