Welcome to the Q&A Forum

dailynaukri

Sorry, I mean pdf files only

dailynaukri

Hello Daniel

The pdfs are duplicates from another site.

The thing is that we have already disallowed the pdfs in the robots.txt file.

Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.

Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."

I hope I make myself clear. Any insights now please.

dailynaukri

We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt

Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"

As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source

If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.

If we direct our users to the third party website, then it would add up to our bounce rate.

What should be the appropriate way to handle duplicate pdfs?

Thanks

dailynaukri

Thanks Darin

dailynaukri

Thanks

dailynaukri

Thanks, it helps

dailynaukri

We have a real estate website in which agents and builders can create their profiles. My question is shall we use h1 or h2 tags in business profile pages or make them according to web 2.0 standards? In case header tags are used, if two agents have the same name and we have used h2 tag for them, then search result page will end up having two same h2's. Can someone please tell me the right way to manage business profiles in a website?

Thanks

dailynaukri

dailynaukri

We have a website where we do job postings. We manually add the data to our website.

The Job Postings are covered by various other websites including the original recruiting organisations. The details of the job posting remain the same, for instance, the eligibility criteria, the exam pattern, syllabus etc.

We create pages where we list the jobs and keep the detailed pages which have the duplicate data disallowed in robots.txt.

Lately, we have been thinking of indexing these pages as well, as the quantum of these non-indexed pages is very high. Some of our competitors have these pages indexed. But we are not sure whether doing this is gonna be the right move or if there is a safe way to deal with this. Additionally, there is this problem that some job posts have very less data like fees, age limit, salary etc which is thin content so that might contribute to poor quality issue.

Secondly, we wanted to use enriched result snippets for our job postings. Google doesn't want snippets to be used on the listing page:

"Put structured data on the most detailed leaf page possible. Don't add structured data to pages intended to present a list of jobs (for example, search result pages). Instead, apply structured data to the most specific page describing a single job with its relevant details."

Now, how do we handle this situation? Is it safe to allow the detailed pages which have duplicate job data and sometime not so high quality data in robots.txt?

dailynaukri

We have a real estate website in which agents and builders can create their profiles. My question is shall we use h1 or h2 tags in business profile pages or make them according to web 2.0 standards? In case header tags are used, if two agents have the same name and we have used h2 tag for them, then search result page will end up having two same h2's. Can someone please tell me the right way to manage business profiles in a website?

Thanks

dailynaukri

Thanks, it helps

dailynaukri

We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt

Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"

As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source

If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.

If we direct our users to the third party website, then it would add up to our bounce rate.

What should be the appropriate way to handle duplicate pdfs?

Thanks

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

dailynaukri

@dailynaukri

Latest posts made by dailynaukri

Best posts made by dailynaukri