January 25, 2013

How Unique Does Content Need to Be to Perform Well in Search Engines?

Whiteboard Friday

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

We all know that content needs to be unique to rank highly in the SERPs, but how "unique" are we talking? From a content creation perspective, it's imperative to know what duplicate content really means and to understand the implications it can have on SEO.

In this week's Whiteboard Friday, Rand discusses what makes content unique in the eyes of the crawlers, and the bane of duplicate content.

Discover duplicate content with Moz Pro now

Video Transcription

"Howdy SEOmoz fans, and welcome to another edition of Whiteboard Friday. This week I want to take some time to talk about content duplication and content uniqueness, which is very important from an SEO perspective. It can also be important from a content marketing perspective.

For SEO purposes, search engines like to filter out what they view as duplicative content, things that are exactly the same. They never want to show you a set of results where result two, three, four, and five are all exactly the same article or are essentially the same three paragraphs repeated with the same photos embedded in them. It could be that content gets licensed among different parties. News vendors do this a lot. It could be that someone has done some plagiarism and actually stolen a piece. It could just be that someone is posting the same article in several different places on the web that accept content submissions. In any case, the engines are trying to filter this type of behavior out. They don't want to see that content because they know users are made happy by, "If I didn't like this result on this website, chances are I'm not going to like it on result number three on the different website." So they try and filter this stuff out.

From an SEO perspective and for content creators, it's therefore very important to understand, "What does that really mean? What is meant by duplicate content, and how unique do I really need to be?"

The first thing that I always like to talk about when we get into a discussion of content uniqueness is that content, when we talk about the content that the engines are considering for this, we're referring only to the unique material on a page. That excludes navigation, ads, footers, sidebars, etc.

I've got a page mockup over here, and you would exclude all this stuff - the logo, the navigation, the sidebars. Maybe this person is running some ads in the sidebar. Maybe they've got a little piece about themselves, and they've got a bunch of text down the right-hand side. Then they think, "Boy, I only have a couple of lines of text on this page and a photo and maybe a couple of bullet points. Is this unique from these other pages that look exactly the same except they have some different content in the content section?" This is the content. If you're worried that, "Oh no, I think that my pages might be kind of heavy and my content is kind of light," I wouldn't worry too much about that so long as you're doing everything else right. We'll talk about some of those. Number two, uniqueness applies to both internal and external sources. Copying either one can be trouble. It could be that these are other pages on your site and these are other pages somewhere else on the web where this content exists, and you're taking from those and putting those pieces on your site. That can be a problem in either of those cases. Internal duplication, usually engines will try and ignore it if it's small and subtle, just happens here and there. It's like, "Oh, there are four different versions of this page because they've got a print version, a mobile version. Okay. We'll try and canonicalize and figure that out."

You would be wise in these situations to use something like a rel=canonical. Or if you're consolidating pages after a big site move or a re-architecturing, something like that, a 301 is proper. But you should also be aware that this can happen from external stuff.

However, when I say that, what I don't mean to say and what I know a lot of people get confused about in the SEO world is this doesn't mean that you can't take a paragraph from Wikipedia and put it in a bigger article that you're writing, or cite a blogger and include a couple of phrases that they say, or take a piece from New York Magazine or from the Wall Street Journal, from Wired, or wherever you want and take, "Oh hey, I'm going to caption this, and I'm going to have a little clip of it. I'm going to put a video that exists on YouTube already." That's not duplicative so long as you are adding unique value.

Number three, uniqueness alone, some people get lost in the minutiae of the rules around SEO, the rules around search engines and they think, "Well, this content exists nowhere else on the web. So I just took someone else's and I changed all the words." You have technically provided unique content, but you have not provided unique value. Unique value is a very different thing. What I mean when I say "unique value" and what the search engines would like you to do and are building algorithms around is providing value that no other sources, no other sites on the web are specifically providing. That could mean that you take a look at the visitor's intent, the searcher's intent or your customer's intent and you say, "Hey, I'm going to answer each of these things that this person is trying to achieve."

If somebody searches for hotels in Cape Town, South Africa, well they're probably looking for a listing of hotels, but they probably have other intents as well. They might be interested in other stuff related to traveling there. They could be wanting to know things about weather. They could be wanting to know things about neighborhoods where these hotels are located. So providing unique value as opposed to just, "Hey, I'm going to take the content from Expedia's website and then I'm also just going to rewrite the paragraph about the hotels specifically," that's not going to help you. But if you were to do something like what Oyster Hotels does, where they actually send a reporter with a camera, a journalist essentially, to the location, they take tons of their own unique photos, and they write about the weather and the neighborhood and the hotel cleanliness and investigate all these sorts of things and provide true, unique value as well as unique content, now you're hitting on what you need to achieve the uniqueness that search engines are talking about when they talk about unique versus duplicate.

Four, there's this imagination that exists in the minds of folks in the SEO field, and has for a long time, that there must be some mythical percentage. If over here, "Oh, this is 100% duplicate and this is 0% duplicate, 100% unique and this is the 50/50 mark, there must be some imaginary, magical, if I just get to like right here at 41%, that's the number. Therefore I'm going to create a huge website and all my pages just have to hit that 47% mark." That is dead wrong. Just totally wrong. There's nothing like this.

The algorithms that you might imagine are so much more sophisticated than an exact percentile of what is and isn't duplicate, even when it comes to just studying the content in here. That specific percentage doesn't exist. They use such a vast array of inputs. I'll give you some examples.

You can see, for example, that an article that might be published on many different news sites, after it moves out of Google news and into the Google main index, sometimes duplicates will appear, and oftentimes those duplicates are the ones that are the most linked to, the ones that have lots of comments on them, the ones that have been socially shared quite a bit, or where Google has seen user usage data behaviors or previous behaviors on those sites that suggests that each site provides some sort of unique value, even if the content is exactly the same.

Like Bloomberg and Business Week are constantly producing the same articles. Business Insider will produce articles from all over the place. Huffington Post will take articles from places that writers submit, and it'll be published in different places. People will publish on one site, and then they'll publish privately on their own blog. Sometimes Google will list both, sometimes they won't. It's not about a percentage. It's about the unique value that's provided, and it's about a very sophisticated algorithm that considers lots of other features.

If you are in a space where you're competing with other people who are posting the same content, think about unique value and think about getting the user usage data, the branding, the social shares, the links, all of those things will be taken into consideration when it comes to, "Are we going to rank your site or this other site that's licensing your content or from whom you are licensing content?" Domain authority can play a big role in there.

The last thing I want to mention is that duplicate and low value content, because of Google's Panda update from 2011, Panda means that low quality content, duplicative content that exists on one part of your site can actually harm your overall site. I'd be very cautious if you're thinking, "Hey, let's produce an article section on our site that's just these 5,000 articles that we licensed from this other place or that we're copying from someone's blog. We might not get much SEO value from it, but we will get a little bit of extra search engine traffic." In fact, that can hurt you because as the Panda algorithm runs its course and sees, "Boy, this site looks like it copied some stuff," they might hurt your rankings in other places.

Google's been very specific about this, that duplicate, low quality content in one area can harm you across your entire site. Be mindful of that. If you're nervous about it, you can robot.txt that stuff out so engines don't crawl it. You can rel=canonical it back up to a category page. You could even not include that in search engines. Use the disallow meta noindex, or you could do it inside your Google Webmaster Tools, disallow crawling of those pages. These are all options for that kind of stuff.

All right everyone. Hope you've enjoyed this edition of Whiteboard Friday and you'll go out there and create some unique and uniquely valuable content, and we'll see you again next week. Take care."

Video transcription by Speechpad.com