Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can we use webiste content to Marketplce websites (Etsy / Amazon etc..)?
Hello Webmasters, My Name is Dinesh. I am working with Commerce Pundit as Marketing Person. We have one question with one of the website and would like to get the more idea on it We have one page or category name with "Engraved Photos on Wood". Here is page URL: http://www.canvaschamp.com/engraved-photos-on-wood-plaques So my Question about the content which we have added on this page. We have another team and they are handling marketplace department and they are using same content from the above page of website to do listing onto below Marketplace website. Refer website listing which are done by our marketplace team and where you can see that they guys have use the same content of form the above website page as a product info or description of the listing. https://www.etsy.com/listing/237807419/personalized-photo-art-or-custom-text-on?ref=listings_manager_grid
Intermediate & Advanced SEO | | CommercePundit
http://www.amazon.in/dp/B01003REIC
http://www.amazon.in/dp/B010037IEM
http://www.amazon.in/dp/B01000JG6I
http://www.amazon.in/dp/B01003HT6Y Does it create Duplicate content Issue with the our Website? Can marketplace team use the our website content onto various marketplace website to do website? We are every serious with the Organic Ranking for our website. So do let me know that is this right way or do we have to ask to them to stop this activities? Waiting for your reply Thanks
Dinesh
Commerce Pundit0 -
Best to Fix Duplicate Content Issues on Blog If URLs are Set to "No-Index"
Greetings Moz Community: I purchased a SEMrush subscription recently and used it to run a site audit. The audit detected 168 duplicate content issues mostly relating to blog posts tags. I suspect these issues may be due to canonical tags not being set up correctly. My developer claims that since these blog URLs are set to "no-index" these issues do not need to be corrected. My instinct would be to avoid any risk with potential duplicate content. To set up canonicalization correctly. In addition, even if these pages are set to "no-index" they are passing page rank. Further more I don't know why a reputable company like SEMrush would consider these errors if in fact they are not errors. So my question is, do we need to do anything with the error pages if they are already set to "no-index"? Incidentally the site URL is www.nyc-officespace-leader.com. I am attaching a copy of the SEMrush audit. Thanks, Alan BarjWaO SqVXYMy
Intermediate & Advanced SEO | | Kingalan10 -
Help with Best Content Posting Approach - WordPress site
I have a word document that i would like to add to my wordpress site as a page. The document has a large detailed flow chart of a complex legal process. (about 20+ boxes in the flow chart). I do not want to add it as an image because i want search engines to read/index the information in the flow chart. any suggestions to post this detailed flow chart on a WP page in the best SEO manner? Thanks.
Intermediate & Advanced SEO | | CamiloSC0 -
Remove content that is indexed?
Hi guys, I want to delete a entire folder with content indexed, how i can explain to google that content no longer exists?
Intermediate & Advanced SEO | | Valarlf0 -
News sites & Duplicate content
Hi SEOMoz I would like to know, in your opinion and according to 'industry' best practice, how do you get around duplicate content on a news site if all news sites buy their "news" from a central place in the world? Let me give you some more insight to what I am talking about. My client has a website that is purely focuses on news. Local news in one of the African Countries to be specific. Now, what we noticed the past few months is that the site is not ranking to it's full potential. We investigated, checked our keyword research, our site structure, interlinking, site speed, code to html ratio you name it we checked it. What we did pic up when looking at duplicate content is that the site is flagged by Google as duplicated, BUT so is most of the news sites because they all get their content from the same place. News get sold by big companies in the US (no I'm not from the US so cant say specifically where it is from) and they usually have disclaimers with these content pieces that you can't change the headline and story significantly, so we do have quite a few journalists that rewrites the news stories, they try and keep it as close to the original as possible but they still change it to fit our targeted audience - where my second point comes in. Even though the content has been duplicated, our site is more relevant to what our users are searching for than the bigger news related websites in the world because we do hyper local everything. news, jobs, property etc. All we need to do is get off this duplicate content issue, in general we rewrite the content completely to be unique if a site has duplication problems, but on a media site, im a little bit lost. Because I haven't had something like this before. Would like to hear some thoughts on this. Thanks,
Intermediate & Advanced SEO | | 360eight-SEO
Chris Captivate0 -
Does a mobile site count as duplicate content?
Are there any specific guidelines that should be followed for setting up a mobile site to ensure it isn't counted as duplicate content?
Intermediate & Advanced SEO | | nicole.healthline0 -
Pages un-indexed in my site
My current website www.energyacuity.com has had most pages indexed for more than a year. However, I tried cache a few of the pages, and it looks the only one that is now indexed by Goggle is the homepage. Any thoughts on why this is happening?
Intermediate & Advanced SEO | | abernatj0 -
Pop Up Pages Being Indexed, Seen As Duplicate Content
I offer users the opportunity to email and embed images from my website. (See this page http://www.andertoons.com/cartoon/6246/ and look under the large image for "Email to a Friend" and "Get Embed HTML" links.) But I'm seeing the ensuing pop-up pages (Ex: http://www.andertoons.com/embed/5231/?KeepThis=true&TB_iframe=true&height=370&width=700&modal=true and http://www.andertoons.com/email/6246/?KeepThis=true&TB_iframe=true&height=432&width=700&modal=true) showing up in Google. Even worse, I think they're seen as duplicate content. How should I deal with this?
Intermediate & Advanced SEO | | andertoons0