Help finding website content scraping
-
Hi,
I need a tool to help me review sites that are plagiarising / directly copying content from my site. But tools that I'm aware, such as Copyscape, appear to work with individual URLs and not a root domain. That's great if you have a particular post or page you want to check. But in this case, some sites are scraping 1000s of product pages. So I need to submit the root domain rather than an individual URL.
In some cases, other sites are being listed in SERPs above or even instead of our site for product search terms. But so far I have stumbled across this, rather than proactively researched offending sites.
So I want to insert my root domain & then for the tool to review all my internal site pages before providing information on other domains where an individual page has a certain amount of duplicated copy. Working in the same way as Moz crawls the site for internal duplicate pages - I need a list of duplicate content by domain & URL, externally that I can then contact the offending sites to request they remove the content and send to Google as evidence, if they don't.
Any help would be gratefully appreciated.
Terry
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My Website is getting too many DMCA Hits
My Website has been getting too many DMCA Hits since last december then my rankings dropped i would like to know if getting a new domain would be advisable ... and would it be good to redirect my website that is getting DMCA hits to the new domain i want to get it is advisable to build links for it the new domain or would it pass link juice to it (it has some spammy links tho)
White Hat / Black Hat SEO | | emmycircle0 -
Third part http links on the page source: Social engineering content warning from Google
Hi, We have received "Social engineering content" warning from Google and one of our important page and it's internal pages have been flagged as "Deceptive site ahead". We wonder what's the reason behind this as Google didn't point exactly to the specific part of the page which made us look so to the Google. We don't employ any such content on the page and the content is same for many months. As our site is WP hosted, we used a WordPress plugin for this page's layout which injected 2 http (non-https) links in our page code. We suspect if this is the reason behind this? Any ideas? Thanks
White Hat / Black Hat SEO | | vtmoz1 -
How long before our website bounce back after Google Penalty?
One of our client websites got recently hacked. In a span of 4 days, it received random backlinks from random websites with random anchor texts. We are already in good standing for some of the keywords we are tracking and the attack got us a penalty from Google and we lost our rankings, moving out of the top 500. We already disavowed these dirty backlinks though we never really diagnosed where these came from. How long do you think our client's website will bounce back from the penalty?
White Hat / Black Hat SEO | | SirAdri110 -
Internal Links & Possible Duplicate Content
Hello, I have a website which from February 6 is keep losing positions. I have not received any manual actions in the Search Console. However I have read the following article a few weeks ago and it look a lot with my case: https://www.seroundtable.com/google-cut-down-on-similar-content-pages-25223.html I noticed that google has remove from indexing 44 out of the 182 pages of my website. The pages that have been removed can be considered as similar like the website that is mentioned in the article above. The problem is that there are about 100 pages that are similar to these. It is about pages that describe the cabins of various cruise ships, that contain one picture and one sentence of max 10 words. So, in terms of humans this is not duplicate content but what about the engine, having in mind that sometimes that little sentence can be the same? And let’s say that I remove all these pages and present the cabin details in one page, instead of 15 for example, dynamically and that reduces that size of the website from 180 pages to 50 or so, how will this affect the SEO concerning the internal links issue? Thank you for your help.
White Hat / Black Hat SEO | | Tz_Seo0 -
Duplicate content - multiple sites hosted on same server with same IP address
We have three sites hosted on the same server with the same IP address. For SEO (to avoid duplicate content) reasons we need to redirect the IP address to the site - but there are three different sites. If we use the "rel canonical" code on the websites, these codes will be duplicates too, as the websites are mirrored versions of the sites with IP address, e.g. www.domainname.com/product-page and 23.34.45.99/product-page. What's the best ways to solve these duplicate content issues in this case? Many thanks!
White Hat / Black Hat SEO | | Jade0 -
Need help determining how toxic this backlinking is
Okay, so my company has an SEO company already. However, we're trying to get people internally cross-trained on SEO, so I've been selected to kind of do a crash-course in SEO and look at our site from a new perspective. We are in the process of getting our old site ported over to a new one we've also created on Wordpress. I've been doing a LOT of online research, but this is definitely a very new field for me. Here's our current site: www.cedrsolutions.com So, here's my question: While doing some SEO-optimizing automatic tests on our site, I came across some weird backlinks to one of our pages: http://www.cedrsolutions.com/dental-office-manual/ http://en.calameo.com/read/003415063525a885728e7 Here's the thing: We didn't make this. It looks HORRIBLE, the copy is gibberish, and it looks weird. Doing some more searching, I started finding stuff like this https://lessons.engrade.com/dentalofficemanual/1 http://pumosust.over-blog.com/2014/09/how-to-get-customized-dental-office-manuals-online.html https://www.youtube.com/watch?v=egMonqa5eRo (???? I don't even understand how someone did this, the photo in the book is just the photo from our page) http://www.tuugo.in/Companies/cedr-hr-solutions/0150008267958#! http://www.webjam.com/dental_office_manual/$my_blog/2014/09/12/how_to_get_customized_dental_office_manuals_online Conservatively, I'd say there's at least 100 of these types of pages out there linking to us, maybe more Then I started finding comments on blogs http://blog.kenexa.com/hr-focus-on-increasing-revenue-not-just-managing-costs/ http://geekologie.com/2012/05/bad-ideas-boyfriend-visits-dentist-ex-da.php (some NSFW language on that one) So, my first thought is obviously "Okay, these are gibberish, over-optimized, and ALL of them are trying to bump our relevancy for something along the lines "Dental office manual" EDIT: I should also mention these links ALL just appeared out of thin air. A whole bunch in early July, and more in mid-September. They didn't just slowly accumulate. So (finally) here's my questions: 1. Did our current SEO company probably do this? The only thing they've mentioned before is that they were going to create some backlinks for us, with an assurance they'd be genuine links that would build Pagerank without getting us slapped by Google. 2. Am I correct in my opinion that these are toxic links that could get manual action taken against us by Google? I'm not sure how LIKELY it is (as again, there's only about 100 or so) but they seem to be violating multiple Google principles. With how often Google pushes out algorithm updates I feel like we could still get busted for this even if the links are like 6-7 months old and not sending us much traffic. I'm asking because I've been told to set up a conference call with the account manager at our current SEO place, and I want to know what I'm getting into. I might be wildly over-reacting about nothing, I might be kind of right but it's not that bad, or I might be 100% right and what they are doing is not cool at all, and could kill our SEO if we get busted by Google. I'm not sure which it is. Checking Google webmaster tools and analytics, I don't see any drops in organic traffic between July '14 and now, so I don't think we've been smacked by Google algorithm-wise. And there's no notice from Google of manual action being taken, or anything being wrong with our backlinks, so I'm fairly confident these links haven't hurt us at least as of today. I'm just worried going forward (especially when we finish the new site and submit it to Google to get crawled, the URLs will be the same) Sorry this was so long. I'm kind of nervous, honestly. On the one hand, these backlinks seem SUPER sketchy to me, but on the other hand, I don't KNOW any of this stuff. It sounds kind of ridiculous for me, someone with maybe 3 weeks of intense Google-education in SEO, to be questioning something a real, established SEO company is doing. I mean, I kind of have to assume they know better, right?
White Hat / Black Hat SEO | | CEDRSolutions1 -
We seem to have been hit by the penguin update can someone please help?
HiOur website www.wholesaleclearance.co.uk has been hit by the penguin update, I'm not a SEO expert and when I first started my SEO got court up buying blog links, that was about 2 years ago and since them and worked really hard to get good manual links.Does anyone know of a way to dig out any bad links so I can get them removed, any software that will give me a list of any of you guys want to do take a look for me? I'm willing to pay for the work.Kind RegardsKarl.
White Hat / Black Hat SEO | | wcuk0 -
New website :301 redirection of a established domain
Hello , I am launching a new website which would host user generated content . Based on my brandname i have purchased a new domain . In order to improve SEO rankings i was considering to purchase a good quality domain (have gr8 link backs) and then perform 301 redirection of the domain to the new brandname.co.in domain . Does this work ? Is there any harm in doing this ? . Does the Link juice pass naturally ? Warm Rgd
White Hat / Black Hat SEO | | ShoutOut0