How to download an entire Website (HTML only), ready to rehost
-
Hi all,
I work for a large retail brand and we have lots of counterfeit sites ranking for our products. Our legal team seizes the websites from the owners who then setup more counterfeit sites and so forth.
As soon as we seize control of a website, the site content is deleted and subsequently it falls out of the SERPs to be immediately replaced by the next lot of counterfeit sites.
I need to be able to download a copy of the site before it is seized, so that once I have control of it I can put the content back and hopefully quickly regain the SERPs (with an additional 'counterfeit site' notice superimposed on that page in JS).
Does anyone know or can recommend good software to be able to download an entire website, so that it can be easily rehosted?
Thanks
FashionLux
(Edited title to reflect only wanting to download html, CSS and images of site. I don't want the sites to actually be functional - only appear the same to Google)
-
Thanks for the detailed explanation.
If you know of any software or techniques to crawl and download multiple (html) pages and images of a site then please let me know.
There are many programs designed to crawl websites and grab the html code. Legitimate sites are often duplicated in this manner. You can try searching a couple relevant terms or searching black hat seo sites.
-
"If it is a very basic pure html/css site, you can pretty much achieve your goal." - Yes this is exactly what I need, I don't want the site to be functional and allow users to place orders (which could happen for non-JS users who don't see the notice that fills the entire screen). I don't want to do anything apart from rehost the site and put a big message up that says "THIS SITE WAS A SCAM - BEWARE OF OTHER SCAM SITES" and cannot be closed down.
"Do you obtain control over just the domain?" - Yes only the domain, not the hosting. We go through legal proceeding to prove the site is illegally selling counterfeit goods and obtain the blank domain.
"I understand your intentions are good, but the method is not complaint with Google's Guidelines." Fair point, but Google shouldn't rank these sites in the first place - they have no genuine links and should be banned already. Google aren't spotting this, so I have to fix Google's **** up. If the site gets banned I couldn't care less. Whilst they rank they serve a genuine purpose of (a) showing users that there are counterfeit sites and they need to be wary and (b) new sites have to better the SEO ability of the old ones in order to rank on page 1.
"Your goal is purely to manipulate search engine results which makes these activities black hat and subject to penalty." Yes but it doesn't matter if the domain is banned, it's not my genuine website and has no links back to my genuine site. I'm not going to host the sites on the same server as our genuine site so no risk to the company. Really I couldn't care less if it gets banned - the counterfeit sites are ranking due to black hat techniques - its in my interest for Google to eventually work it out and fix their algo as it will stop the hundreds of other counterfeit sites from ranking too.
"You can use every social media page, etc. If you put in the time and effort, these pages will rank very well in SERPs."
Yes I could, by building links to social media pages for the hundreds of search terms currently dominated by counterfeiters but this is not a good idea for two reasons:
1. Trying to rank social media sites for irrelevant terms isn't a good thing - you wouldn't do it for users if this situation wasn't happening. As you said already, this is a form of trying to manipulate SERPs and I wouldn't want to risk these genuine SocMed pages getting banned because of this.
2. There are hundreds of search terms to optimise for, and 8 remaining slots on Google to fill for many of these. These sites are also powerful in their SEO strength - 17 counterfeit sites made it into Majestic's top 1million sites by links - these sites have literally tons of scummy, comment box spammed links pointing at them and they are ranking (shame on Google). Competing against these isn't possible via white hat methods and I'm not a black hat kind of guy.
My thought process is - Why try and compete against these sites (and waste A LOT of time and effort) trying to bump them down the rankings when they've already done the hard work of optimisation and link building for these terms? I could simply 're-use' them for a genuine purpose (making our customers beware of ordering from unofficial websites).
The previous owner won't sue us for re-using their content - that involves making themselves known to authorities and they'd get arrested in turn for their illegal activities.
I'm happy to debate it more as its an interesting subject and I don't want to waste time going down the wrong route, but I think re-using the sites is the best option - I just need to get copies of them so they LOOK the same to Google and hopefully keep their SERPs.
If you know of any software or techniques to crawl and download multiple (html) pages and images of a site then please let me know.
Thanks for all of the responses
FashionLux
-
Thanks for the response.
"you can download the the html but not the files themselves" - the html is all I need. I don't want the site to actually work so having only the html files is perfect.
I can go to the homepage and manually save it, and go through 100+ pages and manually download them - I just wanted to ask if there was any software that would do this for me and save some leg work.
Thanks again
-
Most sites are database driven. The public does not have direct access to the database. Accordingly you cannot download the full functioning website in the manner you desire.
If it is a very basic pure html/css site, you can pretty much achieve your goal.
Do you obtain control over just the domain? Or do you have access to their hosting account? If you gain access to the hosting account, you can request the host restore the site from a backup.
Even if you gain access to the full site, you really need to be careful. Your goal is purely to manipulate search engine results which makes these activities black hat and subject to penalty. I understand your intentions are good, but the method is not complaint with Google's Guidelines.
If you own the brand, and you have a trademark, you can build quality sites promoting the brand. You can use every social media page, etc. If you put in the time and effort, these pages will rank very well in SERPs.
Some great legal victories are being won in the US to help with these types of issues. Coach recently won a similar case. It's great to hear the good guys are gaining some ground.
-
Dude, you wont be able to do that, the files are stored on the server behind a password locked folder.
Like Ryan said you can download the the html but not the files themselves.
As long as you get the content that should be enough, put it into a word doc and paste it back up once you have the domain, doesn't even need a template.
You need to stop them from re-using the content on another site.
-
Hi Ryan,
Thanks for the reply. To clarify, the site is deleted prior to me gaining control of it - by the time it comes into my hands it's completely blank, so FTP'ing isn't an option.
The site owners are essentially scamming members of the public by charging hundreds of dollars for goods that are never delivered. We've seized hundreds of sites through legal proceedings, but more keep popping up the moment we get hold of them.
These sites rank for hundreds of popular search terms (some have hundreds/thousands of spammy inbound links), so bumping them off page 1 for all SERPs isn't achievable.
By seizing the sites, keeping the content, but making the site non-functioning (imagine a popup image that fills the screen and can't be escaped) it will hopefully mean we own these SERPs and new counterfeit sites have to try and outrank them.
In turn we'll seize those sites, so the next wave of counterfeit sites have to do even more link building - eventually (maybe years) they'll realise its not worth it and give up.
Manually downloading individual webpages isn't an option, so I'm wondering if theres any programmes that can download all html files for a website so I can then just upload them via ftp once the site has been seized and add my javascript image
Thanks for all of the responses
FashionLux
-
Based on your question I am not clear if the site is deleted prior to your gaining control over the site.
If you are trying to copy a site before you have control over it, all you can do is download the HTML of the various web pages. If you spend a bit more time, you may be able to figure out file names on the server and download them, but that is moving down a path of internet security and hacking.
If you are trying to copy a site after you have control over it, the easiest method to capture everything would be a cPanel backup. cPanel is the most popular software used to administrate Apache web servers. That is the most likely hosting environment for counterfeit sites. A single cPanel backup will capture everything.
Otherwise you can go through and copy the public_html folder (or whatever the main folder is called, it will vary based on server setup) along with the database and other settings you wish to retain such as e-mail.
Understand the old site owner will still have all the passwords and an understanding of the code. While it is unlikely, they could leave themselves backdoors into the site as well. This is one reason why maintaining their site is not likely to be a good idea.
Once you began running these sites from your server, what is the plan? You would place a "counterfeit" notice and then ??? that's it? Or would you redirect them to your site? If you redirect them to your site and maintain these sites up on an ongoing basis, it can be seen as a network of doorway sites.
I understand what you are doing and why. The issue is you are taking actions purely based on search engine rankings. To do such for a short period such as 30-60 days is likely fine. To do it on a more permanent basis will likely lead you to a penalty.
-
Hi Dean,
Heather is right! you should access the websites through FTP. Also if there are databases then you should be able to export the data from the software that is managing it.
Istvan
-
Hi Dean
Could you not just use your FTP client (like Filezilla or Dreamweaver) to pull the entire site content down, save it locally, ready to upload later? Or do you not have FTP details of the sites you're taking over?
Sorry if I've miss understood the question
Heather
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are there any tools to give a value STRICTLY for Quantity of Content on your website?
I am trying to put a value to all the work I do and want to put a very specific value to the number of pages of unique content I have. I know everyone says its about quality, and sure it is but quantity is still a factor and looked at. (Can't argue with if you prefer 100 semi-optimized pages versus 1 optimized page- and is unfair for a tool to rate the website the 1 optimized page higher) I use a ton of tools but yet to find something that puts a value on quantity of CONTENT ONLY (Please don't respond with PA or DA because that encompasses all the inherit value)
Moz Pro | | SEOEnthusiast0 -
Website Issues - Duplicate Content
Hello, I'm fairly new to using Moz and I logged on this morning to find Issues have been found in one of the websites - 22 High Priority and 44 Medium. I know it's due to duplicate content in the blog, but i can't figure out what is duplicated? I've only recently come on board this website so I don't know if the content has been plagiarised or what? The link to the site is here: delacyspa.co.uk Any help would be appreciated. Thanks zFxQmmd
Moz Pro | | Cowbang0 -
Page Authority and Google updates favouring websites with black hat practices ?
Can someone explain how is it that most of the competitors I have online and that rank in first page of the search results almost entirely get links ( in the thousands) and still have higher or equal domain/page authority than mine? I went 1 by 1 checking all their links and they mostly come from sex pages, and non related sites. I say stop creating angry pandas and penguins and start taking out of the game people that just play dirty. Thanks.
Moz Pro | | AbellSEO0 -
Moz & Xenu Link Sleuth unable to crawl a website (403 error)
It could be that I am missing something really obvious however we are getting the following error when we try to use the Moz tool on a client website. (I have read through a few posts on 403 errors but none that appear to be the same problem as this) Moz Result Title 403 : Error Meta Description 403 Forbidden Meta Robots_Not present/empty_ Meta Refresh_Not present/empty_ Xenu Link Sleuth Result Broken links, ordered by link: error code: 403 (forbidden request), linked from page(s): Thanks in advance!
Moz Pro | | ZaddleMarketing0 -
Whats the best way to research website relevant keywords
I wanted to know the general practices SEO marketers use when initially researching and revisiting relevant keywords for a website. I use this process Brain storm a list of keywords, between me and the client Check out competitors websites & SERP's look at Google adwords traffic analyser ( mostly as a point of reference and for increased ideas) How do others do it? what process do you find works - i'm based in the Uk would be great to get a UK perspective. Obviously you can't optimise for all keywords in your list how best is it to decide on the best value or less competitive keywords? I struggle to get stats for local keywords( not enough search data) - most my clients are local businesses and have limited surrounding service areas. thanks
Moz Pro | | Bristolweb0 -
Google Places have been entirely replaced by new Google+
Read this! http://searchengineland.com/google-places-is-over-company-makes-google-the-center-of-gravity-for-local-search-122770 http://www.nydailynews.com/news/national/review-google-adds-zagat-reviews-local-article-1.1086985 Any ideas? How's your Google+ circle look like? What will happened to our SEO ranking? IS Rogger MOZ is ready for that change? What about SEOmoz will give us some Local/Pages/Google+ tools?
Moz Pro | | Elchanan0 -
Mobile Website Resources
Hey everyone, Can you please recommend great resources for building mobile website and using proper SEO techniques for mobile? Just a list of resources would be great. I understand that this is SEO forum so at least basics for mobile SEO would do. I'm currently using http://www.howtogomo.com and WPTouch PRO (for WordPress) but would love to learn to build mobile sites myself, at least with templates or basic tools provided. Just want to know what's there to know and how hard it is. And if I can handle it - what SEO practices for mobile I should keep in mind. Thank you! Max
Moz Pro | | MaxMinzer0