Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our client's site was owned by former employee who took over the site. What should be done? Is there a way to preserve all the SEO work?
A client had a member of the team leave on bad terms. This wasn't something that was conveyed to us at all, but recently it came up when the distraught former employee took control of the domain and locked everyone out. At first, this was assumed to be a hack, but eventually it was revealed that one of the company starters who unhappily left the team owned the domain all along and is now holding it hostage. Here's the breakdown: -Every page aside from the homepage is now gone and serving a 404 response code -The site is out of our control -The former employee is asking for a $1 million ransom to sell the domain back -The homepage is a "countdown clock" that isn't actively counting down, but claims that something exciting is happening in 3 days and lists a contact email. The question is how we can save the client's traffic through all this turmoil. Whether buying a similar domain and starting from square one and hoping we can later redirect the old site's pages after getting it back. Or maybe we have a legal claim here that we do not see even though the individual is now the owner of the site. Perhaps there's a way to redirect the now defunct pages to a new site somehow? Any ideas are greatly appreciated.
Technical SEO | | FPD_NYC0 -
Site hacked in Jan. Redeveloped new site. Still not ranking. Should we change domain?
Our top ranking site in the UK was hacked at the end of 2014. http://www.ultimatefloorsanding.co.uk/ The site was the subject of a manual spam action from Google. After several unsuccessful attempts to clean it up, using Securi.net and reinstating old versions of the site, changing passwords etc. we took the decision to redevelop the site. We also changed hosting provider as we had received absolutely no support from them whatsoever in resolving the issue. So far we have: Removed the old website files off the server Developed a new website having implemented 301's for all the old URL's (except the spam ones) Submitted a reconsideration request for the manual spam action, which was accepted. Disavowed all the spammy inbound links through Webmaster Tools Implemented custom URL parameters through Google to not index the SPAM URLs ( which were using parameters) Our organic traffic is down by 63% compared to last year, and we are not ranking for most of our target keywords any longer. Is there anything that I am missing in the actions I have taken so far? We were advised that at this stage changing domain and starting again might be the way to go. However the current domain has been used by us since 2007, so it would be a big call. Any advice is appreciated, thanks. Sue - http://www.ultimatefloorsanding.co.uk/
Technical SEO | | galwaygirl0 -
Tracing Redirects to a Site
I wonder if anyone has used any tools where you can trace the redirects pointing to a site? I know there are a number of tools out there that can be used to check where a URL redirects to, but I was wondering if anyone has used a tool where I could trace all redirects with the final URL? I am using this for competitor research so I don't have access to Analytics or Webmaster Tools.
Technical SEO | | BeattieGroup0 -
Site Launching, not SEO Ready
Hi, So, we have a site going up on Monday, that in many ways hasn't been gotten ready for search. The focus has been on functionality and UX rather than search, which is fair enough. As a result, I have a big list of things for the developer to complete after launch (like sorting out duplicate pages and adding titles that aren't "undefined" etc.). So, my question is whether it would be better to noindex the site until all the main things are sorted before essentially presenting search engines with the best version we can, or to have the site be indexed (duplicate pages and all) and sort these issues "live", as it were? Would either method be advisable over the other, or are there any other solutions? I just want to ensure we start ranking as well as possible as quickly as possible and don't know which way to go. Thanks so much!
Technical SEO | | LeahHutcheon0 -
Cache Not Working on Our Site
We redesigned our site (www.motivators.com) back in April. Ever since then, we can't view the cache. It loads as a blank, white page but the cache text is at the top saying: "This is Google's cache of http://www.motivators.com/. It is a snapshot of the page as it appeared on Jul 22, 2013 15:50:40 GMT. The current page could have changed in the meantime. Learn more. Tip: To quickly find your search term on this page, press Ctrl+F or ⌘-F (Mac) and use the find bar." Has anyone else ever seen this happen? Any ideas as to why it's happening? Could it be hurting us? Advice, tips, suggestions would be very much appreciated!
Technical SEO | | Motivators0 -
Wordpress Page vs. Posts
My campaigns are telling me I have some duplicate content. I know the reason but not sure how to correct it. Example site here: Bikers Blog is a "static page" referencing each actual "blog post" I write. This site is somewhat orphaned and about to be reconstituted. I have a number of other sites with a similar problem. I'm not sure how to structure the "page" so it only shows a summary of the blog post on the page not the whole post. Permalinks is set as "/%postname%/" I've posted on Wordpress.org with no answer. Since this is an SEO issue I thought maybe someone with WP experience could chime in. Thanks, Don
Technical SEO | | NicheGuy0 -
Want to Target Mobile site for Google Mobile Version and Desktop Site for Google Desktop Version
I have ecommerce site with both mobile version and desktop version. Mobile version starts with m.example.com and full version starts with www.example.com I am using same content through out both site and using 301 redirection by detecting user agent vice-versa. My both sites are accessible to crawl by any google spider. I have submitted both sites's sitemap to GWT and mobile site having mobile sitemap xml, so google can easily recognize my mobile site. Is it going to help to rank my both sites as per my expectation? I need to rank for mobile site in Google mobile and ranking for desktop site in Google desktop version. Some of pages of my mobile site are started to appearing in Google desktop version. So how I can stop them to appear in Google desktop? Your comments are highly welcome.
Technical SEO | | Hexpress0 -
See your sites Architecture
Does anybody know a problem where you can see how your internal linkings look to the search engines?
Technical SEO | | ScottBaxterWW0