Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Partner Sites
Hi All, Within our company we have a media group that publishes magazines and videos, the sites have footers that link to our shopping site, one of them has 118,459 links to one URL, domain authority 23, and the other 17,726 to seven URLs, domain authority 52, (there are some articles which link organically). My question is are these links because they're from identifiable companies with the same ownership worth keeping or are they detrimental? The site being linked to has a DA of 39 Cheers Stew
Technical SEO | | StewMcG0 -
Switching site from http to https. Should I do entire site?
Good morning, As many of you have read, Google seems to have confirmed that they will give a small boost to sites with SSL certificates this morning. So my question is, does that mean we have to switch our entire site to https? Even simple information pages and blog posts? Or will we get credit for the https boost as long as the sensitive parts of our site have it? Anybody know? Thanks in advance.
Technical SEO | | rayvensoft1 -
301 redirecting old content from one site to updated content on a different site
I have a client with two websites. Here are some details, sorry I can't be more specific! Their older site -- specific to one product -- has a very high DA and about 75K visits per month, 80% of which comes from search engines. Their newer site -- focused generally on the brand -- is their top priority. The content here is much better. The vast majority of visits are from referrals (mainly social channels and an email newsletter) and direct traffic. Search traffic is relatively low though. I really want to boost search traffic to site #2. And I'd like to piggy back off some of the search traffic from site #1. Here's my question: If a particular article on site #1 (that ranks very well) needs to be updated, what's the risk/reward of updating the content on site #2 instead and 301 redirecting the original post to the newer post on site #2? Part 2: There are dozens of posts on site #1 that can be improved and updated. Is there an extra risk (or diminishing returns) associated with doing this across many posts? Hope this makes sense. Thanks for your help!
Technical SEO | | djreich0 -
Site Crawl
I was wondering if there was a way to use SEOmoz's tool to quickly and easily find all the URLs on you site and not just the ones with errors. The site that I am working on does not have a site map. What I am trying to do is find all the URLs along with their titles and description tags. Thank you very much for your help
Technical SEO | | pakevin0 -
Site not indexing correctly
I am trying to figure out what is going on with my site listings. Google is only displaying my title and url - no description. You can see it when you search for Franchises for Sale. The site is www.franchisesolutions.com. Why could this happen? Also I saw a big drop off in a handful of keyword rankings today. Could this be related?
Technical SEO | | franchisesolutions0 -
Mini site links?
Can anyone point me to information about the "mini" site links on the Google search results or tell me how to get them set up? These aren't the full site links that show 3 by 3 under the first listing but small text links that appear for certain results. (See attached image for reference.) Are these something that can controlled/requested? NAj6E.png
Technical SEO | | DVanSchepen0 -
Traffic has dropped from my site.
Hello, I never had amazing traffic, but during the last week my site seems to have almost dropped of search engines. Nothing drastic has changed during this time that I can see would have caused this. The site is http://www.comparebestodds.com Does any one have any ideas that can help? Thanks
Technical SEO | | jwdesign0 -
How to setup tumblr blog.site.com to give juice to site.com
Is it possible to get a subdomain blog.site.com that is on tumblr to count toward site.com. I hoped I could point it in webmaster tools like we do www but alas no. Any help would be greatly appreciated.
Technical SEO | | oznappies0