Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What's the best way for users to upload their images to my wordpress site to promote UGC
I have looked at lots of different plugins and wanted a recommendation for an easy way for patients of ours to upload pictures of them out partying and having fun and looking beautiful so future users can see the final results instead of sometimes gory or difficult to understand before and after images. I'd like to give them the opportunity to write captions (like facebook or insta posts and would offer them incentives to do so. I don't want it to be too complicated for them or have too many steps or barriers but I do want it to look nice and slick and modern. Also do you think this would have a positive impact on SEO? I was also thinking of a Q&A app where dentists could get Q&A emails and respond - i've been doing AMA sessions and they've been really successful and I would like to bring it into out site and make it native. Thanks in advance 🙂
Technical SEO | | Smileworks_Liverpool1 -
Wordpress pagination and SEO
Hello Mozzers, We have incorporated Wordpress blog in our website. The blog has a fair share of what we believe is a valuable content both for the users and SEO. We have reached the point where our content is getting pushed out to pages 2, 3 and etc. 99% of the older content is still relevant and useful. However it does get less traffic from the users because it is not on the front page. I am dealing with it by showing "related posts" and get some traffic through that. I feel that the content that got pushed from the front page of the blog gets less love from search engines as well.The my permalink structure is /%postname%/ only, however when Wordpress adds page/1/ the SEO ranking appears to drop. Is it because Wordpress adds page/1/ to the address? What is a good way to optimize is? I have 15 posts showing on the front page should I increase it?
Technical SEO | | SirMax0 -
Will sitemap generated in Yoast for a combined wordpress/magento site map entire site ?
Hi For an ecommerce site thats been developed via a combination of wordpress and magento and has yoast installed, will the sitemap (& other yoast features) map (& apply to) the entire site or just wordpress aspects ? In other words does one need to do anything else to have a full sitemap for a combined magento/wordpress site or will Yoast cover it all ? This link seems to suggest should be fine but seeing if anyone else encountered this and had problems or if straightforward ? http://fishpig.co.uk/wordpress-integration/docs/plugins.html cheers dan
Technical SEO | | Dan-Lawrence0 -
Index or Noindex Wordpress Categories?
I've read a few different opinions on this, but I'm still unclear as to the best practice. I use my categories more like tags. Let's say I write a post about about seo, local marketing, and indexing. I would use the categories "seo"+"marketing"+"indexing". Therefore, that same post will show up in all three category pages. If these category pages are all set to be indexed, what impact does that have on my post being indexed? Should I noindex all of the categories except for the main ones to avoid too much duplicate content? Or do you recommend noindexing all of the categories? I know some seo plugins make this easy to do (I'm using Yoast). The only reason I'm hesitant to noindex all categories is because some of them rank well for their subject. I also already tried noindexing about a month ago and lost a lot of blog traffic, so I reversed it. Now some of my category pages have overtaken my post rankings, which makes it harder for the reader to find the content, but my overall blog traffic is back up. With my situation, what is the best thing to do long term? I just started using my blog a lot more so I want to know that I have it setup correctly. Thanks in advance!
Technical SEO | | ChaseH0 -
Quoting From Another Part of Your Site
Suppose that you wrote an awesome piece of content and want to feature short segments of that content on your product pages. Is there any risk (such as duplicate content problems) to quoting a paragraph from another page of your site?
Technical SEO | | Charlessipe0 -
Redirect from old wordpress site to new php site? Best approach
Hi I have two websites one legacy site done in wordpress the other in php. However I would like to merge the two together and remove the wordpress site. However it has a good link profile and the pages rank well. What is the best approach to do a 301 redirect from the old site with all its pages pointing to the homepage of the new site? If so what's the best way to do this in wordpress? Many thanks
Technical SEO | | ocelot0 -
Traffic has dropped from my site.
Hello, I never had amazing traffic, but during the last week my site seems to have almost dropped of search engines. Nothing drastic has changed during this time that I can see would have caused this. The site is http://www.comparebestodds.com Does any one have any ideas that can help? Thanks
Technical SEO | | jwdesign0 -
Best way to Handle Pagination?
At the moment I my blog is paginated like so: /blogs > /blogs/page/2 > /blogs/page/3 etc What are the benefits of paginating with dynamic URLs like here on SEOmoz with /blog?page=3
Technical SEO | | NickPateman810