A New Way to Migrate WordPress Content Into Drupal

by Bevan Rudge

The Donald W. Reynolds Journalism Institute (RJI) is an organization that seeks out and tests innovations in journalism to find the best solutions for use in the real world.

Their new Palantir-developed Drupal website replaces a custom PHP website and two WordPress.com blogs. Part of our assignment involved migrating content from RJI's WordPress blogs into Drupal.

Initially, we intended to do this using the WordPress Import module. However WordPress Import is a stand-alone module that does not integrate with CCK fields, meaning that you cannot import WordPress post categories or authors as CCK text or node-reference fields. It also has limited options for importing files attached to WordPress posts.

To solve this problem, we created WordPress XML for Feeds, a module that allows Drupal's Feeds module to parse the WordPress export file (WXR). It uses a map to create Drupal nodes (or other entities) in the same way that Feeds uses a map to create Drupal nodes from an RSS feed. This allows site developers to create an arbitrary map that tells Feeds module where and how to store the WordPress post's data in Drupal (e.g., as a CCK field, as a property on the Drupal node, or as some other entity).

Here are some things you can do with WordPress XML for Feeds module that you can't do with WordPress Import module:

  1. Import the WordPress post's content body to a CCK text field
  2. Import the WordPress post's content teaser and body (excluding the teaser) to two different CCK text fields
  3. Import WordPress author name as a user-reference, node-reference and/or CCK text field
  4. Import WordPress categories or tags as a node-reference and/or CCK text field
  5. Import WordPress categories and/or tags into the same vocabulary
  6. Only import categories and/or tags that are used in the posts (instead of all tags)
  7. Not import categories and/or tags at all
  8. Import enclosures (podcasts) as a FileField.
  9. Import attachments as a FileField on the WordPress post's node
  10. Import attachments as their own nodes, different to the WordPress post's node
  11. Import the hierarchy of a comment thread

(Please offer corrections in the issue queue if I have missed anything.)

Because WordPress XML for Feeds leverages the Feeds module, the configuration of the importer and its map can be saved to the database and re-used, exported to code with the Features module, or adapted to meet the needs of a wider range of use cases.

Conveniently, the Feeds module handles all of the issues relating to data processing and content import, keeping WordPress XML for Feeds lean and simple.

Architecturally, WordPress XML for Feeds contains two modules;

  1. WordPress XML for Feeds (wp_feeds) extends Feeds' classes to expose WordPress-specific data contained in the WordPress eXtended RSS (WXR) files.
  2. WordPress Importer (wp_feeds_wxr_importer) is simply a feature-package with a configuration similar to what WordPress Import's import configuration, but utilizing WordPress XML for Feeds and Feeds module's API instead.

WordPress XML for Feeds module works fine without the WordPress Importer (not to be confused with the "WordPress Import" module). But building a useful configuration without having seen an example can be intimidating. So it is recommended that you try out the configuration in the WordPress Importer module first (probably on a new empty instance of Drupal), then tweak it with Feeds UI module to meet your needs. When you are ready you can export it to code with Features module and manage the configuration in code with a version-control system.

WordPress XML for Feeds has not yet been tested widely and probably still has a few bugs. However, because it's been released on drupal.org, you can try it out and help to iron out the bugs by reporting them or contributing patches to fix them.

Note that the dependency list for WordPress XML for Feeds is large. Pay special attention to the instructions on the project page, especially when it comes to SimplePie, and enabling the Feeds module (feeds) before enabling WordPress XML for Feeds module (wp_feeds) or WordPress Importer module (wp_feeds_wxr_importer).

WordPress XML for Feeds module allowed us to define, export-to-code and version control a custom configuration for the import of RJI's WordPress blogs to Drupal. We were able to attach different import configurations to different sections of the website. And the client was able to continuously integrate content from their WordPress blogs throughout site development and testing to launch simply by uploading a fresh WordPress export file. All imported content, including threaded discussions and comments, is available on the new site at the same URLs that it appeared on the old site.

Comments

Likely because the wordpress_migrate module did not yet exist at the time of the project, or there was no Drupal 6 version available (still true). Bevan can likely provide more details as to why. :)

WP Migrate did not exist.

Though I did consider integration with Migrate module. It would have required an additional manual step to import the WXR file to a local instance of WordPress to get the data into a database, so that Migrate can access it. (RJI's blogs were hosted on WordPress.com—similar to Acquia Gardens—which does not provide raw database dumps or database access.)

It was useful to reduce the manual steps as RJI had a lengthy QA and training period after the primary development phase but before site launch. Reduced manual process allowed continuous integration of content without much manual effort.

An extra step would also have introduced more potential points of failure; I.e. I can not be confident (without much testing) that all the data that is exported in a WXR gets re-imported to a local instance of WordPress. And I did not want to need to dive into WordPress code.

In summary, Feeds integration was easier than integration with Migrate for the WordPress.com blogs. With a self-hosted WordPress blog, a target of Drupal 7 and/or with WP Migrate module, the best solution may well be different.

A lot of big Drupal shops still builds on D6. It's still as good as it was before, and contrib is very stable and reliable to estimate jobs on.

At NodeOne we only started doing our first D7 sites just recently, and still, we have some D6 projects starting off even now.

The bulk of our work on the project was done in 2010 long before Drupal 7 was released. The vast majority of the work that we have been doing over the last year has been in Drupal 7, but for some clients, Drupal 6 is still the right choice.

Full Fact (itself an innovation in journalism) would have made good use of this last year, so the RJI has hit its target here—thanks.