A New Way to Migrate WordPress Content Into Drupal
by Bevan Rudge
The Donald W. Reynolds Journalism Institute (RJI) is an organization that seeks out and tests innovations in journalism to find the best solutions for use in the real world.
Their new Palantir-developed Drupal website replaces a custom PHP website and two WordPress.com blogs. Part of our assignment involved migrating content from RJI's WordPress blogs into Drupal.
Initially, we intended to do this using the WordPress Import module. However WordPress Import is a stand-alone module that does not integrate with CCK fields, meaning that you cannot import WordPress post categories or authors as CCK text or node-reference fields. It also has limited options for importing files attached to WordPress posts.
To solve this problem, we created WordPress XML for Feeds, a module that allows Drupal's Feeds module to parse the WordPress export file (
WXR). It uses a map to create Drupal nodes (or other entities) in the same way that Feeds uses a map to create Drupal nodes from an RSS feed. This allows site developers to create an arbitrary map that tells Feeds module where and how to store the WordPress post's data in Drupal (e.g., as a CCK field, as a property on the Drupal node, or as some other entity).
Here are some things you can do with WordPress XML for Feeds module that you can't do with WordPress Import module:
- Import the WordPress post's content body to a CCK text field
- Import the WordPress post's content teaser and body (excluding the teaser) to two different CCK text fields
- Import WordPress author name as a user-reference, node-reference and/or CCK text field
- Import WordPress categories or tags as a node-reference and/or CCK text field
- Import WordPress categories and/or tags into the same vocabulary
- Only import categories and/or tags that are used in the posts (instead of all tags)
- Not import categories and/or tags at all
- Import enclosures (podcasts) as a FileField.
- Import attachments as a FileField on the WordPress post's node
- Import attachments as their own nodes, different to the WordPress post's node
- Import the hierarchy of a comment thread
(Please offer corrections in the issue queue if I have missed anything.)
Because WordPress XML for Feeds leverages the Feeds module, the configuration of the importer and its map can be saved to the database and re-used, exported to code with the Features module, or adapted to meet the needs of a wider range of use cases.
Conveniently, the Feeds module handles all of the issues relating to data processing and content import, keeping WordPress XML for Feeds lean and simple.
Architecturally, WordPress XML for Feeds contains two modules;
- WordPress XML for Feeds (
wp_feeds) extends Feeds' classes to expose WordPress-specific data contained in the WordPress eXtended RSS (
- WordPress Importer (
wp_feeds_wxr_importer) is simply a feature-package with a configuration similar to what WordPress Import's import configuration, but utilizing WordPress XML for Feeds and Feeds module's API instead.
WordPress XML for Feeds module works fine without the WordPress Importer (not to be confused with the "WordPress Import" module). But building a useful configuration without having seen an example can be intimidating. So it is recommended that you try out the configuration in the WordPress Importer module first (probably on a new empty instance of Drupal), then tweak it with Feeds UI module to meet your needs. When you are ready you can export it to code with Features module and manage the configuration in code with a version-control system.
WordPress XML for Feeds has not yet been tested widely and probably still has a few bugs. However, because it's been released on drupal.org, you can try it out and help to iron out the bugs by reporting them or contributing patches to fix them.
Note that the dependency list for WordPress XML for Feeds is large. Pay special attention to the instructions on the project page, especially when it comes to SimplePie, and enabling the Feeds module (
feeds) before enabling WordPress XML for Feeds module (
wp_feeds) or WordPress Importer module (
WordPress XML for Feeds module allowed us to define, export-to-code and version control a custom configuration for the import of RJI's WordPress blogs to Drupal. We were able to attach different import configurations to different sections of the website. And the client was able to continuously integrate content from their WordPress blogs throughout site development and testing to launch simply by uploading a fresh WordPress export file. All imported content, including threaded discussions and comments, is available on the new site at the same URLs that it appeared on the old site.