Teal illustration of lockers

D8FTW: Storing Data in Drupal 8

An overview of different methods of storing data in Drupal 8.

Newsflash: Storing and retrieving data is rather the point of a Content Management System like Drupal. Of course, not all content is created equal. Some needs a robust structure and curatorial controls built around it, while other data isn't really content at all but administrator-defined configuration. The way those need to work can vary widely.

In Drupal 7, developers had three not all that great ways of storing data: Entities (usually nodes), the Variables table, and "here's an SQL connection, enjoy!" That doesn't cut it for modern sites, unfortunately. What's more, everything was stored in a single SQL database which is part of what made configuration staging so difficult in Drupal 7; we had to build complex systems to extract the configuration out of arbitrary SQL tables, serialize it, and put it back in.

Not surprisingly, Drupal 8 has largely fixed that problem by tackling the different types of data that may need to be stored, each with their own dedicated APIs. Of course, moving from one big blob of data (aka, arbitrary SQL) to structured APIs requires changing the way you think about arbitrary data. So let's review the different ways to store stuff in Drupal 8, and where each of them is useful.

State

The simplest option is the Drupal State API. The State API is a simple key/value pool for values that are, by design, specific to a single Drupal install. Good examples here include the timestamp of the last cron run, generated optimization lookup tables (which should not get cleared as often as the cache does), the currently active theme, and so on. These are all values that are not user-provided configuration, and would make no sense to deploy from staging to production or vice versa.

State can store values of any type, as they will be serialized and unserialized automatically. However, not all object types can be serialized. In particular, any object that has a dependency on a service should never be serialized. Only serialize value objects.

Note that every read of a state value is a new hit against the underlying database. If you're loading multiple values out of state for some reason, use the getMultiple() method.

The state API is a single namespace, so be sure to namespace your state entry key names with your module name, like "mymodule.last_person_hugged".

Key/Value

The State API is itself just an abstraction layer on top of the Key/Value API. The Key/Value API allows the storing of any arbitrary serializable value, with the keys namespaced to a "collection". The "state" is simply one collection.

It's also possible to use your own collection, directly accessing the Key/Value API. However, if you're going to do that it's helpful to define your own class that composes the Key/Value factory service, much the same way that the State API does. At the moment there aren't many tools to quickly replicate that functionality, but the process is straightforward and the State class itself is readily copy-pasteable. Most of the non-trivial code in it is simply to cache loaded values so that the Key/Value store is not hit multiple times per request for the same value.

Content

Content in Drupal 8 means the Entity API. Drupal Entities are much more rigidly structured than in an ORM like Doctrine. There are three layers to the Entity API, conceptually:

  1. Entity Types define different business logic for different objects. What that logic is varies on the Entity Type. Generally a new Entity Type involves writing a new PHP class. Examples includes Nodes, Users, Taxonomy Terms, Comments, and Custom Blocks.
  2. Entity Bundles are different configurations of the same Entity Type, with a different Field configuration. Creating one involves setting up configuration, which in (nearly) all cases involves an administrator pushing buttons. "page nodes", "article nodes", and "event nodes" are examples of different bundles of the "node" Entity Type.
  3. Fields are the smallest basic unit of Drupal content. A field is a single rich-value. Rather than "string" or "int" it is a value like "email address", "formatted text", or "telephone number". It can also be a reference to another entity. All entity objects can be viewed as a collection of Fields, each of which may be single- or multi-value. (As far as the code is concerned Fields are always multi-value, but may be configured to only bother storing one value.)

The key aspect of Content is that it is generally user-generated and of potentially infinite cardinality. (That is, there's no limit on how many entity records a user can create.) Unlike in previous Drupal versions, however, the Entity API is robust enough that it is reasonable to use for nearly all user-provided data rather than just a select few types.

If you want users to be able to enter data into the system, and there's no hard-coded low limit to how many entries they can make, Entities are your tool of choice. Building a custom entity is also much more viable than in past versions, so don't be afraid to define your own Entity Types. There's no need to just piggy-back on nodes anymore.

Content Entities are also translatable into different languages. The capability is baked into all Field types, making multi-lingual content a breeze.

Configuration

The most important new data system in Drupal 8, though, is the Configuration system. The Configuration system replaces the variables table, the features module, half of the ctools module suite, and the myriad custom tables that various modules defined in previous versions with a single, coherent, robust way to store, manage, and deploy administrator-provided configuration.

That last part is key. The Configuration system is your go-to tool if:

  1. Users on the production site should not be changing these values. If they should be changing values on production, you probably meant for it to be Content.
  2. If you have a staging site, you will typically be editing on the staging site and then deploying to production en masse.
  3. Affects the business rules of the module or site.

For the Drupal 7 users, pretty much anything for which you ever thought "this should really be in a Feature module" now belongs in Configuration. The configuration system is modeled as a namespaced key-value store (although it does not use the Key/Value system internally, as the implementations are quite different). The keys are dot-delimited strings, and the values are specifically "Configuration objects". Config objects have get() and set() methods to manage properties on the object, among other features we won't go into here.

Most importantly, config objects can be safely serialized to YAML and unserialized from YAML. That's what differentiates the Configuration system from the other data storage systems: It's canonical form is not in SQL, but YAML that can be loaded into the system or exported from it. Modules can provide default configuration files in YAML, which will get imported into the site when they're installed. A site can also export some or all of its configuration files to a known directory on disk. That could be hundreds of files, but that's fine. Once in that directory the files can be easily checked into Git, checked out on another server, and imported from files back into config objects in Drupal. Configuration deployment: Solved!

You will also run across something called "Configuration Entities". This seemingly mixed concept is a way of providing CRUD behavior using the basic Entity API but backed by the Configuration API. Configuration Entities do not have Fields, but otherwise use essentially the same API. Configuration Entities are useful for cases where a user or administrator may be making multiple instances of a given configuration object. They're also the storage mechanism underlying most Plugins, in practice.

Configuration objects are also translatable, which allows sites to make string value configuration available in the language of their users.

Tempstore

Tempstore is a bit of an odd duck of Drupal 8's data storage world. It's provided not by core but by the user module, and there's actually not one but two different tempstores: one private, one shared.

A tempstore is for data that needs to be persisted between requests without being saved back to the canonical storage (such as an entity or configuration object). If that sounds like PHP's native session handling, it should; the use case is very similar. The main difference is the shared tempstore is, as the name implies, shared between users, whereas sessions are, by design, not.

The quintessential (and original) example of that behavior is Views. A View is stored as a configuration entity. You don't want the View to be incrementally updated every time a single field is changed, though; you want to make a series of changes and then save the changes all at once. Instead, a temporary copy of the View config entity is saved to the shared tempstore every time a setting is changed. That allows changes to survive a browser restart, or a lunch break, without affecting the live copy of the View. It can even be picked up by another user if the first user gets delayed or goes on vacation and forgets to hit save. When the View is saved then the temporary copy is written back to the configuration system and the temporary version cleared.

The private tempstore works the same way, but its values are not shared between users. That makes it more appropriate for wizard-type interfaces or multi-step forms.

Both tempstores are backed by the Key/Value API internally. The Key/Value API offers a variant called "expirable", where values will get cleared out eventually, say if a View is left mid-edit for several days, which tempstore uses. In practice, unless you're building complex multi-step UIs you won't run into tempstore very often.

Cache

And finally, we have the cache system. Drupal 8's cache system is actually far more robust than its predecessors, and is heavily leveraged by the rendering system. That's a topic for another time, though. For now, we're just looking at cases where you'll use it directly.

The general rule for caching something is "is it more expensive to compute this value than to look up an old version from the database?" Database calls are not cheap. (Even if not using SQL, you're still making some kind of I/O call which is the most expensive thing you can do in a program). Don't cache something in the cache system until and unless you know it's going to be helpful to do so. Often, that is other I/O intensive calls, like a web service call or a complex set of queries.

Another important rule for caching is that it should be for performance only. If the cache is wiped clean of all data, your code should still run. It may run slower, the site may run too slowly to be usable, but no irreplaceable data has been lost. Never, ever store data in the cache that you cannot recreate on-demand if needed. Similarly, don't store generated and regeneratable data elsewhere. That belongs in the cache.

What to choose?

With so many options, how do you know where to put your data? While the lines are not always crystal clear, the following pointers should cover most cases: 

  • Is it purely a performance optimization, and the data can be regenerated if needed? If yes, Cache.
  • Should it be configured on staging and pushed to production? If yes, use the Configuration system. If there will be an arbitrary number of them, use Config Entities.
  • Was it in the variables table before, but not something to push from staging to production? If so, it likely belongs in State.
  • Is it user-generated content on the live site? Most likely it should be a Content Entity.
  • Is it large amount of unstructured data that will only need to be looked up by ID, and is not something that will be deployed from staging to production? Consider a custom Key/Value collection.
  • Is it a session-like place to temporarily store one of the other values above? That's what tempstore is for.

What, no database?

You'll probably note the lack of "an SQL table" in the above list. That's because in Drupal 8 you will rarely if ever interact with the database directly. In fact, a given site may not even be using an SQL database! These more robust, abstracted tools should cover the vast majority of data storage cases. If you need to use SQL for performance reasons, though, there is a supported way to make even that flexible. More on that in our next installment of D8FTW.