MediaWiki REL1_39
ExternalStore

ExternalStore is an optional feature that enables persistent object storage outside the main database, primarily for revision text (also known as a "blob").

The main public interface for interacting with ExternalStore is ExternalStoreAccess. Though note that higher-level concepts like MediaWiki\Revision\RevisionRecord and text blobs have their own dedicated interface: MediaWiki\Revision\RevisionStore, and MediaWiki\Storage\BlobStore.

Concepts

URL

Objects in external stores are internally identified by a special URL. The URL is of the form <store protocol>://<location>/<object name>.

Protocol

The protocol represents which ExternalStoreMedium class is used. The following protocols are supported by default:

Multiple protocols may be enabled at the same time. For example, to support reading older data while using a different protocol for new data.

Protocols are configured via $wgExternalStores. The ExternalStoreMedium class is decided based on concatenating the value from $wgExternalStores to the string ExternalStore, with a ucfirst transformation applied as-needed.

A custom protocol called "foobar" could be configured by implementing ExternalStoreMedium in a subclass called ExternalStoreFoobar.

Location

The location identifies a particular instance of given store protocol.

In the case of ExternalStoreDB, the location represents a database cluster (one or more database servers that hold the same data).

When using the default of LBFactorySimple, these clusters can be configured via $wgExternalServers. Otherwise, external clusters must be configured via $wgLBFactoryConf.

New insertions

The destination of newly stored text blobs is configured via $wgDefaultExternalStore. To enable use of ExternalStore for new blobs, this must be set to a non-empty array. This can be disabled to store new blobs in the main database instead, it does not affect how existing blobs are read.

Each destination uses a partial URL of the form <store protocol>://<location>. When a blob is inserted, we randomly pick an available protocol/location pair from this list. Insertions will fail-over to another default destination if the chosen one is unavailable.

Append-only

ExternalStore is designed as an append-only system, to persist data in a way that is highly reliable and immutable. As such, the interface is restricted to fetch and insert operations, and specifically does not permit modification or deletion once data is stored.

This design benefits MediaWiki in a number of ways:

  • The limited interface provides flexibility to each protocol implementation.
  • Caching is trivial and safe.
  • Stable references to external store can be kept outside of it, in the core database and anywhere else in caching or other storage layers, without needing to track of propagate changes.
  • Historical data can be stored with high reliability guarantees and operational safety:
    • External database clusters may be operated in read-only mode, directly through MySQL.
    • Each replica within the cluster may operate as independent static backup.
    • Database replication between hosts may be turned off.
    • Even command-line access from outside MediaWiki can't accidentally affect historical data.

In case of maintenance tasks such as recompression, we generally iterate through known blobs and write new blobs as-needed and gracefully update pointers accordingly. If an entire cluster has been copied or recompressed to a new location, it can be taken out of rotation, with any storage space freed at that time. Note that multiple locations may be physically colocated on the same hardware, e.g. by running multiple instances of MySQL. Although it may be simpler to free space by doing recompression during other routine maintenance, such as when migrating data from old to new hardware.