Wikibase
MediaWiki Wikibase extension
|
Secondary storage for Item and Property terms in SQL is needed for efficient and atomic lookup and query of the terms of multiple entities in multiple languages.
For example, when rendering an Item page the labels of all other entities being referred to need to be known. The alternative to secondary storage would be loading each of the full entities in order to lookup the terms needed.
The code for the storage lives in the Wikibase\Lib\Store\Sql\Terms namespace.
Writing to the secondary storage happens through a deferred update after each edit on entities. This is to make saving edits faster and more atomic which also means reducing the failure rate of saving edits. As the result, secondary storage might not be always completely in sync with the actual terms stored in the primary storage.
Briefly in code:
data-model-services
vendor componentThe storage system is currently decided using the tmpItemTermsMigrationStages
and tmpPropertyTermsMigrationStage
repo settings.
This currently the default storage mechanism when using Wikibase.
In the past (pre 2020) terms were stored in a single large database table called wb_terms. This table lacked clear design and eventually became too big to touch for wikidata.org. Between 2019 and 2020 a migration process was carried out (and is still being carried out) migrating the terms to a new schema (see below).
The "Epic" task for this was https://phabricator.wikimedia.org/T208425 - [EPIC] Kill the wb_terms table
The storage is made up of multiple normalized tables, all prefixed with "wbt_".DatabaseTermInLangIdsAcquirer The tables were created by AddNormalizedTermsTablesDDL.sql which includes some documentation.
The relations are shown below:
The Normalization results in a more complex query and update pattern. See sections below for more details on how Reading and Updating work.
Lookup terms of an entity
Lookup of the terms of an entity can be achieved by starting with the wbt_item_terms or wbt_property_terms tables where you will find integer representations of Item and Property identifiers.
The below query selects all terms in the tables for item Q123 and can be used as a starting point for data exploration:
For properties you can do something like:
Lookup all entities that use a certain term
Lookup of entities from a term string can be achieved by starting with the wbt_text table which contains the text for all terms or all types for both Items and Properties.
For properties you can do something like:
Process outline
wbt_item_terms
and wbt_property_terms
tables is done in DatabaseItemTermStoreWriter and DatabasePropertyTermStoreWriter.Keeping the store clean
The tables in the store are cleaned up so that data that is totally removed from entities is also totally removed from the store. This is important for cases such as Wikidata that has publicly accessible database replicas of this information.