MediaWiki REL1_40
|
Provide a given client with protection against visible database lag. More...
Inherits LoggerAwareInterface.
Public Member Functions | |
__construct (BagOStuff $store, array $client, ?int $clientPosIndex, string $secret='') | |
getClientId () | |
getTouched (ILoadBalancer $lb) | |
Get the UNIX timestamp when the client last touched the DB, if they did so recently. | |
persistSessionReplicationPositions (&$clientPosIndex=null) | |
Persist any staged client "session consistency" replication positions. | |
setEnabled ( $enabled) | |
setLogger (LoggerInterface $logger) | |
setMockTime (&$time) | |
setWaitEnabled ( $enabled) | |
stageSessionPrimaryPos (ILoadBalancer $lb) | |
Update client "session consistency" replication position for an end-of-life ILoadBalancer. | |
yieldSessionPrimaryPos (ILoadBalancer $lb) | |
Yield client "session consistency" replication position for a new ILoadBalancer. | |
Public Attributes | |
const | POSITION_COOKIE_TTL = 10 |
Seconds to store position write index cookies (safely less than POSITION_STORE_TTL) | |
Protected Member Functions | |
getCurrentTime () | |
getStartupSessionPositions () | |
getStartupSessionTimestamps () | |
lazyStartup () | |
Load the stored replication positions and touch timestamps for the client. | |
mergePositions ( $storedValue, array $shutdownPositions, array $shutdownTimestamps, ?int &$clientPosIndex=null) | |
Merge the new replication positions with the currently stored ones (highest wins) | |
Protected Attributes | |
string | $clientId |
Hash of client parameters. | |
string[] | $clientLogInfo |
Map of client information fields for logging. | |
bool | $enabled = true |
Whether reading/writing session consistency replication positions is enabled. | |
string | $key |
Storage key name. | |
LoggerInterface | $logger |
bool | $positionWaitsEnabled = true |
Whether waiting on DB servers to reach replication positions is enabled. | |
array< string, DBPrimaryPos > | $shutdownPositionsByPrimary = [] |
Map of (primary server name => position) | |
array< string, float > | $shutdownTimestampsByCluster = [] |
Map of (DB cluster name => UNIX timestamp) | |
array< string, DBPrimaryPos > | $startupPositionsByPrimary = [] |
Map of (primary server name => position) | |
float null | $startupTimestamp |
UNIX timestamp when the client data was loaded. | |
array< string, float > | $startupTimestampsByCluster = [] |
Map of (DB cluster name => UNIX timestamp) | |
BagOStuff | $store |
int null | $waitForPosIndex |
Expected minimum index of the last write to the position store. | |
Provide a given client with protection against visible database lag.
This class tries to hide visible effects of database lag. It does this by temporarily remembering the database positions after a client makes a write, and on their next web request we will prefer non-lagged database replicas. When replica connections are established, we wait up to a few seconds for sufficient replication to have occurred, if they were not yet caught up to that same point.
This ensures a consistent ordering of events as seen by a client. Kind of like Hawking's Chronology Protection Agency.
For performance and scalability reasons, almost all data is queried from replica databases. Only queries relating to writing data, are sent to a primary database. When rendering a web page with content or activity feeds on it, the very latest information may thus not yet be there. That's okay in general, but if, for example, a client recently changed their preferences or submitted new data, we do our best to make sure their next web response does reflect at least their own recent changes.
To explain how it works, we will look at an example lifecycle for a client.
A client is browsing the site. Their web requests are generally read-only and display data from database replicas, which may be a few seconds out of date if a client elsewhere in the world recently modified that same data. If the application is run from multiple data centers, then these web requests may be served from the nearest secondary DC.
A client performs a POST request, perhaps to publish an edit or change their preferences. This request is routed to the primary DC (this is the responsibility of infrastructure outside the web app). There, the data is saved to the primary database, after which the database host will asynchronously replicate this to its replicas in the same and any other DCs.
Toward the end of the response to this POST request, the application takes note of the primary database's current "position", and save this under a "clientId" key in the ChronologyProtector store. The web response will also set two cookies that are similarly short-lived (about ten seconds): UseDC=master
and cpPosIndex=<posIndex>@<write time>#<clientId>
.
The ten seconds window is meant to account for the time needed for the database writes to have replicated across all active database replicas, including the cross-dc latency for those further away in any secondary DCs. The "clientId" is placed in the cookie to handle the case where the client IP addresses frequently changes between web requests.
Future web requests from the client should fall in one of two categories:
The store used by ChronologyProtector, as configured via $wgChronologyProtectorStash
, should meet the following requirements:
These are the expectations a site administrator must meet for chronology protection:
Web requests that use the POST verb, or carry a UseDC=master
cookie, must be routed to the primary DC only.
An exception is requests carrying the Promise-Non-Write-API-Action: true
header, which use the POST verb for large read queries, but don't actually require the primary DC.
If you have legacy extensions deployed that perform queries on the primary database during GET requests, then you will have to identify a way to route any of its relevant URLs to the primary DC as well, or to accept that their reads do not enjoy chronology protection, and that writes may be slower (due to cross-dc latency). See T91820 for Wikimedia Foundation's routing.
Definition at line 132 of file ChronologyProtector.php.
Wikimedia\Rdbms\ChronologyProtector::__construct | ( | BagOStuff | $store, |
array | $client, | ||
?int | $clientPosIndex, | ||
string | $secret = '' ) |
BagOStuff | $store | |
array | $client | Map of (ip: <IP>, agent: <user-agent> [, clientId: <hash>] ) |
int | null | $clientPosIndex | Write counter index of replication positions for this client |
string | $secret | Secret string for HMAC hashing [optional] |
Definition at line 207 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$store, and BagOStuff\makeGlobalKey().
Wikimedia\Rdbms\ChronologyProtector::getClientId | ( | ) |
Definition at line 243 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$clientId.
Referenced by Wikimedia\Rdbms\LBFactory\shutdown().
|
protected |
Definition at line 558 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\lazyStartup(), and Wikimedia\Rdbms\ChronologyProtector\stageSessionPrimaryPos().
|
protected |
Definition at line 425 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$startupPositionsByPrimary, and Wikimedia\Rdbms\ChronologyProtector\lazyStartup().
Referenced by Wikimedia\Rdbms\ChronologyProtector\yieldSessionPrimaryPos().
|
protected |
Definition at line 434 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$startupTimestampsByCluster, and Wikimedia\Rdbms\ChronologyProtector\lazyStartup().
Referenced by Wikimedia\Rdbms\ChronologyProtector\getTouched().
Wikimedia\Rdbms\ChronologyProtector::getTouched | ( | ILoadBalancer | $lb | ) |
Get the UNIX timestamp when the client last touched the DB, if they did so recently.
ILoadBalancer | $lb |
Definition at line 396 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ILoadBalancer\getClusterName(), and Wikimedia\Rdbms\ChronologyProtector\getStartupSessionTimestamps().
|
protected |
Load the stored replication positions and touch timestamps for the client.
Definition at line 445 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\getCurrentTime().
Referenced by Wikimedia\Rdbms\ChronologyProtector\getStartupSessionPositions(), and Wikimedia\Rdbms\ChronologyProtector\getStartupSessionTimestamps().
|
protected |
Merge the new replication positions with the currently stored ones (highest wins)
array<string,mixed>|false | $storedValue Current replication position data | |
array<string,DBPrimaryPos> | $shutdownPositions New replication positions | |
array<string,float> | $shutdownTimestamps New DB post-commit shutdown timestamps | |
int | null | &$clientPosIndex | New position write index |
Definition at line 513 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\persistSessionReplicationPositions().
Wikimedia\Rdbms\ChronologyProtector::persistSessionReplicationPositions | ( | & | $clientPosIndex = null | ) |
Persist any staged client "session consistency" replication positions.
int | null | &$clientPosIndex | DB position key write counter; incremented on update |
Definition at line 337 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$shutdownPositionsByPrimary, and Wikimedia\Rdbms\ChronologyProtector\mergePositions().
Referenced by Wikimedia\Rdbms\LBFactory\shutdownChronologyProtector().
Wikimedia\Rdbms\ChronologyProtector::setEnabled | ( | $enabled | ) |
bool | $enabled | Whether reading/writing session replication positions is enabled |
Definition at line 251 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$enabled.
Wikimedia\Rdbms\ChronologyProtector::setLogger | ( | LoggerInterface | $logger | ) |
Definition at line 235 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$logger.
Referenced by Wikimedia\Rdbms\LBFactory\getChronologyProtector().
Wikimedia\Rdbms\ChronologyProtector::setMockTime | ( | & | $time | ) |
float | null | &$time | Mock UNIX timestamp |
Definition at line 574 of file ChronologyProtector.php.
Wikimedia\Rdbms\ChronologyProtector::setWaitEnabled | ( | $enabled | ) |
bool | $enabled | Whether session replication position wait barriers are enable |
Definition at line 259 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ChronologyProtector\$enabled.
Wikimedia\Rdbms\ChronologyProtector::stageSessionPrimaryPos | ( | ILoadBalancer | $lb | ) |
Update client "session consistency" replication position for an end-of-life ILoadBalancer.
This remarks the replication position of the primary DB if this request made writes to it using the provided ILoadBalancer instance.
ILoadBalancer | $lb |
Definition at line 305 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ILoadBalancer\getClusterName(), Wikimedia\Rdbms\ChronologyProtector\getCurrentTime(), Wikimedia\Rdbms\ILoadBalancer\getReplicaResumePos(), Wikimedia\Rdbms\ILoadBalancer\getServerName(), Wikimedia\Rdbms\ILoadBalancer\getWriterIndex(), Wikimedia\Rdbms\ILoadBalancer\hasOrMadeRecentPrimaryChanges(), and Wikimedia\Rdbms\ILoadBalancer\hasStreamingReplicaServers().
Referenced by Wikimedia\Rdbms\LBFactory\shutdownChronologyProtector().
Wikimedia\Rdbms\ChronologyProtector::yieldSessionPrimaryPos | ( | ILoadBalancer | $lb | ) |
Yield client "session consistency" replication position for a new ILoadBalancer.
If the stash has a previous primary position recorded, this will try to make sure that the next query to a replica server of that primary will see changes up to that position by delaying execution. The delay may timeout and allow stale data if no non-lagged replica servers are available.
ILoadBalancer | $lb |
Definition at line 276 of file ChronologyProtector.php.
References Wikimedia\Rdbms\ILoadBalancer\getClusterName(), Wikimedia\Rdbms\ILoadBalancer\getServerName(), Wikimedia\Rdbms\ChronologyProtector\getStartupSessionPositions(), and Wikimedia\Rdbms\ILoadBalancer\getWriterIndex().
|
protected |
Hash of client parameters.
Definition at line 141 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\getClientId().
|
protected |
Map of client information fields for logging.
Definition at line 143 of file ChronologyProtector.php.
|
protected |
Whether reading/writing session consistency replication positions is enabled.
Definition at line 148 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\setEnabled(), and Wikimedia\Rdbms\ChronologyProtector\setWaitEnabled().
|
protected |
Storage key name.
Definition at line 139 of file ChronologyProtector.php.
|
protected |
Definition at line 136 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\setLogger().
|
protected |
Whether waiting on DB servers to reach replication positions is enabled.
Definition at line 150 of file ChronologyProtector.php.
|
protected |
Map of (primary server name => position)
Definition at line 157 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\persistSessionReplicationPositions().
|
protected |
Map of (DB cluster name => UNIX timestamp)
Definition at line 161 of file ChronologyProtector.php.
|
protected |
Map of (primary server name => position)
Definition at line 155 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\getStartupSessionPositions().
|
protected |
UNIX timestamp when the client data was loaded.
Definition at line 152 of file ChronologyProtector.php.
|
protected |
Map of (DB cluster name => UNIX timestamp)
Definition at line 159 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\getStartupSessionTimestamps().
|
protected |
Definition at line 134 of file ChronologyProtector.php.
Referenced by Wikimedia\Rdbms\ChronologyProtector\__construct().
|
protected |
Expected minimum index of the last write to the position store.
Definition at line 145 of file ChronologyProtector.php.
const Wikimedia\Rdbms\ChronologyProtector::POSITION_COOKIE_TTL = 10 |
Seconds to store position write index cookies (safely less than POSITION_STORE_TTL)
Definition at line 187 of file ChronologyProtector.php.