CirrusSearch
Elasticsearch-powered search for MediaWiki
Loading...
Searching...
No Matches
CirrusSearch\Updater Class Reference

Performs updates and deletes on the Elasticsearch index. More...

+ Inheritance diagram for CirrusSearch\Updater:
+ Collaboration diagram for CirrusSearch\Updater:

Public Member Functions

 __construct (Connection $readConnection, $writeToClusterName=null)
 
 updateFromTitle ( $title, ?string $updateKind, ?int $rootEventTime)
 Update a single page.
 
 traceRedirects ( $title)
 Trace redirects from the title to the destination.
 
 updatePages ( $pages, $flags, string $updateKind=null, int $rootEventTime=null)
 This updates pages in elasticsearch.
 
 updateWeightedTags (ProperPageIdentity $page, string $tagField, string $tagPrefix, $tagNames=null, $tagWeights=null)
 
 resetWeightedTags (ProperPageIdentity $page, string $tagField, string $tagPrefix)
 
 deletePages ( $titles, $docIds, $indexSuffix=null, array $writeJobParams=[])
 Delete pages from the elasticsearch index.
 
 archivePages ( $archived)
 Add documents to archive index.
 
 updateLinkedArticles ( $titles)
 Update the search index for newly linked or unlinked articles.
 
- Public Member Functions inherited from CirrusSearch\ElasticsearchIntermediary
 start (RequestLog $log)
 Mark the start of a request to Elasticsearch.
 
 success ( $result=null, Connection $connection=null)
 Log a successful request and return the provided result in a good Status.
 
 successViaCache (RequestLog $log)
 Log a successful request when the response comes from a cache outside elasticsearch.
 
 failure (ExceptionInterface $exception=null, Connection $connection=null)
 Log a failure and return an appropriate status.
 
 getSearchMetrics ()
 Get the search metrics we have.
 

Static Public Member Functions

static build (SearchConfig $config, $cluster)
 
- Static Public Member Functions inherited from CirrusSearch\ElasticsearchIntermediary
static setResultPages (array $matches)
 This is set externally because we don't have complete control, from the SearchEngine interface, of what is actually sent to the user.
 
static getQueryTypesUsed ()
 Report the types of queries that were issued within the current request.
 
static hasQueryLogs ()
 
static appendLastLogPayload ( $key, $value)
 
static isMSearchResultSetOK (MultiResultSet $multiResultSet)
 check validity of the multisearch response
 

Protected Member Functions

 pushElasticaWriteJobs (string $updateGroup, array $items, $factory, int $batchSize=10)
 
 newLog ( $description, $queryType, array $extra=[])
 
- Protected Member Functions inherited from CirrusSearch\ElasticsearchIntermediary
 __construct (Connection $connection, UserIdentity $user=null, $slowSeconds=null, $extraBackendLatency=0)
 
 startNewLog ( $description, $queryType, array $extra=[])
 
 getTimeout ( $searchType='default')
 
 getClientTimeout ( $searchType='default')
 
 appendMetrics (SearchMetricsProvider $provider)
 
 runMSearch (Search $search, RequestLog $log, Connection $connection=null, callable $resultsTransformer=null)
 

Protected Attributes

 $writeToClusterName
 
- Protected Attributes inherited from CirrusSearch\ElasticsearchIntermediary
 $connection
 
 $user
 
 $currentRequestLog = null
 

Additional Inherited Members

- Static Protected Attributes inherited from CirrusSearch\ElasticsearchIntermediary
static $requestLogger
 

Detailed Description

Performs updates and deletes on the Elasticsearch index.

Called by CirrusSearch.php (our SearchEngine implementation), forceSearchIndex (for bulk updates), and CirrusSearch's jobs.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. http://www.gnu.org/copyleft/gpl.html

Constructor & Destructor Documentation

◆ __construct()

CirrusSearch\Updater::__construct ( Connection $readConnection,
$writeToClusterName = null )
Parameters
Connection$readConnectionconnection used to pull data out of elasticsearch
string | null$writeToClusterName

Member Function Documentation

◆ archivePages()

CirrusSearch\Updater::archivePages ( $archived)

Add documents to archive index.

Parameters
array$archived
Returns
bool

◆ build()

static CirrusSearch\Updater::build ( SearchConfig $config,
$cluster )
static
Parameters
SearchConfig$config
string | null$clustercluster to read from and write to, null to read from the default cluster and write to all
Returns
Updater

◆ deletePages()

CirrusSearch\Updater::deletePages ( $titles,
$docIds,
$indexSuffix = null,
array $writeJobParams = [] )

Delete pages from the elasticsearch index.

$titles and $docIds must point to the same pages and should point to them in the same order.

Parameters
Title[]$titlesList of titles to delete. If empty then skipped other index maintenance is skipped.
int[] | string[]$docIdsList of elasticsearch document ids to delete
string | null$indexSuffixindex from which to delete. null means all.
array$writeJobParamsParameters passed on to ElasticaWriteJob

◆ newLog()

CirrusSearch\Updater::newLog ( $description,
$queryType,
array $extra = [] )
protected
Parameters
string$description
string$queryType
string[]$extra
Returns
SearchRequestLog

Reimplemented from CirrusSearch\ElasticsearchIntermediary.

◆ pushElasticaWriteJobs()

CirrusSearch\Updater::pushElasticaWriteJobs ( string $updateGroup,
array $items,
$factory,
int $batchSize = 10 )
protected
Parameters
string$updateGroupUpdateGroup::* constant
mixed[]$items
callable$factory
int$batchSize

◆ resetWeightedTags()

CirrusSearch\Updater::resetWeightedTags ( ProperPageIdentity $page,
string $tagField,
string $tagPrefix )
Parameters
ProperPageIdentity$page
string$tagField
string$tagPrefix

◆ traceRedirects()

CirrusSearch\Updater::traceRedirects ( $title)

Trace redirects from the title to the destination.

Also registers the title in the memory of titles updated and detects special pages.

Parameters
Title$titletitle to trace
Returns
array with keys: target, redirects
  • target is WikiPage|null wikipage if the $title either isn't a redirect or resolves to an updatable page that hasn't been updated yet. Null if the page has been updated, is a special page, or the redirects enter a loop.
  • redirects is an array of WikiPages, one per redirect in the chain. If title isn't a redirect then this will be an empty array

◆ updateFromTitle()

CirrusSearch\Updater::updateFromTitle ( $title,
?string $updateKind,
?int $rootEventTime )

Update a single page.

Parameters
Title$title
string | null$updateKindkind of update to perform (used for monitoring)
int | null$rootEventTimethe time of MW event that caused this update (used for monitoring)

◆ updateLinkedArticles()

CirrusSearch\Updater::updateLinkedArticles ( $titles)

Update the search index for newly linked or unlinked articles.

Parameters
Title[]$titlestitles to update

◆ updatePages()

CirrusSearch\Updater::updatePages ( $pages,
$flags,
string $updateKind = null,
int $rootEventTime = null )

This updates pages in elasticsearch.

$flags includes: INDEX_EVERYTHING Cirrus will parse the page and count the links and send the document to Elasticsearch as an index so if it doesn't exist it'll be created. SKIP_PARSE Cirrus will skip parsing the page when building the document. It makes sense to do this when you know the page hasn't changed like when it is newly linked from another page. SKIP_LINKS Cirrus will skip collecting links information. It makes sense to do this when you know the link counts aren't yet available like during the first phase of the two phase index build. INDEX_ON_SKIP Cirrus will send an update if SKIP_PARSE or SKIP_LINKS rather than an index. Indexing with any portion of the document skipped is dangerous because it can put half created pages in the index. This is only a good idea during the first half of the two phase index build.

Parameters
WikiPage[]$pagespages to update
int$flagsBit field containing instructions about how the document should be built and sent to Elasticsearch.
string | null$updateKindkind of update to perform (used for monitoring)
int | null$rootEventTimethe time of MW event that caused this update (used for monitoring)
Returns
int Number of documents updated

◆ updateWeightedTags()

CirrusSearch\Updater::updateWeightedTags ( ProperPageIdentity $page,
string $tagField,
string $tagPrefix,
$tagNames = null,
$tagWeights = null )
Parameters
ProperPageIdentity$page
string$tagField
string$tagPrefix
string | string[] | null$tagNames
int | int[] | null$tagWeights

The documentation for this class was generated from the following file: