CirrusSearch
Elasticsearch-powered search for MediaWiki
Loading...
Searching...
No Matches
CirrusSearch\BuildDocument\BuildDocument Class Reference

Orchestrate the process of building an elasticsearch document out of a WikiPage. More...

Public Member Functions

 __construct (Connection $connection, IDatabase $db, ParserCache $parserCache, RevisionStore $revStore, CirrusSearchHookRunner $cirrusSearchHookRunner, BacklinkCacheFactory $backlinkCacheFactory)
 
 initialize (array $pages, int $flags)
 
 finalize (Document $doc)
 Finalize building a page document.
 

Public Attributes

const INDEX_EVERYTHING = 0
 
const INDEX_ON_SKIP = 1
 
const SKIP_PARSE = 2
 
const SKIP_LINKS = 4
 
const FORCE_PARSE = 8
 

Protected Member Functions

 createBuilders (int $flags)
 Construct PagePropertyBuilder instances suitable for provided flags.
 

Detailed Description

Orchestrate the process of building an elasticsearch document out of a WikiPage.

Document building is performed in two stages, and all properties are provided by PagePropertyBuilder instances chosen by a set of provided flags.

The first stage, called initialize, sets up the basic document properties. This stage is executed one time per update and the results are shared between all retry attempts and clusters to be written to. The results of the initialize stage may be written to the job queue, so we try to keep the size of these documents reasonable small. The initialize stage supports batching initialization by the PagePropertyBuilder instances.

The second stage of document building, finalize, is called on each attempt to send a document to an elasticsearch cluster. This stage loads the bulk content, potentially megabytes, from mediawiki ParserOutput into the documents.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. http://www.gnu.org/copyleft/gpl.html

Constructor & Destructor Documentation

◆ __construct()

CirrusSearch\BuildDocument\BuildDocument::__construct ( Connection $connection,
IDatabase $db,
ParserCache $parserCache,
RevisionStore $revStore,
CirrusSearchHookRunner $cirrusSearchHookRunner,
BacklinkCacheFactory $backlinkCacheFactory )
Parameters
Connection$connectionCirrus connection to read page properties from
IDatabase$dbWiki database connection to read page properties from
ParserCache$parserCacheCache to read parser output from
RevisionStore$revStoreStore for retrieving revisions by id
CirrusSearchHookRunner$cirrusSearchHookRunner
BacklinkCacheFactory$backlinkCacheFactory

Member Function Documentation

◆ createBuilders()

CirrusSearch\BuildDocument\BuildDocument::createBuilders ( int $flags)
protected

Construct PagePropertyBuilder instances suitable for provided flags.

Visible for testing. Should be private.

Parameters
int$flagsBitfield of class constants
Returns
PagePropertyBuilder[]

◆ finalize()

CirrusSearch\BuildDocument\BuildDocument::finalize ( Document $doc)

Finalize building a page document.

Called on every attempt to write the document to elasticsearch, meaning every cluster and every retry. Any bulk data that needs to be loaded should happen here.

Parameters
Document$doc
Returns
bool True when the document update can proceed
Exceptions
BuildDocumentException

◆ initialize()

CirrusSearch\BuildDocument\BuildDocument::initialize ( array $pages,
int $flags )
Parameters
\WikiPage[]$pagesList of pages to build documents for. These pages must represent concrete pages with content. It is expected that redirects and non-existent pages have been resolved.
int$flagsBitfield of class constants
Returns
\Elastica\Document[] List of created documents indexed by page id.

The documentation for this class was generated from the following file: