CirrusSearch
Elasticsearch-powered search for MediaWiki
|
Orchestrate the process of building an elasticsearch document out of a WikiPage. More...
Public Member Functions | |
__construct (Connection $connection, IReadableDatabase $db, RevisionStore $revStore, BacklinkCacheFactory $backlinkCacheFactory, DocumentSizeLimiter $docSizeLimiter, TitleFormatter $titleFormatter, WikiPageFactory $wikiPageFactory) | |
initialize (array $pagesOrRevs, int $flags) | |
finalize (Document $doc, bool $enforceLatest=true, ?RevisionRecord $revision=null) | |
Finalize building a page document. | |
Public Attributes | |
const | INDEX_EVERYTHING = 0 |
const | INDEX_ON_SKIP = 1 |
const | SKIP_PARSE = 2 |
const | SKIP_LINKS = 4 |
Protected Member Functions | |
createBuilders (int $flags) | |
Construct PagePropertyBuilder instances suitable for provided flags. | |
Orchestrate the process of building an elasticsearch document out of a WikiPage.
Document building is performed in two stages, and all properties are provided by PagePropertyBuilder instances chosen by a set of provided flags.
The first stage, called initialize, sets up the basic document properties. This stage is executed one time per update and the results are shared between all retry attempts and clusters to be written to. The results of the initialize stage may be written to the job queue, so we try to keep the size of these documents reasonable small. The initialize stage supports batching initialization by the PagePropertyBuilder instances.
The second stage of document building, finalize, is called on each attempt to send a document to an elasticsearch cluster. This stage loads the bulk content, potentially megabytes, from mediawiki ParserOutput into the documents.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. http://www.gnu.org/copyleft/gpl.html
CirrusSearch\BuildDocument\BuildDocument::__construct | ( | Connection | $connection, |
IReadableDatabase | $db, | ||
RevisionStore | $revStore, | ||
BacklinkCacheFactory | $backlinkCacheFactory, | ||
DocumentSizeLimiter | $docSizeLimiter, | ||
TitleFormatter | $titleFormatter, | ||
WikiPageFactory | $wikiPageFactory ) |
Connection | $connection | Cirrus connection to read page properties from |
IReadableDatabase | $db | Wiki database connection to read page properties from |
RevisionStore | $revStore | Store for retrieving revisions by id |
BacklinkCacheFactory | $backlinkCacheFactory | |
DocumentSizeLimiter | $docSizeLimiter | |
TitleFormatter | $titleFormatter | |
WikiPageFactory | $wikiPageFactory |
|
protected |
Construct PagePropertyBuilder instances suitable for provided flags.
Visible for testing. Should be private.
int | $flags | Bitfield of class constants |
CirrusSearch\BuildDocument\BuildDocument::finalize | ( | Document | $doc, |
bool | $enforceLatest = true, | ||
?RevisionRecord | $revision = null ) |
Finalize building a page document.
Called on every attempt to write the document to elasticsearch, meaning every cluster and every retry. Any bulk data that needs to be loaded should happen here.
Document | $doc | |
bool | $enforceLatest | |
RevisionRecord | null | $revision |
BuildDocumentException |
CirrusSearch\BuildDocument\BuildDocument::initialize | ( | array | $pagesOrRevs, |
int | $flags ) |
\WikiPage[] | RevisionRecord[] | $pagesOrRevs | List of pages to build documents for. These pages must represent concrete pages with content. It is expected that redirects and non-existent pages have been resolved. |
int | $flags | Bitfield of class constants |