CirrusSearch
Elasticsearch-powered search for MediaWiki
Loading...
Searching...
No Matches
CirrusSearch\BuildDocument\ParserOutputPageProperties Class Reference

Extract searchable properties from the MediaWiki ParserOutput. More...

+ Inheritance diagram for CirrusSearch\BuildDocument\ParserOutputPageProperties:
+ Collaboration diagram for CirrusSearch\BuildDocument\ParserOutputPageProperties:

Public Member Functions

 __construct (SearchConfig $config)
 
 initialize (Document $doc, WikiPage $page, RevisionRecord $revision)
 Perform initial building of a page document.Called once per page when starting an update and is shared between all clusters written to. This doc may be written to the jobqueue multiple times and should not contain any large (in number of bytes) values.
Parameters
Document$docThe document to be populated
WikiPage$pageThe page to scope operation to
RevisionRecord$revisionThe page revision to use

 
 finishInitializeBatch ()
 Called after a batch of pages have been passed to self::initialize.Allows implementations to batch calls to external services necessary for collecting page properties. Implementations must update the Document instances previously provided.The builder will be disposed of after finishing a batch.
 
 finalize (Document $doc, Title $title, RevisionRecord $revision)
 Finalize document building before sending to cluster.Called on every write attempt for every cluster to perform any final document building. Intended for bulk loading of content from wiki databases that would only serve to bloat the job queue.
Parameters
Document$doc
Title$title
RevisionRecord$revision
Exceptions
BuildDocumentException

 
 finalizeReal (Document $doc, WikiPage $page, CirrusSearch $engine, RevisionRecord $revision)
 Visible for testing.
 

Static Public Member Functions

static fixAndFlagInvalidUTF8InSource (array $fieldDefinitions, int $pageId)
 Find invalid UTF-8 sequence in the source text.
 
static truncateFileTextContent (int $maxLen, array $fieldContent)
 Visible for testing only.
 

Detailed Description

Extract searchable properties from the MediaWiki ParserOutput.

Constructor & Destructor Documentation

◆ __construct()

CirrusSearch\BuildDocument\ParserOutputPageProperties::__construct ( SearchConfig $config)
Parameters
SearchConfig$config

Member Function Documentation

◆ finalize()

CirrusSearch\BuildDocument\ParserOutputPageProperties::finalize ( Document $doc,
Title $title,
RevisionRecord $revision )

Finalize document building before sending to cluster.Called on every write attempt for every cluster to perform any final document building. Intended for bulk loading of content from wiki databases that would only serve to bloat the job queue.

Parameters
Document$doc
Title$title
RevisionRecord$revision
Exceptions
BuildDocumentException

Implements CirrusSearch\BuildDocument\PagePropertyBuilder.

◆ finalizeReal()

CirrusSearch\BuildDocument\ParserOutputPageProperties::finalizeReal ( Document $doc,
WikiPage $page,
CirrusSearch $engine,
RevisionRecord $revision )

Visible for testing.

Much simpler to test with all objects resolved.

Parameters
Document$docDocument to finalize
WikiPage$pageWikiPage to scope operation to
CirrusSearch$engineSearchEngine implementation
RevisionRecord$revisionThe page revision to use
Exceptions
BuildDocumentException

◆ finishInitializeBatch()

CirrusSearch\BuildDocument\ParserOutputPageProperties::finishInitializeBatch ( )

Called after a batch of pages have been passed to self::initialize.Allows implementations to batch calls to external services necessary for collecting page properties. Implementations must update the Document instances previously provided.The builder will be disposed of after finishing a batch.

Implements CirrusSearch\BuildDocument\PagePropertyBuilder.

◆ fixAndFlagInvalidUTF8InSource()

static CirrusSearch\BuildDocument\ParserOutputPageProperties::fixAndFlagInvalidUTF8InSource ( array $fieldDefinitions,
int $pageId )
static

Find invalid UTF-8 sequence in the source text.

Fix them and flag the doc with the CirrusSearchInvalidUTF8 template.

Temporary solution to help investigate/fix T225200

Visible for testing only

Parameters
array$fieldDefinitions
int$pageId
Returns
array

◆ initialize()

CirrusSearch\BuildDocument\ParserOutputPageProperties::initialize ( Document $doc,
WikiPage $page,
RevisionRecord $revision )

Perform initial building of a page document.Called once per page when starting an update and is shared between all clusters written to. This doc may be written to the jobqueue multiple times and should not contain any large (in number of bytes) values.

Parameters
Document$docThe document to be populated
WikiPage$pageThe page to scope operation to
RevisionRecord$revisionThe page revision to use

Implements CirrusSearch\BuildDocument\PagePropertyBuilder.

◆ truncateFileTextContent()

static CirrusSearch\BuildDocument\ParserOutputPageProperties::truncateFileTextContent ( int $maxLen,
array $fieldContent )
static

Visible for testing only.

Parameters
int$maxLen
array$fieldContent
Returns
array

The documentation for this class was generated from the following file: