MediaWiki  REL1_31
WikiTextStructure Class Reference

Class allowing to explore structure of parsed wikitext. More...

Collaboration diagram for WikiTextStructure:

Public Member Functions

 __construct (ParserOutput $parserOutput)
 
 getAuxiliaryText ()
 Get auxiliary text. More...
 
 getDefaultSort ()
 Get the defaultsort property. More...
 
 getMainText ()
 Get main text. More...
 
 getOpeningText ()
 Get opening text. More...
 
 headings ()
 Get headings on the page. More...
 

Static Public Member Functions

static parseSettingsInMessage ( $message)
 Parse a message content into an array. More...
 

Private Member Functions

 extractHeadingBeforeFirstHeading ( $text)
 Get text before first heading. More...
 
 extractWikitextParts ()
 Extract parts of the text - opening, main and auxiliary. More...
 
 getIgnoredHeadings ()
 Get list of heading to ignore. More...
 

Private Attributes

string $allText
 
string[] $auxiliaryElementSelectors
 selectors to elements that are considered auxiliary to article text for search More...
 
string[] $auxText = []
 
string[] $excludedElementSelectors
 selectors to elements that are excluded entirely from search More...
 
string $openingText
 
ParserOutput $parserOutput
 

Detailed Description

Class allowing to explore structure of parsed wikitext.

Definition at line 8 of file WikiTextStructure.php.

Constructor & Destructor Documentation

◆ __construct()

WikiTextStructure::__construct ( ParserOutput  $parserOutput)
Parameters
ParserOutput$parserOutput

Definition at line 66 of file WikiTextStructure.php.

References $parserOutput.

Member Function Documentation

◆ extractHeadingBeforeFirstHeading()

WikiTextStructure::extractHeadingBeforeFirstHeading (   $text)
private

Get text before first heading.

Parameters
string$text
Returns
string|null

Definition at line 191 of file WikiTextStructure.php.

References $matches, and Sanitizer\stripAllTags().

Referenced by extractWikitextParts().

◆ extractWikitextParts()

WikiTextStructure::extractWikitextParts ( )
private

Extract parts of the text - opening, main and auxiliary.

Definition at line 147 of file WikiTextStructure.php.

References as, extractHeadingBeforeFirstHeading(), and Sanitizer\stripAllTags().

Referenced by getAuxiliaryText(), getMainText(), and getOpeningText().

◆ getAuxiliaryText()

WikiTextStructure::getAuxiliaryText ( )

Get auxiliary text.

Returns
string[]

Definition at line 242 of file WikiTextStructure.php.

References $auxText, and extractWikitextParts().

◆ getDefaultSort()

WikiTextStructure::getDefaultSort ( )

Get the defaultsort property.

Returns
string|null

Definition at line 251 of file WikiTextStructure.php.

◆ getIgnoredHeadings()

WikiTextStructure::getIgnoredHeadings ( )
private

Get list of heading to ignore.

Returns
string[]

Definition at line 127 of file WikiTextStructure.php.

References $lines, $source, parseSettingsInMessage(), and wfMessage().

Referenced by headings().

◆ getMainText()

WikiTextStructure::getMainText ( )

Get main text.

Returns
string

Definition at line 233 of file WikiTextStructure.php.

References $allText, and extractWikitextParts().

◆ getOpeningText()

WikiTextStructure::getOpeningText ( )

Get opening text.

Returns
string

Definition at line 224 of file WikiTextStructure.php.

References $openingText, and extractWikitextParts().

◆ headings()

WikiTextStructure::headings ( )

Get headings on the page.

Returns
string[] First strip out things that look like references. We can't use HTML filtering because the references come back as tags without a class. To keep from breaking stuff like ==Applicability of the strict mass–energy equivalence formula, ''E'' = ''mc''2== we don't remove the whole tag. We also don't want to strip the tag and remove everything that looks like [2] because, I dunno, maybe there is a band named Word [2] Foo or something. Whatever. So we only strip things that look like tags wrapping a reference. And since the data looks like: Reference in heading [1][2] we can not really use HtmlFormatter as we have no suitable selector.

Definition at line 83 of file WikiTextStructure.php.

References as, getIgnoredHeadings(), and Sanitizer\stripAllTags().

◆ parseSettingsInMessage()

static WikiTextStructure::parseSettingsInMessage (   $message)
static

Parse a message content into an array.

This function is generally used to parse settings stored as i18n messages (see search-ignored-headings).

Parameters
string$message
Returns
string[]

Definition at line 115 of file WikiTextStructure.php.

References $lines.

Referenced by getIgnoredHeadings().

Member Data Documentation

◆ $allText

string WikiTextStructure::$allText
private

Definition at line 16 of file WikiTextStructure.php.

Referenced by getMainText().

◆ $auxiliaryElementSelectors

string [] WikiTextStructure::$auxiliaryElementSelectors
private
Initial value:
= [
'.thumbcaption',
'table',
'.rellink',
'.dablink',
'.searchaux',
]

selectors to elements that are considered auxiliary to article text for search

Definition at line 50 of file WikiTextStructure.php.

◆ $auxText

string [] WikiTextStructure::$auxText = []
private

Definition at line 20 of file WikiTextStructure.php.

Referenced by getAuxiliaryText().

◆ $excludedElementSelectors

string [] WikiTextStructure::$excludedElementSelectors
private
Initial value:
= [
'audio', 'video',
'style',
'sup.reference',
'.mw-cite-backlink',
'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
'.autocollapse',
'.navigation-not-searchable'
]

selectors to elements that are excluded entirely from search

Definition at line 29 of file WikiTextStructure.php.

◆ $openingText

string WikiTextStructure::$openingText
private

Definition at line 12 of file WikiTextStructure.php.

Referenced by getOpeningText().

◆ $parserOutput

ParserOutput WikiTextStructure::$parserOutput
private

Definition at line 24 of file WikiTextStructure.php.

Referenced by __construct().


The documentation for this class was generated from the following file: