MediaWiki
1.29.2
|
An implementation of the tree building portion of the HTML5 parsing spec. More...
Public Member Functions | |
__construct (array $config=[]) | |
Create a new Balancer. More... | |
balance ( $text, $processingCallback=null, $processingArgs=[]) | |
Return a balanced HTML string for the HTML fragment given by $text, subject to the caveats listed in the class description. More... | |
Public Attributes | |
const | VALID_COMMENT_REGEX |
Valid HTML5 comments. More... | |
Private Member Functions | |
advance () | |
Grab the next "token" from $bitsIterator. More... | |
endCaption () | |
endCell () | |
endRow () | |
endSection () | |
inBodyMode ( $token, $value, $attribs=null, $selfClose=false) | |
inCaptionMode ( $token, $value, $attribs=null, $selfClose=false) | |
inCellMode ( $token, $value, $attribs=null, $selfClose=false) | |
inColumnGroupMode ( $token, $value, $attribs=null, $selfClose=false) | |
inHeadMode ( $token, $value, $attribs=null, $selfClose=false) | |
inRowMode ( $token, $value, $attribs=null, $selfClose=false) | |
inSelectInTableMode ( $token, $value, $attribs=null, $selfClose=false) | |
inSelectMode ( $token, $value, $attribs=null, $selfClose=false) | |
insertForeignToken ( $token, $value, $attribs=null, $selfClose=false) | |
insertToken ( $token, $value, $attribs=null, $selfClose=false) | |
Pass a token to the tree builder. More... | |
inTableBodyMode ( $token, $value, $attribs=null, $selfClose=false) | |
inTableMode ( $token, $value, $attribs=null, $selfClose=false) | |
inTableTextMode ( $token, $value, $attribs=null, $selfClose=false) | |
inTemplateMode ( $token, $value, $attribs=null, $selfClose=false) | |
inTextMode ( $token, $value, $attribs=null, $selfClose=false) | |
parseRawText ( $value, $attribs=null) | |
resetInsertionMode () | |
stopParsing () | |
switchMode ( $mode) | |
switchModeAndReprocess ( $mode, $token, $value, $attribs, $selfClose) | |
An implementation of the tree building portion of the HTML5 parsing spec.
This is used to balance and tidy output so that the result can always be cleanly serialized/deserialized by an HTML5 parser. It does not guarantee "conforming" output – the HTML5 spec contains a number of constraints which are not enforced by the HTML5 parsing process. But the result will be free of gross errors: misnested or unclosed tags, for example, and will be unchanged by spec-complient parsing followed by serialization.
The tree building stage is structured as a state machine. When comparing the implementation to https://www.w3.org/TR/html5/syntax.html#tree-construction note that each state is implemented as a function with a name ending in Mode
(because the HTML spec refers to them as insertion modes). The current insertion mode is held by the $parseMode property.
The following simplifications have been made:
in body
.)The following elements are disallowed: <html>, <head>, <body>, <frameset>, <frame>, <plaintext>, <xmp>, <iframe>, <noembed>, <noscript>, <script>, <title>. As a result, further simplifications can be made:
frameset-ok
is not tracked.head element pointer
is not tracked (but presumed non-null)We generally mark places where we omit cases from the spec due to disallowed elements with a comment: // OMITTED: <element-name>
.
The HTML spec keeps a flag during the parsing process to track whether or not a "parse error" has been encountered. We don't bother to track that flag, we just implement the error-handling process as specified.
Definition at line 1795 of file Balancer.php.
MediaWiki\Tidy\Balancer::__construct | ( | array | $config = [] | ) |
Create a new Balancer.
array | $config | Balancer configuration. Includes: 'strict' : boolean, defaults to false. When true, enforces syntactic constraints on input: all non-tag '<' must be escaped, all attributes must be separated by a single space and double-quoted. This is consistent with the output of the Sanitizer. 'allowedHtmlElements' : array, defaults to null. When present, the keys of this associative array give the acceptable HTML tag names. When not present, no tag sanitization is done. 'tidyCompat' : boolean, defaults to false. When true, the serialization algorithm is tweaked to provide historical compatibility with the old "tidy" program: |
-wrapping is done to the children of <body> and
elements, and empty elements are removed. The
/<listing>/<textarea> serialization is also tweaked to allow lossless round trips. (See: https://github.com/whatwg/html/issues/944) 'allowComments': boolean, defaults to true. When true, allows HTML comments in the input. The Sanitizer generally strips all comments, so if you are running on sanitized output you can set this to false to get a bit more performance.
Definition at line 1879 of file Balancer.php.
References MediaWiki\Tidy\Balancer\$config, MediaWiki\Tidy\BalanceSets\$unsupportedSet, captcha-old\count, and MediaWiki\Tidy\BalanceSets\HTML_NAMESPACE.
|
private |
Grab the next "token" from $bitsIterator.
This is either a open/close tag or text or a comment, depending on whether the Sanitizer approves.
Definition at line 2150 of file Balancer.php.
References $attribs, $t, MediaWiki\Tidy\Balancer\insertToken(), list, and MediaWiki\Tidy\Balancer\VALID_COMMENT_REGEX.
Referenced by MediaWiki\Tidy\Balancer\balance().
MediaWiki\Tidy\Balancer::balance | ( | $text, | |
$processingCallback = null , |
|||
$processingArgs = [] |
|||
) |
Return a balanced HTML string for the HTML fragment given by $text, subject to the caveats listed in the class description.
The result will typically be idempotent – that is, rebalancing the output would result in no change.
string | $text | The markup to be balanced |
callable | $processingCallback | Callback to do any variable or parameter replacements in HTML attributes values |
array | bool | $processingArgs | Arguments for the processing callback |
Definition at line 1923 of file Balancer.php.
References $e, MediaWiki\Tidy\Balancer\$processingArgs, MediaWiki\Tidy\Balancer\$processingCallback, MediaWiki\Tidy\Balancer\advance(), MediaWiki\Tidy\BalanceSets\HTML_NAMESPACE, MediaWiki\Tidy\Balancer\insertToken(), and MediaWiki\Tidy\Balancer\resetInsertionMode().
|
private |
Definition at line 3100 of file Balancer.php.
References MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\inCaptionMode().
|
private |
Definition at line 3343 of file Balancer.php.
References MediaWiki\Tidy\Balancer\inCellMode().
Referenced by MediaWiki\Tidy\Balancer\inCellMode().
|
private |
Definition at line 3277 of file Balancer.php.
References MediaWiki\Tidy\BalanceSets\$tableRowContextSet, and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\inRowMode().
|
private |
Definition at line 3210 of file Balancer.php.
References MediaWiki\Tidy\BalanceSets\$tableBodyContextSet, and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\inTableBodyMode().
|
private |
Definition at line 2434 of file Balancer.php.
References MediaWiki\Tidy\BalanceSets\$addressDivPSet, $attribs, MediaWiki\Tidy\Balancer\$formElementPointer, MediaWiki\Tidy\BalanceSets\$headingSet, MediaWiki\Tidy\BalanceSets\$specialSet, $value, as, MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\insertToken(), MediaWiki\Tidy\Balancer\inTemplateMode(), MediaWiki\Tidy\BalanceSets\MATHML_NAMESPACE, MediaWiki\Tidy\Balancer\stopParsing(), MediaWiki\Tidy\BalanceSets\SVG_NAMESPACE, and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\inCaptionMode(), MediaWiki\Tidy\Balancer\inCellMode(), MediaWiki\Tidy\Balancer\inColumnGroupMode(), MediaWiki\Tidy\Balancer\inSelectMode(), MediaWiki\Tidy\Balancer\inTableMode(), MediaWiki\Tidy\Balancer\inTableTextMode(), and MediaWiki\Tidy\Balancer\inTemplateMode().
|
private |
Definition at line 3111 of file Balancer.php.
References $attribs, $value, MediaWiki\Tidy\Balancer\endCaption(), MediaWiki\Tidy\Balancer\inBodyMode(), and MediaWiki\Tidy\Balancer\insertToken().
|
private |
Definition at line 3354 of file Balancer.php.
References $attribs, MediaWiki\Tidy\BalanceSets\$tableCellSet, $value, MediaWiki\Tidy\Balancer\endCell(), MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\insertToken(), and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\endCell().
|
private |
Definition at line 3158 of file Balancer.php.
References $attribs, $matches, $value, MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\insertToken(), and MediaWiki\Tidy\Balancer\switchMode().
|
private |
Definition at line 2364 of file Balancer.php.
References $attribs, $matches, MediaWiki\Tidy\Balancer\$parseMode, $value, MediaWiki\Tidy\Balancer\insertToken(), MediaWiki\Tidy\Balancer\parseRawText(), MediaWiki\Tidy\Balancer\resetInsertionMode(), and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inColumnGroupMode(), MediaWiki\Tidy\Balancer\inSelectMode(), MediaWiki\Tidy\Balancer\inTableMode(), and MediaWiki\Tidy\Balancer\inTemplateMode().
|
private |
Definition at line 3286 of file Balancer.php.
References $attribs, MediaWiki\Tidy\BalanceSets\$tableRowContextSet, $value, MediaWiki\Tidy\Balancer\endRow(), MediaWiki\Tidy\Balancer\insertToken(), MediaWiki\Tidy\Balancer\inTableMode(), and MediaWiki\Tidy\Balancer\switchMode().
|
private |
Definition at line 3484 of file Balancer.php.
References $attribs, $value, MediaWiki\Tidy\Balancer\inSelectMode(), and MediaWiki\Tidy\Balancer\insertToken().
|
private |
Definition at line 3408 of file Balancer.php.
References $attribs, $value, MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\insertToken(), and MediaWiki\Tidy\Balancer\resetInsertionMode().
Referenced by MediaWiki\Tidy\Balancer\inSelectInTableMode().
|
private |
Definition at line 2041 of file Balancer.php.
References $attribs, MediaWiki\Tidy\Balancer\$parseMode, $value, as, and MediaWiki\Tidy\Balancer\insertToken().
Referenced by MediaWiki\Tidy\Balancer\insertToken().
|
private |
Pass a token to the tree builder.
The $token will be one of the strings "tag", "endtag", or "text".
Definition at line 1972 of file Balancer.php.
References $attribs, MediaWiki\Tidy\Balancer\$parseMode, MediaWiki\Tidy\BalanceSets\$unsupportedSet, $value, MediaWiki\Tidy\BalanceSets\HTML_NAMESPACE, MediaWiki\Tidy\Balancer\insertForeignToken(), and MediaWiki\Tidy\BalanceSets\MATHML_NAMESPACE.
Referenced by MediaWiki\Tidy\Balancer\advance(), MediaWiki\Tidy\Balancer\balance(), MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inCaptionMode(), MediaWiki\Tidy\Balancer\inCellMode(), MediaWiki\Tidy\Balancer\inColumnGroupMode(), MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\inRowMode(), MediaWiki\Tidy\Balancer\inSelectInTableMode(), MediaWiki\Tidy\Balancer\inSelectMode(), MediaWiki\Tidy\Balancer\insertForeignToken(), MediaWiki\Tidy\Balancer\inTableBodyMode(), MediaWiki\Tidy\Balancer\inTableMode(), MediaWiki\Tidy\Balancer\inTemplateMode(), and MediaWiki\Tidy\Balancer\switchModeAndReprocess().
|
private |
Definition at line 3223 of file Balancer.php.
References $attribs, MediaWiki\Tidy\BalanceSets\$tableBodyContextSet, $value, MediaWiki\Tidy\Balancer\endSection(), MediaWiki\Tidy\Balancer\insertToken(), MediaWiki\Tidy\Balancer\inTableMode(), and MediaWiki\Tidy\Balancer\switchMode().
|
private |
Definition at line 2967 of file Balancer.php.
References $attribs, MediaWiki\Tidy\Balancer\$parseMode, MediaWiki\Tidy\BalanceSets\$tableContextSet, MediaWiki\Tidy\BalanceSets\$tableSectionRowSet, $value, MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\insertToken(), MediaWiki\Tidy\Balancer\resetInsertionMode(), MediaWiki\Tidy\Balancer\stopParsing(), MediaWiki\Tidy\Balancer\switchMode(), and MediaWiki\Tidy\Balancer\switchModeAndReprocess().
Referenced by MediaWiki\Tidy\Balancer\inRowMode(), and MediaWiki\Tidy\Balancer\inTableBodyMode().
|
private |
Definition at line 3077 of file Balancer.php.
References $attribs, MediaWiki\Tidy\Balancer\$pendingTableText, $value, MediaWiki\Tidy\Balancer\inBodyMode(), and MediaWiki\Tidy\Balancer\switchModeAndReprocess().
|
private |
Definition at line 3509 of file Balancer.php.
References $attribs, $value, MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\insertToken(), MediaWiki\Tidy\Balancer\resetInsertionMode(), MediaWiki\Tidy\Balancer\stopParsing(), and MediaWiki\Tidy\Balancer\switchModeAndReprocess().
Referenced by MediaWiki\Tidy\Balancer\inBodyMode().
|
private |
Definition at line 2347 of file Balancer.php.
References $attribs, $value, MediaWiki\Tidy\Balancer\switchMode(), and MediaWiki\Tidy\Balancer\switchModeAndReprocess().
|
private |
Definition at line 2340 of file Balancer.php.
References $attribs, $value, and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\inHeadMode().
|
private |
Definition at line 2257 of file Balancer.php.
References MediaWiki\Tidy\Balancer\$fragmentContext, $last, MediaWiki\Tidy\BalanceSets\$tableCellSet, as, and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\balance(), MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\inSelectMode(), MediaWiki\Tidy\Balancer\inTableMode(), and MediaWiki\Tidy\Balancer\inTemplateMode().
|
private |
Definition at line 2327 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inTableMode(), and MediaWiki\Tidy\Balancer\inTemplateMode().
|
private |
Definition at line 2243 of file Balancer.php.
References MediaWiki\Tidy\Balancer\$parseMode.
Referenced by MediaWiki\Tidy\Balancer\endCaption(), MediaWiki\Tidy\Balancer\endRow(), MediaWiki\Tidy\Balancer\endSection(), MediaWiki\Tidy\Balancer\inBodyMode(), MediaWiki\Tidy\Balancer\inCellMode(), MediaWiki\Tidy\Balancer\inColumnGroupMode(), MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\inRowMode(), MediaWiki\Tidy\Balancer\inTableBodyMode(), MediaWiki\Tidy\Balancer\inTableMode(), MediaWiki\Tidy\Balancer\inTextMode(), MediaWiki\Tidy\Balancer\parseRawText(), MediaWiki\Tidy\Balancer\resetInsertionMode(), and MediaWiki\Tidy\Balancer\switchModeAndReprocess().
|
private |
Definition at line 2252 of file Balancer.php.
References $attribs, $value, MediaWiki\Tidy\Balancer\insertToken(), and MediaWiki\Tidy\Balancer\switchMode().
Referenced by MediaWiki\Tidy\Balancer\inTableMode(), MediaWiki\Tidy\Balancer\inTableTextMode(), MediaWiki\Tidy\Balancer\inTemplateMode(), and MediaWiki\Tidy\Balancer\inTextMode().
|
private |
Definition at line 1801 of file Balancer.php.
|
private |
Definition at line 1805 of file Balancer.php.
|
private |
Definition at line 1799 of file Balancer.php.
|
private |
Definition at line 1798 of file Balancer.php.
|
private |
Definition at line 1806 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\__construct().
|
private |
Definition at line 1812 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\inBodyMode().
|
private |
Definition at line 1811 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\resetInsertionMode().
|
private |
Definition at line 1813 of file Balancer.php.
|
private |
Definition at line 1815 of file Balancer.php.
|
private |
Definition at line 1814 of file Balancer.php.
|
private |
Definition at line 1810 of file Balancer.php.
|
private |
Definition at line 1796 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\inHeadMode(), MediaWiki\Tidy\Balancer\insertForeignToken(), MediaWiki\Tidy\Balancer\insertToken(), MediaWiki\Tidy\Balancer\inTableMode(), and MediaWiki\Tidy\Balancer\switchMode().
|
private |
Definition at line 1809 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\inTableTextMode().
|
private |
Definition at line 1820 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\balance().
|
private |
Definition at line 1818 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\balance().
|
private |
Definition at line 1803 of file Balancer.php.
|
private |
Definition at line 1804 of file Balancer.php.
|
private |
Definition at line 1808 of file Balancer.php.
const MediaWiki\Tidy\Balancer::VALID_COMMENT_REGEX |
Valid HTML5 comments.
Regex borrowed from Tim Starling's "remex-html" project.
Definition at line 1826 of file Balancer.php.
Referenced by MediaWiki\Tidy\Balancer\advance().