Parsoid
A bidirectional parser between wikitext and HTML5
|
Static Public Member Functions | |
static | getTokenType ( $token) |
Gets a string type value for a token. | |
static | isWikitextBlockTag (string $name) |
static | tagOpensBlockScope (string $name) |
In the legacy parser, these block tags open block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php). | |
static | tagClosesBlockScope (string $name) |
In the legacy parser, these block tags close block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php). | |
static | isTemplateToken ( $token) |
Is this a template token? | |
static | isHTMLTag ( $token) |
Determine whether the current token was an HTML tag in wikitext. | |
static | hasDOMFragmentType (Token $token) |
Is the token a DOMFragment type value? | |
static | isTableTag ( $token) |
Is the token a table tag? | |
static | isSolTransparentLinkTag ( $token) |
Determine if token is a transparent link tag. | |
static | isBehaviorSwitch (Env $env, $token) |
Does this token represent a behavior switch? | |
static | isSolTransparent (Env $env, $token) |
This should come close to matching WTUtils::emitsSolTransparentSingleLineWT , without the single line caveat. | |
static | isTranslationUnitMarker (Env $env, CommentTk $token) |
HACK: Returns true if $token looks like a TU marker () and if we could be in a translate-annotated page. | |
static | isEmptyLineMetaToken ( $token) |
Is token a transparent link tag? | |
static | matchTypeOf (Token $t, string $typeRe) |
Determine whether the token matches the given typeof attribute value. | |
static | hasTypeOf (Token $t, string $type) |
Determine whether the token matches the given typeof attribute value. | |
static | shiftTokenTSR (array $tokens, $offset) |
Shift TSR of a token. | |
static | stripEOFTkFromTokens (array &$tokens) |
Strip EOFTk token from token chunk. | |
static | convertOffsets (string $s, string $from, string $to, array $offsets) |
Convert string offsets. | |
static | convertTokenOffsets (string $s, string $from, string $to, array $tokens) |
Convert offsets in a token array. | |
static | isEntitySpanToken ( $token) |
Tests whether token represents an HTML entity. | |
static | newlinesToNlTks (string $str) |
Transform "\n" and "\r\n" in the input string to NlTk tokens. | |
static | tokensToString ( $tokens, bool $strict=false, array $opts=[]) |
Flatten/convert a token array into a string. | |
static | kvToHash (array $kvs) |
Convert an array of key-value pairs into a hash of keys to values. | |
static | tokenTrim ( $tokens) |
Trim space and newlines from leading and trailing text tokens. | |
static | isAnnotationStartToken (Token $t) |
Checks whether the provided meta tag token is an annotation start token. | |
static | isAnnotationEndToken (Token $t) |
Checks whether the provided meta tag token is an annotation end token. | |
Public Attributes | |
const | SOL_TRANSPARENT_LINK_REGEX |
|
static |
Convert string offsets.
Offset types are:
substr()
or strlen()
.mb_substr()
or mb_strlen()
..substring()
or .length
.Offsets that are mid-Unicode character are "rounded" up to the next full character, i.e. the output offset will always point to the start of a Unicode code point (or just past the end of the string). Offsets outside the string are "rounded" to 0 or just-past-the-end.
$offsets[] = &$var;
.string | $s | Unicode string the offsets are offsets into, UTF-8 encoded. |
('byte'|'ucs2'|'char') | $from Offset type to convert from. | |
('byte'|'ucs2'|'char') | $to Offset type to convert to. | |
int[] | $offsets | References to the offsets to convert. |
|
static |
Convert offsets in a token array.
string | $s | The offset reference string |
('byte'|'ucs2'|'char') | $from Offset type to convert from | |
('byte'|'ucs2'|'char') | $to Offset type to convert to | |
array<Token|string|array> | $tokens |
|
static |
Gets a string type value for a token.
Token | string | $token |
|
static |
Is the token a DOMFragment type value?
Token | $token |
|
static |
Determine whether the token matches the given typeof attribute value.
Token | $t | |
string | $type | Expected value of "typeof" attribute, as a literal string. |
|
static |
Checks whether the provided meta tag token is an annotation end token.
Token | $t |
|
static |
Checks whether the provided meta tag token is an annotation start token.
Token | $t |
|
static |
Does this token represent a behavior switch?
Env | $env | |
Token | string | $token |
|
static |
Is token a transparent link tag?
Token | string | $token |
|
static |
Tests whether token represents an HTML entity.
Think <span typeof="mw:Entity">
.
Token | string | null | $token |
|
static |
Determine whether the current token was an HTML tag in wikitext.
Token | string | null | $token |
|
static |
This should come close to matching WTUtils::emitsSolTransparentSingleLineWT
, without the single line caveat.
Env | $env | |
Token | string | $token |
|
static |
Determine if token is a transparent link tag.
Token | string | $token |
|
static |
Is the token a table tag?
Token | string | $token |
|
static |
Is this a template token?
Token | string | null | $token |
|
static |
HACK: Returns true if $token looks like a TU marker () and if we could be in a translate-annotated page.
Env | $env | |
CommentTk | $token |
|
static |
string | $name |
|
static |
Convert an array of key-value pairs into a hash of keys to values.
For duplicate keys, the last entry wins.
array<KV> | $kvs |
|
static |
Determine whether the token matches the given typeof
attribute value.
Token | $t | The token to test |
string | $typeRe | Regular expression matching the expected value of the typeof attribute. |
typeof
value, or null
if there is no match.
|
static |
Transform "\n"
and "\r\n"
in the input string to NlTk
tokens.
string | $str |
|
static |
Shift TSR of a token.
PORT-FIXME: In JS this was sometimes called with $offset=undefined, which meant do nothing by default, except if there was a third parameter set to true, in which case it meant the same thing as $offset = null. We can't pass in undefined in PHP, so this should usually be handled with isset() is the caller. But isset() returns true if the variable is null, so let's use false instead of null for whatever the previous code meant by a null offset.
array<Token|string> | $tokens | |
int | false | $offset |
|
static |
Strip EOFTk token from token chunk.
The EOFTk is expected to be the last token of the chunk.
array | &$tokens |
|
static |
In the legacy parser, these block tags close block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).
string | $name |
|
static |
In the legacy parser, these block tags open block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).
string | $name |
|
static |
Flatten/convert a token array into a string.
string|Token|array<Token|string> | $tokens | |
bool | $strict | Whether to abort as soon as we find a token we can't stringify. |
array<string,bool|Env> | $opts |
Unsure why phan is whining about $opts array accesses. So for now, I am simply suppressing those warnings.
|
static |
Trim space and newlines from leading and trailing text tokens.
string|Token|(Token|string)[] | $tokens |
const Wikimedia\Parsoid\Utils\TokenUtils::SOL_TRANSPARENT_LINK_REGEX |