Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Utils\TokenUtils Class Reference

Static Public Member Functions

static getTokenType ( $token)
 Gets a string type value for a token.
 
static isWikitextBlockTag (string $name)
 
static tagOpensBlockScope (string $name)
 In the legacy parser, these block tags open block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).
 
static tagClosesBlockScope (string $name)
 In the legacy parser, these block tags close block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).
 
static isTemplateToken ( $token)
 Is this a template token?
 
static isHTMLTag ( $token)
 Determine whether the current token was an HTML tag in wikitext.
 
static hasDOMFragmentType (Token $token)
 Is the token a DOMFragment type value?
 
static isTableTag ( $token)
 Is the token a table tag?
 
static isSolTransparentLinkTag ( $token)
 Determine if token is a transparent link tag.
 
static isBehaviorSwitch (Env $env, $token)
 Does this token represent a behavior switch?
 
static isSolTransparent (Env $env, $token)
 This should come close to matching WTUtils::emitsSolTransparentSingleLineWT, without the single line caveat.
 
static isTranslationUnitMarker (Env $env, CommentTk $token)
 HACK: Returns true if $token looks like a TU marker () and if we could be in a translate-annotated page.
 
static isEmptyLineMetaToken ( $token)
 Is token a transparent link tag?
 
static matchTypeOf (Token $t, string $typeRe)
 Determine whether the token matches the given typeof attribute value.
 
static hasTypeOf (Token $t, string $type)
 Determine whether the token matches the given typeof attribute value.
 
static shiftTokenTSR (array $tokens, $offset)
 Shift TSR of a token.
 
static stripEOFTkFromTokens (array &$tokens)
 Strip EOFTk token from token chunk.
 
static convertOffsets (string $s, string $from, string $to, array $offsets)
 Convert string offsets.
 
static convertTokenOffsets (string $s, string $from, string $to, array $tokens)
 Convert offsets in a token array.
 
static isEntitySpanToken ( $token)
 Tests whether token represents an HTML entity.
 
static newlinesToNlTks (string $str)
 Transform "\n" and "\r\n" in the input string to NlTk tokens.
 
static tokensToString ( $tokens, bool $strict=false, array $opts=[])
 Flatten/convert a token array into a string.
 
static kvToHash (array $kvs)
 Convert an array of key-value pairs into a hash of keys to values.
 
static tokenTrim ( $tokens)
 Trim space and newlines from leading and trailing text tokens.
 
static isAnnotationStartToken (Token $t)
 Checks whether the provided meta tag token is an annotation start token.
 
static isAnnotationEndToken (Token $t)
 Checks whether the provided meta tag token is an annotation end token.
 

Public Attributes

const SOL_TRANSPARENT_LINK_REGEX
 

Member Function Documentation

◆ convertOffsets()

static Wikimedia\Parsoid\Utils\TokenUtils::convertOffsets ( string $s,
string $from,
string $to,
array $offsets )
static

Convert string offsets.

Offset types are:

  • 'byte': Bytes (UTF-8 encoding), e.g. PHP substr() or strlen().
  • 'char': Unicode code points (encoding irrelevant), e.g. PHP mb_substr() or mb_strlen().
  • 'ucs2': 16-bit code units (UTF-16 encoding), e.g. JavaScript .substring() or .length.

Offsets that are mid-Unicode character are "rounded" up to the next full character, i.e. the output offset will always point to the start of a Unicode code point (or just past the end of the string). Offsets outside the string are "rounded" to 0 or just-past-the-end.

Note
When constructing the array of offsets to pass to this method, populate it with references as $offsets[] = &$var;.
Parameters
string$sUnicode string the offsets are offsets into, UTF-8 encoded.
('byte'|'ucs2'|'char')$from Offset type to convert from.
('byte'|'ucs2'|'char')$to Offset type to convert to.
int[]$offsetsReferences to the offsets to convert.

◆ convertTokenOffsets()

static Wikimedia\Parsoid\Utils\TokenUtils::convertTokenOffsets ( string $s,
string $from,
string $to,
array $tokens )
static

Convert offsets in a token array.

See also
TokenUtils::convertOffsets()
Parameters
string$sThe offset reference string
('byte'|'ucs2'|'char')$from Offset type to convert from
('byte'|'ucs2'|'char')$to Offset type to convert to
array<Token|string|array>$tokens

◆ getTokenType()

static Wikimedia\Parsoid\Utils\TokenUtils::getTokenType ( $token)
static

Gets a string type value for a token.

Parameters
Token | string$token
Returns
string

◆ hasDOMFragmentType()

static Wikimedia\Parsoid\Utils\TokenUtils::hasDOMFragmentType ( Token $token)
static

Is the token a DOMFragment type value?

Parameters
Token$token
Returns
bool

◆ hasTypeOf()

static Wikimedia\Parsoid\Utils\TokenUtils::hasTypeOf ( Token $t,
string $type )
static

Determine whether the token matches the given typeof attribute value.

Parameters
Token$t
string$typeExpected value of "typeof" attribute, as a literal string.
Returns
bool True if the token matches.

◆ isAnnotationEndToken()

static Wikimedia\Parsoid\Utils\TokenUtils::isAnnotationEndToken ( Token $t)
static

Checks whether the provided meta tag token is an annotation end token.

Parameters
Token$t
Returns
bool

◆ isAnnotationStartToken()

static Wikimedia\Parsoid\Utils\TokenUtils::isAnnotationStartToken ( Token $t)
static

Checks whether the provided meta tag token is an annotation start token.

Parameters
Token$t
Returns
bool

◆ isBehaviorSwitch()

static Wikimedia\Parsoid\Utils\TokenUtils::isBehaviorSwitch ( Env $env,
$token )
static

Does this token represent a behavior switch?

Parameters
Env$env
Token | string$token
Returns
bool

◆ isEmptyLineMetaToken()

static Wikimedia\Parsoid\Utils\TokenUtils::isEmptyLineMetaToken ( $token)
static

Is token a transparent link tag?

Parameters
Token | string$token
Returns
bool

◆ isEntitySpanToken()

static Wikimedia\Parsoid\Utils\TokenUtils::isEntitySpanToken ( $token)
static

Tests whether token represents an HTML entity.

Think <span typeof="mw:Entity">.

Parameters
Token | string | null$token
Returns
bool

◆ isHTMLTag()

static Wikimedia\Parsoid\Utils\TokenUtils::isHTMLTag ( $token)
static

Determine whether the current token was an HTML tag in wikitext.

Parameters
Token | string | null$token
Returns
bool

◆ isSolTransparent()

static Wikimedia\Parsoid\Utils\TokenUtils::isSolTransparent ( Env $env,
$token )
static

This should come close to matching WTUtils::emitsSolTransparentSingleLineWT, without the single line caveat.

Parameters
Env$env
Token | string$token
Returns
bool

◆ isSolTransparentLinkTag()

static Wikimedia\Parsoid\Utils\TokenUtils::isSolTransparentLinkTag ( $token)
static

Determine if token is a transparent link tag.

Parameters
Token | string$token
Returns
bool

◆ isTableTag()

static Wikimedia\Parsoid\Utils\TokenUtils::isTableTag ( $token)
static

Is the token a table tag?

Parameters
Token | string$token
Returns
bool

◆ isTemplateToken()

static Wikimedia\Parsoid\Utils\TokenUtils::isTemplateToken ( $token)
static

Is this a template token?

Parameters
Token | string | null$token
Returns
bool

◆ isTranslationUnitMarker()

static Wikimedia\Parsoid\Utils\TokenUtils::isTranslationUnitMarker ( Env $env,
CommentTk $token )
static

HACK: Returns true if $token looks like a TU marker () and if we could be in a translate-annotated page.

Parameters
Env$env
CommentTk$token
Returns
bool

◆ isWikitextBlockTag()

static Wikimedia\Parsoid\Utils\TokenUtils::isWikitextBlockTag ( string $name)
static
Parameters
string$name
Returns
bool

◆ kvToHash()

static Wikimedia\Parsoid\Utils\TokenUtils::kvToHash ( array $kvs)
static

Convert an array of key-value pairs into a hash of keys to values.

For duplicate keys, the last entry wins.

Parameters
array<KV>$kvs
Returns
array<string,array<Token|string>>|array<string,string>

◆ matchTypeOf()

static Wikimedia\Parsoid\Utils\TokenUtils::matchTypeOf ( Token $t,
string $typeRe )
static

Determine whether the token matches the given typeof attribute value.

Parameters
Token$tThe token to test
string$typeReRegular expression matching the expected value of the typeof attribute.
Returns
?string The matching typeof value, or null if there is no match.

◆ newlinesToNlTks()

static Wikimedia\Parsoid\Utils\TokenUtils::newlinesToNlTks ( string $str)
static

Transform "\n" and "\r\n" in the input string to NlTk tokens.

Parameters
string$str
Returns
array (interspersed string and NlTk tokens)

◆ shiftTokenTSR()

static Wikimedia\Parsoid\Utils\TokenUtils::shiftTokenTSR ( array $tokens,
$offset )
static

Shift TSR of a token.

PORT-FIXME: In JS this was sometimes called with $offset=undefined, which meant do nothing by default, except if there was a third parameter set to true, in which case it meant the same thing as $offset = null. We can't pass in undefined in PHP, so this should usually be handled with isset() is the caller. But isset() returns true if the variable is null, so let's use false instead of null for whatever the previous code meant by a null offset.

Parameters
array<Token|string>$tokens
int | false$offset

◆ stripEOFTkFromTokens()

static Wikimedia\Parsoid\Utils\TokenUtils::stripEOFTkFromTokens ( array & $tokens)
static

Strip EOFTk token from token chunk.

The EOFTk is expected to be the last token of the chunk.

Parameters
array&$tokens
Returns
array return the modified token array so that this call can be chained

◆ tagClosesBlockScope()

static Wikimedia\Parsoid\Utils\TokenUtils::tagClosesBlockScope ( string $name)
static

In the legacy parser, these block tags close block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).

Parameters
string$name
Returns
bool

◆ tagOpensBlockScope()

static Wikimedia\Parsoid\Utils\TokenUtils::tagOpensBlockScope ( string $name)
static

In the legacy parser, these block tags open block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).

Parameters
string$name
Returns
bool

◆ tokensToString()

static Wikimedia\Parsoid\Utils\TokenUtils::tokensToString ( $tokens,
bool $strict = false,
array $opts = [] )
static

Flatten/convert a token array into a string.

Parameters
string|Token|array<Token|string>$tokens
bool$strictWhether to abort as soon as we find a token we can't stringify.
array<string,bool|Env>$opts
Returns
string|array{0:string,1:Array<Token|string>} The stringified tokens. If $strict is true, returns a two-element array containing string prefix and the remainder of the tokens as soon as we encounter something we can't stringify.

Unsure why phan is whining about $opts array accesses. So for now, I am simply suppressing those warnings.

◆ tokenTrim()

static Wikimedia\Parsoid\Utils\TokenUtils::tokenTrim ( $tokens)
static

Trim space and newlines from leading and trailing text tokens.

Parameters
string|Token|(Token|string)[]$tokens
Returns
string|Token|(Token|string)[]

Member Data Documentation

◆ SOL_TRANSPARENT_LINK_REGEX

const Wikimedia\Parsoid\Utils\TokenUtils::SOL_TRANSPARENT_LINK_REGEX
Initial value:
=
'/(?:^|\s)mw:PageProp\/(?:Category|redirect|Language)(?=$|\s)/D'

The documentation for this class was generated from the following file: