Parsoid
A bidirectional parser between wikitext and HTML5
Parsoid\Utils\TokenUtils Class Reference

Static Public Member Functions

static getTokenType ( $token)
 Gets a string type value for a token. More...
 
static isBlockTag (string $name)
 Determine if a tag is block-level or not. More...
 
static tagOpensBlockScope (string $name)
 In the PHP parser, these block tags open block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php). More...
 
static tagClosesBlockScope (string $name)
 In the PHP parser, these block tags close block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php). More...
 
static isTemplateToken ( $token)
 Is this a template token? More...
 
static isHTMLTag ( $token)
 Determine whether the current token was an HTML tag in wikitext. More...
 
static isDOMFragmentType (string $typeOf)
 Is the typeof a DOMFragment type value? More...
 
static isTableTag ( $token)
 Is the token a table tag? More...
 
static isSolTransparentLinkTag ( $token)
 Determine if token is a transparent link tag. More...
 
static isBehaviorSwitch (Env $env, $token)
 Does this token represent a behavior switch? More...
 
static isSolTransparent (Env $env, $token)
 This should come close to matching DOMUtils.emitsSolTransparentSingleLineWT, without the single line caveat. More...
 
static isEmptyLineMetaToken ( $token)
 Is token a transparent link tag? More...
 
static shiftTokenTSR (array $tokens, $offset)
 Shift TSR of a token. More...
 
static stripEOFTkFromTokens (array &$tokens)
 Strip EOFTk token from token chunk. More...
 
static placeholder (?string $content, stdClass $openTagAttribs, stdClass $closeTagAttribs)
 Create placeholder tokens for some content. More...
 
static convertOffsets (string $s, string $from, string $to, array $offsets)
 Convert string offsets. More...
 
static convertTokenOffsets (string $s, string $from, string $to, array $tokens)
 Convert offsets in a token array. More...
 
static isEntitySpanToken ( $token)
 Tests whether token represents an HTML entity. More...
 
static newlinesToNlTks (string $str)
 Transform "\n" and "\r\n" in the input string to NlTk tokens. More...
 
static tokensToString ( $tokens, bool $strict=false, array $opts=[])
 Flatten/convert a token array into a string. More...
 
static kvToHash (array $kvs, bool $convertValuesToString=false, bool $useSrc=false)
 Convert an array of key-value pairs into a hash of keys to values. More...
 
static tokenTrim ( $tokens)
 Trim space and newlines from leading and trailing text tokens. More...
 

Public Attributes

const SOL_TRANSPARENT_LINK_REGEX
 

Member Function Documentation

◆ convertOffsets()

static Parsoid\Utils\TokenUtils::convertOffsets ( string  $s,
string  $from,
string  $to,
array  $offsets 
)
static

Convert string offsets.

Offset types are:

  • 'byte': Bytes (UTF-8 encoding), e.g. PHP substr() or strlen().
  • 'char': Unicode code points (encoding irrelevant), e.g. PHP mb_substr() or mb_strlen().
  • 'ucs2': 16-bit code units (UTF-16 encoding), e.g. JavaScript .substring() or .length.

Offsets that are mid-Unicode character are "rounded" up to the next full character, i.e. the output offset will always point to the start of a Unicode code point (or just past the end of the string). Offsets outside the string are "rounded" to 0 or just-past-the-end.

Note
When constructing the array of offsets to pass to this method, populate it with references as $offsets[] = .
Parameters
string$sUnicode string the offsets are offsets into, UTF-8 encoded.
string$fromOffset type to convert from.
string$toOffset type to convert to.
int[]$offsets References to the offsets to convert.

◆ convertTokenOffsets()

static Parsoid\Utils\TokenUtils::convertTokenOffsets ( string  $s,
string  $from,
string  $to,
array  $tokens 
)
static

Convert offsets in a token array.

See also
TokenUtils::convertOffsets()
Parameters
string$sThe offset reference string
string$fromOffset type to convert from
string$toOffset type to convert to
array<Token|string|array>$tokens

◆ getTokenType()

static Parsoid\Utils\TokenUtils::getTokenType (   $token)
static

Gets a string type value for a token.

Parameters
Token | string$token
Returns
string

◆ isBehaviorSwitch()

static Parsoid\Utils\TokenUtils::isBehaviorSwitch ( Env  $env,
  $token 
)
static

Does this token represent a behavior switch?

Parameters
Env$env
Token | string$token
Returns
bool

◆ isBlockTag()

static Parsoid\Utils\TokenUtils::isBlockTag ( string  $name)
static

Determine if a tag is block-level or not.

<video> is removed from block tags, since it can be phrasing content. This is necessary for it to render inline.

Parameters
string$name
Returns
bool

◆ isDOMFragmentType()

static Parsoid\Utils\TokenUtils::isDOMFragmentType ( string  $typeOf)
static

Is the typeof a DOMFragment type value?

Parameters
string$typeOf
Returns
bool

◆ isEmptyLineMetaToken()

static Parsoid\Utils\TokenUtils::isEmptyLineMetaToken (   $token)
static

Is token a transparent link tag?

Parameters
Token | string$token
Returns
bool

◆ isEntitySpanToken()

static Parsoid\Utils\TokenUtils::isEntitySpanToken (   $token)
static

Tests whether token represents an HTML entity.

Think <span typeof="mw:Entity">.

Parameters
Token | string | null$token
Returns
bool

◆ isHTMLTag()

static Parsoid\Utils\TokenUtils::isHTMLTag (   $token)
static

Determine whether the current token was an HTML tag in wikitext.

Parameters
Token | string | null$token
Returns
bool

◆ isSolTransparent()

static Parsoid\Utils\TokenUtils::isSolTransparent ( Env  $env,
  $token 
)
static

This should come close to matching DOMUtils.emitsSolTransparentSingleLineWT, without the single line caveat.

Parameters
Env$env
Token | string$token
Returns
bool

◆ isSolTransparentLinkTag()

static Parsoid\Utils\TokenUtils::isSolTransparentLinkTag (   $token)
static

Determine if token is a transparent link tag.

Parameters
Token | string$token
Returns
bool

◆ isTableTag()

static Parsoid\Utils\TokenUtils::isTableTag (   $token)
static

Is the token a table tag?

Parameters
Token | string$token
Returns
bool

◆ isTemplateToken()

static Parsoid\Utils\TokenUtils::isTemplateToken (   $token)
static

Is this a template token?

Parameters
Token | string | null$token
Returns
bool

◆ kvToHash()

static Parsoid\Utils\TokenUtils::kvToHash ( array  $kvs,
bool  $convertValuesToString = false,
bool  $useSrc = false 
)
static

Convert an array of key-value pairs into a hash of keys to values.

For duplicate keys, the last entry wins.

Parameters
array<KV>$kvs
bool$convertValuesToString
bool$useSrc
Returns
array<string,Token[]>|array<string,string>

◆ newlinesToNlTks()

static Parsoid\Utils\TokenUtils::newlinesToNlTks ( string  $str)
static

Transform "\n" and "\r\n" in the input string to NlTk tokens.

Parameters
string$str
Returns
array (interspersed string and NlTk tokens)

◆ placeholder()

static Parsoid\Utils\TokenUtils::placeholder ( ?string  $content,
stdClass  $openTagAttribs,
stdClass  $closeTagAttribs 
)
static

Create placeholder tokens for some content.

This is just an escape hatch for scenarios where we don't have a good representation for this content and just want to render it without providing any editing support for it. Content with mw:Placeholder typeof attribute will be ignored by editing clients and they are expected to not modify it either.

Parameters
?string$content
stdClass$openTagAttribs
stdClass$closeTagAttribs
Returns
array

◆ shiftTokenTSR()

static Parsoid\Utils\TokenUtils::shiftTokenTSR ( array  $tokens,
  $offset 
)
static

Shift TSR of a token.

Port warning: in JS this was sometimes called with $offset=undefined, which meant do nothing by default, except if there was a third parameter set to true, in which case it meant the same thing as $offset = null. We can't pass in undefined in PHP, so this should usually be handled with isset() is the caller. But isset() returns true if the variable is null, so let's use false instead of null for whatever the previous code meant by a null offset.

Parameters
Token[]$tokens
int | false$offset

◆ stripEOFTkFromTokens()

static Parsoid\Utils\TokenUtils::stripEOFTkFromTokens ( array &  $tokens)
static

Strip EOFTk token from token chunk.

The EOFTk is expected to be the last token of the chunk.

Parameters
array&$tokens
Returns
array return the modified token array so that this call can be chained

◆ tagClosesBlockScope()

static Parsoid\Utils\TokenUtils::tagClosesBlockScope ( string  $name)
static

In the PHP parser, these block tags close block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).

Parameters
string$name
Returns
bool

◆ tagOpensBlockScope()

static Parsoid\Utils\TokenUtils::tagOpensBlockScope ( string  $name)
static

In the PHP parser, these block tags open block-tag scope See doBlockLevels in the PHP parser (includes/parser/Parser.php).

Parameters
string$name
Returns
bool

◆ tokensToString()

static Parsoid\Utils\TokenUtils::tokensToString (   $tokens,
bool  $strict = false,
array  $opts = [] 
)
static

Flatten/convert a token array into a string.

Parameters
string|Token|array<Token|string>$tokens
bool$strictWhether to abort as soon as we find a token we can't stringify.
array<string,bool|Env>$opts
Returns
string|array{0:string,1:Array<Token|string>} The stringified tokens. If $strict is true, returns a two-element array containing string prefix and the remainder of the tokens as soon as we encounter something we can't stringify.

Unsure why phan is whining about $opts array accesses. So for now, I am simply suppressing those warnings.

◆ tokenTrim()

static Parsoid\Utils\TokenUtils::tokenTrim (   $tokens)
static

Trim space and newlines from leading and trailing text tokens.

Parameters
string|Token|(Token|string)[]$tokens
Returns
string|Token|(Token|string)[]

Member Data Documentation

◆ SOL_TRANSPARENT_LINK_REGEX

const Parsoid\Utils\TokenUtils::SOL_TRANSPARENT_LINK_REGEX
Initial value:
=
'/(?:^|\s)mw:PageProp\/(?:Category|redirect|Language)(?=$|\s)/D'

The documentation for this class was generated from the following file: