css-sanitizer
Classes to parse and sanitize CSS
Loading...
Searching...
No Matches
Wikimedia\CSS\Parser\DataSourceTokenizer Class Reference

Parse CSS into tokens. More...

+ Inheritance diagram for Wikimedia\CSS\Parser\DataSourceTokenizer:
+ Collaboration diagram for Wikimedia\CSS\Parser\DataSourceTokenizer:

Public Member Functions

 __construct (DataSource $source, array $options=[])
 
 getParseErrors ()
 @inheritDoc
 
 clearParseErrors ()
 @inheritDoc
 
 consumeToken ()
 Read a token from the data source.
 

Protected Member Functions

 nextChar ()
 Read a character from the data source.
 
 consumeCharacter ()
 Update the current and next character fields.
 
 reconsumeCharacter ()
 Reconsume the next character.
 
 lookAhead ()
 Look ahead at the next three characters.
 
 parseError ( $tag, array $position=null, array $data=[])
 Record a parse error.
 
 consumeNumericToken (array $data)
 Consume a numeric token.
 
 consumeIdentLikeToken (array $data)
 Consume an ident-like token.
 
 consumeStringToken ( $endChar, array $data)
 Consume a string token.
 
 consumeUrlToken (array $data)
 Consume a URL token.
 
 consumeBadUrlRemnants ()
 Clean up after finding an error in a URL.
 
 consumeEscape ()
 Consume a valid escape.
 
 consumeName ()
 Consume a name.
 
 consumeNumber ()
 Consume a number.
 

Static Protected Member Functions

static isWhitespace ( $char)
 Indicate if a character is whitespace.
 
static isNameStartCharacter ( $char)
 Indicate if a character is a name-start code point.
 
static isNameCharacter ( $char)
 Indicate if a character is a name code point.
 
static isNonPrintable ( $char)
 Indicate if a character is non-printable.
 
static isDigit ( $char)
 Indicate if a character is a digit.
 
static isHexDigit ( $char)
 Indicate if a character is a hex digit.
 
static isValidEscape ( $char1, $char2)
 Determine if two characters constitute a valid escape.
 
static wouldStartIdentifier ( $char1, $char2, $char3)
 Determine if three characters would start an identifier.
 
static wouldStartNumber ( $char1, $char2, $char3)
 Determine if three characters would start a number.
 

Protected Attributes

 $source
 
 $line = 1
 
 $pos = 0
 
 $currentCharacter = null
 
 $nextCharacter = null
 
 $parseErrors = []
 

Detailed Description

Parse CSS into tokens.

This implements the tokenizer from the CSS Syntax Module Level 3 candidate recommendation.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\CSS\Parser\DataSourceTokenizer::__construct ( DataSource $source,
array $options = [] )
Parameters
DataSource$source
array$optionsConfiguration options. (none currently defined)

Member Function Documentation

◆ clearParseErrors()

Wikimedia\CSS\Parser\DataSourceTokenizer::clearParseErrors ( )

@inheritDoc

Implements Wikimedia\CSS\Parser\Tokenizer.

◆ consumeBadUrlRemnants()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeBadUrlRemnants ( )
protected

◆ consumeEscape()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeEscape ( )
protected

Consume a valid escape.

This assumes the leading backslash is consumed.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-escaped-code-point
Returns
string Escaped character

◆ consumeIdentLikeToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeIdentLikeToken ( array $data)
protected

Consume an ident-like token.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-ident-like-token
Parameters
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ consumeName()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeName ( )
protected

Consume a name.

Note this does not do validation on the input stream. Call self::wouldStartIdentifier() or the like before calling the method if necessary.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-name
Returns
string Name

◆ consumeNumber()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeNumber ( )
protected

Consume a number.

Note this does not do validation on the input stream. Call self::wouldStartNumber() before calling the method if necessary.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-number
Returns
array [ string $value, int|float $number, string $type ('integer' or 'number') ] @suppress PhanPluginDuplicateAdjacentStatement

◆ consumeNumericToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeNumericToken ( array $data)
protected

Consume a numeric token.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-numeric-token
Parameters
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ consumeStringToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeStringToken ( $endChar,
array $data )
protected

Consume a string token.

This assumes the leading quote or apostrophe has already been consumed.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-string-token
Parameters
string$endCharEnding character of the string
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ consumeToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeToken ( )

Read a token from the data source.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-token
Returns
Token @suppress PhanPluginDuplicateAdjacentStatement,PhanPluginDuplicateSwitchCaseLooseEquality

Implements Wikimedia\CSS\Parser\Tokenizer.

◆ consumeUrlToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeUrlToken ( array $data)
protected

Consume a URL token.

This assumes the leading "url(" has already been consumed.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#consume-url-token
Parameters
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ getParseErrors()

Wikimedia\CSS\Parser\DataSourceTokenizer::getParseErrors ( )

@inheritDoc

Implements Wikimedia\CSS\Parser\Tokenizer.

◆ isDigit()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isDigit ( $char)
staticprotected

Indicate if a character is a digit.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#digit
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isHexDigit()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isHexDigit ( $char)
staticprotected

Indicate if a character is a hex digit.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#hex-digit
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isNameCharacter()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isNameCharacter ( $char)
staticprotected

Indicate if a character is a name code point.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#name-code-point
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isNameStartCharacter()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isNameStartCharacter ( $char)
staticprotected

Indicate if a character is a name-start code point.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#name-start-code-point
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isNonPrintable()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isNonPrintable ( $char)
staticprotected

Indicate if a character is non-printable.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#non-printable-code-point
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isValidEscape()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isValidEscape ( $char1,
$char2 )
staticprotected

Determine if two characters constitute a valid escape.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#starts-with-a-valid-escape
Parameters
string$char1
string$char2
Returns
bool

◆ isWhitespace()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isWhitespace ( $char)
staticprotected

Indicate if a character is whitespace.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#whitespace
Parameters
string$charA single UTF-8 character
Returns
bool

◆ lookAhead()

Wikimedia\CSS\Parser\DataSourceTokenizer::lookAhead ( )
protected

Look ahead at the next three characters.

Returns
string[] Three characters

◆ nextChar()

Wikimedia\CSS\Parser\DataSourceTokenizer::nextChar ( )
protected

Read a character from the data source.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#input-preprocessing
Returns
string One UTF-8 character, or empty string on EOF

◆ parseError()

Wikimedia\CSS\Parser\DataSourceTokenizer::parseError ( $tag,
array $position = null,
array $data = [] )
protected

Record a parse error.

Parameters
string$tagError tag
array | null$positionReport the error as starting at this position instead of at the current position.
array$dataExtra data about the error.

◆ reconsumeCharacter()

Wikimedia\CSS\Parser\DataSourceTokenizer::reconsumeCharacter ( )
protected

Reconsume the next character.

In more normal terms, this pushes a character back onto the data source, so it will be read again for the next call to self::consumeCharacter().

◆ wouldStartIdentifier()

static Wikimedia\CSS\Parser\DataSourceTokenizer::wouldStartIdentifier ( $char1,
$char2,
$char3 )
staticprotected

Determine if three characters would start an identifier.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#would-start-an-identifier
Parameters
string$char1
string$char2
string$char3
Returns
bool

◆ wouldStartNumber()

static Wikimedia\CSS\Parser\DataSourceTokenizer::wouldStartNumber ( $char1,
$char2,
$char3 )
staticprotected

Determine if three characters would start a number.

See also
https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/#starts-with-a-number
Parameters
string$char1
string$char2
string$char3
Returns
bool

The documentation for this class was generated from the following file: