css-sanitizer
Classes to parse and sanitize CSS
Wikimedia\CSS\Parser\DataSourceTokenizer Class Reference

Parse CSS into tokens. More...

+ Inheritance diagram for Wikimedia\CSS\Parser\DataSourceTokenizer:
+ Collaboration diagram for Wikimedia\CSS\Parser\DataSourceTokenizer:

Public Member Functions

 __construct (DataSource $source, array $options=[])
 
 getParseErrors ()
 @inheritDoc
 
 clearParseErrors ()
 @inheritDoc
 
 consumeToken ()
 Read a token from the data source. More...
 

Public Attributes

 $pos = 0
 

Protected Member Functions

 nextChar ()
 Read a character from the data source. More...
 
 consumeCharacter ()
 Update the current and next character fields.
 
 reconsumeCharacter ()
 Reconsume the next character. More...
 
 lookAhead ()
 Look ahead at the next three characters. More...
 
 parseError ( $tag, array $position=null, array $data=[])
 Record a parse error. More...
 
 consumeNumericToken (array $data)
 Consume a numeric token. More...
 
 consumeIdentLikeToken (array $data)
 Consume an ident-like token. More...
 
 consumeStringToken ( $endChar, array $data)
 Consume a string token. More...
 
 consumeUrlToken (array $data)
 Consume a URL token. More...
 
 consumeBadUrlRemnants ()
 Clean up after finding an error in a URL. More...
 
 consumeUnicodeRangeToken (array $data)
 Consume a unicode-range token. More...
 
 consumeEscape ()
 Consume a valid escape. More...
 
 consumeName ()
 Consume a name. More...
 
 consumeNumber ()
 Consume a number. More...
 

Static Protected Member Functions

static isWhitespace ( $char)
 Indicate if a character is whitespace. More...
 
static isNameStartCharacter ( $char)
 Indicate if a character is a name-start code point. More...
 
static isNameCharacter ( $char)
 Indicate if a character is a name code point. More...
 
static isNonPrintable ( $char)
 Indicate if a character is non-printable. More...
 
static isDigit ( $char)
 Indicate if a character is a digit. More...
 
static isHexDigit ( $char)
 Indicate if a character is a hex digit. More...
 
static isValidEscape ( $char1, $char2)
 Determine if two characters constitute a valid escape. More...
 
static wouldStartIdentifier ( $char1, $char2, $char3)
 Determine if three characters would start an identifier. More...
 
static wouldStartNumber ( $char1, $char2, $char3)
 Determine if three characters would start a number. More...
 

Protected Attributes

 $source
 
 $line = 1
 
 $currentCharacter = null
 
 $nextCharacter = null
 
 $parseErrors = []
 

Detailed Description

Parse CSS into tokens.

This implements the tokenizer from the CSS Syntax Module Level 3 candidate recommendation.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\CSS\Parser\DataSourceTokenizer::__construct ( DataSource  $source,
array  $options = [] 
)
Parameters
DataSource$source
array$optionsConfiguration options. (none currently defined)

Member Function Documentation

◆ consumeBadUrlRemnants()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeBadUrlRemnants ( )
protected

◆ consumeEscape()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeEscape ( )
protected

Consume a valid escape.

This assumes the leading backslash is consumed.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-an-escaped-code-point
Returns
string Escaped character

◆ consumeIdentLikeToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeIdentLikeToken ( array  $data)
protected

Consume an ident-like token.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-an-ident-like-token
Note
Per the draft as of January 2017, quoted URLs are parsed as functions named 'url'. This is needed in order to implement the <url> type in the Values specification.
Parameters
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ consumeName()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeName ( )
protected

Consume a name.

Note this does not do validation on the input stream. Call self::wouldStartIdentifier() or the like before calling the method if necessary.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-a-name
Returns
string Name

◆ consumeNumber()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeNumber ( )
protected

Consume a number.

Note this does not do validation on the input stream. Call self::wouldStartNumber() before calling the method if necessary.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-a-number
Returns
array [ string $value, int|float $number, string $type ('integer' or 'number') ] @suppress PhanPluginDuplicateAdjacentStatement

◆ consumeNumericToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeNumericToken ( array  $data)
protected

Consume a numeric token.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-a-numeric-token
Parameters
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ consumeStringToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeStringToken (   $endChar,
array  $data 
)
protected

Consume a string token.

This assumes the leading quote or apostrophe has already been consumed.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-a-string-token
Parameters
string$endCharEnding character of the string
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ consumeToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeToken ( )

Read a token from the data source.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-a-token
Returns
Token @suppress PhanPluginDuplicateAdjacentStatement,PhanPluginDuplicateSwitchCaseLooseEquality

Implements Wikimedia\CSS\Parser\Tokenizer.

◆ consumeUnicodeRangeToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeUnicodeRangeToken ( array  $data)
protected

Consume a unicode-range token.

This assumes the initial "u" has been consumed (currentCharacter is the '+'), and the next codepoint is verfied to be a hex digit or "?".

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-a-unicode-range-token
Parameters
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ consumeUrlToken()

Wikimedia\CSS\Parser\DataSourceTokenizer::consumeUrlToken ( array  $data)
protected

Consume a URL token.

This assumes the leading "url(" has already been consumed.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#consume-a-url-token
Note
Per the draft as of January 2017, this does not handle quoted URL tokens.
Parameters
array$dataData for the new token (typically contains just 'position')
Returns
Token

◆ isDigit()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isDigit (   $char)
staticprotected

Indicate if a character is a digit.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#digit
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isHexDigit()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isHexDigit (   $char)
staticprotected

Indicate if a character is a hex digit.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#hex-digit
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isNameCharacter()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isNameCharacter (   $char)
staticprotected

Indicate if a character is a name code point.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#name-code-point
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isNameStartCharacter()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isNameStartCharacter (   $char)
staticprotected

Indicate if a character is a name-start code point.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#name-start-code-point
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isNonPrintable()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isNonPrintable (   $char)
staticprotected

Indicate if a character is non-printable.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#non-printable-code-point
Parameters
string$charA single UTF-8 character
Returns
bool

◆ isValidEscape()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isValidEscape (   $char1,
  $char2 
)
staticprotected

Determine if two characters constitute a valid escape.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#starts-with-a-valid-escape
Parameters
string$char1
string$char2
Returns
bool

◆ isWhitespace()

static Wikimedia\CSS\Parser\DataSourceTokenizer::isWhitespace (   $char)
staticprotected

Indicate if a character is whitespace.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#whitespace
Parameters
string$charA single UTF-8 character
Returns
bool

◆ lookAhead()

Wikimedia\CSS\Parser\DataSourceTokenizer::lookAhead ( )
protected

Look ahead at the next three characters.

Returns
string[] Three characters

◆ nextChar()

Wikimedia\CSS\Parser\DataSourceTokenizer::nextChar ( )
protected

Read a character from the data source.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#input-preprocessing
Returns
string One UTF-8 character, or empty string on EOF

◆ parseError()

Wikimedia\CSS\Parser\DataSourceTokenizer::parseError (   $tag,
array  $position = null,
array  $data = [] 
)
protected

Record a parse error.

Parameters
string$tagError tag
array | null$positionReport the error as starting at this position instead of at the current position.
array$dataExtra data about the error.

◆ reconsumeCharacter()

Wikimedia\CSS\Parser\DataSourceTokenizer::reconsumeCharacter ( )
protected

Reconsume the next character.

In more normal terms, this pushes a character back onto the data source so it will be read again for the next call to self::consumeCharacter().

◆ wouldStartIdentifier()

static Wikimedia\CSS\Parser\DataSourceTokenizer::wouldStartIdentifier (   $char1,
  $char2,
  $char3 
)
staticprotected

Determine if three characters would start an identifier.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#would-start-an-identifier
Parameters
string$char1
string$char2
string$char3
Returns
bool

◆ wouldStartNumber()

static Wikimedia\CSS\Parser\DataSourceTokenizer::wouldStartNumber (   $char1,
  $char2,
  $char3 
)
staticprotected

Determine if three characters would start a number.

See also
https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/#starts-with-a-number
Parameters
string$char1
string$char2
string$char3
Returns
bool

The documentation for this class was generated from the following file: