Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Core\SectionMetadata Class Reference

Section metadata for generating TOC. More...

+ Inheritance diagram for Wikimedia\Parsoid\Core\SectionMetadata:
+ Collaboration diagram for Wikimedia\Parsoid\Core\SectionMetadata:

Public Member Functions

 __construct (int $tocLevel=0, int $hLevel=-1, string $line='', string $number='', string $index='', ?string $fromTitle=null, ?int $codepointOffset=null, string $anchor='', string $linkAnchor='', ?array $extensionData=null)
 
 setExtensionData (string $key, $value)
 Attaches arbitrary data to this SectionMetadata object.
 
 appendExtensionData (string $key, $value)
 Appends arbitrary data to this SectionMetadata.
 
 getExtensionData ( $key)
 Gets extension data previously attached to this SectionMetadata.
 
 toArray ()
 Alias for :toLegacy(), for b/c compatibility only.
 
 toLegacy ()
 Return as associative array, in the format returned by the action API (including the order of fields and the value types).
 
 jsonSerialize ()
 
 toJsonArray ()
 
 prettyPrint (int $indent=0)
 For use in parser tests and wherever else humans might appreciate some formatting in the JSON encoded output.
 

Static Public Member Functions

static fromArray (array $data)
 Alias for :fromLegacy(), for b/c compatibility only.
 
static fromLegacy (array $data)
 Create a new SectionMetadata object from an array in the legacy format returned by the action API.
 
static newFromJsonArray (array $json)
 

Public Attributes

int $hLevel
 The heading tag level: a 1 here means an.
 
int $tocLevel
 This is a one-indexed TOC level and the nesting level.
 
string $line
 HTML heading of the section.
 
string $number
 TOC number string (3.1.3, 4.5.2, etc.)
 
string $index
 Section id (integer, assigned in depth first traversal order) Template generated sections get a "T-" prefix.
 
string $fromTitle
 The title of the page that generated this heading.
 
int $codepointOffset
 Codepoint offset where the section shows up in wikitext; this is null if this section comes from a template, if it comes from a literal HTML <h_> tag, or otherwise doesn't correspond to a "preprocessor section".
 
string $anchor
 Anchor attribute.
 
string $linkAnchor
 Anchor URL fragment.
 

Detailed Description

Section metadata for generating TOC.

This is not the complete data for the article section, just the information needed to generate the table of contents.

For now, this schema matches whatever is generated by Parser.php. Parsoid will attempt to match this output for now.

Parser.php::finalizeHeadings() is the authoritative source for how some of these properties are computed right now, especially for the $line, $anchor, and $linkAnchor properties below.

Linker.php::tocLine() and ::makeHeadline() demonstrate how these properties are used to create headings and table of contents lines.

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\Parsoid\Core\SectionMetadata::__construct ( int $tocLevel = 0,
int $hLevel = -1,
string $line = '',
string $number = '',
string $index = '',
?string $fromTitle = null,
?int $codepointOffset = null,
string $anchor = '',
string $linkAnchor = '',
?array $extensionData = null )
Parameters
int$tocLevelOne-indexed TOC level and the nesting level
int$hLevelThe heading tag level
string$lineStripped headline text
string$numberTOC number string (3.1.3, 4.5.2, etc)
string$indexSection id
?string$fromTitleThe title of the page or template that generated this heading, or null.
?int$codepointOffsetCodepoint offset (# of characters) where the section shows up in wikitext, or null if this doesn't correspond to a "preprocesor section". (Be careful if using JavaScript, as JavaScript "characters" are UCS-2 encoded and don't correspond directly to code points.)
string$anchor"True" value of the ID attribute
string$linkAnchorURL-escaped value of the anchor, for use in constructing a URL fragment link
?array$extensionDataExtension data passed in as an associative array

Member Function Documentation

◆ appendExtensionData()

Wikimedia\Parsoid\Core\SectionMetadata::appendExtensionData ( string $key,
$value )

Appends arbitrary data to this SectionMetadata.

This can be used to store some information about the section in the ParserOutput object for later use during page output.

See ::setExtensionData() for more details on rationale and use.

Parameters
string$keyThe key for accessing the data. Extensions should take care to avoid conflicts in naming keys. It is suggested to use the extension's name as a prefix.
int | string$valueThe value to append to the list.
Returns
never This method is not yet implemented.

◆ fromArray()

static Wikimedia\Parsoid\Core\SectionMetadata::fromArray ( array $data)
static

Alias for :fromLegacy(), for b/c compatibility only.

Deprecated
Parameters
array$data
Returns
SectionMetadata

◆ fromLegacy()

static Wikimedia\Parsoid\Core\SectionMetadata::fromLegacy ( array $data)
static

Create a new SectionMetadata object from an array in the legacy format returned by the action API.

This is useful for backward-compatibility, but is expected to be replaced by conversion to/from JSON in the future.

Parameters
array$dataAssociative array with section metadata
Returns
SectionMetadata

◆ getExtensionData()

Wikimedia\Parsoid\Core\SectionMetadata::getExtensionData ( $key)

Gets extension data previously attached to this SectionMetadata.

Parameters
string$keyThe key to look up
Returns
mixed|null The value(s) previously set for the given key using ::setExtensionData() or ::appendExtensionData(), or null if no value was set for this key.

◆ prettyPrint()

Wikimedia\Parsoid\Core\SectionMetadata::prettyPrint ( int $indent = 0)

For use in parser tests and wherever else humans might appreciate some formatting in the JSON encoded output.

For now, nothing special.

Parameters
int$indentAdditional indentation to apply (defaults to zero)
Returns
string

◆ setExtensionData()

Wikimedia\Parsoid\Core\SectionMetadata::setExtensionData ( string $key,
$value )

Attaches arbitrary data to this SectionMetadata object.

This can be used to store some information about this section in the ParserOutput object for later use during page output. The data will be cached along with the ParserOutput object.

This method is provided to overcome the unsafe practice of attaching extra information to a section by directly assigning member variables.

See ParserOutput::setExtensionData() in core for further information about typical usage in hooks.

Setting conflicting values for the same key is not allowed. If you call ::setExtensionData() multiple times with the same key on a SectionMetadata, is is expected that the value will be identical each time. If you want to collect multiple pieces of data under a single key, use ::appendExtensionData().

Note
Only scalar values (numbers, strings, or arrays) are supported as a value. (A future revision will allow anything that core's JsonCodec can handle.) Attempts to set other types as extension data values will break ParserCache for the page.
Todo
When more complex values than scalar values are supported, TOCData::__clone should be updated to take that into account.
Parameters
string$keyThe key for accessing the data. Extensions should take care to avoid conflicts in naming keys. It is suggested to use the extension's name as a prefix. Using the prefix mw: is reserved for core.
mixed$valueThe value to set. Setting a value to null is equivalent to removing the value.

◆ toArray()

Wikimedia\Parsoid\Core\SectionMetadata::toArray ( )

Alias for :toLegacy(), for b/c compatibility only.

Deprecated
Returns
array

◆ toLegacy()

Wikimedia\Parsoid\Core\SectionMetadata::toLegacy ( )

Return as associative array, in the format returned by the action API (including the order of fields and the value types).

This is helpful as b/c support while we transition to objects.

Returns
array

Member Data Documentation

◆ $anchor

string Wikimedia\Parsoid\Core\SectionMetadata::$anchor

Anchor attribute.

This property is the "true" value of the ID attribute, and should be used when looking up a heading or setting an attribute, for example using Document.getElementById() or Element.setAttribute('id',...).

This value is not HTML-entity escaped; if you are writing HTML as a literal string, you should still entity-escape ampersands and single/double quotes as appropriate.

This value is not URL-escaped either; instead use the linkAnchor property if you are constructing a URL to target this section.

The anchor attribute is based on the $line property, but does extra processing to turn it into a valid attribute:

  • strip all HTML tags,
  • normalizes section name
  • normalizes section name whitespace
  • decodes char references
  • makes it a valid HTML id attribute value (HTML5 / HTML4 based on $wgFragmentMode property)
  • dedupes (case-insensitively) identical anchors by adding "_$n" suffixes

◆ $codepointOffset

int Wikimedia\Parsoid\Core\SectionMetadata::$codepointOffset

Codepoint offset where the section shows up in wikitext; this is null if this section comes from a template, if it comes from a literal HTML <h_> tag, or otherwise doesn't correspond to a "preprocessor section".

Note
This is measured in codepoints, not bytes; you should use appropriate multi-byte aware string functions, not substr(). Similarly, in JavaScript, be careful not to confuse JavaScript UCS-2 "characters" with codepoints.

◆ $fromTitle

string Wikimedia\Parsoid\Core\SectionMetadata::$fromTitle

The title of the page that generated this heading.

For template-generated sections, this will be the template title. This string is in "prefixed DB key" format.

◆ $hLevel

int Wikimedia\Parsoid\Core\SectionMetadata::$hLevel

The heading tag level: a 1 here means an.

tag was used, a 2 means an tag was used, etc.

◆ $line

string Wikimedia\Parsoid\Core\SectionMetadata::$line

HTML heading of the section.

Only a narrow set of HTML tags are allowed here.

This starts with the parsed headline seen in wikitext and

  • replaces links with link text
  • processes extension strip markers
  • removes style, script tags
  • strips all HTML tags except the following tags (from Parser.php) . and (T10393) . (T28375) . (r105284) . <bdi> (T74884) . and (T37167) . and (T35715) . <q> (T251672) We strip any parameter from accepted tags, except dir="rtl|ltr" from , to allow setting directionality in toc items.

Note
This should be converted into the proper html variant.

◆ $linkAnchor

string Wikimedia\Parsoid\Core\SectionMetadata::$linkAnchor

Anchor URL fragment.

This is very similar to the $anchor property, but is appropriately URL-escaped to make it appropriate to use in constructing a URL fragment link. You should almost always prepend a # symbol to linkAnchor if you are using it correctly. You are still responsible for HTML-escaping the resulting URL if you are emitting this as an HTML attribute.

◆ $number

string Wikimedia\Parsoid\Core\SectionMetadata::$number

TOC number string (3.1.3, 4.5.2, etc.)

Note
This should be localized into the parser target language.

◆ $tocLevel

int Wikimedia\Parsoid\Core\SectionMetadata::$tocLevel

This is a one-indexed TOC level and the nesting level.

So, if a page has a H2-H4-H6, then, those levels 2,4,6 correspond to TOC-levels 1,2,3.


The documentation for this class was generated from the following file: