Parsoid
A bidirectional parser between wikitext and HTML5
Loading...
Searching...
No Matches
Wikimedia\Parsoid\Config\Env Class Reference

Environment/Envelope class for Parsoid. More...

+ Inheritance diagram for Wikimedia\Parsoid\Config\Env:

Public Member Functions

 __construct (SiteConfig $siteConfig, PageConfig $pageConfig, DataAccess $dataAccess, ContentMetadataCollector $metadata, ?array $options=null)
 
 profiling ()
 Is profiling enabled?
 
 getCurrentProfile ()
 Get the profile at the top of the stack.
 
 pushNewProfile ()
 New pipeline started.
 
 popProfile ()
 Pipeline ended.
 
 hasTraceFlags ()
 
 hasTraceFlag (string $flag)
 Test which trace information to log.
 
 hasDumpFlags ()
 
 hasDumpFlag (string $flag)
 Test which state to dump.
 
 writeDump (string $str)
 Write out a string (because it was requested by dumpFlags)
 
 getSiteConfig ()
 Get the site config.
 
 getPageConfig ()
 Get the page config.
 
 getDataAccess ()
 Get the data access object.
 
 getMetadata ()
 Return the ContentMetadataCollector.
 
 getTOCData ()
 Return the Table of Contents information for the article.
 
 nativeTemplateExpansionEnabled ()
 
 getUID ()
 Get the current uid counter value.
 
 getFID ()
 Get the current fragment id counter value.
 
 getWrapSections ()
 Whether <section> wrappers should be added.
 
 getPipelineFactory ()
 Get the pipeline factory.
 
 getRequestOffsetType ()
 Return the external format of character offsets in source ranges.
 
 getCurrentOffsetType ()
 Return the current format of character offsets in source ranges.
 
 setCurrentOffsetType (string $offsetType)
 Update the current offset type.
 
 getContextTitle ()
 Return the title from the PageConfig, as a Parsoid title.
 
 resolveTitle (string $str, bool $resolveOnly=false)
 Resolve strings that are page-fragments or subpage references with respect to the current page name.
 
 normalizedTitleKey (string $str, bool $noExceptions=false, bool $ignoreFragment=false)
 Get normalized title key for a title string.
 
 makeTitleFromText (string $str, ?int $defaultNs=null, bool $noExceptions=false)
 Create a Title object.
 
 makeTitleFromURLDecodedStr (string $str, ?int $defaultNs=null, bool $noExceptions=false)
 Create a Title object.
 
 makeLink (Title $title)
 Make a link to a Title.
 
 isValidLinkTarget ( $href)
 Test if an href attribute value could be a valid link target.
 
 generateUID ()
 Generate a new uid.
 
 newObjectId ()
 Generate a new object id.
 
 generateAnnotationUID ()
 Generate a new annotation uid.
 
 newAnnotationId ()
 Generate a new annotation id.
 
 newAboutId ()
 Generate a new about id.
 
 setDOMDiff ( $doc)
 Store reference to DOM diff document.
 
 getDOMDiff ()
 Return reference to DOM diff document.
 
 newFragmentId ()
 Generate a new fragment id.
 
 setupTopLevelDoc (?Document $topLevelDoc=null)
 When an environment is constructed, we initialize a document (and RemexPipeline) to be used throughout the parse.
 
 fetchRemexPipeline (bool $atTopLevel)
 
 setVariable (string $variable, $state)
 BehaviorSwitchHandler support function that adds a property named by $variable and sets it to $state.
 
 setBehaviorSwitch (string $switch, $state)
 Record a behavior switch.
 
 getBehaviorSwitch (string $switch, $default=null)
 Fetch the state of a previously-recorded behavior switch.
 
 getDOMFragmentMap ()
 
 getDOMFragment (string $id)
 
 setDOMFragment (string $id, DocumentFragment $forest)
 
 removeDOMFragment (string $id)
 
 recordLint (string $type, array $lintData)
 Record a lint.
 
 getLints ()
 Retrieve recorded lints.
 
 setLints (array $lints)
 Init lints to the passed array.
 
 log (string $prefix,... $args)
 
 bumpWt2HtmlResourceUse (string $resource, int $count=1)
 Bump usage of some limited parser resource (ex: tokens, # transclusions, # list items, etc.)
 
 compareWt2HtmlLimit (string $resource, int $n)
 
 bumpHtml2WtResourceUse (string $resource, int $count=1)
 Bump usage of some limited serializer resource (ex: html size)
 
 getContentHandler (?string &$contentmodel=null)
 Get an appropriate content handler, given a contentmodel.
 
 langConverterEnabled ()
 Is the language converter enabled on this page?
 
 getInputContentVersion ()
 The HTML content version of the input document (for html2wt and html2html conversions).
 
 getOutputContentVersion ()
 The HTML content version of the input document (for html2wt and html2html conversions).
 
 getHtmlVariantLanguageBcp47 ()
 If non-null, the language variant used for Parsoid HTML; we convert to this if wt2html, or from this (if html2wt).
 
 getWtVariantLanguageBcp47 ()
 If non-null, the language variant to be used for wikitext.
 
 getSkipLanguageConversionPass ()
 
 htmlVary ()
 Determine appropriate vary headers for the HTML form of this page.
 
 htmlContentLanguageBcp47 ()
 Determine an appropriate content-language for the HTML form of this page.
 
 getExternalLinkAttribs (string $url)
 Get an array of attributes to apply to an anchor linking to $url.
 
 getLinterConfig ()
 
 linting (?string $type=null)
 Whether to enable linter Backend.
 

Public Attributes

 $topFrame
 
 $logLinterData = false
 
 $styleTagKeys = []
 
 $pageBundle
 
 $discardDataParsoid = false
 
 $hasAnnotations
 
 $pageCache = []
 
 $transclusionCache = []
 
 $mediaCache = []
 
 $extensionCache = []
 
 $topLevelDoc
 

Detailed Description

Environment/Envelope class for Parsoid.

Carries around the SiteConfig and PageConfig during an operation and provides certain other services.

Constructor & Destructor Documentation

◆ __construct()

Wikimedia\Parsoid\Config\Env::__construct ( SiteConfig $siteConfig,
PageConfig $pageConfig,
DataAccess $dataAccess,
ContentMetadataCollector $metadata,
?array $options = null )
Parameters
SiteConfig$siteConfig
PageConfig$pageConfig
DataAccess$dataAccess
ContentMetadataCollector$metadata
?array$options
  • wrapSections: (bool) Whether <section> wrappers should be added.
  • pageBundle: (bool) Sets ids on nodes and stores data-* attributes in a JSON blob.
  • traceFlags: (array) Flags indicating which components need to be traced
  • dumpFlags: (bool[]) Dump flags
  • debugFlags: (bool[]) Debug flags
  • nativeTemplateExpansion: boolean
  • discardDataParsoid: boolean
  • offsetType: 'byte' (default), 'ucs2', 'char' See Parsoid\Wt2Html\PP\Processors\ConvertOffsets.
  • logLinterData: (bool) Should we log linter data if linting is enabled?
  • linterOverrides: (array) Override the site linting configs.
  • skipLanguageConversionPass: (bool) Should we skip the language conversion pass? (defaults to false)
  • htmlVariantLanguage: Bcp47Code|null If non-null, the language variant used for Parsoid HTML as a BCP 47 object. We convert to this if wt2html, or from this if html2wt.
  • wtVariantLanguage: Bcp47Code|null If non-null, the language variant to be used for wikitext as a BCP 47 object. If null, heuristics will be used to identify the original wikitext variant in wt2html mode, and in html2wt mode new or edited HTML will be left unconverted.
  • logLevels: (string[]) Levels to log
  • topLevelDoc: (Document) Set explicitly when serializing otherwise it gets initialized for parsing.

Member Function Documentation

◆ bumpHtml2WtResourceUse()

Wikimedia\Parsoid\Config\Env::bumpHtml2WtResourceUse ( string $resource,
int $count = 1 )

Bump usage of some limited serializer resource (ex: html size)

Parameters
string$resource
int$countHow much of the resource is used? (defaults to 1)
Exceptions
ResourceLimitExceededException

◆ bumpWt2HtmlResourceUse()

Wikimedia\Parsoid\Config\Env::bumpWt2HtmlResourceUse ( string $resource,
int $count = 1 )

Bump usage of some limited parser resource (ex: tokens, # transclusions, # list items, etc.)

Parameters
string$resource
int$countHow much of the resource is used?
Returns
?bool Returns null if the limit was already reached, false when exceeded

◆ compareWt2HtmlLimit()

Wikimedia\Parsoid\Config\Env::compareWt2HtmlLimit ( string $resource,
int $n )
Parameters
string$resource
int$n
Returns
bool Return false when exceeded

◆ generateAnnotationUID()

Wikimedia\Parsoid\Config\Env::generateAnnotationUID ( )

Generate a new annotation uid.

Returns
int

◆ generateUID()

Wikimedia\Parsoid\Config\Env::generateUID ( )

Generate a new uid.

Returns
int

◆ getBehaviorSwitch()

Wikimedia\Parsoid\Config\Env::getBehaviorSwitch ( string $switch,
$default = null )

Fetch the state of a previously-recorded behavior switch.

Todo
Does this belong here, or on some equivalent to MediaWiki's ParserOutput?
Parameters
string$switchSwitch name
mixed$defaultDefault value if the switch was never set
Returns
mixed State data that was previously passed to setBehaviorSwitch(), or $default

◆ getContentHandler()

Wikimedia\Parsoid\Config\Env::getContentHandler ( ?string & $contentmodel = null)

Get an appropriate content handler, given a contentmodel.

Parameters
?string&$contentmodelAn optional content model which will override whatever the source specifies. It gets set to the handler which is used.
Returns
ContentModelHandler An appropriate content handler

◆ getContextTitle()

Wikimedia\Parsoid\Config\Env::getContextTitle ( )

Return the title from the PageConfig, as a Parsoid title.

Returns
Title

◆ getCurrentOffsetType()

Wikimedia\Parsoid\Config\Env::getCurrentOffsetType ( )

Return the current format of character offsets in source ranges.

This allows us to track whether the internal byte offsets have been converted to the external format (as returned by getRequestOffsetType) yet.

See also
Parsoid\Wt2Html\PP\Processors\ConvertOffsets
Returns
string 'byte', 'ucs2', or 'char'

◆ getCurrentProfile()

Wikimedia\Parsoid\Config\Env::getCurrentProfile ( )

Get the profile at the top of the stack.

FIXME: This implicitly assumes sequential in-order processing This wouldn't have worked in Parsoid/JS and may not work in the future depending on how / if we restructure the pipeline for concurrency, etc.

Returns
Profile

◆ getDataAccess()

Wikimedia\Parsoid\Config\Env::getDataAccess ( )

Get the data access object.

Returns
DataAccess

◆ getDOMDiff()

Wikimedia\Parsoid\Config\Env::getDOMDiff ( )

Return reference to DOM diff document.

Returns
Document|null

◆ getDOMFragment()

Wikimedia\Parsoid\Config\Env::getDOMFragment ( string $id)
Parameters
string$idFragment id
Returns
DocumentFragment

◆ getDOMFragmentMap()

Wikimedia\Parsoid\Config\Env::getDOMFragmentMap ( )
Returns
array<string,DocumentFragment>

◆ getFID()

Wikimedia\Parsoid\Config\Env::getFID ( )

Get the current fragment id counter value.

Returns
int

◆ getHtmlVariantLanguageBcp47()

Wikimedia\Parsoid\Config\Env::getHtmlVariantLanguageBcp47 ( )

If non-null, the language variant used for Parsoid HTML; we convert to this if wt2html, or from this (if html2wt).

Returns
?Bcp47Code a BCP-47 language code

◆ getInputContentVersion()

Wikimedia\Parsoid\Config\Env::getInputContentVersion ( )

The HTML content version of the input document (for html2wt and html2html conversions).

See also
https://www.mediawiki.org/wiki/Parsoid/API#Content_Negotiation
https://www.mediawiki.org/wiki/Specs/HTML#Versioning
Returns
string A semver version number

◆ getLinterConfig()

Wikimedia\Parsoid\Config\Env::getLinterConfig ( )
Returns
array

◆ getLints()

Wikimedia\Parsoid\Config\Env::getLints ( )

Retrieve recorded lints.

Returns
array[]

◆ getMetadata()

Wikimedia\Parsoid\Config\Env::getMetadata ( )

Return the ContentMetadataCollector.

Returns
ContentMetadataCollector

◆ getOutputContentVersion()

Wikimedia\Parsoid\Config\Env::getOutputContentVersion ( )

The HTML content version of the input document (for html2wt and html2html conversions).

See also
https://www.mediawiki.org/wiki/Parsoid/API#Content_Negotiation
https://www.mediawiki.org/wiki/Specs/HTML#Versioning
Returns
string A semver version number

◆ getPageConfig()

Wikimedia\Parsoid\Config\Env::getPageConfig ( )

Get the page config.

Returns
PageConfig

◆ getPipelineFactory()

Wikimedia\Parsoid\Config\Env::getPipelineFactory ( )

Get the pipeline factory.

Returns
ParserPipelineFactory

◆ getRequestOffsetType()

Wikimedia\Parsoid\Config\Env::getRequestOffsetType ( )

Return the external format of character offsets in source ranges.

Internally we always keep DomSourceRange and SourceRange information as UTF-8 byte offsets for efficiency (matches the native string representation), but for external use we can convert these to other formats when we output wt2html or input for html2wt.

See also
Parsoid\Wt2Html\PP\Processors\ConvertOffsets
Returns
string 'byte', 'ucs2', or 'char'

◆ getSiteConfig()

Wikimedia\Parsoid\Config\Env::getSiteConfig ( )

Get the site config.

Returns
SiteConfig

Reimplemented in Wikimedia\Parsoid\Config\Api\Env.

◆ getTOCData()

Wikimedia\Parsoid\Config\Env::getTOCData ( )

Return the Table of Contents information for the article.

Returns
TOCData

◆ getUID()

Wikimedia\Parsoid\Config\Env::getUID ( )

Get the current uid counter value.

Returns
int

◆ getWrapSections()

Wikimedia\Parsoid\Config\Env::getWrapSections ( )

Whether <section> wrappers should be added.

Todo
Does this actually belong here? Should it be a behavior switch?
Returns
bool

◆ getWtVariantLanguageBcp47()

Wikimedia\Parsoid\Config\Env::getWtVariantLanguageBcp47 ( )

If non-null, the language variant to be used for wikitext.

If null, heuristics will be used to identify the original wikitext variant in wt2html mode, and in html2wt mode new or edited HTML will be left unconverted.

Returns
?Bcp47Code a BCP-47 language code

◆ hasDumpFlag()

Wikimedia\Parsoid\Config\Env::hasDumpFlag ( string $flag)

Test which state to dump.

Parameters
string$flagFlag name.
Returns
bool

◆ hasTraceFlag()

Wikimedia\Parsoid\Config\Env::hasTraceFlag ( string $flag)

Test which trace information to log.

Parameters
string$flagFlag name.
Returns
bool

◆ htmlContentLanguageBcp47()

Wikimedia\Parsoid\Config\Env::htmlContentLanguageBcp47 ( )

Determine an appropriate content-language for the HTML form of this page.

Returns
Bcp47Code a BCP-47 language code.

◆ htmlVary()

Wikimedia\Parsoid\Config\Env::htmlVary ( )

Determine appropriate vary headers for the HTML form of this page.

Returns
string

◆ isValidLinkTarget()

Wikimedia\Parsoid\Config\Env::isValidLinkTarget ( $href)

Test if an href attribute value could be a valid link target.

Parameters
string|(Token|string)[]$href
Returns
bool

◆ langConverterEnabled()

Wikimedia\Parsoid\Config\Env::langConverterEnabled ( )

Is the language converter enabled on this page?

Returns
bool

◆ linting()

Wikimedia\Parsoid\Config\Env::linting ( ?string $type = null)

Whether to enable linter Backend.

Consults the allow list and block list from ::getLinterConfig().

Parameters
null$typeIf $type is null or omitted, returns true if any linting type is enabled; otherwise returns true only if the specified linting type is enabled.
Returns
bool If $type is null or omitted, returns true if any linting type is enabled; otherwise returns true only if the specified linting type is enabled.

◆ log()

Wikimedia\Parsoid\Config\Env::log ( string $prefix,
$args )
Parameters
string$prefix
mixed...$args

◆ makeLink()

Wikimedia\Parsoid\Config\Env::makeLink ( Title $title)

Make a link to a Title.

Parameters
Title$title
Returns
string

◆ makeTitleFromText()

Wikimedia\Parsoid\Config\Env::makeTitleFromText ( string $str,
?int $defaultNs = null,
bool $noExceptions = false )

Create a Title object.

See also
Title::newFromURL in MediaWiki
Parameters
string$strURL-encoded text
?int$defaultNs
bool$noExceptions
Returns
Title|null

◆ makeTitleFromURLDecodedStr()

Wikimedia\Parsoid\Config\Env::makeTitleFromURLDecodedStr ( string $str,
?int $defaultNs = null,
bool $noExceptions = false )

Create a Title object.

See also
Title::newFromText in MediaWiki
Parameters
string$strURL-decoded text
?int$defaultNs
bool$noExceptions
Returns
Title|null

◆ newAboutId()

Wikimedia\Parsoid\Config\Env::newAboutId ( )

Generate a new about id.

Returns
string

◆ newAnnotationId()

Wikimedia\Parsoid\Config\Env::newAnnotationId ( )

Generate a new annotation id.

Returns
string

◆ newFragmentId()

Wikimedia\Parsoid\Config\Env::newFragmentId ( )

Generate a new fragment id.

Returns
string

◆ newObjectId()

Wikimedia\Parsoid\Config\Env::newObjectId ( )

Generate a new object id.

Returns
string

◆ normalizedTitleKey()

Wikimedia\Parsoid\Config\Env::normalizedTitleKey ( string $str,
bool $noExceptions = false,
bool $ignoreFragment = false )

Get normalized title key for a title string.

Parameters
string$strShould be in url-decoded format.
bool$noExceptionsReturn null instead of throwing exceptions.
bool$ignoreFragmentIgnore the fragment, if any.
Returns
string|null Normalized title key for a title string (or null for invalid titles).

◆ popProfile()

Wikimedia\Parsoid\Config\Env::popProfile ( )

Pipeline ended.

Pop profile.

Returns
Profile

◆ profiling()

Wikimedia\Parsoid\Config\Env::profiling ( )

Is profiling enabled?

Returns
bool

◆ pushNewProfile()

Wikimedia\Parsoid\Config\Env::pushNewProfile ( )

New pipeline started.

Push profile.

Returns
Profile

◆ recordLint()

Wikimedia\Parsoid\Config\Env::recordLint ( string $type,
array $lintData )

Record a lint.

Parameters
string$typeLint type key
array$lintDataData for the lint.
  • dsr: (SourceRange)
  • params: (array)
  • templateInfo: (array|null)

◆ resolveTitle()

Wikimedia\Parsoid\Config\Env::resolveTitle ( string $str,
bool $resolveOnly = false )

Resolve strings that are page-fragments or subpage references with respect to the current page name.

Parameters
string$strPage fragment or subpage reference. Not URL encoded.
bool$resolveOnlyIf true, only trim and add the current title to lone fragments. TODO: This parameter seems poorly named.
Returns
string Resolved title

◆ setBehaviorSwitch()

Wikimedia\Parsoid\Config\Env::setBehaviorSwitch ( string $switch,
$state )

Record a behavior switch.

Todo
Does this belong here, or on some equivalent to MediaWiki's ParserOutput?
Parameters
string$switchSwitch name
mixed$stateRelevant state data to record

◆ setCurrentOffsetType()

Wikimedia\Parsoid\Config\Env::setCurrentOffsetType ( string $offsetType)

Update the current offset type.

Only Parsoid\Wt2Html\PP\Processors\ConvertOffsets should be doing this.

Parameters
string$offsetType'byte', 'ucs2', or 'char'

◆ setDOMDiff()

Wikimedia\Parsoid\Config\Env::setDOMDiff ( $doc)

Store reference to DOM diff document.

Parameters
Document$doc

◆ setDOMFragment()

Wikimedia\Parsoid\Config\Env::setDOMFragment ( string $id,
DocumentFragment $forest )
Parameters
string$idFragment id
DocumentFragment$forestDOM forest to store against the fragment id

◆ setLints()

Wikimedia\Parsoid\Config\Env::setLints ( array $lints)

Init lints to the passed array.

FIXME: This is currently needed to reset lints after converting DSR offsets because of ordering of DOM passes. So, in reality, there should be no real use case for setting this anywhere else but from that single callsite.

Parameters
array$lints

◆ setupTopLevelDoc()

Wikimedia\Parsoid\Config\Env::setupTopLevelDoc ( ?Document $topLevelDoc = null)

When an environment is constructed, we initialize a document (and RemexPipeline) to be used throughout the parse.

Parameters
?Document$topLevelDoc

◆ setVariable()

Wikimedia\Parsoid\Config\Env::setVariable ( string $variable,
$state )

BehaviorSwitchHandler support function that adds a property named by $variable and sets it to $state.

Deprecated
Use setBehaviorSwitch() instead.
Parameters
string$variable
mixed$state

◆ writeDump()

Wikimedia\Parsoid\Config\Env::writeDump ( string $str)

Write out a string (because it was requested by dumpFlags)

Parameters
string$str

The documentation for this class was generated from the following file: