Parsoid
A bidirectional parser between wikitext and HTML5
|
Environment/Envelope class for Parsoid. More...
Public Member Functions | |
__construct (SiteConfig $siteConfig, PageConfig $pageConfig, DataAccess $dataAccess, ContentMetadataCollector $metadata, ?array $options=null) | |
profiling () | |
Is profiling enabled? | |
getCurrentProfile () | |
Get the profile at the top of the stack. | |
pushNewProfile () | |
New pipeline started. | |
popProfile () | |
Pipeline ended. | |
hasTraceFlags () | |
hasTraceFlag (string $flag) | |
Test which trace information to log. | |
hasDumpFlags () | |
hasDumpFlag (string $flag) | |
Test which state to dump. | |
writeDump (string $str) | |
Write out a string (because it was requested by dumpFlags) | |
getSiteConfig () | |
Get the site config. | |
getPageConfig () | |
Get the page config. | |
getDataAccess () | |
Get the data access object. | |
getMetadata () | |
Return the ContentMetadataCollector. | |
getTOCData () | |
Return the Table of Contents information for the article. | |
nativeTemplateExpansionEnabled () | |
getUID () | |
Get the current uid counter value. | |
getFID () | |
Get the current fragment id counter value. | |
getWrapSections () | |
Whether <section> wrappers should be added. | |
getPipelineFactory () | |
Get the pipeline factory. | |
getRequestOffsetType () | |
Return the external format of character offsets in source ranges. | |
getCurrentOffsetType () | |
Return the current format of character offsets in source ranges. | |
setCurrentOffsetType (string $offsetType) | |
Update the current offset type. | |
getContextTitle () | |
Return the title from the PageConfig, as a Parsoid title. | |
resolveTitle (string $str, bool $resolveOnly=false) | |
Resolve strings that are page-fragments or subpage references with respect to the current page name. | |
normalizedTitleKey (string $str, bool $noExceptions=false, bool $ignoreFragment=false) | |
Get normalized title key for a title string. | |
makeTitleFromText (string $str, ?int $defaultNs=null, bool $noExceptions=false) | |
Create a Title object. | |
makeTitleFromURLDecodedStr (string $str, ?int $defaultNs=null, bool $noExceptions=false) | |
Create a Title object. | |
makeLink (Title $title) | |
Make a link to a Title. | |
isValidLinkTarget ( $href) | |
Test if an href attribute value could be a valid link target. | |
generateUID () | |
Generate a new uid. | |
newObjectId () | |
Generate a new object id. | |
generateAnnotationUID () | |
Generate a new annotation uid. | |
newAnnotationId () | |
Generate a new annotation id. | |
newAboutId () | |
Generate a new about id. | |
setDOMDiff ( $doc) | |
Store reference to DOM diff document. | |
getDOMDiff () | |
Return reference to DOM diff document. | |
newFragmentId () | |
Generate a new fragment id. | |
setupTopLevelDoc (?Document $topLevelDoc=null) | |
When an environment is constructed, we initialize a document (and RemexPipeline) to be used throughout the parse. | |
fetchRemexPipeline (bool $atTopLevel) | |
setVariable (string $variable, $state) | |
BehaviorSwitchHandler support function that adds a property named by $variable and sets it to $state. | |
setBehaviorSwitch (string $switch, $state) | |
Record a behavior switch. | |
getBehaviorSwitch (string $switch, $default=null) | |
Fetch the state of a previously-recorded behavior switch. | |
getDOMFragmentMap () | |
getDOMFragment (string $id) | |
setDOMFragment (string $id, DocumentFragment $forest) | |
removeDOMFragment (string $id) | |
recordLint (string $type, array $lintData) | |
Record a lint. | |
getLints () | |
Retrieve recorded lints. | |
setLints (array $lints) | |
Init lints to the passed array. | |
log (string $prefix,... $args) | |
bumpWt2HtmlResourceUse (string $resource, int $count=1) | |
Bump usage of some limited parser resource (ex: tokens, # transclusions, # list items, etc.) | |
compareWt2HtmlLimit (string $resource, int $n) | |
bumpHtml2WtResourceUse (string $resource, int $count=1) | |
Bump usage of some limited serializer resource (ex: html size) | |
getContentHandler (?string &$contentmodel=null) | |
Get an appropriate content handler, given a contentmodel. | |
langConverterEnabled () | |
Is the language converter enabled on this page? | |
getInputContentVersion () | |
The HTML content version of the input document (for html2wt and html2html conversions). | |
getOutputContentVersion () | |
The HTML content version of the input document (for html2wt and html2html conversions). | |
getHtmlVariantLanguageBcp47 () | |
If non-null, the language variant used for Parsoid HTML; we convert to this if wt2html, or from this (if html2wt). | |
getWtVariantLanguageBcp47 () | |
If non-null, the language variant to be used for wikitext. | |
getSkipLanguageConversionPass () | |
htmlVary () | |
Determine appropriate vary headers for the HTML form of this page. | |
htmlContentLanguageBcp47 () | |
Determine an appropriate content-language for the HTML form of this page. | |
getExternalLinkAttribs (string $url) | |
Get an array of attributes to apply to an anchor linking to $url. | |
Environment/Envelope class for Parsoid.
Carries around the SiteConfig and PageConfig during an operation and provides certain other services.
Wikimedia\Parsoid\Config\Env::__construct | ( | SiteConfig | $siteConfig, |
PageConfig | $pageConfig, | ||
DataAccess | $dataAccess, | ||
ContentMetadataCollector | $metadata, | ||
?array | $options = null ) |
SiteConfig | $siteConfig | |
PageConfig | $pageConfig | |
DataAccess | $dataAccess | |
ContentMetadataCollector | $metadata | |
?array | $options |
|
Wikimedia\Parsoid\Config\Env::bumpHtml2WtResourceUse | ( | string | $resource, |
int | $count = 1 ) |
Bump usage of some limited serializer resource (ex: html size)
string | $resource | |
int | $count | How much of the resource is used? (defaults to 1) |
ResourceLimitExceededException |
Wikimedia\Parsoid\Config\Env::bumpWt2HtmlResourceUse | ( | string | $resource, |
int | $count = 1 ) |
Bump usage of some limited parser resource (ex: tokens, # transclusions, # list items, etc.)
string | $resource | |
int | $count | How much of the resource is used? |
null
if the limit was already reached, false
when exceeded Wikimedia\Parsoid\Config\Env::compareWt2HtmlLimit | ( | string | $resource, |
int | $n ) |
string | $resource | |
int | $n |
false
when exceeded Wikimedia\Parsoid\Config\Env::generateAnnotationUID | ( | ) |
Generate a new annotation uid.
Wikimedia\Parsoid\Config\Env::generateUID | ( | ) |
Generate a new uid.
Wikimedia\Parsoid\Config\Env::getBehaviorSwitch | ( | string | $switch, |
$default = null ) |
Fetch the state of a previously-recorded behavior switch.
string | $switch | Switch name |
mixed | $default | Default value if the switch was never set |
Wikimedia\Parsoid\Config\Env::getContentHandler | ( | ?string & | $contentmodel = null | ) |
Get an appropriate content handler, given a contentmodel.
?string | &$contentmodel | An optional content model which will override whatever the source specifies. It gets set to the handler which is used. |
Wikimedia\Parsoid\Config\Env::getContextTitle | ( | ) |
Return the title from the PageConfig, as a Parsoid title.
Wikimedia\Parsoid\Config\Env::getCurrentOffsetType | ( | ) |
Return the current format of character offsets in source ranges.
This allows us to track whether the internal byte offsets have been converted to the external format (as returned by getRequestOffsetType
) yet.
Wikimedia\Parsoid\Config\Env::getCurrentProfile | ( | ) |
Get the profile at the top of the stack.
FIXME: This implicitly assumes sequential in-order processing This wouldn't have worked in Parsoid/JS and may not work in the future depending on how / if we restructure the pipeline for concurrency, etc.
Wikimedia\Parsoid\Config\Env::getDataAccess | ( | ) |
Get the data access object.
Wikimedia\Parsoid\Config\Env::getDOMDiff | ( | ) |
Return reference to DOM diff document.
Wikimedia\Parsoid\Config\Env::getDOMFragment | ( | string | $id | ) |
string | $id | Fragment id |
Wikimedia\Parsoid\Config\Env::getDOMFragmentMap | ( | ) |
Wikimedia\Parsoid\Config\Env::getFID | ( | ) |
Get the current fragment id counter value.
Wikimedia\Parsoid\Config\Env::getHtmlVariantLanguageBcp47 | ( | ) |
If non-null, the language variant used for Parsoid HTML; we convert to this if wt2html, or from this (if html2wt).
Wikimedia\Parsoid\Config\Env::getInputContentVersion | ( | ) |
The HTML content version of the input document (for html2wt and html2html conversions).
Wikimedia\Parsoid\Config\Env::getLints | ( | ) |
Retrieve recorded lints.
Wikimedia\Parsoid\Config\Env::getMetadata | ( | ) |
Return the ContentMetadataCollector.
Wikimedia\Parsoid\Config\Env::getOutputContentVersion | ( | ) |
The HTML content version of the input document (for html2wt and html2html conversions).
Wikimedia\Parsoid\Config\Env::getPageConfig | ( | ) |
Get the page config.
Wikimedia\Parsoid\Config\Env::getPipelineFactory | ( | ) |
Get the pipeline factory.
Wikimedia\Parsoid\Config\Env::getRequestOffsetType | ( | ) |
Return the external format of character offsets in source ranges.
Internally we always keep DomSourceRange and SourceRange information as UTF-8 byte offsets for efficiency (matches the native string representation), but for external use we can convert these to other formats when we output wt2html or input for html2wt.
Wikimedia\Parsoid\Config\Env::getSiteConfig | ( | ) |
Wikimedia\Parsoid\Config\Env::getTOCData | ( | ) |
Return the Table of Contents information for the article.
Wikimedia\Parsoid\Config\Env::getUID | ( | ) |
Get the current uid counter value.
Wikimedia\Parsoid\Config\Env::getWrapSections | ( | ) |
Whether <section>
wrappers should be added.
Wikimedia\Parsoid\Config\Env::getWtVariantLanguageBcp47 | ( | ) |
If non-null, the language variant to be used for wikitext.
If null, heuristics will be used to identify the original wikitext variant in wt2html mode, and in html2wt mode new or edited HTML will be left unconverted.
Wikimedia\Parsoid\Config\Env::hasDumpFlag | ( | string | $flag | ) |
Test which state to dump.
string | $flag | Flag name. |
Wikimedia\Parsoid\Config\Env::hasTraceFlag | ( | string | $flag | ) |
Test which trace information to log.
string | $flag | Flag name. |
Wikimedia\Parsoid\Config\Env::htmlContentLanguageBcp47 | ( | ) |
Determine an appropriate content-language for the HTML form of this page.
Wikimedia\Parsoid\Config\Env::htmlVary | ( | ) |
Determine appropriate vary headers for the HTML form of this page.
Wikimedia\Parsoid\Config\Env::isValidLinkTarget | ( | $href | ) |
Test if an href attribute value could be a valid link target.
string|(Token|string)[] | $href |
Wikimedia\Parsoid\Config\Env::langConverterEnabled | ( | ) |
Is the language converter enabled on this page?
Wikimedia\Parsoid\Config\Env::log | ( | string | $prefix, |
$args ) |
string | $prefix | |
mixed | ...$args |
Wikimedia\Parsoid\Config\Env::makeLink | ( | Title | $title | ) |
Make a link to a Title.
Title | $title |
Wikimedia\Parsoid\Config\Env::makeTitleFromText | ( | string | $str, |
?int | $defaultNs = null, | ||
bool | $noExceptions = false ) |
Create a Title object.
string | $str | URL-encoded text |
?int | $defaultNs | |
bool | $noExceptions |
Wikimedia\Parsoid\Config\Env::makeTitleFromURLDecodedStr | ( | string | $str, |
?int | $defaultNs = null, | ||
bool | $noExceptions = false ) |
Create a Title object.
string | $str | URL-decoded text |
?int | $defaultNs | |
bool | $noExceptions |
Wikimedia\Parsoid\Config\Env::newAboutId | ( | ) |
Generate a new about id.
Wikimedia\Parsoid\Config\Env::newAnnotationId | ( | ) |
Generate a new annotation id.
Wikimedia\Parsoid\Config\Env::newFragmentId | ( | ) |
Generate a new fragment id.
Wikimedia\Parsoid\Config\Env::newObjectId | ( | ) |
Generate a new object id.
Wikimedia\Parsoid\Config\Env::normalizedTitleKey | ( | string | $str, |
bool | $noExceptions = false, | ||
bool | $ignoreFragment = false ) |
Get normalized title key for a title string.
string | $str | Should be in url-decoded format. |
bool | $noExceptions | Return null instead of throwing exceptions. |
bool | $ignoreFragment | Ignore the fragment, if any. |
Wikimedia\Parsoid\Config\Env::popProfile | ( | ) |
Pipeline ended.
Pop profile.
Wikimedia\Parsoid\Config\Env::profiling | ( | ) |
Is profiling enabled?
Wikimedia\Parsoid\Config\Env::pushNewProfile | ( | ) |
New pipeline started.
Push profile.
Wikimedia\Parsoid\Config\Env::recordLint | ( | string | $type, |
array | $lintData ) |
Record a lint.
string | $type | Lint type key |
array | $lintData | Data for the lint.
|
Wikimedia\Parsoid\Config\Env::resolveTitle | ( | string | $str, |
bool | $resolveOnly = false ) |
Resolve strings that are page-fragments or subpage references with respect to the current page name.
string | $str | Page fragment or subpage reference. Not URL encoded. |
bool | $resolveOnly | If true, only trim and add the current title to lone fragments. TODO: This parameter seems poorly named. |
Wikimedia\Parsoid\Config\Env::setBehaviorSwitch | ( | string | $switch, |
$state ) |
Record a behavior switch.
string | $switch | Switch name |
mixed | $state | Relevant state data to record |
Wikimedia\Parsoid\Config\Env::setCurrentOffsetType | ( | string | $offsetType | ) |
Update the current offset type.
Only Parsoid\Wt2Html\PP\Processors\ConvertOffsets should be doing this.
string | $offsetType | 'byte', 'ucs2', or 'char' |
Wikimedia\Parsoid\Config\Env::setDOMDiff | ( | $doc | ) |
Store reference to DOM diff document.
Document | $doc |
Wikimedia\Parsoid\Config\Env::setDOMFragment | ( | string | $id, |
DocumentFragment | $forest ) |
string | $id | Fragment id |
DocumentFragment | $forest | DOM forest to store against the fragment id |
Wikimedia\Parsoid\Config\Env::setLints | ( | array | $lints | ) |
Init lints to the passed array.
FIXME: This is currently needed to reset lints after converting DSR offsets because of ordering of DOM passes. So, in reality, there should be no real use case for setting this anywhere else but from that single callsite.
array | $lints |
Wikimedia\Parsoid\Config\Env::setupTopLevelDoc | ( | ?Document | $topLevelDoc = null | ) |
When an environment is constructed, we initialize a document (and RemexPipeline) to be used throughout the parse.
?Document | $topLevelDoc |
Wikimedia\Parsoid\Config\Env::setVariable | ( | string | $variable, |
$state ) |
BehaviorSwitchHandler support function that adds a property named by $variable and sets it to $state.
string | $variable | |
mixed | $state |
Wikimedia\Parsoid\Config\Env::writeDump | ( | string | $str | ) |
Write out a string (because it was requested by dumpFlags)
string | $str |