Parsoid
A bidirectional parser between wikitext and HTML5
|
Public Member Functions | |||||||||||||
__construct (array $opts) | |||||||||||||
getLinterSiteConfig () | |||||||||||||
Return the desired linter configuration. | |||||||||||||
allowedExternalImagePrefixes () | |||||||||||||
Allowed external image URL prefixes. | |||||||||||||
baseURI () | |||||||||||||
Site base URI. | |||||||||||||
exportMetadataToHeadBcp47 (Document $document, ContentMetadataCollector $metadata, string $defaultTitle, Bcp47Code $lang) | |||||||||||||
Export content metadata via meta tags (and via a stylesheet for now to aid some clients).
| |||||||||||||
redirectRegexp () | |||||||||||||
A regexp matching the localized 'REDIRECT' marker for this wiki. | |||||||||||||
categoryRegexp () | |||||||||||||
A regexp matching the localized 'Category' prefix for this wiki. | |||||||||||||
bswRegexp () | |||||||||||||
A regexp matching localized behavior switches for this wiki. | |||||||||||||
canonicalNamespaceId (string $name) | |||||||||||||
Map a canonical namespace name to its index.
| |||||||||||||
namespaceId (string $name) | |||||||||||||
Map a namespace name to its index.
| |||||||||||||
namespaceName (int $ns) | |||||||||||||
Map a namespace index to its preferred name (with spaces, not underscores).
| |||||||||||||
namespaceHasSubpages (int $ns) | |||||||||||||
Test if a namespace has subpages.
| |||||||||||||
namespaceCase (int $ns) | |||||||||||||
Return namespace case setting.
| |||||||||||||
specialPageLocalName (string $alias) | |||||||||||||
Get the default local name for a special page.
| |||||||||||||
setInterwikiMagic (bool $val) | |||||||||||||
interwikiMagic () | |||||||||||||
Treat language links as magic connectors, not inline links. | |||||||||||||
interwikiMap () | |||||||||||||
Interwiki link data. | |||||||||||||
iwp () | |||||||||||||
Wiki identifier, for cache keys. | |||||||||||||
legalTitleChars () | |||||||||||||
Legal title characters. | |||||||||||||
linkPrefixRegex () | |||||||||||||
Link prefix regular expression. | |||||||||||||
linkTrailRegex () | |||||||||||||
Link trail regular expression. | |||||||||||||
langBcp47 () | |||||||||||||
Wiki language code. | |||||||||||||
mainPageLinkTarget () | |||||||||||||
Main page title, as LinkTarget. | |||||||||||||
getMWConfigValue (string $key) | |||||||||||||
Lookup config.
| |||||||||||||
rtl () | |||||||||||||
Whether the wiki language is right-to-left. | |||||||||||||
langConverterEnabledBcp47 (Bcp47Code $lang) | |||||||||||||
Whether language converter is enabled for the specified language.
| |||||||||||||
script () | |||||||||||||
The URL path to index.php. | |||||||||||||
scriptpath () | |||||||||||||
FIXME: This is only used to compute the modules path below and maybe shouldn't be exposed. | |||||||||||||
server () | |||||||||||||
The base URL of the server. | |||||||||||||
timezoneOffset () | |||||||||||||
The wiki's time zone offset. | |||||||||||||
variantsFor (Bcp47Code $lang) | |||||||||||||
Language variant information for the given language (or null if unknown).
| |||||||||||||
widthOption () | |||||||||||||
Default thumbnail width. | |||||||||||||
getMagicWordMatcher (string $id) | |||||||||||||
Get a regexp matching a localized magic word, given its id.FIXME: misleading function name
| |||||||||||||
getParameterizedAliasMatcher (array $words) | |||||||||||||
Get a matcher function for fetching values out of interpolated magic words, ie those with $1 in their aliases.The matcher takes a string and returns null if it doesn't match any of the words, or an associative array if it did match:
| |||||||||||||
getMaxTemplateDepth () | |||||||||||||
Get the maximum template depth.
| |||||||||||||
fakeTimestamp () | |||||||||||||
Fake timestamp, for unit tests. | |||||||||||||
setFakeTimestamp (?int $ts) | |||||||||||||
Set the fake timestamp for testing. | |||||||||||||
setTimezoneOffset (int $offset) | |||||||||||||
Set the timezone offset for testing. | |||||||||||||
scrubBidiChars () | |||||||||||||
If enabled, bidi chars adjacent to category links will be stripped in the html -> wt serialization pass. | |||||||||||||
getNoFollowConfig () | |||||||||||||
getExternalLinkTarget () | |||||||||||||
| |||||||||||||
metrics () | |||||||||||||
Statistics aggregator, for counting and timing.
| |||||||||||||
incrementCounter (string $name, array $labels, float $amount=1) | |||||||||||||
Increment a counter metric. | |||||||||||||
observeTiming (string $name, float $value, array $labels) | |||||||||||||
Record a timing metric. | |||||||||||||
Public Member Functions inherited from Wikimedia\Parsoid\Config\SiteConfig | |||||||||||||
registerExtensionModule ( $configOrSpec) | |||||||||||||
Register a Parsoid extension module. | |||||||||||||
unregisterExtensionModule (int $extId) | |||||||||||||
Unregister a Parsoid extension module. | |||||||||||||
getExtensionModules () | |||||||||||||
Return the set of Parsoid extension modules associated with this SiteConfig. | |||||||||||||
__construct () | |||||||||||||
Base constructor. | |||||||||||||
getObjectFactory () | |||||||||||||
Return an object factory to use when instantiating extensions. | |||||||||||||
tagNeedsNowikiStrippedInTagPF (string $lowerTagName) | |||||||||||||
getContentModelHandler (string $contentmodel) | |||||||||||||
Return a ContentModelHandler for the specified $contentmodel, if one is registered. | |||||||||||||
getAnnotationStrippers () | |||||||||||||
Returns all the annotationStrippers that are defined as annotation configuration. | |||||||||||||
isExtensionTag (string $name) | |||||||||||||
Determine whether a given name, which must have already been converted to lower case, is a valid extension tag name. | |||||||||||||
isAnnotationTag (string $tagName) | |||||||||||||
getAnnotationTags () | |||||||||||||
Get an array of defined annotation tags in lower case. | |||||||||||||
getExtensionTagNameMap () | |||||||||||||
Get an array of defined extension tags, with the lower case name in the key, and the value being arbitrary. | |||||||||||||
getExtTagConfig (string $tagName) | |||||||||||||
getExtTagImpl (string $tagName) | |||||||||||||
getExtDOMProcessors () | |||||||||||||
Return an array mapping extension name to an array of object factory specs for Ext\DOMProcessor objects. | |||||||||||||
getWt2HtmlLimits () | |||||||||||||
getHtml2WtLimits () | |||||||||||||
getLogger () | |||||||||||||
General log channel. | |||||||||||||
setLogger (?LoggerInterface $logger) | |||||||||||||
Set the log channel, for debugging. | |||||||||||||
galleryOptions () | |||||||||||||
Default gallery options for this wiki. | |||||||||||||
addHTMLTemplateParameters () | |||||||||||||
When processing template parameters, parse them to HTML and add it to the template parameters data. | |||||||||||||
relativeLinkPrefix () | |||||||||||||
Prefix for relative links. | |||||||||||||
bswPagePropRegexp () | |||||||||||||
Regex matching all double-underscore magic words. | |||||||||||||
namespaceIsTalk (int $ns) | |||||||||||||
Test if a namespace is a talk namespace. | |||||||||||||
ucfirst (string $str) | |||||||||||||
Uppercasing method for titles. | |||||||||||||
magicLinkEnabled (string $which) | |||||||||||||
Return true if the specified magic link syntax is enabled on this wiki. | |||||||||||||
interwikiMapNoNamespaces () | |||||||||||||
Interwiki link data, after removing items that conflict with namespace names. | |||||||||||||
interwikiMatcher (string $href) | |||||||||||||
Match interwiki URLs. | |||||||||||||
solTransparentWikitextRegexp () | |||||||||||||
A regex matching a line containing just whitespace, comments, and sol transparent links and behavior switches. | |||||||||||||
solTransparentWikitextNoWsRegexp (bool $addIncludes=false) | |||||||||||||
A regex matching a line containing just comments and sol transparent links and behavior switches. | |||||||||||||
mwAliases () | |||||||||||||
List all magic words by canonical name. | |||||||||||||
getMagicWordForFunctionHook (string $str) | |||||||||||||
Return canonical magic word for a function hook. | |||||||||||||
getMagicWordForVariable (string $str) | |||||||||||||
Return canonical magic word for a variable. | |||||||||||||
getMagicWordForMediaOption (string $word) | |||||||||||||
Return canonical magic word for a media option. | |||||||||||||
getMagicWordForBehaviorSwitch (string $word) | |||||||||||||
Return canonical magic word for a behavior switch. | |||||||||||||
isBehaviorSwitch (string $word) | |||||||||||||
Check if a string is a recognized behavior switch. | |||||||||||||
getMagicWordWT (string $word, string $suggest) | |||||||||||||
Convert the internal canonical magic word name to the wikitext alias. | |||||||||||||
getMediaPrefixParameterizedAliasMatcher () | |||||||||||||
Get a matcher function for fetching values out of interpolated magic words which are media prefix options. | |||||||||||||
getExtResourceURLPatternMatcher () | |||||||||||||
Matcher for ISBN/RFC/PMID URL patterns, returning the type and number. | |||||||||||||
linterEnabled () | |||||||||||||
makeExtResourceURL (array $match, string $href, string $content) | |||||||||||||
Serialize ISBN/RFC/PMID URL patterns. | |||||||||||||
getProtocolsRegex (bool $excludeProtRel=false) | |||||||||||||
Get a regex fragment matching URL protocols, quoted for an exclamation mark delimiter. | |||||||||||||
hasValidProtocol (string $potentialLink) | |||||||||||||
Matcher for valid protocols, must be anchored at start of string. | |||||||||||||
findValidProtocol (string $potentialLink) | |||||||||||||
Matcher for valid protocols, may occur at any point within string. | |||||||||||||
Protected Member Functions | ||||
linkTrail () | ||||
Return raw link trail regexp from config. | ||||
getVariableIDs () | ||||
haveComputedFunctionSynonyms () | ||||
Does the SiteConfig provide precomputed function synonyms? If no, the SiteConfig is expected to provide an implementation for updateFunctionSynonym. | ||||
updateFunctionSynonym (string $func, string $magicword, bool $caseSensitive) | ||||
getMagicWords () | ||||
getNonNativeExtensionTags () | ||||
Get an array of defined extension tags, with the lower case name in the key, the value arbitrary.This is the set of extension tags that are configured in M/W core. $coreExtModules may already be part of it, but eventually this distinction will disappear since all extension tags have to be defined against the Parsoid's extension API.
| ||||
getSpecialPageAliases (string $specialPage) | ||||
Return Special Page aliases for a special page name.
| ||||
getSpecialNSAliases () | ||||
Return name spaces aliases for the NS_SPECIAL namespace.
| ||||
getProtocols () | ||||
Get the list of valid protocols.
| ||||
Protected Member Functions inherited from Wikimedia\Parsoid\Config\SiteConfig | ||||
processExtensionModule (ExtensionModule $ext) | ||||
Register a Parsoid-compatible extension. | ||||
getExtConfig () | ||||
exportMetadataHelper (Document $document, string $modulesLoadURI, array $modules, array $moduleStyles, array $jsConfigVars, string $htmlTitle, Bcp47Code $lang) | ||||
Helper function to create <head> elements from metadata. | ||||
getFunctionSynonyms () | ||||
Get a list of precomputed function synonyms. | ||||
Protected Attributes | |
$namespacesWithSubpages = [] | |
$interwikiMap = [] | |
Protected Attributes inherited from Wikimedia\Parsoid\Config\SiteConfig | |
$mwAliases | |
$functionSynonyms | |
$interwikiMapNoNamespaces | |
$linkTrailRegex = false | |
$logger = null | |
$iwMatcherBatchSize = 4096 | |
$iwMatcher = null | |
$addHTMLTemplateParameters = false | |
$scrubBidiChars = false | |
$linterEnabled = false | |
$extConfig = null | |
$wt2htmlLimits | |
$html2wtLimits | |
Additional Inherited Members | |
Static Public Member Functions inherited from Wikimedia\Parsoid\Config\SiteConfig | |
static | createLogger (?string $filePath=null) |
Static Protected Member Functions inherited from Wikimedia\Parsoid\Config\SiteConfig | |
static | quoteTitleRe (string $s, string $delimiter='/') |
Quote a title regex. | |
Wikimedia\Parsoid\Mocks\MockSiteConfig::allowedExternalImagePrefixes | ( | ) |
Allowed external image URL prefixes.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::baseURI | ( | ) |
Site base URI.
This would be the URI found in <base href="..." />
.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::bswRegexp | ( | ) |
A regexp matching localized behavior switches for this wiki.
The regexp should be delimited, but should not have boundary anchors or capture groups.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::canonicalNamespaceId | ( | string | $name | ) |
Map a canonical namespace name to its index.
string | $name | all-lowercase and with underscores rather than spaces. |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::categoryRegexp | ( | ) |
A regexp matching the localized 'Category' prefix for this wiki.
The regexp should be delimited, but should not have boundary anchors or capture groups.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::exportMetadataToHeadBcp47 | ( | Document | $document, |
ContentMetadataCollector | $metadata, | ||
string | $defaultTitle, | ||
Bcp47Code | $lang ) |
Export content metadata via meta tags (and via a stylesheet for now to aid some clients).
Document | $document | |
ContentMetadataCollector | $metadata | |
string | $defaultTitle | The default title to display, as an unescaped string |
Bcp47Code | $lang | a BCP-47 language code |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::fakeTimestamp | ( | ) |
Fake timestamp, for unit tests.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::getExternalLinkTarget | ( | ) |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::getLinterSiteConfig | ( | ) |
Return the desired linter configuration.
These are heuristic values which have hardcoded defaults but could be overridden on a per-wiki basis.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::getMagicWordMatcher | ( | string | $id | ) |
Get a regexp matching a localized magic word, given its id.FIXME: misleading function name
string | $id |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::getMaxTemplateDepth | ( | ) |
Wikimedia\Parsoid\Mocks\MockSiteConfig::getMWConfigValue | ( | string | $key | ) |
Lookup config.
string | $key |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::getNoFollowConfig | ( | ) |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Get an array of defined extension tags, with the lower case name in the key, the value arbitrary.This is the set of extension tags that are configured in M/W core. $coreExtModules may already be part of it, but eventually this distinction will disappear since all extension tags have to be defined against the Parsoid's extension API.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::getParameterizedAliasMatcher | ( | array | $words | ) |
Get a matcher function for fetching values out of interpolated magic words, ie those with $1
in their aliases.The matcher takes a string and returns null if it doesn't match any of the words, or an associative array if it did match:
string[] | $words | Magic words to match |
$name is the canonical magic word name $re has patterns for matching aliases
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Get the list of valid protocols.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Return name spaces aliases for the NS_SPECIAL namespace.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Return Special Page aliases for a special page name.
string | $specialPage |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Does the SiteConfig provide precomputed function synonyms? If no, the SiteConfig is expected to provide an implementation for updateFunctionSynonym.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::incrementCounter | ( | string | $name, |
array | $labels, | ||
float | $amount = 1 ) |
Increment a counter metric.
string | $name | |
array | $labels | |
float | $amount |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::interwikiMagic | ( | ) |
Treat language links as magic connectors, not inline links.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::interwikiMap | ( | ) |
Interwiki link data.
Note that the order of the keys in this array is significant: if more than one prefix matches a given URL during html2wt conversion, the first match is used. If you want wikitech
to be used instead of labsconsole
, for example, the ‘'wikitech’=>[....]` key needs to enumerate first.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::iwp | ( | ) |
Wiki identifier, for cache keys.
Should match a key in mwApiMap()?
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::langBcp47 | ( | ) |
Wiki language code.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::langConverterEnabledBcp47 | ( | Bcp47Code | $lang | ) |
Whether language converter is enabled for the specified language.
Bcp47Code | $lang |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::legalTitleChars | ( | ) |
Legal title characters.
Regex is intended to match bytes, not Unicode characters.
[]
) Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::linkPrefixRegex | ( | ) |
Link prefix regular expression.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Return raw link trail regexp from config.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::linkTrailRegex | ( | ) |
Link trail regular expression.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::mainPageLinkTarget | ( | ) |
Main page title, as LinkTarget.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::metrics | ( | ) |
Statistics aggregator, for counting and timing.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::namespaceCase | ( | int | $ns | ) |
Return namespace case setting.
int | $ns |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::namespaceHasSubpages | ( | int | $ns | ) |
Test if a namespace has subpages.
int | $ns |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::namespaceId | ( | string | $name | ) |
Map a namespace name to its index.
string | $name | all-lowercase and with underscores rather than spaces. |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::namespaceName | ( | int | $ns | ) |
Map a namespace index to its preferred name (with spaces, not underscores).
int | $ns |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::observeTiming | ( | string | $name, |
float | $value, | ||
array | $labels ) |
Record a timing metric.
string | $name | |
float | $value | |
array | $labels |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::redirectRegexp | ( | ) |
A regexp matching the localized 'REDIRECT' marker for this wiki.
The regexp should be delimited, but should not have boundary anchors or capture groups.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::rtl | ( | ) |
Whether the wiki language is right-to-left.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::script | ( | ) |
Wikimedia\Parsoid\Mocks\MockSiteConfig::scriptpath | ( | ) |
FIXME: This is only used to compute the modules path below and maybe shouldn't be exposed.
The base wiki path
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::scrubBidiChars | ( | ) |
If enabled, bidi chars adjacent to category links will be stripped in the html -> wt serialization pass.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::server | ( | ) |
Wikimedia\Parsoid\Mocks\MockSiteConfig::setFakeTimestamp | ( | ?int | $ts | ) |
Set the fake timestamp for testing.
?int | $ts | Unix timestamp |
Wikimedia\Parsoid\Mocks\MockSiteConfig::setTimezoneOffset | ( | int | $offset | ) |
Set the timezone offset for testing.
int | $offset | Offset from UTC |
Wikimedia\Parsoid\Mocks\MockSiteConfig::specialPageLocalName | ( | string | $alias | ) |
Get the default local name for a special page.
string | $alias | Special page alias |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::timezoneOffset | ( | ) |
The wiki's time zone offset.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
|
protected |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::variantsFor | ( | Bcp47Code | $lang | ) |
Language variant information for the given language (or null if unknown).
Bcp47Code | $lang | The language for which you want variant information |
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.
Wikimedia\Parsoid\Mocks\MockSiteConfig::widthOption | ( | ) |
Default thumbnail width.
Reimplemented from Wikimedia\Parsoid\Config\SiteConfig.