MediaWiki master
ParsoidParser.php
Go to the documentation of this file.
1<?php
2declare( strict_types = 1 );
3
5
9use MediaWiki\Languages\LanguageConverterFactory;
23use Wikimedia\Assert\Assert;
24use Wikimedia\Parsoid\Config\PageConfig;
25use Wikimedia\Parsoid\Parsoid;
26
35class ParsoidParser /* eventually this will extend \Parser */ {
42 public const PARSOID_TITLE_KEY = "parsoid:title-dbkey";
43
44 public function __construct(
45 private Parsoid $parsoid,
46 private readonly PageConfigFactory $pageConfigFactory,
47 private readonly LanguageConverterFactory $languageConverterFactory,
48 private readonly DataAccess $dataAccess,
49 ) {
50 }
51
59 private function genParserOutput(
60 PageConfig $pageConfig, ParserOptions $options, ?ParserOutput $previousOutput
61 ): ParserOutput {
62 $parserOutput = new ParserOutput();
63
64 // Parsoid itself does not vary output by parser options right now.
65 // But, ensure that any option use by extensions, parser functions,
66 // recursive parses, or (in the unlikely future scenario) Parsoid itself
67 // are recorded as used.
68 $options->registerWatcher( $parserOutput->recordOption( ... ) );
69
70 // The enable/disable logic here matches that in Parser::internalParseHalfParsed(),
71 // although __NOCONTENTCONVERT__ is handled internal to Parsoid.
72 //
73 // T349137: It might be preferable to handle __NOCONTENTCONVERT__ here rather than
74 // by inspecting the DOM inside Parsoid. That will come in a separate patch.
75 $htmlVariantLanguage = null;
76 if ( !( $options->getDisableContentConversion() || $options->getInterfaceMessage() ) ) {
77 // NOTES (some of these are TODOs for read views integration)
78 // 1. This html variant conversion is a pre-cache transform. HtmlOutputRendererHelper
79 // has another variant conversion that is a post-cache transform based on the
80 // 'Accept-Language' header. If that header is set, there is really no reason to
81 // do this conversion here. So, eventually, we are likely to either not pass in
82 // the htmlVariantLanguage option below OR disable language conversion from the
83 // wt2html path in Parsoid and this and the Accept-Language variant conversion
84 // both would have to be handled as post-cache transforms.
85 //
86 // 2. Parser.php calls convert() which computes a preferred variant from the
87 // target language. But, we cannot do that unconditionally here because REST API
88 // requests specify the exact variant via the 'Content-Language' header.
89 //
90 // For Parsoid page views, either the callers will have to compute the
91 // preferred variant and set it in ParserOptions OR the REST API will have
92 // to set some other flag indicating that the preferred variant should not
93 // be computed. For now, I am adding a temporary hack, but this should be
94 // replaced with something more sensible (T267067).
95 //
96 // 3. Additionally, Parsoid's callers will have to set targetLanguage in ParserOptions
97 // to mimic the logic in Parser.php (missing right now).
98 $langCode = $pageConfig->getPageLanguageBcp47();
99 // TEMPORARY HACK
100 if ( $options->getRenderReason() === 'page_view' || $options->getRenderReason() === 'page_view_old' ) {
101 $langFactory = MediaWikiServices::getInstance()->getLanguageFactory();
102 $lang = $langFactory->getLanguage( $langCode );
103 $langConv = $this->languageConverterFactory->getLanguageConverter( $lang );
104 $htmlVariantLanguage = $langFactory->getLanguage( $langConv->getPreferredVariant() );
105 } else {
106 $htmlVariantLanguage = $langCode;
107 }
108 }
109 $oldPageConfig = null;
110 $oldPageBundle = null;
111
112 // T371713: Temporary statistics collection code to determine
113 // feasibility of Parsoid selective update
114 $sampleRate = MediaWikiServices::getInstance()->getMainConfig()->get(
116 );
117 $doSample = ( $sampleRate && mt_rand( 1, $sampleRate ) === 1 );
118 if ( $doSample && $previousOutput !== null && $previousOutput->getCacheRevisionId() ) {
119 // Allow fetching the old wikitext corresponding to the
120 // $previousOutput
121 $oldPageConfig = $this->pageConfigFactory->createFromParserOptions(
122 $options,
123 Title::newFromLinkTarget( $pageConfig->getLinkTarget() ),
124 $previousOutput->getCacheRevisionId(),
125 $previousOutput->getLanguage()
126 );
127 $oldPageBundle =
129 $previousOutput
130 );
131 }
132
133 $defaultOptions = [
134 'pageBundle' => true,
135 'wrapSections' => true,
136 'logLinterData' => true,
137 'body_only' => false,
138 'htmlVariantLanguage' => $htmlVariantLanguage,
139 'offsetType' => 'byte',
140 'outputContentVersion' => Parsoid::defaultHTMLVersion(),
141 'previousOutput' => $oldPageBundle,
142 'previousInput' => $oldPageConfig,
143 // The following are passed for metrics & labelling
144 'sampleStats' => $doSample,
145 'renderReason' => $options->getRenderReason(),
146 'userAgent' => RequestContext::getMain()->getRequest()->getHeader( 'User-Agent' ),
147 ];
148
149 $parserOutput->resetParseStartTime();
150
151 // This can throw ClientError or ResourceLimitExceededException.
152 // Callers are responsible for figuring out how to handle them.
153 $pageBundle = $this->parsoid->wikitext2html(
154 $pageConfig,
155 $defaultOptions,
156 $headers,
157 $parserOutput );
158
159 $parserOutput = PageBundleParserOutputConverter::parserOutputFromPageBundle( $pageBundle, $parserOutput );
160
161 // Record the page title in dbkey form so that post-cache transforms
162 // have access to the title.
163 $parserOutput->setTitle( $pageConfig->getLinkTarget() );
164 // Backward-compatibility w/ MW < 1.46
165 $parserOutput->setExtensionData(
166 self::PARSOID_TITLE_KEY,
167 Title::newFromLinkTarget( $pageConfig->getLinkTarget() )->getPrefixedDBkey()
168 );
169
170 // Register a watcher again because the $parserOutput arg
171 // and $parserOutput return value above are different objects!
172 $options->registerWatcher( $parserOutput->recordOption( ... ) );
173
174 $parserOutput->setFromParserOptions( $options );
175
176 $parserOutput->recordTimeProfile();
177 $this->dataAccess->makeLimitReport( $pageConfig, $options, $parserOutput );
178
179 // T371713: Collect statistics on parsing time -vs- presence of
180 // $previousOutput
181 $stats = MediaWikiServices::getInstance()->getStatsFactory();
182 $labels = [
183 'type' => $previousOutput === null ? 'full' : 'selective',
184 'wiki' => WikiMap::getCurrentWikiId(),
185 'reason' => $options->getRenderReason() ?: 'unknown',
186 'has_async_content' =>
187 $parserOutput->getOutputFlag( ParserOutputFlags::HAS_ASYNC_CONTENT )
188 ? 'true' : 'false',
189 'async_not_ready' =>
190 $parserOutput->getOutputFlag( ParserOutputFlags::ASYNC_NOT_READY )
191 ? 'true' : 'false',
192 ];
193 $stats
194 ->getCounter( 'Parsoid_parse_cpu_seconds' )
195 ->setLabels( $labels )
196 ->incrementBy( $parserOutput->getTimeProfile( 'cpu' ) );
197 $stats
198 ->getCounter( 'Parsoid_parse_total' )
199 ->setLabels( $labels )
200 ->increment();
201
202 // Add Parsoid skinning module
203 $parserOutput->addModuleStyles( [ 'mediawiki.skinning.content.parsoid' ] );
204
205 // Record Parsoid version in extension data; this allows
206 // us to use the onRejectParserCacheValue hook to selectively
207 // expire "bad" generated content in the event of a rollback.
208 $parserOutput->setExtensionData(
209 'core:parsoid-version', Parsoid::version()
210 );
211 $parserOutput->setExtensionData(
212 'core:html-version', Parsoid::defaultHTMLVersion()
213 );
214 // Export Parsoid HTML version to client gadgets as well
215 $parserOutput->setJsConfigVar(
216 'wgParsoidHtmlVersion', Parsoid::defaultHTMLVersion()
217 );
218
219 return $parserOutput;
220 }
221
242 public function parse(
243 $text, PageReference $page, ParserOptions $options,
244 bool $linestart = true, bool $clearState = true, ?int $revId = null,
245 ?ParserOutput $previousOutput = null
246 ): ParserOutput {
247 Assert::invariant( $linestart, '$linestart=false is not yet supported' );
248 Assert::invariant( $clearState, '$clearState=false is not yet supported' );
249 $title = Title::newFromPageReference( $page );
250 $lang = $options->getTargetLanguage();
251 if ( $lang === null && $options->getInterfaceMessage() ) {
252 $lang = $options->getUserLangObj();
253 }
254 $pageConfig = $revId === null || $revId === 0 ? null : $this->pageConfigFactory->createFromParserOptions(
255 $options, // T392113: transfers current revision record callback
256 $title,
257 $revId,
258 $lang // defaults to title page language if null
259 );
260 $content = null;
261 if ( $text instanceof TextContent ) {
262 $content = $text;
263 $text = $content->getText();
264 }
265 if ( !( $pageConfig && $pageConfig->getPageMainContent() === $text ) ) {
266 // This is a bit awkward! But we really need to parse $text, which
267 // may or may not correspond to the $revId provided!
268 // T332928 suggests one solution: splitting the "have revid"
269 // callers from the "bare text, no associated revision" callers.
270 $revisionRecord = new MutableRevisionRecord( $title );
271 if ( $revId !== null ) {
272 $revisionRecord->setId( $revId );
273 }
274 $revisionRecord->setSlot(
275 SlotRecord::newUnsaved(
276 SlotRecord::MAIN,
277 $content ?? new WikitextContent( $text )
278 )
279 );
280 $pageConfig = $this->pageConfigFactory->createFromParserOptions(
281 $options,
282 $title,
283 $revisionRecord,
284 $lang // defaults to title page language if null
285 );
286 }
287
288 return $this->genParserOutput( $pageConfig, $options, $previousOutput );
289 }
290
303 public function parseFakeRevision(
304 RevisionRecord $fakeRev, PageReference $page, ParserOptions $options
305 ): ParserOutput {
306 wfDeprecated( __METHOD__, '1.43' );
307 $title = Title::newFromPageReference( $page );
308 $lang = $options->getTargetLanguage();
309 if ( $lang === null && $options->getInterfaceMessage() ) {
310 $lang = $options->getUserLangObj();
311 }
312 $pageConfig = $this->pageConfigFactory->createFromParserOptions(
313 $options,
314 $title,
315 $fakeRev,
316 $lang // defaults to title page language if null
317 );
318
319 return $this->genParserOutput( $pageConfig, $options, null );
320 }
321}
wfDeprecated( $function, $version=false, $component=false, $callerOffset=2)
Logs a warning that a deprecated feature was used.
Content object implementation for representing flat text.
Content object for wiki text pages.
Group all the pieces relevant to the context of a request into one instance.
A class containing constants representing the names of configuration variables.
const ParsoidSelectiveUpdateSampleRate
Name constant for the ParsoidSelectiveUpdateSampleRate setting, for use with Config::get()
Service locator for MediaWiki core services.
static getInstance()
Returns the global default instance of the top level service locator.
Set options of the Parser.
getDisableContentConversion()
Whether content conversion should be disabled.
getRenderReason()
Returns reason for rendering the content.
getInterfaceMessage()
Parsing an interface message in the user language?
getTargetLanguage()
Target language for the parse.
getUserLangObj()
Get the user language used by the parser for this page and split the parser cache.
registerWatcher( $callback)
Registers a callback for tracking which ParserOptions which are used.
ParserOutput is a rendering of a Content object or a message.
getLanguage()
Get the primary language code of the output.
Implement Parsoid's abstract class for data access.
Helper class used by MediaWiki to create Parsoid PageConfig objects.
static pageBundleFromParserOutput(ParserOutput $parserOutput)
Returns a Parsoid HtmlPageBundle equivalent to the given ParserOutput.
static parserOutputFromPageBundle(HtmlPageBundle $pageBundle, ?ParserOutput $originalParserOutput=null)
Creates a ParserOutput object containing the relevant data from the given HtmlPageBundle object.
Parser implementation which uses Parsoid.
__construct(private Parsoid $parsoid, private readonly PageConfigFactory $pageConfigFactory, private readonly LanguageConverterFactory $languageConverterFactory, private readonly DataAccess $dataAccess,)
parse( $text, PageReference $page, ParserOptions $options, bool $linestart=true, bool $clearState=true, ?int $revId=null, ?ParserOutput $previousOutput=null)
Convert wikitext to HTML Do not call this function recursively.
parseFakeRevision(RevisionRecord $fakeRev, PageReference $page, ParserOptions $options)
Page revision base class.
Value object representing a content slot associated with a page revision.
Represents a title within MediaWiki.
Definition Title.php:69
Tools for dealing with other locally-hosted wikis.
Definition WikiMap.php:19
Interface for objects (potentially) representing a page that can be viewable and linked to on a wiki.