MediaWiki master
ParsoidParser.php
Go to the documentation of this file.
1<?php
2declare( strict_types = 1 );
3
5
9use MediaWiki\Languages\LanguageConverterFactory;
23use Wikimedia\Assert\Assert;
24use Wikimedia\Parsoid\Config\PageConfig;
25use Wikimedia\Parsoid\Parsoid;
26
35class ParsoidParser /* eventually this will extend \Parser */ {
42 public const PARSOID_TITLE_KEY = "parsoid:title-dbkey";
43 private Parsoid $parsoid;
44 private PageConfigFactory $pageConfigFactory;
45 private LanguageConverterFactory $languageConverterFactory;
46 private DataAccess $dataAccess;
47
53 public function __construct(
54 Parsoid $parsoid,
55 PageConfigFactory $pageConfigFactory,
56 LanguageConverterFactory $languageConverterFactory,
57 DataAccess $dataAccess
58 ) {
59 $this->parsoid = $parsoid;
60 $this->pageConfigFactory = $pageConfigFactory;
61 $this->languageConverterFactory = $languageConverterFactory;
62 $this->dataAccess = $dataAccess;
63 }
64
72 private function genParserOutput(
73 PageConfig $pageConfig, ParserOptions $options, ?ParserOutput $previousOutput
74 ): ParserOutput {
75 $parserOutput = new ParserOutput();
76
77 // Parsoid itself does not vary output by parser options right now.
78 // But, ensure that any option use by extensions, parser functions,
79 // recursive parses, or (in the unlikely future scenario) Parsoid itself
80 // are recorded as used.
81 $options->registerWatcher( [ $parserOutput, 'recordOption' ] );
82
83 // The enable/disable logic here matches that in Parser::internalParseHalfParsed(),
84 // although __NOCONTENTCONVERT__ is handled internal to Parsoid.
85 //
86 // T349137: It might be preferable to handle __NOCONTENTCONVERT__ here rather than
87 // by inspecting the DOM inside Parsoid. That will come in a separate patch.
88 $htmlVariantLanguage = null;
89 if ( !( $options->getDisableContentConversion() || $options->getInterfaceMessage() ) ) {
90 // NOTES (some of these are TODOs for read views integration)
91 // 1. This html variant conversion is a pre-cache transform. HtmlOutputRendererHelper
92 // has another variant conversion that is a post-cache transform based on the
93 // 'Accept-Language' header. If that header is set, there is really no reason to
94 // do this conversion here. So, eventually, we are likely to either not pass in
95 // the htmlVariantLanguage option below OR disable language conversion from the
96 // wt2html path in Parsoid and this and the Accept-Language variant conversion
97 // both would have to be handled as post-cache transforms.
98 //
99 // 2. Parser.php calls convert() which computes a preferred variant from the
100 // target language. But, we cannot do that unconditionally here because REST API
101 // requests specify the exact variant via the 'Content-Language' header.
102 //
103 // For Parsoid page views, either the callers will have to compute the
104 // preferred variant and set it in ParserOptions OR the REST API will have
105 // to set some other flag indicating that the preferred variant should not
106 // be computed. For now, I am adding a temporary hack, but this should be
107 // replaced with something more sensible (T267067).
108 //
109 // 3. Additionally, Parsoid's callers will have to set targetLanguage in ParserOptions
110 // to mimic the logic in Parser.php (missing right now).
111 $langCode = $pageConfig->getPageLanguageBcp47();
112 if ( $options->getRenderReason() === 'page-view' ) { // TEMPORARY HACK
113 $langFactory = MediaWikiServices::getInstance()->getLanguageFactory();
114 $lang = $langFactory->getLanguage( $langCode );
115 $langConv = $this->languageConverterFactory->getLanguageConverter( $lang );
116 $htmlVariantLanguage = $langFactory->getLanguage( $langConv->getPreferredVariant() );
117 } else {
118 $htmlVariantLanguage = $langCode;
119 }
120 }
121 $oldPageConfig = null;
122 $oldPageBundle = null;
123
124 // T371713: Temporary statistics collection code to determine
125 // feasibility of Parsoid selective update
126 $sampleRate = MediaWikiServices::getInstance()->getMainConfig()->get(
128 );
129 $doSample = ( $sampleRate && mt_rand( 1, $sampleRate ) === 1 );
130 if ( $doSample && $previousOutput !== null && $previousOutput->getCacheRevisionId() ) {
131 // Allow fetching the old wikitext corresponding to the
132 // $previousOutput
133 $oldPageConfig = $this->pageConfigFactory->createFromParserOptions(
134 $options,
135 Title::newFromLinkTarget( $pageConfig->getLinkTarget() ),
136 $previousOutput->getCacheRevisionId(),
137 $previousOutput->getLanguage()
138 );
139 $oldPageBundle =
141 $previousOutput
142 );
143 }
144
145 $defaultOptions = [
146 'pageBundle' => true,
147 'wrapSections' => true,
148 'logLinterData' => true,
149 'body_only' => false,
150 'htmlVariantLanguage' => $htmlVariantLanguage,
151 'offsetType' => 'byte',
152 'outputContentVersion' => Parsoid::defaultHTMLVersion(),
153 'previousOutput' => $oldPageBundle,
154 'previousInput' => $oldPageConfig,
155 // The following are passed for metrics & labelling
156 'sampleStats' => $doSample,
157 'renderReason' => $options->getRenderReason(),
158 'userAgent' => RequestContext::getMain()->getRequest()->getHeader( 'User-Agent' ),
159 ];
160
161 $parserOutput->resetParseStartTime();
162
163 // This can throw ClientError or ResourceLimitExceededException.
164 // Callers are responsible for figuring out how to handle them.
165 $pageBundle = $this->parsoid->wikitext2html(
166 $pageConfig,
167 $defaultOptions,
168 $headers,
169 $parserOutput );
170
171 $parserOutput = PageBundleParserOutputConverter::parserOutputFromPageBundle( $pageBundle, $parserOutput );
172
173 // Record the page title in dbkey form so that post-cache transforms
174 // have access to the title.
175 $parserOutput->setExtensionData(
176 self::PARSOID_TITLE_KEY,
177 Title::newFromLinkTarget( $pageConfig->getLinkTarget() )->getPrefixedDBkey()
178 );
179
180 // Register a watcher again because the $parserOutput arg
181 // and $parserOutput return value above are different objects!
182 $options->registerWatcher( [ $parserOutput, 'recordOption' ] );
183
184 $parserOutput->setFromParserOptions( $options );
185
186 $parserOutput->recordTimeProfile();
187 $this->dataAccess->makeLimitReport( $pageConfig, $options, $parserOutput );
188
189 // T371713: Collect statistics on parsing time -vs- presence of
190 // $previousOutput
191 $stats = MediaWikiServices::getInstance()->getStatsFactory();
192 $labels = [
193 'type' => $previousOutput === null ? 'full' : 'selective',
194 'wiki' => WikiMap::getCurrentWikiId(),
195 'reason' => $options->getRenderReason() ?: 'unknown',
196 'has_async_content' =>
197 $parserOutput->getOutputFlag( ParserOutputFlags::HAS_ASYNC_CONTENT )
198 ? 'true' : 'false',
199 'async_not_ready' =>
200 $parserOutput->getOutputFlag( ParserOutputFlags::ASYNC_NOT_READY )
201 ? 'true' : 'false',
202 ];
203 $stats
204 ->getCounter( 'Parsoid_parse_cpu_seconds' )
205 ->setLabels( $labels )
206 ->incrementBy( $parserOutput->getTimeProfile( 'cpu' ) );
207 $stats
208 ->getCounter( 'Parsoid_parse_total' )
209 ->setLabels( $labels )
210 ->increment();
211
212 // Add Parsoid skinning module
213 $parserOutput->addModuleStyles( [ 'mediawiki.skinning.content.parsoid' ] );
214
215 // Record Parsoid version in extension data; this allows
216 // us to use the onRejectParserCacheValue hook to selectively
217 // expire "bad" generated content in the event of a rollback.
218 $parserOutput->setExtensionData(
219 'core:parsoid-version', Parsoid::version()
220 );
221 $parserOutput->setExtensionData(
222 'core:html-version', Parsoid::defaultHTMLVersion()
223 );
224
225 return $parserOutput;
226 }
227
248 public function parse(
249 $text, PageReference $page, ParserOptions $options,
250 bool $linestart = true, bool $clearState = true, ?int $revId = null,
251 ?ParserOutput $previousOutput = null
252 ): ParserOutput {
253 Assert::invariant( $linestart, '$linestart=false is not yet supported' );
254 Assert::invariant( $clearState, '$clearState=false is not yet supported' );
255 $title = Title::newFromPageReference( $page );
256 $lang = $options->getTargetLanguage();
257 if ( $lang === null && $options->getInterfaceMessage() ) {
258 $lang = $options->getUserLangObj();
259 }
260 $pageConfig = $revId === null || $revId === 0 ? null : $this->pageConfigFactory->createFromParserOptions(
261 $options, // T392113: transfers current revision record callback
262 $title,
263 $revId,
264 $lang // defaults to title page language if null
265 );
266 $content = null;
267 if ( $text instanceof TextContent ) {
268 $content = $text;
269 $text = $content->getText();
270 }
271 if ( !( $pageConfig && $pageConfig->getPageMainContent() === $text ) ) {
272 // This is a bit awkward! But we really need to parse $text, which
273 // may or may not correspond to the $revId provided!
274 // T332928 suggests one solution: splitting the "have revid"
275 // callers from the "bare text, no associated revision" callers.
276 $revisionRecord = new MutableRevisionRecord( $title );
277 if ( $revId !== null ) {
278 $revisionRecord->setId( $revId );
279 }
280 $revisionRecord->setSlot(
281 SlotRecord::newUnsaved(
282 SlotRecord::MAIN,
283 $content ?? new WikitextContent( $text )
284 )
285 );
286 $pageConfig = $this->pageConfigFactory->createFromParserOptions(
287 $options,
288 $title,
289 $revisionRecord,
290 $lang // defaults to title page language if null
291 );
292 }
293
294 return $this->genParserOutput( $pageConfig, $options, $previousOutput );
295 }
296
309 public function parseFakeRevision(
310 RevisionRecord $fakeRev, PageReference $page, ParserOptions $options
311 ): ParserOutput {
312 wfDeprecated( __METHOD__, '1.43' );
313 $title = Title::newFromPageReference( $page );
314 $lang = $options->getTargetLanguage();
315 if ( $lang === null && $options->getInterfaceMessage() ) {
316 $lang = $options->getUserLangObj();
317 }
318 $pageConfig = $this->pageConfigFactory->createFromParserOptions(
319 $options,
320 $title,
321 $fakeRev,
322 $lang // defaults to title page language if null
323 );
324
325 return $this->genParserOutput( $pageConfig, $options, null );
326 }
327}
wfDeprecated( $function, $version=false, $component=false, $callerOffset=2)
Logs a warning that a deprecated feature was used.
Content object implementation for representing flat text.
Content object for wiki text pages.
Group all the pieces relevant to the context of a request into one instance.
A class containing constants representing the names of configuration variables.
const ParsoidSelectiveUpdateSampleRate
Name constant for the ParsoidSelectiveUpdateSampleRate setting, for use with Config::get()
Service locator for MediaWiki core services.
static getInstance()
Returns the global default instance of the top level service locator.
Set options of the Parser.
getDisableContentConversion()
Whether content conversion should be disabled.
getRenderReason()
Returns reason for rendering the content.
getInterfaceMessage()
Parsing an interface message in the user language?
getTargetLanguage()
Target language for the parse.
getUserLangObj()
Get the user language used by the parser for this page and split the parser cache.
registerWatcher( $callback)
Registers a callback for tracking which ParserOptions which are used.
ParserOutput is a rendering of a Content object or a message.
getLanguage()
Get the primary language code of the output.
Implement Parsoid's abstract class for data access.
Helper class used by MediaWiki to create Parsoid PageConfig objects.
static pageBundleFromParserOutput(ParserOutput $parserOutput)
Returns a Parsoid PageBundle equivalent to the given ParserOutput.
static parserOutputFromPageBundle(PageBundle $pageBundle, ?ParserOutput $originalParserOutput=null)
Creates a ParserOutput object containing the relevant data from the given PageBundle object.
Parser implementation which uses Parsoid.
__construct(Parsoid $parsoid, PageConfigFactory $pageConfigFactory, LanguageConverterFactory $languageConverterFactory, DataAccess $dataAccess)
parse( $text, PageReference $page, ParserOptions $options, bool $linestart=true, bool $clearState=true, ?int $revId=null, ?ParserOutput $previousOutput=null)
Convert wikitext to HTML Do not call this function recursively.
parseFakeRevision(RevisionRecord $fakeRev, PageReference $page, ParserOptions $options)
Page revision base class.
Value object representing a content slot associated with a page revision.
Represents a title within MediaWiki.
Definition Title.php:78
Tools for dealing with other locally-hosted wikis.
Definition WikiMap.php:33
Interface for objects (potentially) representing a page that can be viewable and linked to on a wiki.