Code Coverage |
||||||||||
Lines |
Functions and Methods |
Classes and Traits |
||||||||
| Total | |
0.00% |
0 / 12 |
|
0.00% |
0 / 2 |
CRAP | |
0.00% |
0 / 1 |
| RunExtensionProcessors | |
0.00% |
0 / 12 |
|
0.00% |
0 / 2 |
30 | |
0.00% |
0 / 1 |
| initialize | |
0.00% |
0 / 9 |
|
0.00% |
0 / 1 |
12 | |||
| run | |
0.00% |
0 / 3 |
|
0.00% |
0 / 1 |
6 | |||
| 1 | <?php |
| 2 | declare( strict_types = 1 ); |
| 3 | |
| 4 | namespace Wikimedia\Parsoid\Wt2Html\DOM\Processors; |
| 5 | |
| 6 | use Wikimedia\Parsoid\Config\Env; |
| 7 | use Wikimedia\Parsoid\DOM\Node; |
| 8 | use Wikimedia\Parsoid\Ext\DOMProcessor as ExtDOMProcessor; |
| 9 | use Wikimedia\Parsoid\Wt2Html\Wt2HtmlDOMProcessor; |
| 10 | |
| 11 | /** |
| 12 | * A wrapper to call extension-specific DOM processors. |
| 13 | * |
| 14 | * FIXME: There are two potential ordering problems here. |
| 15 | * |
| 16 | * 1. unpackDOMFragment should always run immediately |
| 17 | * before these extensionPostProcessors, which we do currently. |
| 18 | * This ensures packed content get processed correctly by extensions |
| 19 | * before additional transformations are run on the DOM. |
| 20 | * |
| 21 | * This ordering issue is handled through documentation. |
| 22 | * |
| 23 | * 2. This has existed all along (in the PHP parser as well as Parsoid |
| 24 | * which is probably how the ref-in-ref hack works - because of how |
| 25 | * parser functions and extension tags are procesed, #tag:ref doesn't |
| 26 | * see a nested ref anymore) and this patch only exposes that problem |
| 27 | * more clearly with the unpackOutput property. |
| 28 | * |
| 29 | * * Consider the set of extensions that |
| 30 | * (a) process wikitext |
| 31 | * (b) provide an extensionPostProcessor |
| 32 | * (c) run the extensionPostProcessor only on the top-level |
| 33 | * As of today, there is exactly one extension (Cite) that has all |
| 34 | * these properties, so the problem below is a speculative problem |
| 35 | * for today. But, this could potentially be a problem in the future. |
| 36 | * |
| 37 | * * Let us say there are at least two of them, E1 and E2 that |
| 38 | * support extension tags <e1> and <e2> respectively. |
| 39 | * |
| 40 | * * Let us say in an instance of <e1> on the page, <e2> is present |
| 41 | * and in another instance of <e2> on the page, <e1> is present. |
| 42 | * |
| 43 | * * In what order should E1's and E2's extensionPostProcessors be |
| 44 | * run on the top-level? Depending on what these handlers do, you |
| 45 | * could get potentially different results. You can see this quite |
| 46 | * starkly with the unpackOutput flag. |
| 47 | * |
| 48 | * * The ideal solution to this problem is to require that every extension's |
| 49 | * extensionPostProcessor be idempotent which lets us run these |
| 50 | * post processors repeatedly till the DOM stabilizes. But, this |
| 51 | * still doesn't necessarily guarantee that ordering doesn't matter. |
| 52 | * It just guarantees that with the unpackOutput flag set to false |
| 53 | * multiple extensions, all sealed fragments get fully processed. |
| 54 | * So, we still need to worry about that problem. |
| 55 | * |
| 56 | * But, idempotence *could* potentially be a sufficient property in most cases. |
| 57 | * To see this, consider that there is a Footnotes extension which is similar |
| 58 | * to the Cite extension in that they both extract inline content in the |
| 59 | * page source to a separate section of output and leave behind pointers to |
| 60 | * the global section in the output DOM. Given this, the Cite and Footnote |
| 61 | * extension post processors would essentially walk the dom and |
| 62 | * move any existing inline content into that global section till it is |
| 63 | * done. So, even if a <footnote> has a <ref> and a <ref> has a <footnote>, |
| 64 | * we ultimately end up with all footnote content in the footnotes section |
| 65 | * and all ref content in the references section and the DOM stabilizes. |
| 66 | * Ordering is irrelevant here. |
| 67 | * |
| 68 | * So, perhaps one way of catching these problems would be in code review |
| 69 | * by analyzing what the DOM postprocessor does and see if it introduces |
| 70 | * potential ordering issues. |
| 71 | */ |
| 72 | class RunExtensionProcessors implements Wt2HtmlDOMProcessor { |
| 73 | private ?array $extProcessors = null; |
| 74 | |
| 75 | /** |
| 76 | * FIXME: We've lost the ability to dump dom pre/post individual |
| 77 | * extension processors. Need to fix RunExtensionProcessors to |
| 78 | * reintroduce that granularity |
| 79 | */ |
| 80 | private function initialize( Env $env ): array { |
| 81 | $extProcessors = []; |
| 82 | foreach ( $env->getSiteConfig()->getExtDOMProcessors() as $extName => $domProcs ) { |
| 83 | foreach ( $domProcs as $i => $classNameOrSpec ) { |
| 84 | // Extension post processor, object factory spec given |
| 85 | $objectFactory = $env->getSiteConfig()->getObjectFactory(); |
| 86 | $extProcessors[] = $objectFactory->createObject( $classNameOrSpec, [ |
| 87 | 'allowClassName' => true, |
| 88 | 'assertClass' => ExtDOMProcessor::class, |
| 89 | ] ); |
| 90 | } |
| 91 | } |
| 92 | |
| 93 | return $extProcessors; |
| 94 | } |
| 95 | |
| 96 | /** |
| 97 | * @inheritDoc |
| 98 | */ |
| 99 | public function run( |
| 100 | Env $env, Node $root, array $options = [], bool $atTopLevel = false |
| 101 | ): void { |
| 102 | $this->extProcessors ??= $this->initialize( $env ); |
| 103 | foreach ( $this->extProcessors as $ep ) { |
| 104 | $ep->wtPostprocess( $options['extApi'], $root, $options ); |
| 105 | } |
| 106 | } |
| 107 | } |