`proofreadpage` — ProofreadPage Extension#

Objects used with ProofreadPage Extension.

This module includes objects:

ProofreadPage(Page)
FullHeader
IndexPage(Page)

OCR support of page scans via: - https://phetools.toolforge.org/hocr_cgi.py - https://phetools.toolforge.org/ocr.php - inspired by https://en.wikisource.org/wiki/MediaWiki:Gadget-ocr.js

Wikimedia OCR
see: https://www.mediawiki.org/wiki/Help:Extension:Wikisource/Wikimedia_OCR
https://ocr.wmcloud.org/
inspired by https://wikisource.org/wiki/MediaWiki:GoogleOCR.js
see also: https://wikisource.org/wiki/Wikisource:Google_OCR

class proofreadpage.FullHeader(text=None)[source]#

Bases: object

Header of a ProofreadPage object.

Parameters:: text (Optional[str]) –

TEMPLATE_V1 = '<pagequality level="{0.ql}" user="{0.user}" /><div class="pagetext">{0.header}\n\n\n'#

TEMPLATE_V2 = '<pagequality level="{0.ql}" user="{0.user}" />{0.header}'#

p_header = re.compile('<pagequality level="(?P<ql>\\d)" user="(?P<user>.*?)" />(?P<has_div><div class="pagetext">)?(?P<header>.*)', re.DOTALL)#

class proofreadpage.IndexPage(source, title='')[source]#

Bases: Page

Index Page page used in Mediawiki ProofreadPage extension.

Instantiate an IndexPage object.

In this class: page number is the number in the page title in the Page namespace, if the wikisource site adopts this convention (e.g. page_number is 12 for Page:Popular Science Monthly Volume 1.djvu/12) or the sequential number of the pages linked from the index section in the Index page if the index is built via transclusion of a list of pages (e.g. like on de wikisource). page label is the label associated with a page in the Index page.

This class provides methods to get pages contained in Index page, and relative page numbers and labels by means of several helper functions.

It also provides a generator to pages contained in Index page, with possibility to define range, filter by quality levels and page existence.

Raises:

UnknownExtensionError – source Site has no ProofreadPage Extension.
ImportError – bs4 is not installed.

Parameters:

source (Union[BaseLink, BaseSite, Page]) –
title (str) –

INDEX_TEMPLATE = ':MediaWiki:Proofreadpage_index_template'#

_get_prp_index_pagelist()[source]#

Get all pages in an IndexPage page list.

Note

This method is called by initializer and should not be used.

proofreadpage — ProofreadPage Extension#

`proofreadpage` — ProofreadPage Extension#