`xmlreader` — XML Reader#

XML reading module.

Each XmlEntry object represents a page, as read from an XML source

The XmlDump class reads a pages_current XML dump (like the ones offered on https://dumps.wikimedia.org/backup-index.html) and offers a generator over XmlEntry objects which can be used by other bots.

Changed in version 7.7: defusedxml is used in favour of xml.etree if present to prevent vulnerable XML attacks. defusedxml 0.7.1 or higher is recommended.

class xmlreader.Headers(title, ns, pageid, isredirect, edit_restriction, move_restriction)[source]#

Bases: NamedTuple

Represent the common info of a page.

Added in version 9.0.

Create new instance of Headers(title, ns, pageid, isredirect, edit_restriction, move_restriction)

Parameters:

title (str)
ns (str)
pageid (str)
isredirect (bool)
edit_restriction (str)
move_restriction (str)

edit_restriction: str#: Alias for field number 4

isredirect: bool#: Alias for field number 3

move_restriction: str#: Alias for field number 5

ns: str#: Alias for field number 1

pageid: str#: Alias for field number 2

title: str#: Alias for field number 0

class xmlreader.RawRev(headers, revision, revid)[source]#

Bases: NamedTuple

Represent a raw revision.

Added in version 9.0.

Create new instance of RawRev(headers, revision, revid)

Parameters:

headers (Headers)
revision (Element)
revid (int)

headers: Headers#: Alias for field number 0

revid: int#: Alias for field number 2

revision: Element#: Alias for field number 1

class xmlreader.XmlDump(filename, *, allrevisions=None, revisions='first_found', on_error=None)[source]#

Bases: object

Represents an XML dump file.

Reads the local file at initialization, parses it, and offers access to the resulting XmlEntries via a generator.

Added in version 7.2: the on_error parameter

Changed in version 7.2: allrevisions parameter must be given as keyword parameter

Changed in version 9.0: allrevisions parameter is deprecated due to T340804, revisions parameter was introduced as replacement. root attribute was removed.

Usage example:

>>> from pywikibot import xmlreader
>>> name = 'tests/data/xml/article-pear.xml'
>>> dump = xmlreader.XmlDump(name, revisions='all')
>>> for elem in dump.parse():
...     print(elem.title, elem.revisionid)
...
...
Pear 185185
Pear 185241
Pear 185408
Pear 188924
>>>

Parameters:

allrevisions (bool | str | None) – boolean If True, parse all revisions instead of only the latest one. Default: False.
on_error (Callable[[ParseError], None] | None) – a callable which is invoked within parse() method when a ParseError occurs. The exception is passed to this callable. Otherwise the exception is raised.
revisions (str) – which of four methods to use to parse the dump: * first_found (whichever revision is the first element) * latest (most recent revision, by largest revisionid) * earliest (first revision, by smallest revisionid) * all (all revisions for each page) Default: first_found

parse()[source]#

Generator using ElementTree iterparse function.

Changed in version 7.2: if a ParseError occurs it can be handled by the callable given with on_error parameter of this instance.

Return type:: Iterator[XmlEntry]

static parse_restrictions(restrictions)[source]#

Parse the characters within a restrictions tag.

Returns strings representing user groups allowed to edit and to move a page, where None means there are no restrictions.

Added in version 9.0: replaces deprecated parseRestrictions function.

Parameters:: restrictions (str)
Return type:: tuple[str | None, str | None]

class xmlreader.XmlEntry(title, ns, id, text, username, ipedit, timestamp, editRestriction, moveRestriction, revisionid, comment, isredirect)[source]#

Bases: object

Represent a page.

Parameters:

title (str)
ns (str)
id (str)
text (str)
username (str)
ipedit (bool)
timestamp (str)
editRestriction (str)
moveRestriction (str)
revisionid (str)
comment (str)
isredirect (bool)

comment: str#

editRestriction: str#

id: str#

ipedit: bool#

isredirect: bool#

moveRestriction: str#

ns: str#

revisionid: str#

text: str#

timestamp: str#

title: str#

username: str#

xmlreader — XML Reader#

`xmlreader` — XML Reader#