xmlreader — XML Reader#

XML reading module.

Each XmlEntry object represents a page, as read from an XML source

The XmlDump class reads a pages_current XML dump (like the ones offered on https://dumps.wikimedia.org/backup-index.html) and offers a generator over XmlEntry objects which can be used by other bots.

Changed in version 7.7: defusedxml is used in favour of xml.etree if present to prevent vulnerable XML attacks. defusedxml 0.7.1 or higher is recommended.

class xmlreader.Headers(title, ns, pageid, isredirect, edit_restriction, move_restriction)[source]#

Bases: NamedTuple

Represent the common info of a page.

Added in version 9.0.

Create new instance of Headers(title, ns, pageid, isredirect, edit_restriction, move_restriction)

Parameters:
  • title (str)

  • ns (str)

  • pageid (str)

  • isredirect (bool)

  • edit_restriction (str)

  • move_restriction (str)

edit_restriction: str#

Alias for field number 4

isredirect: bool#

Alias for field number 3

move_restriction: str#

Alias for field number 5

ns: str#

Alias for field number 1

pageid: str#

Alias for field number 2

title: str#

Alias for field number 0

class xmlreader.RawRev(headers, revision, revid)[source]#

Bases: NamedTuple

Represent a raw revision.

Added in version 9.0.

Create new instance of RawRev(headers, revision, revid)

Parameters:
  • headers (Headers)

  • revision (Element)

  • revid (int)

headers: Headers#

Alias for field number 0

revid: int#

Alias for field number 2

revision: Element#

Alias for field number 1

class xmlreader.XmlDump(filename, *, allrevisions=None, revisions='first_found', on_error=None)[source]#

Bases: object

Represents an XML dump file.

Reads the local file at initialization, parses it, and offers access to the resulting XmlEntries via a generator.

Added in version 7.2: the on_error parameter

Changed in version 7.2: allrevisions parameter must be given as keyword parameter

Changed in version 9.0: allrevisions parameter is deprecated due to T340804, revisions parameter was introduced as replacement. root attribute was removed.

Usage example:

>>> from pywikibot import xmlreader
>>> name = 'tests/data/xml/article-pear.xml'
>>> dump = xmlreader.XmlDump(name, revisions='all')
>>> for elem in dump.parse():
...     print(elem.title, elem.revisionid)
...
...
Pear 185185
Pear 185241
Pear 185408
Pear 188924
>>>
Parameters:
  • allrevisions (bool | str | None) – boolean If True, parse all revisions instead of only the latest one. Default: False.

  • on_error (Callable[[ParseError], None] | None) – a callable which is invoked within parse() method when a ParseError occurs. The exception is passed to this callable. Otherwise the exception is raised.

  • revisions (str) – which of four methods to use to parse the dump: * first_found (whichever revision is the first element) * latest (most recent revision, by largest revisionid) * earliest (first revision, by smallest revisionid) * all (all revisions for each page) Default: first_found

parse()[source]#

Generator using ElementTree iterparse function.

Changed in version 7.2: if a ParseError occurs it can be handled by the callable given with on_error parameter of this instance.

Return type:

Iterator[XmlEntry]

static parse_restrictions(restrictions)[source]#

Parse the characters within a restrictions tag.

Returns strings representing user groups allowed to edit and to move a page, where None means there are no restrictions.

Added in version 9.0: replaces deprecated parseRestrictions function.

Parameters:

restrictions (str)

Return type:

tuple[str | None, str | None]

class xmlreader.XmlEntry(title, ns, id, text, username, ipedit, timestamp, editRestriction, moveRestriction, revisionid, comment, isredirect)[source]#

Bases: object

Represent a page.

Parameters:
  • title (str)

  • ns (str)

  • id (str)

  • text (str)

  • username (str)

  • ipedit (bool)

  • timestamp (str)

  • editRestriction (str)

  • moveRestriction (str)

  • revisionid (str)

  • comment (str)

  • isredirect (bool)

comment: str#
editRestriction: str#
id: str#
ipedit: bool#
isredirect: bool#
moveRestriction: str#
ns: str#
revisionid: str#
text: str#
timestamp: str#
title: str#
username: str#