xmlreader — XML Reader#

XML reading module.

Each XmlEntry object represents a page, as read from an XML source

The XmlDump class reads a pages_current XML dump (like the ones offered on https://dumps.wikimedia.org/backup-index.html) and offers a generator over XmlEntry objects which can be used by other bots.

Changed in version 7.7: defusedxml is used in favour of xml.etree if present to prevent vulnerable XML attacks. defusedxml 0.7.1 or higher is recommended.

class xmlreader.XmlDump(filename, *, allrevisions=False, on_error=None)[source]#

Bases: object

Represents an XML dump file.

Reads the local file at initialization, parses it, and offers access to the resulting XmlEntries via a generator.

New in version 7.2: the on_error parameter

Changed in version 7.2: allrevisions parameter must be given as keyword parameter

Usage example:

>>> from pywikibot import xmlreader
>>> name = 'tests/data/xml/article-pear.xml'
>>> dump = xmlreader.XmlDump(name, allrevisions=True)
>>> for elem in dump.parse():
...     print(elem.title, elem.revisionid)
Pear 185185
Pear 185241
Pear 185408
Pear 188924
  • allrevisions (bool) – boolean If True, parse all revisions instead of only the latest one. Default: False.

  • on_error (Callable[[type[BaseException]], None] | None) – a callable which is invoked within parse() method when a ParseError occurs. The exception is passed to this callable. Otherwise the exception is raised.


Generator using ElementTree iterparse function.

Changed in version 7.2: if a ParseError occurs it can be handled by the callable given with on_error parameter of this instance.

class xmlreader.XmlEntry(title, ns, id, text, username, ipedit, timestamp, editRestriction, moveRestriction, revisionid, comment, redirect)[source]#

Bases: object

Represent a page.


Parse the characters within a restrictions tag.

Returns strings representing user groups allowed to edit and to move a page, where None means there are no restrictions.