xmlreader
— XML Reader#
XML reading module.
Each XmlEntry object represents a page, as read from an XML source
The XmlDump class reads a pages_current XML dump (like the ones offered on https://dumps.wikimedia.org/backup-index.html) and offers a generator over XmlEntry objects which can be used by other bots.
Changed in version 7.7: defusedxml is used in favour of xml.etree if present to prevent vulnerable XML attacks. defusedxml 0.7.1 or higher is recommended.
- class xmlreader.XmlDump(filename, *, allrevisions=False, on_error=None)[source]#
Bases:
object
Represents an XML dump file.
Reads the local file at initialization, parses it, and offers access to the resulting XmlEntries via a generator.
New in version 7.2: the
on_error
parameterChanged in version 7.2:
allrevisions
parameter must be given as keyword parameterUsage example:
>>> from pywikibot import xmlreader >>> name = 'tests/data/xml/article-pear.xml' >>> dump = xmlreader.XmlDump(name, allrevisions=True) >>> for elem in dump.parse(): ... print(elem.title, elem.revisionid) ... ... Pear 185185 Pear 185241 Pear 185408 Pear 188924 >>>
- Parameters:
allrevisions (bool) – boolean If True, parse all revisions instead of only the latest one. Default: False.
on_error (Optional[Callable[[Type[BaseException]], None]]) – a callable which is invoked within
parse()
method when a ParseError occurs. The exception is passed to this callable. Otherwise the exception is raised.