xmlreader
— XML Reader#
XML reading module.
Each XmlEntry object represents a page, as read from an XML source
The XmlDump class reads a pages_current XML dump (like the ones offered on https://dumps.wikimedia.org/backup-index.html) and offers a generator over XmlEntry objects which can be used by other bots.
Changed in version 7.7: defusedxml is used in favour of xml.etree if present to prevent vulnerable XML attacks. defusedxml 0.7.1 or higher is recommended.
- class xmlreader.Headers(title, ns, pageid, isredirect, edit_restriction, move_restriction)[source]#
Bases:
NamedTuple
Represent the common info of a page.
Added in version 9.0.
Create new instance of Headers(title, ns, pageid, isredirect, edit_restriction, move_restriction)
- Parameters:
title (str)
ns (str)
pageid (str)
isredirect (bool)
edit_restriction (str)
move_restriction (str)
- edit_restriction: str#
Alias for field number 4
- isredirect: bool#
Alias for field number 3
- move_restriction: str#
Alias for field number 5
- ns: str#
Alias for field number 1
- pageid: str#
Alias for field number 2
- title: str#
Alias for field number 0
- class xmlreader.RawRev(headers, revision, revid)[source]#
Bases:
NamedTuple
Represent a raw revision.
Added in version 9.0.
Create new instance of RawRev(headers, revision, revid)
- Parameters:
headers (Headers)
revision (Element)
revid (int)
- revid: int#
Alias for field number 2
- revision: Element#
Alias for field number 1
- class xmlreader.XmlDump(filename, *, allrevisions=None, revisions='first_found', on_error=None)[source]#
Bases:
object
Represents an XML dump file.
Reads the local file at initialization, parses it, and offers access to the resulting XmlEntries via a generator.
Added in version 7.2: the
on_error
parameterChanged in version 7.2:
allrevisions
parameter must be given as keyword parameterChanged in version 9.0:
allrevisions
parameter is deprecated due to T340804,revisions
parameter was introduced as replacement.root
attribute was removed.Usage example:
>>> from pywikibot import xmlreader >>> name = 'tests/data/xml/article-pear.xml' >>> dump = xmlreader.XmlDump(name, revisions='all') >>> for elem in dump.parse(): ... print(elem.title, elem.revisionid) ... ... Pear 185185 Pear 185241 Pear 185408 Pear 188924 >>>
- Parameters:
allrevisions (bool | str | None) – boolean If True, parse all revisions instead of only the latest one. Default: False.
on_error (Callable[[ParseError], None] | None) – a callable which is invoked within
parse()
method when a ParseError occurs. The exception is passed to this callable. Otherwise the exception is raised.revisions (str) – which of four methods to use to parse the dump: *
first_found
(whichever revision is the first element) *latest
(most recent revision, by largestrevisionid
) *earliest
(first revision, by smallestrevisionid
) *all
(all revisions for each page) Default:first_found
- parse()[source]#
Generator using ElementTree iterparse function.
Changed in version 7.2: if a ParseError occurs it can be handled by the callable given with
on_error
parameter of this instance.- Return type:
Iterator[XmlEntry]
- static parse_restrictions(restrictions)[source]#
Parse the characters within a restrictions tag.
Returns strings representing user groups allowed to edit and to move a page, where None means there are no restrictions.
Added in version 9.0: replaces deprecated
parseRestrictions
function.- Parameters:
restrictions (str)
- Return type:
tuple[str | None, str | None]
- class xmlreader.XmlEntry(title, ns, id, text, username, ipedit, timestamp, editRestriction, moveRestriction, revisionid, comment, isredirect)[source]#
Bases:
object
Represent a page.
- Parameters:
title (str)
ns (str)
id (str)
text (str)
username (str)
ipedit (bool)
timestamp (str)
editRestriction (str)
moveRestriction (str)
revisionid (str)
comment (str)
isredirect (bool)
- comment: str#
- editRestriction: str#
- id: str#
- ipedit: bool#
- isredirect: bool#
- moveRestriction: str#
- ns: str#
- revisionid: str#
- text: str#
- timestamp: str#
- title: str#
- username: str#