tools package

Miscellaneous helper functions (not wiki-dependent).


Bases: KeyError, IndexError

An error that gets caught by both KeyError and IndexError.


Bases: object

Mixin class to allow comparing to other objects which are comparable.


Bases:, collections.deque

A generator that allows items to be added during generating.


Bases: str,

A default for a not existing siteinfo property.

It should be chosen if there is no better default known. It acts like an empty collections, so it can be iterated through it safely if treated as a list, tuple, set or dictionary. It is also basically an empty string.

Accessing a value via __getitem__ will result in a combined KeyError and IndexError.

Initialise the default as an empty string.

class str)[source]

Bases: object

Version object to allow comparing ‘wmf’ versions with normal ones.

The version mainly consist of digits separated by periods. After that is a suffix which may only be ‘wmf<number>’, ‘alpha’, ‘beta<number>’ or ‘-rc.<number>’ (the - and . are optional). They are considered from old to new in that order with a version number without suffix is considered the newest. This secondary difference is stored in an internal _dev_version attribute.

Two versions are equal if their normal version and dev version are equal. A version is greater if the normal version or dev version is greater. For .. admonition:: Example

1.34 < 1.34.1 < 1.35wmf1 < 1.35alpha < 1.35beta1 < 1.35beta2 < 1.35-rc-1 < 1.35-rc.2 < 1.35

Any other suffixes are considered invalid.


version_str – version to parse

MEDIAWIKI_VERSION = re.compile('(\\d+(?:\\.\\d+)+)(-?wmf\\.?(\\d+)|alpha|beta(\\d+)|-?rc\\.?(\\d+)|.*)?$')
static from_generator(generator: str)[source]

Create instance from a site’s generator attribute.

class*args, **kwargs)[source]

Bases: object

Context manager which implements extended reentrant lock objects.

This RLock is implicit derived from threading.RLock but provides a locked() method like in threading.Lock and a count attribute which gives the active recursion level of locks.


>>> from import RLock
>>> lock = RLock()
>>> lock.acquire()
>>> with lock: print(lock.count)  # nested lock
>>> lock.locked()
>>> lock.release()
>>> lock.locked()

New in version 6.2

property count

Return number of acquired locks.


Return true if the lock is acquired.


Bases:, dict

Dict with SelfCallMixin.


Bases: object

Return self when called.

When ‘_own_desc’ is defined it’ll also issue a deprecation warning using issue_deprecation_warning(‘Calling ‘ + _own_desc, ‘it directly’).


Bases:, str

String with SelfCallMixin.

class str)[source]


Structure to hold values where the key is given by the value itself.

A stucture like a defaultdict but the key is given by the value itselfvand cannot be assigned directly. It returns the number of all items with len() but not the number of keys.


>>> from import SizedKeyCollection
>>> data = SizedKeyCollection('title')
>>> data.append('foo')
>>> data.append('bar')
>>> data.append('Foo')
>>> list(data)
['foo', 'Foo', 'bar']
>>> len(data)
>>> 'Foo' in data
>>> 'foo' in data
>>> data['Foo']
['foo', 'Foo']
>>> list(data.keys())
['Foo', 'Bar']
>>> data.remove_key('Foo')
>>> list(data)
>>> data.clear()
>>> list(data)

New in version 6.1.


keyattr – an attribute or method of the values to be hold with this collection which will be used as key.


Add a value to the collection.


Remove all elements from SizedKeyCollection.


Iterate over items for a given key.


Yield key, len(values) pairs.


Remove a value from the container.


Remove all values for a given key.

class, wait_time=2, *args)[source]

Bases: list

A simple threadpool class to limit the number of simultaneous threads.

Any threading.Thread object can be added to the pool using the append() method. If the maximum number of simultaneous threads has not been reached, the Thread object will be started immediately; if not, the append() call will block until the thread is able to start.

>>> pool = ThreadList(limit=10)
>>> def work():
...     time.sleep(1)
>>> for x in range(20):
...     pool.append(threading.Thread(target=work))
  • limit (int) – the number of simultaneous threads

  • wait_time (int or float) – how long to wait if active threads exceeds limit


Return the number of alive threads and delete all non-alive ones.


Add a thread to the pool and start it.


Stop all threads the pool.

class, target=None, name='GeneratorThread', args=(), kwargs=None, qsize=65536)[source]

Bases: threading.Thread

Look-ahead generator class.

Runs a generator in a separate thread and queues the results; can be called like a regular generator.

Subclasses should override self.generator, not

Important: the generator thread will stop itself if the generator’s internal queue is exhausted; but, if the calling program does not use all the generated values, it must call the generator’s stop() method to stop the background thread. Example usage:

>>> gen = ThreadedGenerator(target=range, args=(20,))
>>> try:
...     data = list(gen)
... finally:
...     gen.stop()
>>> data
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Initializer. Takes same keyword arguments as threading.Thread.

target must be a generator function (or other callable that returns an iterable object).


qsize (int) – The size of the lookahead queue. The larger the qsize, the more values will be computed in advance of use (which can eat up memory and processor time).


Run the generator and store the results on the queue.


Stop the background thread.


Bases: pkg_resources.extern.packaging.version.Version

Version from pkg_resouce vendor package.

This Version provides propreties of vendor package 20.4 shipped with setuptools 49.4.0.

Add additional properties of not provided by base class.


Bases: object

Descriptor class to access a class method as a property.

This class may be used as a decorator:

class Foo:

    _bar = 'baz'  # a class property

    def bar(cls):  # a class property method
        return cls._bar gives ‘baz’.

Hold the class method. str, sha='sha1', bytes_to_read=None)[source]

Compute file hash.

Result is expressed as hexdigest().

  • filename – filename path

  • sha (str) – hashing function among the following in hashlib: md5(), sha1(), sha224(), sha256(), sha384(), and sha512() function name shall be passed as string, e.g. ‘sha1’.

  • bytes_to_read (None or int) – only the first bytes_to_read will be considered; if file size is smaller, the whole file will be considered. str, mode=384, quiet=False, create=False)[source]

Check file mode and update it, if needed.

  • filename – filename path

  • mode (int) – requested file mode

  • quiet (bool) – warn about file mode change if False.

  • create (bool) – create the file if it does not exist already


IOError – The file does not exist and create is False., container=None, key=None, add=None)[source]

Yield unique items from an iterable, omitting duplicates.

By default, to provide uniqueness, it puts the generated items into a set created as a local variable. It only yields items which are not already present in the local set.

For large collections, this is not memory efficient, as a strong reference to every item is kept in a local set which cannot be cleared.

Also, the local set can’t be re-used when chaining unique operations on multiple generators.

To avoid these issues, it is advisable for the caller to provide their own container and set the key parameter to be the function hash, or use a weakref as the key.

The container can be any object that supports __contains__. If the container is a set or dict, the method add or __setitem__ will be used automatically. Any other method may be provided explicitly using the add parameter.

Beware that key=id is only useful for cases where id() is not unique.

Note: This is not thread safe.

  • iterable ( – the source iterable

  • container (type) – storage of seen items

  • key (callable) – function to convert the item to a key

  • add (callable) – function to add an item to the container str)str[source]

Return a string with the first character uncapitalized.

Empty strings are supported. The original string is not changed. str)str[source]

Return a string with the first character capitalized.

Empty strings are supported. The original string is not changed.


MediaWiki doesn’t capitalize some characters the same way as Python. This function tries to be close to MediaWiki’s capitalize function in title.php. See T179115 and T200357., version=None)[source]

Check if a module can be imported.

New in version 3.0., allow_duplicates=False)[source]

Intersect generators listed in genlist.

Yield items only if they are yielded by all generators in genlist. Threads (via ThreadedGenerator) are used in order to run generators in parallel, so that items can be yielded before generators are exhausted.

Threads are stopped when they are either exhausted or Ctrl-C is pressed. Quitting before all generators are finished is attempted if there is no more chance of finding an item in all queues.

  • genlist (list) – list of page generators

  • allow_duplicates (bool) – allow duplicates if present in all generators str)bool[source]

Check if a value is a valid IPv4 or IPv6 address.


value – value to check, *args, marker='…')[source]

Generator which yields the first n elements of the iterable.

If more elements are available and marker is True, it returns an extra string marker as continuation mark.

Function takes the and the additional keyword marker.

  • iterable (iterable) – the iterable to work on

  • args – same args as: - itertools.islice(iterable, stop) - itertools.islice(iterable, start, stop[, step])

  • marker (str) – element to yield if iterable still contains elements after showing the required number. Default value: ‘…’, size: int)[source]

Make an iterator that returns lists of (up to) size items from iterable.


>>> i = itergroup(range(25), 10)
>>> print(next(i))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> print(next(i))
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> print(next(i))
[20, 21, 22, 23, 24]
>>> print(next(i))
Traceback (most recent call last):
StopIteration*args, **kwargs)[source]

Return a merged dict and make sure that the original dicts keys are unique.

The positional arguments are the dictionaries to be merged. It is also possible to define an additional dict using the keyword arguments.[str][source]

Normalize the username., mode='rb', use_extension=True)[source]

Open a file and uncompress it if needed.

This function supports bzip2, gzip, 7zip, lzma, and xz as compression containers. It uses the packages available in the standard library for bzip2, gzip, lzma, and xz so they are always available. 7zip is only available when a 7za program is available and only supports reading from it.

The compression is either selected via the magic number or file ending.

  • filename (str) – The filename.

  • use_extension (bool) – Use the file extension instead of the magic number to determine the type of compression (default True). Must be True when writing or appending.

  • mode (str) – The mode in which the file should be opened. It may either be ‘r’, ‘rb’, ‘a’, ‘ab’, ‘w’ or ‘wb’. All modes open the file in binary mode. It defaults to ‘rb’.

  • ValueError – When 7za is not available or the opening mode is unknown or it tries to write a 7z archive.

  • FileNotFoundError – When the filename doesn’t exist and it tries to read from it or it tries to determine the compression algorithm.

  • OSError – When it’s not a 7z archive but the file extension is 7z. It is also raised by bz2 when its content is invalid. gzip does not immediately raise that error but only on reading it.

  • lzma.LZMAError – When error occurs during compression or decompression or when initializing the state with lzma or xz.

  • ImportError – When file is compressed with bz2 but neither bz2 nor bz2file is importable, or when file is compressed with lzma or xz but lzma is not importable.


A file-like object returning the uncompressed data in binary mode.

Return type

file-like object*iterables)[source]

Yield simultaneous from each iterable.

Sample: >>> tuple(roundrobin_generators(‘ABC’, range(5))) (‘A’, 0, ‘B’, 1, ‘C’, 2, 3, 4)

New in version 3.0.


iterables (iterable) – any iterable to combine in roundrobin way


the combined generator of iterables

Return type


class'', category=<class 'Warning'>, filename='')[source]

Bases: warnings.catch_warnings

A decorator/context manager that temporarily suppresses warnings.

Those suppressed warnings that do not match the parameters will be raised shown upon exit.

New in vesion 3.0.

Initialize the object.

The parameter semantics are similar to those of warnings.filterwarnings.

  • message (str) – A string containing a regular expression that the start of the warning message must match. (case-insensitive)

  • category (type) – A class (a subclass of Warning) of which the warning category must be a subclass in order to match.

  • filename (str) – A string containing a regular expression that the start of the path to the warning module must match. (case-sensitive)

tools.chars module

Character based helper functions (not wiki-dependent).[source]

Return True if the text contain any of the invisible characters.[source]

Replace invisible characters by ‘<codepoint>’. str, encoding: str)str[source]

Convert unicode string to requested HTML encoding.

Attempt to encode the string into the desired format; if that work return it unchanged. Otherwise encode the non-ASCII characters into HTML &#; entities.

  • string – String to update

  • encoding – Encoding to use str)str[source]

Convert unicode chars of str to HTML entities if chars are not ASCII. str, encodings: Union[str, List[str], Tuple[str, ...]] = 'utf-8')str[source]

Convert URL-encoded text to unicode using several encoding.

Uses the first encoding that doesn’t cause an error.

  • title – URL-encoded character data to convert

  • encodings – Encodings to attempt to use during conversion.


UnicodeError – Could not convert using any encoding.

tools.djvu module

Wrapper around djvulibre to access djvu files properties and content.

class str, file_djvu='[deprecated name of file]')[source]

Bases: object

Wrapper around djvulibre to access djvu files properties and content.

Perform file existence checks.

Control characters in djvu text-layer are converted for convenience (see for control chars details).


file – filename (including path) to djvu file

static check_cache(fn)[source]

Decorator to check if cache shall be cleared.

static check_page_number(fn)[source]

Decorator to check if page number is valid.

:raises ValueError

delete_page(*args, **kwargs)[source]

Return most common size and dpi for pages in djvu file.

get_page(*args, **kwargs)[source]
has_text(*args, **kwargs)[source]
number_of_images(*args, **kwargs)[source]
page_info(*args, **kwargs)[source]
whiten_page(*args, **kwargs)[source]

tools.formatter module

Module containing various formatting related utilities.


Bases: object

A class formatting a list of items.

It is possible to customize the appearance by changing format_string which is used by str.format with index, width and item. Each line is joined by the separator and the complete text is surrounded by the prefix and the suffix. All three are by default a new line. The index starts at 1 and for the width it’s using the width of the sequence’s length written as a decimal number. So a length of 100 will result in a with of 3 and a length of 99 in a width of 2.

It is iterating over self.sequence to generate the text. That sequence can be any iterator but the result is better when it has an order.

Create a new instance with a reference to the sequence.

format_string = '  {index:>{width}} - {item}'
property out

Create the text with one item on each line.


Output the text of the current sequence.

prefix = '\n'
separator = '\n'
suffix = '\n' str, *args, **kwargs)str[source]

Do str.format without having to worry about colors.

It is automatically adding 03 in front of color fields so it’s unnecessary to add them manually. Any other 03 in the text is disallowed.

You may use a variant {color} by assigning a valid color to a named parameter color.


text – The format template string


The formatted string