pagegenerators — Page Generators#

This module offers a wide variety of page generators.

A page generator is an object that is iterable (see PEP 255) and that yields page objects on which other scripts can then work.

Most of these functions just wrap a Site or Page method that returns a generator. For testing purposes listpages.py can be used, to print page titles to standard output.

These parameters are supported to specify which pages titles to be used:

GENERATOR OPTIONS#

-cat

Work on all pages which are in a specific category. Argument can also be given as “-cat:categoryname” or as “-cat:categoryname|fromtitle” (using # instead of | is also allowed in this one and the following)

-catr

Like -cat, but also recursively includes pages in subcategories, sub-subcategories etc. of the given category. Argument can also be given as “-catr:categoryname” or as “-catr:categoryname|fromtitle”.

-subcats

Work on all subcategories of a specific category. Argument can also be given as “-subcats:categoryname” or as “-subcats:categoryname|fromtitle”.

-subcatsr

Like -subcats, but also includes sub-subcategories etc. of the given category. Argument can also be given as “-subcatsr:categoryname” or as “-subcatsr:categoryname|fromtitle”.

-uncat

Work on all pages which are not categorised.

-uncatcat

Work on all categories which are not categorised.

-uncatfiles

Work on all files which are not categorised.

-file

Read a list of pages to treat from the named text file. Page titles in the file may be either enclosed with [[brackets]], or be separated by new lines. Argument can also be given as “-file:filename”.

-filelinks

Work on all pages that use a certain image/media file. Argument can also be given as “-filelinks:filename”.

-search

Work on all pages that are found in a MediaWiki search across all namespaces.

-logevents

Work on articles that were on a specified Special:Log. The value may be a comma separated list of these values:

logevent,username,start,end

Deprecated since version 9.2: backward compatible total argument like logevent,username,total; use -limit filter option instead (see below).

To use the default value, use an empty string.

Note

‘start’ is the most recent date and log events are iterated from present to past. If ‘start’ is not provided, it means ‘now’; if ‘end’ is not provided, it means ‘since the beginning’.

See also

letype of API:Logevents for the supported types of log events.

Examples:

-logevents:move gives pages from move log (usually redirects)

-logevents:delete -limit20 gives 20 pages from deletion log

-logevents:protect,Usr gives pages from protect log by user Usr

-logevents:patrol,Usr -limit:20 gives 20 patrolled pages by Usr

-logevents:upload,,20121231,20100101 gives upload pages in the 2010s, 2011s, and 2012s

-logevents:review,,20121231 gives review pages since the beginning till the 31 Dec 2012

-logevents:review,Usr,20121231 gives review pages by user Usr since the beginning till the 31 Dec 2012

In some cases it must be given as -logevents:"move,Usr,20"

-interwiki

Work on the given page and all equivalent pages in other languages. This can, for example, be used to fight multi-site spamming. Attention: this will cause the bot to modify pages on several wiki sites, this is not well tested, so check your edits!

-links

Work on all pages that are linked from a certain page. Argument can also be given as “-links:linkingpagetitle”.

-liverecentchanges

Work on pages from the live recent changes feed. If used as -liverecentchanges:x, work on x recent changes.

-imagesused

Work on all images that contained on a certain page. Can also be given as “-imagesused:linkingpagetitle”.

-newimages

Work on the most recent new images. If given as -newimages:x, will work on x newest images.

-newpages

Work on the most recent new pages. If given as -newpages:x, will work on x newest pages.

-recentchanges

Work on the pages with the most recent changes. If given as -recentchanges:x, will work on the x most recently changed pages. If given as -recentchanges:offset,duration it will work on pages changed from ‘offset’ minutes with ‘duration’ minutes of timespan. rctags are supported too. The rctag must be the very first parameter part.

Examples:

-recentchanges:20 gives the 20 most recently changed pages

-recentchanges:120,70 will give pages with 120 offset minutes and 70 minutes of timespan

-recentchanges:visualeditor,10 gives the 10 most recently changed pages marked with ‘visualeditor’

-recentchanges:"mobile edit,60,35" will retrieve pages marked with ‘mobile edit’ for the given offset and timespan

-unconnectedpages

Work on the most recent unconnected pages to the Wikibase repository. Given as -unconnectedpages:x, will work on the x most recent unconnected pages.

-ref

Work on all pages that link to a certain page. Argument can also be given as “-ref:referredpagetitle”.

-start

Specifies that the robot should go alphabetically through all pages on the home wiki, starting at the named page. Argument can also be given as “-start:pagetitle”.

You can also include a namespace. For example, “-start:Template:!” will make the bot work on all pages in the template namespace.

default value is start:!

-prefixindex

Work on pages commencing with a common prefix.

-transcludes

Work on all pages that use a certain template. Argument can also be given as “-transcludes:Title”.

-unusedfiles

Work on all description pages of images/media files that are not used anywhere. Argument can be given as “-unusedfiles:n” where n is the maximum number of articles to work on.

-lonelypages

Work on all articles that are not linked from any other article. Argument can be given as “-lonelypages:n” where n is the maximum number of articles to work on.

-unwatched

Work on all articles that are not watched by anyone. Argument can be given as “-unwatched:n” where n is the maximum number of articles to work on.

-property

Work on all pages with a given property name from Special:PagesWithProp. Usage:

-property:name

-usercontribs

Work on all articles that were edited by a certain user. (Example : -usercontribs:DumZiBoT)

-weblink

Work on all articles that contain an external link to a given URL; may be given as “-weblink:url”

-withoutinterwiki

Work on all pages that don’t have interlanguage links. Argument can be given as “-withoutinterwiki:n” where n is the total to fetch.

-mysqlquery

Takes a MySQL query string like “SELECT page_namespace, page_title FROM page WHERE page_namespace = 0” and treats the resulting pages. See MySQL for more details.

-supersetquery

Takes a SQL query string like “SELECT page_namespace, page_title FROM page WHERE page_namespace = 0” and run it in https://superset.wmcloud.org/ and treats the resulting pages.

-sparql

Takes a SPARQL SELECT query string including ?item and works on the resulting pages.

-sparqlendpoint

Specify SPARQL endpoint URL (optional). (Example: -sparqlendpoint:http://myserver.com/sparql)

-searchitem

Takes a search string and works on Wikibase pages that contain it. Argument can be given as “-searchitem:text”, where text is the string to look for, or “-searchitem:lang:text”, where lang is the language to search items in.

-wantedpages

Work on pages that are linked, but do not exist; may be given as “-wantedpages:n” where n is the maximum number of articles to work on.

-wantedcategories

Work on categories that are used, but do not exist; may be given as “-wantedcategories:n” where n is the maximum number of categories to work on.

-wantedfiles

Work on files that are used, but do not exist; may be given as “-wantedfiles:n” where n is the maximum number of files to work on.

-wantedtemplates

Work on templates that are used, but do not exist; may be given as “-wantedtemplates:n” where n is the maximum number of templates to work on.

-random

Work on random pages returned by [[Special:Random]]. Can also be given as “-random:n” where n is the number of pages to be returned.

-randomredirect

Work on random redirect pages returned by [[Special:RandomRedirect]]. Can also be given as “-randomredirect:n” where n is the number of pages to be returned.

-google

Work on all pages that are found in a Google search. You need a Google Web API license key. Note that Google doesn’t give out license keys anymore. See google_key in config.py for instructions. Argument can also be given as “-google:searchstring”.

-page

Work on a single page. Argument can also be given as “-page:pagetitle”, and supplied multiple times for multiple pages.

-pageid

Work on a single pageid. Argument can also be given as “-pageid:pageid1,pageid2,.” or “-pageid:’pageid1|pageid2|..’” and supplied multiple times for multiple pages.

-pagepile

Work on a PagePile. Argument is the pile id (an integer)

-linter

Work on pages that contain lint errors. Extension Linter must be available on the site. -linter select all categories. -linter:high, -linter:medium or -linter:low select all categories for that prio. Single categories can be selected with commas as in -linter:cat1,cat2,cat3

Adding ‘/int’ identifies Lint ID to start querying from: e.g. -linter:high/10000

-linter:show just shows available categories.

-querypage

Work on pages provided by a QueryPage-based special page. Usage:

-querypage:name

-querypage without argument shows special pages available.

See also

API:Querypage

-url

Read a list of pages to treat from the provided URL. The URL must return text in the same format as expected for the -file argument, e.g. page titles separated by newlines or enclosed in brackets.

Tip

use -limit:n filter option to fetch only n pages.

FILTER OPTIONS#

-catfilter

Filter the page generator to only yield pages in the specified category. See -cat generator for argument format.

-grep

A regular expression that needs to match the article otherwise the page won’t be returned. Multiple -grep:regexpr can be provided and the page will be returned if content is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character, including a newline.

-grepnot

Like -grep, but return the page only if the regular expression does not match.

-intersect

Work on the intersection of all the provided generators.

-limit

When used with any other argument -limit:n specifies a set of pages, work on no more than n pages in total. If used with multiple generators, pages are yielded in a roundrobin way.

-namespaces

Filter the page generator to only yield pages in the

-namespace

specified namespaces. Separate multiple namespace

-ns

numbers or names with commas.

Examples:

-ns:0,2,4
-ns:Help,MediaWiki

You may use a preleading “not” to exclude the namespace.

Examples:

-ns:not:2,3
-ns:not:Help,File

If used with -newpages/-random/-randomredirect/-linter generators, -namespace/ns must be provided before -newpages/-random/-randomredirect/-linter. If used with -recentchanges generator, efficiency is improved if -namespace is provided before -recentchanges.

If used with -start generator, -namespace/ns shall contain only one value.

-onlyif

A claim the page needs to contain, otherwise the item won’t be returned. The format is property=value,qualifier=value. Multiple (or none) qualifiers can be passed, separated by commas.

Examples:

P1=Q2 (property P1 must contain value Q2),
P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and
qualifiers: P5 with value Q6 and P6 with value Q7).

Value can be page ID, coordinate in format: latitude,longitude[,precision] (all values are in decimal degrees), year, or plain string.

The argument can be provided multiple times and the item page will be returned only if all claims are present. Argument can be also given as “-onlyif:expression”.

-onlyifnot

A claim the page must not contain, otherwise the item won’t be returned. For usage and examples, see -onlyif above.

-ql

Filter pages based on page quality. This is only applicable if contentmodel equals ‘proofread-page’, otherwise has no effects. Valid values are in range 0-4. Multiple values can be comma-separated.

-redirect

Filter pages based on whether they are redirects. To return only pages that are not redirects, use -redirect:false

-subpage

-subpage:n filters pages to only those that have depth n i.e. a depth of 0 filters out all pages that are subpages, and a depth of 1 filters out all pages that are subpages of subpages.

-titleregex

A regular expression that needs to match the article title otherwise the page won’t be returned. Multiple -titleregex:regexpr can be provided and the page will be returned if title is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character.

-titleregexnot

Like -titleregex, but return the page only if the regular expression does not match.

pagegenerators.AllpagesPageGenerator(start='!', namespace=0, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#

Iterate Page objects for all titles in a single namespace.

Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.

Parameters:
  • start (str) – if provided, only generate pages >= this title lexically

  • namespace (SingleNamespaceType) – Namespace to retrieve pages from

  • includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.

  • site (BaseSite | None) – Site for generator results.

  • total (int | None) – Maximum number of pages to retrieve in total

  • content (bool) – If True, load current version of each page (default False)

  • filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).

Returns:

a generator that yields Page objects

Raises:

ValueErrorfilterredir as well as includeredirects parameters were given. Use filterredir only.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.AncientPagesPageGenerator(total=100, site=None)[source]#

Ancient page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.CategorizedPageGenerator(category, recurse=False, start=None, total=None, content=False, namespaces=None)[source]#

Yield all pages in a specific category.

Parameters:
  • recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)

  • start (str | None) – if provided, only generate pages >= this title lexically

  • total (int | None) – iterate no more than this number of pages in total (at all levels)

  • content (bool) – if True, retrieve the content of the current version of each page (default False)

  • category (pywikibot.page.Category)

  • namespaces (NamespaceArgType)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.CategoryFilterPageGenerator(generator, category_list)[source]#

Wrap a generator to filter pages by categories specified.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • category_list (Sequence[Category]) – categories used to filter generated pages

Return type:

Generator[BasePage, None, None]

pagegenerators.DayPageGenerator(start_month=1, end_month=12, site=None, year=2000)[source]#

Day page generator.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • year (int) – considering leap year.

  • start_month (int)

  • end_month (int)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.DeadendPagesPageGenerator(total=100, site=None)[source]#

Dead-end page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.DequePreloadingGenerator(generator, groupsize=50, quiet=False)[source]#

Preload generator of type DequeGenerator.

Parameters:
  • generator (DequeGenerator) – pages to iterate over

  • groupsize (int) – how many pages to preload at once

  • quiet (bool) – If False (default), show the “Retrieving pages” message

Return type:

Generator[Page, None, None]

pagegenerators.EdittimeFilterPageGenerator(generator, last_edit_start=None, last_edit_end=None, first_edit_start=None, first_edit_end=None, show_filtered=False)[source]#

Wrap a generator to filter pages outside last or first edit range.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • last_edit_start (datetime | None) – Only yield pages last edited after this time

  • last_edit_end (datetime | None) – Only yield pages last edited before this time

  • first_edit_start (datetime | None) – Only yield pages first edited after this time

  • first_edit_end (datetime | None) – Only yield pages first edited before this time

  • show_filtered (bool) – Output a message for each page not yielded

Return type:

Generator[BasePage, None, None]

pagegenerators.FileLinksGenerator(referredFilePage, total=None, content=False)[source]#

Yield Pages on which referredFilePage file is displayed.

Parameters:
  • referredFilePage (FilePage)

  • total (int | None)

  • content (bool)

Return type:

Iterable[Page]

class pagegenerators.GeneratorFactory(site=None, positional_arg_name=None, enabled_options=None, disabled_options=None)[source]#

Bases: object

Process command line arguments and return appropriate page generator.

This factory is responsible for processing command line arguments that are used by many scripts and that determine which pages to work on.

Note

GeneratorFactory must be instantiated after global arguments are parsed except if site parameter is given.

Parameters:
  • site (BaseSite | None) – Site for generator results

  • positional_arg_name (str | None) – generator to use for positional args, which do not begin with a hyphen

  • enabled_options (Iterable[str] | None) – only enable options given by this Iterable. This is priorized over disabled_options

  • disabled_options (Iterable[str] | None) – disable these given options and let them be handled by scripts options handler

getCategory(category)[source]#

Return Category and start as defined by category.

Parameters:

category (str) – category name with start parameter

Return type:

tuple[Category, str | None]

getCategoryGen(category, recurse=False, content=False, gen_func=None)[source]#

Return generator based on Category defined by category and gen_func.

Parameters:
  • category (str) – category name with start parameter

  • recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)

  • content (bool) – if True, retrieve the content of the current version of each page (default False)

  • gen_func (Callable | None)

Return type:

Any

getCombinedGenerator(gen=None, preload=False)[source]#

Return the combination of all accumulated generators.

Only call this after all arguments have been parsed.

Changed in version 7.3: set the instance variable is_preloading to True or False.

Changed in version 8.0: if limit option is set and multiple generators are given, pages are yieded in a roundrobin way.

Parameters:
  • gen (OPT_GENERATOR_TYPE) – Another generator to be combined with

  • preload (bool) – preload pages using PreloadingGenerator unless self.nopreload is True

Return type:

OPT_GENERATOR_TYPE

handle_arg(arg)[source]#

Parse one argument at a time.

If it is recognized as an argument that specifies a generator, a generator is created and added to the accumulation list, and the function returns true. Otherwise, it returns false, so that caller can try parsing the argument. Call getCombinedGenerator() after all arguments have been parsed to get the final output generator.

Added in version 6.0: renamed from handleArg

Parameters:

arg (str) – Pywikibot argument consisting of -name:value

Returns:

True if the argument supplied was recognised by the factory

Return type:

bool

handle_args(args)[source]#

Handle command line arguments and return the rest as a list.

Added in version 6.0.

Changed in version 7.3: Prioritize -namespaces options to solve problems with several generators like -newpages/-random/-randomredirect/-linter

Parameters:

args (Iterable[str])

Return type:

list[str]

is_preloading: bool | None#

Return whether Page objects are preloaded. You may use this instance variable after getCombinedGenerator() is called e.g.:

gen_factory = GeneratorFactory()
print(gen_factory.is_preloading)  # None
gen = gen_factory.getCombinedGenerator()
print(gen_factory.is_preloading)  # True or False

Otherwise the value is undefined and gives None.

Added in version 7.3.

property namespaces: frozenset[Namespace]#

List of Namespace parameters.

Converts int or string namespaces to Namespace objects and change the storage to immutable once it has been accessed.

The resolving and validation of namespace command line arguments is performed in this method, as it depends on the site property which is lazy loaded to avoid being cached before the global arguments are handled.

Returns:

namespaces selected using arguments

Raises:
  • KeyError – a namespace identifier was not resolved

  • TypeError – a namespace identifier has an inappropriate type such as NoneType or bool

property site: BaseSite#

Generator site.

The generator site should not be accessed until after the global arguments have been handled, otherwise the default Site may be changed by global arguments, which will cause this cached value to be stale.

Returns:

Site given to initializer, otherwise the default Site at the time this property is first accessed.

class pagegenerators.GoogleSearchPageGenerator(query=None, site=None)[source]#

Bases: GeneratorWrapper

Page generator using Google search results.

To use this generator, you need to install the package ‘google’:

https://pypi.org/project/google

This package has been available since 2010, hosted on GitHub since 2012, and provided by PyPI since 2013.

As there are concerns about Google’s Terms of Service, this generator prints a warning for each query.

Changed in version 7.6: subclassed from tools.collections.GeneratorWrapper

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • query (str | None)

property generator: Generator[Page, None, None]#

Yield results from queryGoogle() query.

Google contains links in the format: https://de.wikipedia.org/wiki/en:Foobar

Changed in version 7.6: changed from iterator method to generator property

static queryGoogle(query)[source]#

Perform a query using python package ‘google’.

The terms of service as at June 2014 give two conditions that may apply to use of search:

  1. Don’t access [Google Services] using a method other than the interface and the instructions that [they] provide.

  2. Don’t remove, obscure, or alter any legal notices displayed in or along with [Google] Services.

Both of those issues should be managed by the package ‘google’, however Pywikibot will at least ensure the user sees the TOS in order to comply with the second condition.

Parameters:

query (str)

Return type:

Generator[str, None, None]

pagegenerators.ImagesPageGenerator(pageWithImages, total=None, content=False)[source]#

Yield FilePages displayed on pageWithImages.

Parameters:
  • pageWithImages (Page)

  • total (int | None)

  • content (bool)

Return type:

Iterable[Page]

pagegenerators.InterwikiPageGenerator(page)[source]#

Iterate over all interwiki (non-language) links on a page.

Parameters:

page (Page)

Return type:

Generator[Page, None, None]

pagegenerators.ItemClaimFilterPageGenerator(generator, prop, claim, qualifiers=None, negate=False)#

Yield all ItemPages which contain certain claim in a property.

Parameters:
  • prop (str) – property id to check

  • claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).

  • qualifiers (dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant

  • negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise

  • generator (Iterable[WikibasePage])

Return type:

Generator[WikibasePage, None, None]

pagegenerators.LanguageLinksPageGenerator(page, total=None)[source]#

Iterate over all interwiki language links on a page.

Parameters:
  • page (Page)

  • total (int | None)

Return type:

Generator[Page, None, None]

pagegenerators.LinkedPageGenerator(linkingPage, total=None, content=False)[source]#

Yield all pages linked from a specific page.

See page.BasePage.linkedPages for details.

Parameters:
  • linkingPage (Page) – the page that links to the pages we want

  • total (int | None) – the total number of pages to iterate

  • content (bool) – if True, retrieve the current content of each linked page

Returns:

a generator that yields Page objects of pages linked to linkingPage

Return type:

Iterable[BasePage]

pagegenerators.LinksearchPageGenerator(url, namespaces=None, total=None, site=None, protocol=None)[source]#

Yield all pages that link to a certain URL.

Parameters:
  • url (str) – The URL to search for (with or without the protocol prefix); this may include a ‘*’ as a wildcard, only at the start of the hostname

  • namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from

  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results

  • protocol (str | None) – Protocol to search for, likely http or https, http by default. Full list shown on Special:LinkSearch wikipage.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.LiveRCPageGenerator(site=None, total=None)[source]#

Yield pages from a socket.io RC stream.

Generates pages based on the EventStreams Server-Sent-Event (SSE) recent changes stream. The Page objects will have an extra property ._rcinfo containing the literal rc data. This can be used to e.g. filter only new pages. See pywikibot.comms.eventstreams.rc_listener for details on the .rcinfo format.

Parameters:
  • site (BaseSite | None) – site to return recent changes for

  • total (int | None) – the maximum number of changes to return

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.LogeventsPageGenerator(logtype=None, user=None, site=None, namespace=None, total=None, start=None, end=None, reverse=False)[source]#

Generate Pages for specified modes of logevents.

Parameters:
  • logtype (str | None) – Mode of logs to retrieve

  • user (str | None) – User of logs retrieved

  • site (BaseSite | None) – Site for generator results

  • namespace (SingleNamespaceType | None) – Namespace to retrieve logs from

  • total (int | None) – Maximum number of pages to retrieve in total

  • start (Timestamp | None) – Timestamp to start listing from

  • end (Timestamp | None) – Timestamp to end listing at

  • reverse (bool) – if True, start with oldest changes (default: newest)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.LonelyPagesPageGenerator(total=None, site=None)[source]#

Lonely page generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.LongPagesPageGenerator(total=100, site=None)[source]#

Long page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.MySQLPageGenerator(query, site=None, verbose=None)[source]#

Yield a list of pages based on a MySQL query.

The query should return two columns, page namespace and page title pairs from some table. An example query that yields all ns0 pages might look like:

SELECT
 page_namespace,
 page_title
FROM page
WHERE page_namespace = 0;

See also

MySQL

Parameters:
  • query (str) – MySQL query to execute

  • site (BaseSite | None) – Site object

  • verbose (bool | None) – if True, print query to be executed; if None, config.verbose_output will be used.

Returns:

generator which yields pywikibot.Page

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.NamespaceFilterPageGenerator(generator, namespaces, site=None)[source]#

A generator yielding pages from another generator in given namespaces.

If a site is provided, the namespaces are validated using the namespaces of that site, otherwise the namespaces are validated using the default site.

Note

API-based generators that have a “namespaces” parameter perform namespace filtering more efficiently than this generator.

Parameters:
  • namespaces (frozenset[Namespace] | str | Namespace | Sequence[str | Namespace]) – list of namespace identifiers to limit results

  • site (BaseSite | None) – Site for generator results; mandatory if namespaces contains namespace names. Defaults to the default site.

  • generator (Iterable[pywikibot.page.BasePage])

Raises:
  • KeyError – a namespace identifier was not resolved

  • TypeError – a namespace identifier has an inappropriate type such as NoneType or bool, or more than one namespace if the API module does not support multiple namespaces

Return type:

Generator[pywikibot.page.BasePage, None, None]

pagegenerators.NewimagesPageGenerator(total=None, site=None)[source]#

New file generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.NewpagesPageGenerator(site=None, namespaces=(0,), total=None)[source]#

Iterate Page objects for all new titles in a single namespace.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • namespaces (NamespaceArgType) – namespace to retrieve pages from

  • total (int | None) – Maximum number of pages to retrieve in total

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.PageClassGenerator(generator)[source]#

Yield pages from another generator as Page subclass objects.

The page class type depends on the page namespace. Objects may be Category, FilePage, Userpage or Page.

Parameters:

generator (Iterable[Page])

Return type:

Generator[Page, None, None]

class pagegenerators.PagePilePageGenerator(id)[source]#

Bases: GeneratorWrapper

Queries PagePile to generate pages.

Added in version 9.0.

Parameters:

id (int) – The PagePile id to query

buildQuery(id)[source]#

Get the querystring options to query PagePile.

Parameters:

id (int) – int

Returns:

Dictionary of querystring parameters to use in the query

property generator: Generator[Page, None, None]#

Yield results from query().

query()[source]#

Query PagePile.

Raises:
  • ServerError – Either ReadTimeout or server status error

  • APIError – error response from petscan

Return type:

Generator[str, None, None]

pagegenerators.PageTitleFilterPageGenerator(generator, ignore_list)[source]#

Yield only those pages are not listed in the ignore list.

Parameters:
  • ignore_list (dict[str, dict[str, str]]) – family names are mapped to dictionaries in which language codes are mapped to lists of page titles. Each title must be a valid regex as they are compared using re.search.

  • generator (Iterable[BasePage])

Return type:

Generator[BasePage, None, None]

pagegenerators.PageWithTalkPageGenerator(generator, return_talk_only=False)[source]#

Yield pages and associated talk pages from another generator.

Only yields talk pages if the original generator yields a non-talk page, and does not check if the talk page in fact exists.

Parameters:
  • generator (Iterable[BasePage])

  • return_talk_only (bool)

Return type:

Generator[BasePage, None, None]

pagegenerators.PagesFromPageidGenerator(pageids, site=None)[source]#

Return a page generator from pageids.

Pages are iterated in the same order than in the underlying pageids. Pageids are filtered and only one page is returned in case of duplicate pageid.

Parameters:
  • pageids (Iterable[str]) – an iterable that returns pageids, or a comma-separated string of pageids (e.g. ‘945097,1483753,956608’)

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.PagesFromTitlesGenerator(iterable, site=None)[source]#

Generate pages from the titles (strings) yielded by iterable.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • iterable (Iterable[str])

Return type:

Generator[pywikibot.page.Page, None, None]

class pagegenerators.PetScanPageGenerator(categories, subset_combination=True, namespaces=None, site=None, extra_options=None)[source]#

Bases: GeneratorWrapper

Queries PetScan to generate pages.

Added in version 3.0.

Changed in version 7.6: subclassed from tools.collections.GeneratorWrapper

Parameters:
  • categories (Sequence[str]) – List of category names to retrieve pages from

  • subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories

  • namespaces (Iterable[int | pywikibot.site.Namespace] | None) – List of namespaces to search in (default is None, meaning all namespaces)

  • site (BaseSite | None) – Site to operate on (default is the default site from the user config)

  • extra_options (dict[Any, Any] | None) – Dictionary of extra options to use (optional)

buildQuery(categories, subset_combination, namespaces, extra_options)[source]#

Get the querystring options to query PetScan.

Parameters:
  • categories (Sequence[str]) – List of categories (as strings)

  • subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories

  • namespaces (Iterable[int | Namespace] | None) – List of namespaces to search in

  • extra_options (dict[Any, Any] | None) – Dictionary of extra options to use

Returns:

Dictionary of querystring parameters to use in the query

Return type:

dict[str, Any]

property generator: Generator[Page, None, None]#

Yield results from query().

Changed in version 7.6: changed from iterator method to generator property

query()[source]#

Query PetScan.

Changed in version 7.4: raises APIError if query returns an error message.

Raises:
  • ServerError – Either ReadTimeout or server status error

  • APIError – error response from petscan

Return type:

Generator[dict[str, Any], None, None]

pagegenerators.PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#

Prefixed Page generator.

Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.

Parameters:
  • prefix (str) – The prefix of the pages.

  • namespace (SingleNamespaceType | None) – Namespace to retrieve pages from

  • includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.

  • site (BaseSite | None) – Site for generator results.

  • total (int | None) – Maximum number of pages to retrieve in total

  • content (bool) – If True, load current version of each page (default False)

  • filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).

Returns:

a generator that yields Page objects

Raises:

ValueErrorfilterredir as well as includeredirects parameters were given. Use filterredir only.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.PreloadingEntityGenerator(generator, groupsize=50)[source]#

Yield preloaded pages taken from another generator.

Function basically is copied from above, but for Wikibase entities.

Parameters:
  • generator (Iterable[WikibaseEntity]) – pages to iterate over

  • groupsize (int) – how many pages to preload at once

Return type:

Generator[WikibaseEntity, None, None]

pagegenerators.PreloadingGenerator(generator, groupsize=50, quiet=False)[source]#

Yield preloaded pages taken from another generator.

Parameters:
  • generator (Iterable[Page]) – pages to iterate over

  • groupsize (int) – how many pages to preload at once

  • quiet (bool) – If False (default), show the “Retrieving pages” message

Return type:

Generator[Page, None, None]

pagegenerators.QualityFilterPageGenerator(generator, quality)[source]#

Wrap a generator to filter pages according to quality levels.

This is possible only for pages with content_model ‘proofread-page’. In all the other cases, no filter is applied.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • quality (list[int]) – proofread-page quality levels (valid range 0-4)

Return type:

Generator[BasePage, None, None]

pagegenerators.RandomPageGenerator(total=None, site=None, namespaces=None)[source]#

Random page generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

  • namespaces (NamespaceArgType)

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.RandomRedirectPageGenerator(total=None, site=None, namespaces=None)[source]#

Random redirect generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

  • namespaces (NamespaceArgType)

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.RecentChangesPageGenerator(site=None, _filter_unique=None, **kwargs)[source]#

Generate recent changes pages, including duplicates.

For keyword parameters refer APISite.recentchanges().

Changed in version 8.2: The YieldType depends on namespace. It can be pywikibot.Page, pywikibot.User, pywikibot.FilePage or pywikibot.Category.

Changed in version 9.4: Ignore pywikibot.FilePage if it raises a ValueError during upcast e.g. due to an invalid file extension.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • _filter_unique (None | Callable[[Iterable[Page]], Iterable[Page]])

  • kwargs (Any)

Return type:

Generator[Page, None, None]

pagegenerators.RedirectFilterPageGenerator(generator, no_redirects=True, show_filtered=False)[source]#

Yield pages from another generator that are redirects or not.

Parameters:
  • no_redirects (bool) – Exclude redirects if True, else only include redirects.

  • show_filtered (bool) – Output a message for each page not yielded

  • generator (Iterable[BasePage])

Return type:

Generator[BasePage, None, None]

pagegenerators.RegexBodyFilterPageGenerator(generator, regex, quantifier='any')#

Yield pages from another generator whose body matches regex.

Uses regex option re.IGNORECASE depending on the quantifier parameter.

For parameters see titlefilter above.

Parameters:
  • generator (Iterable[pywikibot.page.BasePage])

  • regex (PATTERN_STR_OR_SEQ_TYPE)

  • quantifier (str)

Return type:

Generator[pywikibot.page.BasePage, None, None]

pagegenerators.RegexFilterPageGenerator(generator, regex, quantifier='any', ignore_namespace=True)#

Yield pages from another generator whose title matches regex.

Uses regex option re.IGNORECASE depending on the quantifier parameter.

If ignore_namespace is False, the whole page title is compared.

Note

if you want to check for a match at the beginning of the title, you have to start the regex with “^”

Parameters:
  • generator (Iterable[pywikibot.page.BasePage]) – another generator

  • regex (PATTERN_STR_OR_SEQ_TYPE) – a regex which should match the page title

  • quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes

  • ignore_namespace (bool) – ignore the namespace when matching the title

Returns:

return a page depending on the matching parameters

Return type:

Generator[pywikibot.page.BasePage, None, None]

pagegenerators.RepeatingGenerator(generator, key_func=<function <lambda>>, sleep_duration=60, total=None, **kwargs)[source]#

Yield items in live time.

The provided generator must support parameter ‘start’, ‘end’, ‘reverse’, and ‘total’ such as site.recentchanges(), site.logevents().

To fetch revisions in recentchanges in live time:

gen = RepeatingGenerator(site.recentchanges, lambda x: x['revid'])

To fetch new pages in live time:

gen = RepeatingGenerator(site.newpages, lambda x: x[0])

Note that other parameters not listed below will be passed to the generator function. Parameter ‘reverse’, ‘start’, ‘end’ will always be discarded to prevent the generator yielding items in wrong order.

Parameters:
  • generator (Callable[[...], Iterable[BasePage]]) – a function returning a generator that will be queried

  • key_func (Callable[[BasePage], Any]) – a function returning key that will be used to detect duplicate entry

  • sleep_duration (int) – duration between each query

  • total (int | None) – if it is a positive number, iterate no more than this number of items in total. Otherwise, iterate forever

  • kwargs (Any)

Returns:

a generator yielding items in ascending order by time

Return type:

Generator[Page, None, None]

pagegenerators.SearchPageGenerator(query, total=None, namespaces=None, site=None)[source]#

Yield pages from the MediaWiki internal search engine.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

  • query (str)

  • namespaces (NamespaceArgType)

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.ShortPagesPageGenerator(total=100, site=None)[source]#

Short page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.SubCategoriesPageGenerator(category, recurse=False, start=None, total=None, content=False)[source]#

Yield all subcategories in a specific category.

Parameters:
  • recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)

  • start (str | None) – if provided, only generate pages >= this title lexically

  • total (int | None) – iterate no more than this number of pages in total (at all levels)

  • content (bool) – if True, retrieve the content of the current version of each page (default False)

  • category (Category)

Return type:

Generator[Page, None, None]

pagegenerators.SubpageFilterGenerator(generator, max_depth=0, show_filtered=False)[source]#

Generator which filters out subpages based on depth.

It looks at the namespace of each page and checks if that namespace has subpages enabled. If so, pages with forward slashes (‘/’) are excluded.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • max_depth (int) – Max depth of subpages to yield, at least zero

  • show_filtered (bool) – Output a message for each page not yielded

Return type:

Generator[BasePage, None, None]

pagegenerators.SupersetPageGenerator(query, site=None, schema_name=None, database_id=None)[source]#

Generate pages that result from the given SPARQL query.

Pages are generated using site in following order:

  1. site retrieved using page_wikidb column in SQL result

  2. site as parameter

  3. site retrieved using schema_name

SQL columns used are

  • page_id

  • page_namespace + page_title

  • page_wikidb

Example SQL queries

SELECT
    gil_wiki AS page_wikidb,
    gil_page AS page_id
FROM globalimagelinks
GROUP BY gil_wiki
LIMIT 10

OR

SELECT
    page_id
FROM page
LIMIT 10

OR

SELECT
    page_namespace,
    page_title
FROM page
LIMIT 10

Added in version 9.2.

Parameters:
  • query (str) – the SQL query string.

  • site (BaseSite | None) – Site for generator results.

  • schema_name (str | None) – target superset schema name

  • database_id (int | None) – target superset database id

Return type:

Iterator[pywikibot.page.Page]

pagegenerators.TextIOPageGenerator(source=None, site=None)[source]#

Iterate pages from a list in a text file or on a webpage.

The text source must contain page links between double-square-brackets or, alternatively, separated by newlines. The generator will yield each corresponding Page object.

Parameters:
  • source (str | None) – the file path or URL that should be read. If no name is given, the generator prompts the user.

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.UnCategorizedCategoryGenerator(total=100, site=None)[source]#

Uncategorized category generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[Category]

pagegenerators.UnCategorizedImageGenerator(total=100, site=None)[source]#

Uncategorized file generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.FilePage]

pagegenerators.UnCategorizedPageGenerator(total=100, site=None)[source]#

Uncategorized page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.UnCategorizedTemplateGenerator(total=100, site=None)[source]#

Uncategorized template generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.UnconnectedPageGenerator(site=None, total=None)[source]#

Iterate Page objects for all unconnected pages to a Wikibase repository.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.UnusedFilesGenerator(total=None, site=None)[source]#

Unused files generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.FilePage]

pagegenerators.UnwatchedPagesPageGenerator(total=None, site=None)[source]#

Unwatched page generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.UserContributionsGenerator(username, namespaces=None, site=None, total=None, _filter_unique=functools.partial(<function filter_unique>, key=<function <lambda>>))[source]#

Yield unique pages edited by user:username.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from

  • site (BaseSite | None) – Site for generator results.

  • username (str)

  • _filter_unique (None | Callable[[Iterable[pywikibot.page.Page]], Iterable[pywikibot.page.Page]])

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.UserEditFilterGenerator(generator, username, timestamp=None, skip=False, max_revision_depth=None, show_filtered=False)[source]#

Generator which will yield Pages modified by username.

It only looks at the last editors given by max_revision_depth. If timestamp is set in MediaWiki format JJJJMMDDhhmmss, older edits are ignored. If skip is set, pages edited by the given user are ignored otherwise only pages edited by this user are given back.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • username (str) – user name which edited the page

  • timestamp (str | datetime | None) – ignore edits which are older than this timestamp

  • skip (bool) – Ignore pages edited by the given user

  • max_revision_depth (int | None) – It only looks at the last editors given by max_revision_depth

  • show_filtered (bool) – Output a message for each page not yielded

Return type:

Generator[BasePage, None, None]

pagegenerators.WantedPagesPageGenerator(total=100, site=None)[source]#

Wanted page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators.WikibaseItemFilterPageGenerator(generator, has_item=True, show_filtered=False)[source]#

A wrapper generator used to exclude if page has a Wikibase item or not.

Parameters:
  • generator (Iterable[BasePage]) – Generator to wrap.

  • has_item (bool) – Exclude pages without an item if True, or only include pages without an item if False

  • show_filtered (bool) – Output a message for each page not yielded

Returns:

Wrapped generator

Return type:

Generator[BasePage, None, None]

pagegenerators.WikibaseItemGenerator(gen)[source]#

A wrapper generator used to yield Wikibase items of another generator.

Parameters:

gen (Iterable[Page]) – Generator to wrap.

Returns:

Wrapped generator

Return type:

Generator[ItemPage, None, None]

pagegenerators.WikibaseSearchItemPageGenerator(text, language=None, total=None, site=None)[source]#

Generate pages that contain the provided text.

Parameters:
  • text (str) – Text to look for.

  • language (str | None) – Code of the language to search in. If not specified, value from pywikibot.config.data_lang is used.

  • total (int | None) – Maximum number of pages to retrieve in total, or None in case of no limit.

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.ItemPage, None, None]

pagegenerators.WikidataPageFromItemGenerator(gen, site)[source]#

Generate pages from site based on sitelinks of item pages.

Parameters:
Return type:

Generator[Page, None, None]

pagegenerators.WikidataSPARQLPageGenerator(query, site=None, item_name='item', endpoint=None, entity_url=None, result_type=<class 'set'>)[source]#

Generate pages that result from the given SPARQL query.

Parameters:
  • query (str) – the SPARQL query string.

  • site (BaseSite | None) – Site for generator results.

  • item_name (str) – name of the item in the SPARQL query

  • endpoint (str | None) – SPARQL endpoint URL

  • entity_url (str | None) – URL prefix for any entities returned in a query.

  • result_type (Any) – type of the iterable in which SPARQL results are stored (default set)

Return type:

Iterator[pywikibot.page.Page]

pagegenerators.WithoutInterwikiPageGenerator(total=None, site=None)[source]#

Page lacking interwikis generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

class pagegenerators.XMLDumpPageGenerator(filename, start=None, namespaces=None, site=None, text_predicate=None, content=False)[source]#

Bases: Iterator

Xml iterator that yields Page objects.

Added in version 7.2: the content parameter

Parameters:
  • filename (str) – filename of XML dump

  • start (str | None) – skip entries below that value

  • namespaces (NamespaceArgType) – namespace filter

  • site (BaseSite | None) – current site for the generator

  • text_predicate (Callable[[str], bool] | None) – a callable with entry.text as parameter and boolean as result to indicate the generator should return the page or not

  • content – If True, assign old page content to Page.text

Variables:
  • skipping – True if start parameter is given, else False

  • parser – holds the xmlreader.XmlDump parse method

pagegenerators.YearPageGenerator(start=1, end=2050, site=None)[source]#

Year page generator.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • start (int)

  • end (int)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators.page_with_property_generator(name, total=None, site=None)[source]#

Special:PagesWithProperty page generator.

Parameters:
  • name (str) – Property name of pages to be retrieved

  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._factory — Pagegenerators Options Handler#

GeneratorFactory module which handles pagegenerators options.

class pagegenerators._factory.GeneratorFactory(site=None, positional_arg_name=None, enabled_options=None, disabled_options=None)[source]#

Bases: object

Process command line arguments and return appropriate page generator.

This factory is responsible for processing command line arguments that are used by many scripts and that determine which pages to work on.

Note

GeneratorFactory must be instantiated after global arguments are parsed except if site parameter is given.

Parameters:
  • site (BaseSite | None) – Site for generator results

  • positional_arg_name (str | None) – generator to use for positional args, which do not begin with a hyphen

  • enabled_options (Iterable[str] | None) – only enable options given by this Iterable. This is priorized over disabled_options

  • disabled_options (Iterable[str] | None) – disable these given options and let them be handled by scripts options handler

getCategory(category)[source]#

Return Category and start as defined by category.

Parameters:

category (str) – category name with start parameter

Return type:

tuple[Category, str | None]

getCategoryGen(category, recurse=False, content=False, gen_func=None)[source]#

Return generator based on Category defined by category and gen_func.

Parameters:
  • category (str) – category name with start parameter

  • recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)

  • content (bool) – if True, retrieve the content of the current version of each page (default False)

  • gen_func (Callable | None)

Return type:

Any

getCombinedGenerator(gen=None, preload=False)[source]#

Return the combination of all accumulated generators.

Only call this after all arguments have been parsed.

Changed in version 7.3: set the instance variable is_preloading to True or False.

Changed in version 8.0: if limit option is set and multiple generators are given, pages are yieded in a roundrobin way.

Parameters:
  • gen (OPT_GENERATOR_TYPE) – Another generator to be combined with

  • preload (bool) – preload pages using PreloadingGenerator unless self.nopreload is True

Return type:

OPT_GENERATOR_TYPE

handle_arg(arg)[source]#

Parse one argument at a time.

If it is recognized as an argument that specifies a generator, a generator is created and added to the accumulation list, and the function returns true. Otherwise, it returns false, so that caller can try parsing the argument. Call getCombinedGenerator() after all arguments have been parsed to get the final output generator.

Added in version 6.0: renamed from handleArg

Parameters:

arg (str) – Pywikibot argument consisting of -name:value

Returns:

True if the argument supplied was recognised by the factory

Return type:

bool

handle_args(args)[source]#

Handle command line arguments and return the rest as a list.

Added in version 6.0.

Changed in version 7.3: Prioritize -namespaces options to solve problems with several generators like -newpages/-random/-randomredirect/-linter

Parameters:

args (Iterable[str])

Return type:

list[str]

is_preloading: bool | None#

Return whether Page objects are preloaded. You may use this instance variable after getCombinedGenerator() is called e.g.:

gen_factory = GeneratorFactory()
print(gen_factory.is_preloading)  # None
gen = gen_factory.getCombinedGenerator()
print(gen_factory.is_preloading)  # True or False

Otherwise the value is undefined and gives None.

Added in version 7.3.

property namespaces: frozenset[Namespace]#

List of Namespace parameters.

Converts int or string namespaces to Namespace objects and change the storage to immutable once it has been accessed.

The resolving and validation of namespace command line arguments is performed in this method, as it depends on the site property which is lazy loaded to avoid being cached before the global arguments are handled.

Returns:

namespaces selected using arguments

Raises:
  • KeyError – a namespace identifier was not resolved

  • TypeError – a namespace identifier has an inappropriate type such as NoneType or bool

property site: BaseSite#

Generator site.

The generator site should not be accessed until after the global arguments have been handled, otherwise the default Site may be changed by global arguments, which will cause this cached value to be stale.

Returns:

Site given to initializer, otherwise the default Site at the time this property is first accessed.

pagegenerators._filters — Filter Functions#

Page filter generators provided by the pagegenerators module.

pagegenerators._filters.CategoryFilterPageGenerator(generator, category_list)[source]#

Wrap a generator to filter pages by categories specified.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • category_list (Sequence[Category]) – categories used to filter generated pages

Return type:

Generator[BasePage, None, None]

pagegenerators._filters.EdittimeFilterPageGenerator(generator, last_edit_start=None, last_edit_end=None, first_edit_start=None, first_edit_end=None, show_filtered=False)[source]#

Wrap a generator to filter pages outside last or first edit range.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • last_edit_start (datetime | None) – Only yield pages last edited after this time

  • last_edit_end (datetime | None) – Only yield pages last edited before this time

  • first_edit_start (datetime | None) – Only yield pages first edited after this time

  • first_edit_end (datetime | None) – Only yield pages first edited before this time

  • show_filtered (bool) – Output a message for each page not yielded

Return type:

Generator[BasePage, None, None]

class pagegenerators._filters.ItemClaimFilter[source]#

Bases: object

Item claim filter.

classmethod filter(generator, prop, claim, qualifiers=None, negate=False)[source]#

Yield all ItemPages which contain certain claim in a property.

Parameters:
  • prop (str) – property id to check

  • claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).

  • qualifiers (dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant

  • negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise

  • generator (Iterable[WikibasePage])

Return type:

Generator[WikibasePage, None, None]

page_classes = {False: <class 'pywikibot.page._wikibase.ItemPage'>, True: <class 'pywikibot.page._wikibase.PropertyPage'>}#
pagegenerators._filters.ItemClaimFilterPageGenerator(generator, prop, claim, qualifiers=None, negate=False)#

Yield all ItemPages which contain certain claim in a property.

Parameters:
  • prop (str) – property id to check

  • claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).

  • qualifiers (dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant

  • negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise

  • generator (Iterable[WikibasePage])

Return type:

Generator[WikibasePage, None, None]

pagegenerators._filters.NamespaceFilterPageGenerator(generator, namespaces, site=None)[source]#

A generator yielding pages from another generator in given namespaces.

If a site is provided, the namespaces are validated using the namespaces of that site, otherwise the namespaces are validated using the default site.

Note

API-based generators that have a “namespaces” parameter perform namespace filtering more efficiently than this generator.

Parameters:
  • namespaces (frozenset[Namespace] | str | Namespace | Sequence[str | Namespace]) – list of namespace identifiers to limit results

  • site (BaseSite | None) – Site for generator results; mandatory if namespaces contains namespace names. Defaults to the default site.

  • generator (Iterable[pywikibot.page.BasePage])

Raises:
  • KeyError – a namespace identifier was not resolved

  • TypeError – a namespace identifier has an inappropriate type such as NoneType or bool, or more than one namespace if the API module does not support multiple namespaces

Return type:

Generator[pywikibot.page.BasePage, None, None]

pagegenerators._filters.PageTitleFilterPageGenerator(generator, ignore_list)[source]#

Yield only those pages are not listed in the ignore list.

Parameters:
  • ignore_list (dict[str, dict[str, str]]) – family names are mapped to dictionaries in which language codes are mapped to lists of page titles. Each title must be a valid regex as they are compared using re.search.

  • generator (Iterable[BasePage])

Return type:

Generator[BasePage, None, None]

pagegenerators._filters.QualityFilterPageGenerator(generator, quality)[source]#

Wrap a generator to filter pages according to quality levels.

This is possible only for pages with content_model ‘proofread-page’. In all the other cases, no filter is applied.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • quality (list[int]) – proofread-page quality levels (valid range 0-4)

Return type:

Generator[BasePage, None, None]

pagegenerators._filters.RedirectFilterPageGenerator(generator, no_redirects=True, show_filtered=False)[source]#

Yield pages from another generator that are redirects or not.

Parameters:
  • no_redirects (bool) – Exclude redirects if True, else only include redirects.

  • show_filtered (bool) – Output a message for each page not yielded

  • generator (Iterable[BasePage])

Return type:

Generator[BasePage, None, None]

pagegenerators._filters.RegexBodyFilterPageGenerator(generator, regex, quantifier='any')#

Yield pages from another generator whose body matches regex.

Uses regex option re.IGNORECASE depending on the quantifier parameter.

For parameters see titlefilter above.

Parameters:
  • generator (Iterable[pywikibot.page.BasePage])

  • regex (PATTERN_STR_OR_SEQ_TYPE)

  • quantifier (str)

Return type:

Generator[pywikibot.page.BasePage, None, None]

class pagegenerators._filters.RegexFilter[source]#

Bases: object

Regex filter.

classmethod contentfilter(generator, regex, quantifier='any')[source]#

Yield pages from another generator whose body matches regex.

Uses regex option re.IGNORECASE depending on the quantifier parameter.

For parameters see titlefilter above.

Parameters:
  • generator (Iterable[pywikibot.page.BasePage])

  • regex (PATTERN_STR_OR_SEQ_TYPE)

  • quantifier (str)

Return type:

Generator[pywikibot.page.BasePage, None, None]

classmethod titlefilter(generator, regex, quantifier='any', ignore_namespace=True)[source]#

Yield pages from another generator whose title matches regex.

Uses regex option re.IGNORECASE depending on the quantifier parameter.

If ignore_namespace is False, the whole page title is compared.

Note

if you want to check for a match at the beginning of the title, you have to start the regex with “^”

Parameters:
  • generator (Iterable[pywikibot.page.BasePage]) – another generator

  • regex (PATTERN_STR_OR_SEQ_TYPE) – a regex which should match the page title

  • quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes

  • ignore_namespace (bool) – ignore the namespace when matching the title

Returns:

return a page depending on the matching parameters

Return type:

Generator[pywikibot.page.BasePage, None, None]

pagegenerators._filters.RegexFilterPageGenerator(generator, regex, quantifier='any', ignore_namespace=True)#

Yield pages from another generator whose title matches regex.

Uses regex option re.IGNORECASE depending on the quantifier parameter.

If ignore_namespace is False, the whole page title is compared.

Note

if you want to check for a match at the beginning of the title, you have to start the regex with “^”

Parameters:
  • generator (Iterable[pywikibot.page.BasePage]) – another generator

  • regex (PATTERN_STR_OR_SEQ_TYPE) – a regex which should match the page title

  • quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes

  • ignore_namespace (bool) – ignore the namespace when matching the title

Returns:

return a page depending on the matching parameters

Return type:

Generator[pywikibot.page.BasePage, None, None]

pagegenerators._filters.SubpageFilterGenerator(generator, max_depth=0, show_filtered=False)[source]#

Generator which filters out subpages based on depth.

It looks at the namespace of each page and checks if that namespace has subpages enabled. If so, pages with forward slashes (‘/’) are excluded.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • max_depth (int) – Max depth of subpages to yield, at least zero

  • show_filtered (bool) – Output a message for each page not yielded

Return type:

Generator[BasePage, None, None]

pagegenerators._filters.UserEditFilterGenerator(generator, username, timestamp=None, skip=False, max_revision_depth=None, show_filtered=False)[source]#

Generator which will yield Pages modified by username.

It only looks at the last editors given by max_revision_depth. If timestamp is set in MediaWiki format JJJJMMDDhhmmss, older edits are ignored. If skip is set, pages edited by the given user are ignored otherwise only pages edited by this user are given back.

Parameters:
  • generator (Iterable[BasePage]) – A generator object

  • username (str) – user name which edited the page

  • timestamp (str | datetime | None) – ignore edits which are older than this timestamp

  • skip (bool) – Ignore pages edited by the given user

  • max_revision_depth (int | None) – It only looks at the last editors given by max_revision_depth

  • show_filtered (bool) – Output a message for each page not yielded

Return type:

Generator[BasePage, None, None]

pagegenerators._filters.WikibaseItemFilterPageGenerator(generator, has_item=True, show_filtered=False)[source]#

A wrapper generator used to exclude if page has a Wikibase item or not.

Parameters:
  • generator (Iterable[BasePage]) – Generator to wrap.

  • has_item (bool) – Exclude pages without an item if True, or only include pages without an item if False

  • show_filtered (bool) – Output a message for each page not yielded

Returns:

Wrapped generator

Return type:

Generator[BasePage, None, None]

pagegenerators._generators — Generator Functions#

Page filter generators provided by the pagegenerators module.

pagegenerators._generators.AllpagesPageGenerator(start='!', namespace=0, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#

Iterate Page objects for all titles in a single namespace.

Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.

Parameters:
  • start (str) – if provided, only generate pages >= this title lexically

  • namespace (SingleNamespaceType) – Namespace to retrieve pages from

  • includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.

  • site (BaseSite | None) – Site for generator results.

  • total (int | None) – Maximum number of pages to retrieve in total

  • content (bool) – If True, load current version of each page (default False)

  • filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).

Returns:

a generator that yields Page objects

Raises:

ValueErrorfilterredir as well as includeredirects parameters were given. Use filterredir only.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.AncientPagesPageGenerator(total=100, site=None)[source]#

Ancient page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.CategorizedPageGenerator(category, recurse=False, start=None, total=None, content=False, namespaces=None)[source]#

Yield all pages in a specific category.

Parameters:
  • recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)

  • start (str | None) – if provided, only generate pages >= this title lexically

  • total (int | None) – iterate no more than this number of pages in total (at all levels)

  • content (bool) – if True, retrieve the content of the current version of each page (default False)

  • category (pywikibot.page.Category)

  • namespaces (NamespaceArgType)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.DayPageGenerator(start_month=1, end_month=12, site=None, year=2000)[source]#

Day page generator.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • year (int) – considering leap year.

  • start_month (int)

  • end_month (int)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.DeadendPagesPageGenerator(total=100, site=None)[source]#

Dead-end page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.FileLinksGenerator(referredFilePage, total=None, content=False)[source]#

Yield Pages on which referredFilePage file is displayed.

Parameters:
  • referredFilePage (FilePage)

  • total (int | None)

  • content (bool)

Return type:

Iterable[Page]

class pagegenerators._generators.GoogleSearchPageGenerator(query=None, site=None)[source]#

Bases: GeneratorWrapper

Page generator using Google search results.

To use this generator, you need to install the package ‘google’:

https://pypi.org/project/google

This package has been available since 2010, hosted on GitHub since 2012, and provided by PyPI since 2013.

As there are concerns about Google’s Terms of Service, this generator prints a warning for each query.

Changed in version 7.6: subclassed from tools.collections.GeneratorWrapper

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • query (str | None)

property generator: Generator[Page, None, None]#

Yield results from queryGoogle() query.

Google contains links in the format: https://de.wikipedia.org/wiki/en:Foobar

Changed in version 7.6: changed from iterator method to generator property

static queryGoogle(query)[source]#

Perform a query using python package ‘google’.

The terms of service as at June 2014 give two conditions that may apply to use of search:

  1. Don’t access [Google Services] using a method other than the interface and the instructions that [they] provide.

  2. Don’t remove, obscure, or alter any legal notices displayed in or along with [Google] Services.

Both of those issues should be managed by the package ‘google’, however Pywikibot will at least ensure the user sees the TOS in order to comply with the second condition.

Parameters:

query (str)

Return type:

Generator[str, None, None]

pagegenerators._generators.ImagesPageGenerator(pageWithImages, total=None, content=False)[source]#

Yield FilePages displayed on pageWithImages.

Parameters:
  • pageWithImages (Page)

  • total (int | None)

  • content (bool)

Return type:

Iterable[Page]

pagegenerators._generators.InterwikiPageGenerator(page)[source]#

Iterate over all interwiki (non-language) links on a page.

Parameters:

page (Page)

Return type:

Generator[Page, None, None]

pagegenerators._generators.LanguageLinksPageGenerator(page, total=None)[source]#

Iterate over all interwiki language links on a page.

Parameters:
  • page (Page)

  • total (int | None)

Return type:

Generator[Page, None, None]

pagegenerators._generators.LinkedPageGenerator(linkingPage, total=None, content=False)[source]#

Yield all pages linked from a specific page.

See page.BasePage.linkedPages for details.

Parameters:
  • linkingPage (Page) – the page that links to the pages we want

  • total (int | None) – the total number of pages to iterate

  • content (bool) – if True, retrieve the current content of each linked page

Returns:

a generator that yields Page objects of pages linked to linkingPage

Return type:

Iterable[BasePage]

pagegenerators._generators.LinksearchPageGenerator(url, namespaces=None, total=None, site=None, protocol=None)[source]#

Yield all pages that link to a certain URL.

Parameters:
  • url (str) – The URL to search for (with or without the protocol prefix); this may include a ‘*’ as a wildcard, only at the start of the hostname

  • namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from

  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results

  • protocol (str | None) – Protocol to search for, likely http or https, http by default. Full list shown on Special:LinkSearch wikipage.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.LiveRCPageGenerator(site=None, total=None)[source]#

Yield pages from a socket.io RC stream.

Generates pages based on the EventStreams Server-Sent-Event (SSE) recent changes stream. The Page objects will have an extra property ._rcinfo containing the literal rc data. This can be used to e.g. filter only new pages. See pywikibot.comms.eventstreams.rc_listener for details on the .rcinfo format.

Parameters:
  • site (BaseSite | None) – site to return recent changes for

  • total (int | None) – the maximum number of changes to return

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.LogeventsPageGenerator(logtype=None, user=None, site=None, namespace=None, total=None, start=None, end=None, reverse=False)[source]#

Generate Pages for specified modes of logevents.

Parameters:
  • logtype (str | None) – Mode of logs to retrieve

  • user (str | None) – User of logs retrieved

  • site (BaseSite | None) – Site for generator results

  • namespace (SingleNamespaceType | None) – Namespace to retrieve logs from

  • total (int | None) – Maximum number of pages to retrieve in total

  • start (Timestamp | None) – Timestamp to start listing from

  • end (Timestamp | None) – Timestamp to end listing at

  • reverse (bool) – if True, start with oldest changes (default: newest)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.LonelyPagesPageGenerator(total=None, site=None)[source]#

Lonely page generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.LongPagesPageGenerator(total=100, site=None)[source]#

Long page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.MySQLPageGenerator(query, site=None, verbose=None)[source]#

Yield a list of pages based on a MySQL query.

The query should return two columns, page namespace and page title pairs from some table. An example query that yields all ns0 pages might look like:

SELECT
 page_namespace,
 page_title
FROM page
WHERE page_namespace = 0;

See also

MySQL

Parameters:
  • query (str) – MySQL query to execute

  • site (BaseSite | None) – Site object

  • verbose (bool | None) – if True, print query to be executed; if None, config.verbose_output will be used.

Returns:

generator which yields pywikibot.Page

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.NewimagesPageGenerator(total=None, site=None)[source]#

New file generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.NewpagesPageGenerator(site=None, namespaces=(0,), total=None)[source]#

Iterate Page objects for all new titles in a single namespace.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • namespaces (NamespaceArgType) – namespace to retrieve pages from

  • total (int | None) – Maximum number of pages to retrieve in total

Return type:

Generator[pywikibot.page.Page, None, None]

class pagegenerators._generators.PagePilePageGenerator(id)[source]#

Bases: GeneratorWrapper

Queries PagePile to generate pages.

Added in version 9.0.

Parameters:

id (int) – The PagePile id to query

buildQuery(id)[source]#

Get the querystring options to query PagePile.

Parameters:

id (int) – int

Returns:

Dictionary of querystring parameters to use in the query

property generator: Generator[Page, None, None]#

Yield results from query().

query()[source]#

Query PagePile.

Raises:
  • ServerError – Either ReadTimeout or server status error

  • APIError – error response from petscan

Return type:

Generator[str, None, None]

pagegenerators._generators.PagesFromPageidGenerator(pageids, site=None)[source]#

Return a page generator from pageids.

Pages are iterated in the same order than in the underlying pageids. Pageids are filtered and only one page is returned in case of duplicate pageid.

Parameters:
  • pageids (Iterable[str]) – an iterable that returns pageids, or a comma-separated string of pageids (e.g. ‘945097,1483753,956608’)

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.PagesFromTitlesGenerator(iterable, site=None)[source]#

Generate pages from the titles (strings) yielded by iterable.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • iterable (Iterable[str])

Return type:

Generator[pywikibot.page.Page, None, None]

class pagegenerators._generators.PetScanPageGenerator(categories, subset_combination=True, namespaces=None, site=None, extra_options=None)[source]#

Bases: GeneratorWrapper

Queries PetScan to generate pages.

Added in version 3.0.

Changed in version 7.6: subclassed from tools.collections.GeneratorWrapper

Parameters:
  • categories (Sequence[str]) – List of category names to retrieve pages from

  • subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories

  • namespaces (Iterable[int | pywikibot.site.Namespace] | None) – List of namespaces to search in (default is None, meaning all namespaces)

  • site (BaseSite | None) – Site to operate on (default is the default site from the user config)

  • extra_options (dict[Any, Any] | None) – Dictionary of extra options to use (optional)

buildQuery(categories, subset_combination, namespaces, extra_options)[source]#

Get the querystring options to query PetScan.

Parameters:
  • categories (Sequence[str]) – List of categories (as strings)

  • subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories

  • namespaces (Iterable[int | Namespace] | None) – List of namespaces to search in

  • extra_options (dict[Any, Any] | None) – Dictionary of extra options to use

Returns:

Dictionary of querystring parameters to use in the query

Return type:

dict[str, Any]

property generator: Generator[Page, None, None]#

Yield results from query().

Changed in version 7.6: changed from iterator method to generator property

query()[source]#

Query PetScan.

Changed in version 7.4: raises APIError if query returns an error message.

Raises:
  • ServerError – Either ReadTimeout or server status error

  • APIError – error response from petscan

Return type:

Generator[dict[str, Any], None, None]

pagegenerators._generators.PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#

Prefixed Page generator.

Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.

Parameters:
  • prefix (str) – The prefix of the pages.

  • namespace (SingleNamespaceType | None) – Namespace to retrieve pages from

  • includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.

  • site (BaseSite | None) – Site for generator results.

  • total (int | None) – Maximum number of pages to retrieve in total

  • content (bool) – If True, load current version of each page (default False)

  • filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).

Returns:

a generator that yields Page objects

Raises:

ValueErrorfilterredir as well as includeredirects parameters were given. Use filterredir only.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.RandomPageGenerator(total=None, site=None, namespaces=None)[source]#

Random page generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

  • namespaces (NamespaceArgType)

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.RandomRedirectPageGenerator(total=None, site=None, namespaces=None)[source]#

Random redirect generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

  • namespaces (NamespaceArgType)

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.RecentChangesPageGenerator(site=None, _filter_unique=None, **kwargs)[source]#

Generate recent changes pages, including duplicates.

For keyword parameters refer APISite.recentchanges().

Changed in version 8.2: The YieldType depends on namespace. It can be pywikibot.Page, pywikibot.User, pywikibot.FilePage or pywikibot.Category.

Changed in version 9.4: Ignore pywikibot.FilePage if it raises a ValueError during upcast e.g. due to an invalid file extension.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • _filter_unique (None | Callable[[Iterable[Page]], Iterable[Page]])

  • kwargs (Any)

Return type:

Generator[Page, None, None]

pagegenerators._generators.SearchPageGenerator(query, total=None, namespaces=None, site=None)[source]#

Yield pages from the MediaWiki internal search engine.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

  • query (str)

  • namespaces (NamespaceArgType)

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.ShortPagesPageGenerator(total=100, site=None)[source]#

Short page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.SubCategoriesPageGenerator(category, recurse=False, start=None, total=None, content=False)[source]#

Yield all subcategories in a specific category.

Parameters:
  • recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)

  • start (str | None) – if provided, only generate pages >= this title lexically

  • total (int | None) – iterate no more than this number of pages in total (at all levels)

  • content (bool) – if True, retrieve the content of the current version of each page (default False)

  • category (Category)

Return type:

Generator[Page, None, None]

pagegenerators._generators.SupersetPageGenerator(query, site=None, schema_name=None, database_id=None)[source]#

Generate pages that result from the given SPARQL query.

Pages are generated using site in following order:

  1. site retrieved using page_wikidb column in SQL result

  2. site as parameter

  3. site retrieved using schema_name

SQL columns used are

  • page_id

  • page_namespace + page_title

  • page_wikidb

Example SQL queries

SELECT
    gil_wiki AS page_wikidb,
    gil_page AS page_id
FROM globalimagelinks
GROUP BY gil_wiki
LIMIT 10

OR

SELECT
    page_id
FROM page
LIMIT 10

OR

SELECT
    page_namespace,
    page_title
FROM page
LIMIT 10

Added in version 9.2.

Parameters:
  • query (str) – the SQL query string.

  • site (BaseSite | None) – Site for generator results.

  • schema_name (str | None) – target superset schema name

  • database_id (int | None) – target superset database id

Return type:

Iterator[pywikibot.page.Page]

pagegenerators._generators.TextIOPageGenerator(source=None, site=None)[source]#

Iterate pages from a list in a text file or on a webpage.

The text source must contain page links between double-square-brackets or, alternatively, separated by newlines. The generator will yield each corresponding Page object.

Parameters:
  • source (str | None) – the file path or URL that should be read. If no name is given, the generator prompts the user.

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.UnCategorizedCategoryGenerator(total=100, site=None)[source]#

Uncategorized category generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[Category]

pagegenerators._generators.UnCategorizedImageGenerator(total=100, site=None)[source]#

Uncategorized file generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.FilePage]

pagegenerators._generators.UnCategorizedPageGenerator(total=100, site=None)[source]#

Uncategorized page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.UnCategorizedTemplateGenerator(total=100, site=None)[source]#

Uncategorized template generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.UnconnectedPageGenerator(site=None, total=None)[source]#

Iterate Page objects for all unconnected pages to a Wikibase repository.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.UnusedFilesGenerator(total=None, site=None)[source]#

Unused files generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.FilePage]

pagegenerators._generators.UnwatchedPagesPageGenerator(total=None, site=None)[source]#

Unwatched page generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.UserContributionsGenerator(username, namespaces=None, site=None, total=None, _filter_unique=functools.partial(<function filter_unique>, key=<function <lambda>>))[source]#

Yield unique pages edited by user:username.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from

  • site (BaseSite | None) – Site for generator results.

  • username (str)

  • _filter_unique (None | Callable[[Iterable[pywikibot.page.Page]], Iterable[pywikibot.page.Page]])

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.WantedPagesPageGenerator(total=100, site=None)[source]#

Wanted page generator.

Parameters:
  • total (int) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

pagegenerators._generators.WikibaseItemGenerator(gen)[source]#

A wrapper generator used to yield Wikibase items of another generator.

Parameters:

gen (Iterable[Page]) – Generator to wrap.

Returns:

Wrapped generator

Return type:

Generator[ItemPage, None, None]

pagegenerators._generators.WikibaseSearchItemPageGenerator(text, language=None, total=None, site=None)[source]#

Generate pages that contain the provided text.

Parameters:
  • text (str) – Text to look for.

  • language (str | None) – Code of the language to search in. If not specified, value from pywikibot.config.data_lang is used.

  • total (int | None) – Maximum number of pages to retrieve in total, or None in case of no limit.

  • site (BaseSite | None) – Site for generator results.

Return type:

Generator[pywikibot.page.ItemPage, None, None]

pagegenerators._generators.WikidataPageFromItemGenerator(gen, site)[source]#

Generate pages from site based on sitelinks of item pages.

Parameters:
Return type:

Generator[Page, None, None]

pagegenerators._generators.WikidataSPARQLPageGenerator(query, site=None, item_name='item', endpoint=None, entity_url=None, result_type=<class 'set'>)[source]#

Generate pages that result from the given SPARQL query.

Parameters:
  • query (str) – the SPARQL query string.

  • site (BaseSite | None) – Site for generator results.

  • item_name (str) – name of the item in the SPARQL query

  • endpoint (str | None) – SPARQL endpoint URL

  • entity_url (str | None) – URL prefix for any entities returned in a query.

  • result_type (Any) – type of the iterable in which SPARQL results are stored (default set)

Return type:

Iterator[pywikibot.page.Page]

pagegenerators._generators.WithoutInterwikiPageGenerator(total=None, site=None)[source]#

Page lacking interwikis generator.

Parameters:
  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]

class pagegenerators._generators.XMLDumpPageGenerator(filename, start=None, namespaces=None, site=None, text_predicate=None, content=False)[source]#

Bases: Iterator

Xml iterator that yields Page objects.

Added in version 7.2: the content parameter

Parameters:
  • filename (str) – filename of XML dump

  • start (str | None) – skip entries below that value

  • namespaces (NamespaceArgType) – namespace filter

  • site (BaseSite | None) – current site for the generator

  • text_predicate (Callable[[str], bool] | None) – a callable with entry.text as parameter and boolean as result to indicate the generator should return the page or not

  • content – If True, assign old page content to Page.text

Variables:
  • skipping – True if start parameter is given, else False

  • parser – holds the xmlreader.XmlDump parse method

pagegenerators._generators.YearPageGenerator(start=1, end=2050, site=None)[source]#

Year page generator.

Parameters:
  • site (BaseSite | None) – Site for generator results.

  • start (int)

  • end (int)

Return type:

Generator[pywikibot.page.Page, None, None]

pagegenerators._generators.page_with_property_generator(name, total=None, site=None)[source]#

Special:PagesWithProperty page generator.

Parameters:
  • name (str) – Property name of pages to be retrieved

  • total (int | None) – Maximum number of pages to retrieve in total

  • site (BaseSite | None) – Site for generator results.

Return type:

Iterable[pywikibot.page.Page]