pagegenerators
— Page Generators#
This module offers a wide variety of page generators.
A page generator is an object that is iterable (see PEP 255) and that yields page objects on which other scripts can then work.
Most of these functions just wrap a Site or Page method that returns a generator. For testing purposes listpages.py can be used, to print page titles to standard output.
These parameters are supported to specify which pages titles to be used:
GENERATOR OPTIONS#
- -cat
Work on all pages which are in a specific category. Argument can also be given as “-cat:categoryname” or as “-cat:categoryname|fromtitle” (using # instead of | is also allowed in this one and the following)
- -catr
Like -cat, but also recursively includes pages in subcategories, sub-subcategories etc. of the given category. Argument can also be given as “-catr:categoryname” or as “-catr:categoryname|fromtitle”.
- -subcats
Work on all subcategories of a specific category. Argument can also be given as “-subcats:categoryname” or as “-subcats:categoryname|fromtitle”.
- -subcatsr
Like -subcats, but also includes sub-subcategories etc. of the given category. Argument can also be given as “-subcatsr:categoryname” or as “-subcatsr:categoryname|fromtitle”.
- -uncat
Work on all pages which are not categorised.
- -uncatcat
Work on all categories which are not categorised.
- -uncatfiles
Work on all files which are not categorised.
- -file
Read a list of pages to treat from the named text file. Page titles in the file may be either enclosed with [[brackets]], or be separated by new lines. Argument can also be given as “-file:filename”.
- -filelinks
Work on all pages that use a certain image/media file. Argument can also be given as “-filelinks:filename”.
- -search
Work on all pages that are found in a MediaWiki search across all namespaces.
- -logevents
Work on articles that were on a specified Special:Log. The value may be a comma separated list of these values:
logevent,username,start,end
Deprecated since version 9.2: backward compatible total argument like
logevent,username,total
; use-limit
filter option instead (see below).To use the default value, use an empty string.
Note
‘start’ is the most recent date and log events are iterated from present to past. If ‘start’ is not provided, it means ‘now’; if ‘end’ is not provided, it means ‘since the beginning’.
See also
letype of API:Logevents for the supported types of log events.
Examples:
-logevents:move
gives pages from move log (usually redirects)-logevents:delete -limit20
gives 20 pages from deletion log-logevents:protect,Usr
gives pages from protect log by user Usr-logevents:patrol,Usr -limit:20
gives 20 patrolled pages by Usr-logevents:upload,,20121231,20100101
gives upload pages in the 2010s, 2011s, and 2012s-logevents:review,,20121231
gives review pages since the beginning till the 31 Dec 2012-logevents:review,Usr,20121231
gives review pages by user Usr since the beginning till the 31 Dec 2012In some cases it must be given as
-logevents:"move,Usr,20"
- -interwiki
Work on the given page and all equivalent pages in other languages. This can, for example, be used to fight multi-site spamming. Attention: this will cause the bot to modify pages on several wiki sites, this is not well tested, so check your edits!
- -links
Work on all pages that are linked from a certain page. Argument can also be given as “-links:linkingpagetitle”.
- -liverecentchanges
Work on pages from the live recent changes feed. If used as -liverecentchanges:x, work on x recent changes.
- -imagesused
Work on all images that contained on a certain page. Can also be given as “-imagesused:linkingpagetitle”.
- -newimages
Work on the most recent new images. If given as -newimages:x, will work on x newest images.
- -newpages
Work on the most recent new pages. If given as -newpages:x, will work on x newest pages.
- -recentchanges
Work on the pages with the most recent changes. If given as -recentchanges:x, will work on the x most recently changed pages. If given as -recentchanges:offset,duration it will work on pages changed from ‘offset’ minutes with ‘duration’ minutes of timespan. rctags are supported too. The rctag must be the very first parameter part.
Examples:
-recentchanges:20
gives the 20 most recently changed pages-recentchanges:120,70
will give pages with 120 offset minutes and 70 minutes of timespan-recentchanges:visualeditor,10
gives the 10 most recently changed pages marked with ‘visualeditor’-recentchanges:"mobile edit,60,35"
will retrieve pages marked with ‘mobile edit’ for the given offset and timespan- -unconnectedpages
Work on the most recent unconnected pages to the Wikibase repository. Given as -unconnectedpages:x, will work on the x most recent unconnected pages.
- -ref
Work on all pages that link to a certain page. Argument can also be given as “-ref:referredpagetitle”.
- -start
Specifies that the robot should go alphabetically through all pages on the home wiki, starting at the named page. Argument can also be given as “-start:pagetitle”.
You can also include a namespace. For example, “-start:Template:!” will make the bot work on all pages in the template namespace.
default value is start:!
- -prefixindex
Work on pages commencing with a common prefix.
- -transcludes
Work on all pages that use a certain template. Argument can also be given as “-transcludes:Title”.
- -unusedfiles
Work on all description pages of images/media files that are not used anywhere. Argument can be given as “-unusedfiles:n” where n is the maximum number of articles to work on.
- -lonelypages
Work on all articles that are not linked from any other article. Argument can be given as “-lonelypages:n” where n is the maximum number of articles to work on.
- -unwatched
Work on all articles that are not watched by anyone. Argument can be given as “-unwatched:n” where n is the maximum number of articles to work on.
- -property
Work on all pages with a given property name from Special:PagesWithProp. Usage:
-property:name
- -usercontribs
Work on all articles that were edited by a certain user. (Example : -usercontribs:DumZiBoT)
- -weblink
Work on all articles that contain an external link to a given URL; may be given as “-weblink:url”
- -withoutinterwiki
Work on all pages that don’t have interlanguage links. Argument can be given as “-withoutinterwiki:n” where n is the total to fetch.
- -mysqlquery
Takes a MySQL query string like “SELECT page_namespace, page_title FROM page WHERE page_namespace = 0” and treats the resulting pages. See MySQL for more details.
- -supersetquery
Takes a SQL query string like “SELECT page_namespace, page_title FROM page WHERE page_namespace = 0” and run it in https://superset.wmcloud.org/ and treats the resulting pages.
- -sparql
Takes a SPARQL SELECT query string including ?item and works on the resulting pages.
- -sparqlendpoint
Specify SPARQL endpoint URL (optional). (Example: -sparqlendpoint:http://myserver.com/sparql)
- -searchitem
Takes a search string and works on Wikibase pages that contain it. Argument can be given as “-searchitem:text”, where text is the string to look for, or “-searchitem:lang:text”, where lang is the language to search items in.
- -wantedpages
Work on pages that are linked, but do not exist; may be given as “-wantedpages:n” where n is the maximum number of articles to work on.
- -wantedcategories
Work on categories that are used, but do not exist; may be given as “-wantedcategories:n” where n is the maximum number of categories to work on.
- -wantedfiles
Work on files that are used, but do not exist; may be given as “-wantedfiles:n” where n is the maximum number of files to work on.
- -wantedtemplates
Work on templates that are used, but do not exist; may be given as “-wantedtemplates:n” where n is the maximum number of templates to work on.
- -random
Work on random pages returned by [[Special:Random]]. Can also be given as “-random:n” where n is the number of pages to be returned.
- -randomredirect
Work on random redirect pages returned by [[Special:RandomRedirect]]. Can also be given as “-randomredirect:n” where n is the number of pages to be returned.
Work on all pages that are found in a Google search. You need a Google Web API license key. Note that Google doesn’t give out license keys anymore. See google_key in config.py for instructions. Argument can also be given as “-google:searchstring”.
- -page
Work on a single page. Argument can also be given as “-page:pagetitle”, and supplied multiple times for multiple pages.
- -pageid
Work on a single pageid. Argument can also be given as “-pageid:pageid1,pageid2,.” or “-pageid:’pageid1|pageid2|..’” and supplied multiple times for multiple pages.
- -pagepile
Work on a PagePile. Argument is the pile id (an integer)
- -linter
Work on pages that contain lint errors. Extension Linter must be available on the site. -linter select all categories. -linter:high, -linter:medium or -linter:low select all categories for that prio. Single categories can be selected with commas as in -linter:cat1,cat2,cat3
Adding ‘/int’ identifies Lint ID to start querying from: e.g. -linter:high/10000
-linter:show just shows available categories.
- -querypage
Work on pages provided by a QueryPage-based special page. Usage:
-querypage:name
-querypage
without argument shows special pages available.See also
- -url
Read a list of pages to treat from the provided URL. The URL must return text in the same format as expected for the -file argument, e.g. page titles separated by newlines or enclosed in brackets.
Tip
use -limit:n
filter option to fetch only n pages.
FILTER OPTIONS#
- -catfilter
Filter the page generator to only yield pages in the specified category. See -cat generator for argument format.
- -grep
A regular expression that needs to match the article otherwise the page won’t be returned. Multiple -grep:regexpr can be provided and the page will be returned if content is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character, including a newline.
- -grepnot
Like -grep, but return the page only if the regular expression does not match.
- -intersect
Work on the intersection of all the provided generators.
- -limit
When used with any other argument
-limit:n
specifies a set of pages, work on no more than n pages in total. If used with multiple generators, pages are yielded in a roundrobin way.- -namespaces
Filter the page generator to only yield pages in the
- -namespace
specified namespaces. Separate multiple namespace
- -ns
numbers or names with commas.
Examples:
-ns:0,2,4 -ns:Help,MediaWiki
You may use a preleading “not” to exclude the namespace.
Examples:
-ns:not:2,3 -ns:not:Help,File
If used with -newpages/-random/-randomredirect/-linter generators, -namespace/ns must be provided before -newpages/-random/-randomredirect/-linter. If used with -recentchanges generator, efficiency is improved if -namespace is provided before -recentchanges.
If used with -start generator, -namespace/ns shall contain only one value.
- -onlyif
A claim the page needs to contain, otherwise the item won’t be returned. The format is property=value,qualifier=value. Multiple (or none) qualifiers can be passed, separated by commas.
Examples:
P1=Q2 (property P1 must contain value Q2), P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and qualifiers: P5 with value Q6 and P6 with value Q7).
Value can be page ID, coordinate in format: latitude,longitude[,precision] (all values are in decimal degrees), year, or plain string.
The argument can be provided multiple times and the item page will be returned only if all claims are present. Argument can be also given as “-onlyif:expression”.
- -onlyifnot
A claim the page must not contain, otherwise the item won’t be returned. For usage and examples, see
-onlyif
above.- -ql
Filter pages based on page quality. This is only applicable if contentmodel equals ‘proofread-page’, otherwise has no effects. Valid values are in range 0-4. Multiple values can be comma-separated.
- -redirect
Filter pages based on whether they are redirects. To return only pages that are not redirects, use -redirect:false
- -subpage
-subpage:n filters pages to only those that have depth n i.e. a depth of 0 filters out all pages that are subpages, and a depth of 1 filters out all pages that are subpages of subpages.
- -titleregex
A regular expression that needs to match the article title otherwise the page won’t be returned. Multiple -titleregex:regexpr can be provided and the page will be returned if title is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character.
- -titleregexnot
Like -titleregex, but return the page only if the regular expression does not match.
- pagegenerators.AllpagesPageGenerator(start='!', namespace=0, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#
Iterate Page objects for all titles in a single namespace.
Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.
See also
- Parameters:
start (str) – if provided, only generate pages >= this title lexically
namespace (SingleNamespaceType) – Namespace to retrieve pages from
includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.
site (BaseSite | None) – Site for generator results.
total (int | None) – Maximum number of pages to retrieve in total
content (bool) – If True, load current version of each page (default False)
filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).
- Returns:
a generator that yields Page objects
- Raises:
ValueError – filterredir as well as includeredirects parameters were given. Use filterredir only.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.AncientPagesPageGenerator(total=100, site=None)[source]#
Ancient page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.CategorizedPageGenerator(category, recurse=False, start=None, total=None, content=False, namespaces=None)[source]#
Yield all pages in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (pywikibot.page.Category)
namespaces (NamespaceArgType)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.CategoryFilterPageGenerator(generator, category_list)[source]#
Wrap a generator to filter pages by categories specified.
- pagegenerators.DayPageGenerator(start_month=1, end_month=12, site=None, year=2000)[source]#
Day page generator.
- Parameters:
site (BaseSite | None) – Site for generator results.
year (int) – considering leap year.
start_month (int)
end_month (int)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.DeadendPagesPageGenerator(total=100, site=None)[source]#
Dead-end page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.DequePreloadingGenerator(generator, groupsize=50, quiet=False)[source]#
Preload generator of type DequeGenerator.
- Parameters:
generator (DequeGenerator) – pages to iterate over
groupsize (int) – how many pages to preload at once
quiet (bool) – If False (default), show the “Retrieving pages” message
- Return type:
Generator[Page, None, None]
- pagegenerators.EdittimeFilterPageGenerator(generator, last_edit_start=None, last_edit_end=None, first_edit_start=None, first_edit_end=None, show_filtered=False)[source]#
Wrap a generator to filter pages outside last or first edit range.
- Parameters:
generator (Iterable[BasePage]) – A generator object
last_edit_start (datetime | None) – Only yield pages last edited after this time
last_edit_end (datetime | None) – Only yield pages last edited before this time
first_edit_start (datetime | None) – Only yield pages first edited after this time
first_edit_end (datetime | None) – Only yield pages first edited before this time
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Generator[BasePage, None, None]
- pagegenerators.FileLinksGenerator(referredFilePage, total=None, content=False)[source]#
Yield Pages on which referredFilePage file is displayed.
- class pagegenerators.GeneratorFactory(site=None, positional_arg_name=None, enabled_options=None, disabled_options=None)[source]#
Bases:
object
Process command line arguments and return appropriate page generator.
This factory is responsible for processing command line arguments that are used by many scripts and that determine which pages to work on.
Note
GeneratorFactory must be instantiated after global arguments are parsed except if site parameter is given.
- Parameters:
site (BaseSite | None) – Site for generator results
positional_arg_name (str | None) – generator to use for positional args, which do not begin with a hyphen
enabled_options (Iterable[str] | None) – only enable options given by this Iterable. This is priorized over disabled_options
disabled_options (Iterable[str] | None) – disable these given options and let them be handled by scripts options handler
- getCategory(category)[source]#
Return Category and start as defined by category.
- Parameters:
category (str) – category name with start parameter
- Return type:
tuple[Category, str | None]
- getCategoryGen(category, recurse=False, content=False, gen_func=None)[source]#
Return generator based on Category defined by category and gen_func.
- Parameters:
category (str) – category name with start parameter
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
content (bool) – if True, retrieve the content of the current version of each page (default False)
gen_func (Callable | None)
- Return type:
Any
- getCombinedGenerator(gen=None, preload=False)[source]#
Return the combination of all accumulated generators.
Only call this after all arguments have been parsed.
Changed in version 7.3: set the instance variable
is_preloading
to True or False.Changed in version 8.0: if
limit
option is set and multiple generators are given, pages are yieded in aroundrobin
way.- Parameters:
gen (OPT_GENERATOR_TYPE) – Another generator to be combined with
preload (bool) – preload pages using PreloadingGenerator unless self.nopreload is True
- Return type:
OPT_GENERATOR_TYPE
- handle_arg(arg)[source]#
Parse one argument at a time.
If it is recognized as an argument that specifies a generator, a generator is created and added to the accumulation list, and the function returns true. Otherwise, it returns false, so that caller can try parsing the argument. Call getCombinedGenerator() after all arguments have been parsed to get the final output generator.
Added in version 6.0: renamed from
handleArg
- Parameters:
arg (str) – Pywikibot argument consisting of -name:value
- Returns:
True if the argument supplied was recognised by the factory
- Return type:
bool
- handle_args(args)[source]#
Handle command line arguments and return the rest as a list.
Added in version 6.0.
Changed in version 7.3: Prioritize -namespaces options to solve problems with several generators like -newpages/-random/-randomredirect/-linter
- Parameters:
args (Iterable[str])
- Return type:
list[str]
- is_preloading: bool | None#
Return whether Page objects are preloaded. You may use this instance variable after
getCombinedGenerator()
is called e.g.:gen_factory = GeneratorFactory() print(gen_factory.is_preloading) # None gen = gen_factory.getCombinedGenerator() print(gen_factory.is_preloading) # True or False
Otherwise the value is undefined and gives None.
Added in version 7.3.
- property namespaces: frozenset[Namespace]#
List of Namespace parameters.
Converts int or string namespaces to Namespace objects and change the storage to immutable once it has been accessed.
The resolving and validation of namespace command line arguments is performed in this method, as it depends on the site property which is lazy loaded to avoid being cached before the global arguments are handled.
- Returns:
namespaces selected using arguments
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool
- property site: BaseSite#
Generator site.
The generator site should not be accessed until after the global arguments have been handled, otherwise the default Site may be changed by global arguments, which will cause this cached value to be stale.
- Returns:
Site given to initializer, otherwise the default Site at the time this property is first accessed.
- class pagegenerators.GoogleSearchPageGenerator(query=None, site=None)[source]#
Bases:
GeneratorWrapper
Page generator using Google search results.
To use this generator, you need to install the package ‘google’:
https://pypi.org/project/google
This package has been available since 2010, hosted on GitHub since 2012, and provided by PyPI since 2013.
As there are concerns about Google’s Terms of Service, this generator prints a warning for each query.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
site (BaseSite | None) – Site for generator results.
query (str | None)
- property generator: Generator[Page, None, None]#
Yield results from
queryGoogle()
query.Google contains links in the format: https://de.wikipedia.org/wiki/en:Foobar
Changed in version 7.6: changed from iterator method to generator property
- static queryGoogle(query)[source]#
Perform a query using python package ‘google’.
The terms of service as at June 2014 give two conditions that may apply to use of search:
Don’t access [Google Services] using a method other than the interface and the instructions that [they] provide.
Don’t remove, obscure, or alter any legal notices displayed in or along with [Google] Services.
Both of those issues should be managed by the package ‘google’, however Pywikibot will at least ensure the user sees the TOS in order to comply with the second condition.
- Parameters:
query (str)
- Return type:
Generator[str, None, None]
- pagegenerators.ImagesPageGenerator(pageWithImages, total=None, content=False)[source]#
Yield FilePages displayed on pageWithImages.
- pagegenerators.InterwikiPageGenerator(page)[source]#
Iterate over all interwiki (non-language) links on a page.
- pagegenerators.ItemClaimFilterPageGenerator(generator, prop, claim, qualifiers=None, negate=False)#
Yield all ItemPages which contain certain claim in a property.
- Parameters:
prop (str) – property id to check
claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).
qualifiers (dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant
negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise
generator (Iterable[WikibasePage])
- Return type:
Generator[WikibasePage, None, None]
- pagegenerators.LanguageLinksPageGenerator(page, total=None)[source]#
Iterate over all interwiki language links on a page.
- pagegenerators.LinkedPageGenerator(linkingPage, total=None, content=False)[source]#
Yield all pages linked from a specific page.
See
page.BasePage.linkedPages
for details.- Parameters:
linkingPage (Page) – the page that links to the pages we want
total (int | None) – the total number of pages to iterate
content (bool) – if True, retrieve the current content of each linked page
- Returns:
a generator that yields Page objects of pages linked to linkingPage
- Return type:
Iterable[BasePage]
- pagegenerators.LinksearchPageGenerator(url, namespaces=None, total=None, site=None, protocol=None)[source]#
Yield all pages that link to a certain URL.
- Parameters:
url (str) – The URL to search for (with or without the protocol prefix); this may include a ‘*’ as a wildcard, only at the start of the hostname
namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results
protocol (str | None) – Protocol to search for, likely http or https, http by default. Full list shown on Special:LinkSearch wikipage.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.LiveRCPageGenerator(site=None, total=None)[source]#
Yield pages from a socket.io RC stream.
Generates pages based on the EventStreams Server-Sent-Event (SSE) recent changes stream. The Page objects will have an extra property ._rcinfo containing the literal rc data. This can be used to e.g. filter only new pages. See
pywikibot.comms.eventstreams.rc_listener
for details on the .rcinfo format.- Parameters:
site (BaseSite | None) – site to return recent changes for
total (int | None) – the maximum number of changes to return
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.LogeventsPageGenerator(logtype=None, user=None, site=None, namespace=None, total=None, start=None, end=None, reverse=False)[source]#
Generate Pages for specified modes of logevents.
- Parameters:
logtype (str | None) – Mode of logs to retrieve
user (str | None) – User of logs retrieved
site (BaseSite | None) – Site for generator results
namespace (SingleNamespaceType | None) – Namespace to retrieve logs from
total (int | None) – Maximum number of pages to retrieve in total
start (Timestamp | None) – Timestamp to start listing from
end (Timestamp | None) – Timestamp to end listing at
reverse (bool) – if True, start with oldest changes (default: newest)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.LonelyPagesPageGenerator(total=None, site=None)[source]#
Lonely page generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.LongPagesPageGenerator(total=100, site=None)[source]#
Long page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.MySQLPageGenerator(query, site=None, verbose=None)[source]#
Yield a list of pages based on a MySQL query.
The query should return two columns, page namespace and page title pairs from some table. An example query that yields all ns0 pages might look like:
SELECT page_namespace, page_title FROM page WHERE page_namespace = 0;
See also
- Parameters:
query (str) – MySQL query to execute
site (BaseSite | None) – Site object
verbose (bool | None) – if True, print query to be executed; if None, config.verbose_output will be used.
- Returns:
generator which yields pywikibot.Page
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.NamespaceFilterPageGenerator(generator, namespaces, site=None)[source]#
A generator yielding pages from another generator in given namespaces.
If a site is provided, the namespaces are validated using the namespaces of that site, otherwise the namespaces are validated using the default site.
Note
API-based generators that have a “namespaces” parameter perform namespace filtering more efficiently than this generator.
- Parameters:
namespaces (frozenset[Namespace] | str | Namespace | Sequence[str | Namespace]) – list of namespace identifiers to limit results
site (BaseSite | None) – Site for generator results; mandatory if namespaces contains namespace names. Defaults to the default site.
generator (Iterable[pywikibot.page.BasePage])
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool, or more than one namespace if the API module does not support multiple namespaces
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- pagegenerators.NewimagesPageGenerator(total=None, site=None)[source]#
New file generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.NewpagesPageGenerator(site=None, namespaces=(0,), total=None)[source]#
Iterate Page objects for all new titles in a single namespace.
- Parameters:
site (BaseSite | None) – Site for generator results.
namespaces (NamespaceArgType) – namespace to retrieve pages from
total (int | None) – Maximum number of pages to retrieve in total
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.PageClassGenerator(generator)[source]#
Yield pages from another generator as Page subclass objects.
The page class type depends on the page namespace. Objects may be Category, FilePage, Userpage or Page.
- class pagegenerators.PagePilePageGenerator(id)[source]#
Bases:
GeneratorWrapper
Queries PagePile to generate pages.
See also
Added in version 9.0.
- Parameters:
id (int) – The PagePile id to query
- buildQuery(id)[source]#
Get the querystring options to query PagePile.
- Parameters:
id (int) – int
- Returns:
Dictionary of querystring parameters to use in the query
- query()[source]#
Query PagePile.
- Raises:
ServerError – Either ReadTimeout or server status error
APIError – error response from petscan
- Return type:
Generator[str, None, None]
- pagegenerators.PageTitleFilterPageGenerator(generator, ignore_list)[source]#
Yield only those pages are not listed in the ignore list.
- Parameters:
ignore_list (dict[str, dict[str, str]]) – family names are mapped to dictionaries in which language codes are mapped to lists of page titles. Each title must be a valid regex as they are compared using
re.search
.generator (Iterable[BasePage])
- Return type:
Generator[BasePage, None, None]
- pagegenerators.PageWithTalkPageGenerator(generator, return_talk_only=False)[source]#
Yield pages and associated talk pages from another generator.
Only yields talk pages if the original generator yields a non-talk page, and does not check if the talk page in fact exists.
- pagegenerators.PagesFromPageidGenerator(pageids, site=None)[source]#
Return a page generator from pageids.
Pages are iterated in the same order than in the underlying pageids. Pageids are filtered and only one page is returned in case of duplicate pageid.
- Parameters:
pageids (Iterable[str]) – an iterable that returns pageids, or a comma-separated string of pageids (e.g. ‘945097,1483753,956608’)
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.PagesFromTitlesGenerator(iterable, site=None)[source]#
Generate pages from the titles (strings) yielded by iterable.
- Parameters:
site (BaseSite | None) – Site for generator results.
iterable (Iterable[str])
- Return type:
Generator[pywikibot.page.Page, None, None]
- class pagegenerators.PetScanPageGenerator(categories, subset_combination=True, namespaces=None, site=None, extra_options=None)[source]#
Bases:
GeneratorWrapper
Queries PetScan to generate pages.
See also
Added in version 3.0.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
categories (Sequence[str]) – List of category names to retrieve pages from
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Iterable[int | pywikibot.site.Namespace] | None) – List of namespaces to search in (default is None, meaning all namespaces)
site (BaseSite | None) – Site to operate on (default is the default site from the user config)
extra_options (dict[Any, Any] | None) – Dictionary of extra options to use (optional)
- buildQuery(categories, subset_combination, namespaces, extra_options)[source]#
Get the querystring options to query PetScan.
- Parameters:
categories (Sequence[str]) – List of categories (as strings)
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Iterable[int | Namespace] | None) – List of namespaces to search in
extra_options (dict[Any, Any] | None) – Dictionary of extra options to use
- Returns:
Dictionary of querystring parameters to use in the query
- Return type:
dict[str, Any]
- property generator: Generator[Page, None, None]#
Yield results from
query()
.Changed in version 7.6: changed from iterator method to generator property
- query()[source]#
Query PetScan.
Changed in version 7.4: raises
APIError
if query returns an error message.- Raises:
ServerError – Either ReadTimeout or server status error
APIError – error response from petscan
- Return type:
Generator[dict[str, Any], None, None]
- pagegenerators.PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#
Prefixed Page generator.
Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.
- Parameters:
prefix (str) – The prefix of the pages.
namespace (SingleNamespaceType | None) – Namespace to retrieve pages from
includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.
site (BaseSite | None) – Site for generator results.
total (int | None) – Maximum number of pages to retrieve in total
content (bool) – If True, load current version of each page (default False)
filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).
- Returns:
a generator that yields Page objects
- Raises:
ValueError – filterredir as well as includeredirects parameters were given. Use filterredir only.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.PreloadingEntityGenerator(generator, groupsize=50)[source]#
Yield preloaded pages taken from another generator.
Function basically is copied from above, but for Wikibase entities.
- Parameters:
generator (Iterable[WikibaseEntity]) – pages to iterate over
groupsize (int) – how many pages to preload at once
- Return type:
Generator[WikibaseEntity, None, None]
- pagegenerators.PreloadingGenerator(generator, groupsize=50, quiet=False)[source]#
Yield preloaded pages taken from another generator.
- pagegenerators.QualityFilterPageGenerator(generator, quality)[source]#
Wrap a generator to filter pages according to quality levels.
This is possible only for pages with content_model ‘proofread-page’. In all the other cases, no filter is applied.
- pagegenerators.RandomPageGenerator(total=None, site=None, namespaces=None)[source]#
Random page generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
namespaces (NamespaceArgType)
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.RandomRedirectPageGenerator(total=None, site=None, namespaces=None)[source]#
Random redirect generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
namespaces (NamespaceArgType)
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.RecentChangesPageGenerator(site=None, _filter_unique=None, **kwargs)[source]#
Generate recent changes pages, including duplicates.
For keyword parameters refer
APISite.recentchanges()
.Changed in version 8.2: The YieldType depends on namespace. It can be
pywikibot.Page
,pywikibot.User
,pywikibot.FilePage
orpywikibot.Category
.Changed in version 9.4: Ignore
pywikibot.FilePage
if it raises aValueError
during upcast e.g. due to an invalid file extension.
- pagegenerators.RedirectFilterPageGenerator(generator, no_redirects=True, show_filtered=False)[source]#
Yield pages from another generator that are redirects or not.
- pagegenerators.RegexBodyFilterPageGenerator(generator, regex, quantifier='any')#
Yield pages from another generator whose body matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
For parameters see titlefilter above.
- Parameters:
generator (Iterable[pywikibot.page.BasePage])
regex (PATTERN_STR_OR_SEQ_TYPE)
quantifier (str)
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- pagegenerators.RegexFilterPageGenerator(generator, regex, quantifier='any', ignore_namespace=True)#
Yield pages from another generator whose title matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
If ignore_namespace is False, the whole page title is compared.
Note
if you want to check for a match at the beginning of the title, you have to start the regex with “^”
- Parameters:
generator (Iterable[pywikibot.page.BasePage]) – another generator
regex (PATTERN_STR_OR_SEQ_TYPE) – a regex which should match the page title
quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes
ignore_namespace (bool) – ignore the namespace when matching the title
- Returns:
return a page depending on the matching parameters
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- pagegenerators.RepeatingGenerator(generator, key_func=<function <lambda>>, sleep_duration=60, total=None, **kwargs)[source]#
Yield items in live time.
The provided generator must support parameter ‘start’, ‘end’, ‘reverse’, and ‘total’ such as site.recentchanges(), site.logevents().
To fetch revisions in recentchanges in live time:
gen = RepeatingGenerator(site.recentchanges, lambda x: x['revid'])
To fetch new pages in live time:
gen = RepeatingGenerator(site.newpages, lambda x: x[0])
Note that other parameters not listed below will be passed to the generator function. Parameter ‘reverse’, ‘start’, ‘end’ will always be discarded to prevent the generator yielding items in wrong order.
- Parameters:
generator (Callable[[...], Iterable[BasePage]]) – a function returning a generator that will be queried
key_func (Callable[[BasePage], Any]) – a function returning key that will be used to detect duplicate entry
sleep_duration (int) – duration between each query
total (int | None) – if it is a positive number, iterate no more than this number of items in total. Otherwise, iterate forever
kwargs (Any)
- Returns:
a generator yielding items in ascending order by time
- Return type:
Generator[Page, None, None]
- pagegenerators.SearchPageGenerator(query, total=None, namespaces=None, site=None)[source]#
Yield pages from the MediaWiki internal search engine.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
query (str)
namespaces (NamespaceArgType)
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.ShortPagesPageGenerator(total=100, site=None)[source]#
Short page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.SubCategoriesPageGenerator(category, recurse=False, start=None, total=None, content=False)[source]#
Yield all subcategories in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (Category)
- Return type:
Generator[Page, None, None]
- pagegenerators.SubpageFilterGenerator(generator, max_depth=0, show_filtered=False)[source]#
Generator which filters out subpages based on depth.
It looks at the namespace of each page and checks if that namespace has subpages enabled. If so, pages with forward slashes (‘/’) are excluded.
- pagegenerators.SupersetPageGenerator(query, site=None, schema_name=None, database_id=None)[source]#
Generate pages that result from the given SPARQL query.
Pages are generated using site in following order:
site retrieved using page_wikidb column in SQL result
site as parameter
site retrieved using schema_name
SQL columns used are
page_id
page_namespace + page_title
page_wikidb
Example SQL queries
SELECT gil_wiki AS page_wikidb, gil_page AS page_id FROM globalimagelinks GROUP BY gil_wiki LIMIT 10
OR
SELECT page_id FROM page LIMIT 10
OR
SELECT page_namespace, page_title FROM page LIMIT 10
Added in version 9.2.
- Parameters:
query (str) – the SQL query string.
site (BaseSite | None) – Site for generator results.
schema_name (str | None) – target superset schema name
database_id (int | None) – target superset database id
- Return type:
Iterator[pywikibot.page.Page]
- pagegenerators.TextIOPageGenerator(source=None, site=None)[source]#
Iterate pages from a list in a text file or on a webpage.
The text source must contain page links between double-square-brackets or, alternatively, separated by newlines. The generator will yield each corresponding Page object.
- Parameters:
source (str | None) – the file path or URL that should be read. If no name is given, the generator prompts the user.
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.UnCategorizedCategoryGenerator(total=100, site=None)[source]#
Uncategorized category generator.
- pagegenerators.UnCategorizedImageGenerator(total=100, site=None)[source]#
Uncategorized file generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.FilePage]
- pagegenerators.UnCategorizedPageGenerator(total=100, site=None)[source]#
Uncategorized page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.UnCategorizedTemplateGenerator(total=100, site=None)[source]#
Uncategorized template generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.UnconnectedPageGenerator(site=None, total=None)[source]#
Iterate Page objects for all unconnected pages to a Wikibase repository.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.UnusedFilesGenerator(total=None, site=None)[source]#
Unused files generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.FilePage]
- pagegenerators.UnwatchedPagesPageGenerator(total=None, site=None)[source]#
Unwatched page generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.UserContributionsGenerator(username, namespaces=None, site=None, total=None, _filter_unique=functools.partial(<function filter_unique>, key=<function <lambda>>))[source]#
Yield unique pages edited by user:username.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from
site (BaseSite | None) – Site for generator results.
username (str)
_filter_unique (None | Callable[[Iterable[pywikibot.page.Page]], Iterable[pywikibot.page.Page]])
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.UserEditFilterGenerator(generator, username, timestamp=None, skip=False, max_revision_depth=None, show_filtered=False)[source]#
Generator which will yield Pages modified by username.
It only looks at the last editors given by max_revision_depth. If timestamp is set in MediaWiki format JJJJMMDDhhmmss, older edits are ignored. If skip is set, pages edited by the given user are ignored otherwise only pages edited by this user are given back.
- Parameters:
generator (Iterable[BasePage]) – A generator object
username (str) – user name which edited the page
timestamp (str | datetime | None) – ignore edits which are older than this timestamp
skip (bool) – Ignore pages edited by the given user
max_revision_depth (int | None) – It only looks at the last editors given by max_revision_depth
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Generator[BasePage, None, None]
- pagegenerators.WantedPagesPageGenerator(total=100, site=None)[source]#
Wanted page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators.WikibaseItemFilterPageGenerator(generator, has_item=True, show_filtered=False)[source]#
A wrapper generator used to exclude if page has a Wikibase item or not.
- Parameters:
generator (Iterable[BasePage]) – Generator to wrap.
has_item (bool) – Exclude pages without an item if True, or only include pages without an item if False
show_filtered (bool) – Output a message for each page not yielded
- Returns:
Wrapped generator
- Return type:
Generator[BasePage, None, None]
- pagegenerators.WikibaseItemGenerator(gen)[source]#
A wrapper generator used to yield Wikibase items of another generator.
- pagegenerators.WikibaseSearchItemPageGenerator(text, language=None, total=None, site=None)[source]#
Generate pages that contain the provided text.
- Parameters:
text (str) – Text to look for.
language (str | None) – Code of the language to search in. If not specified, value from pywikibot.config.data_lang is used.
total (int | None) – Maximum number of pages to retrieve in total, or None in case of no limit.
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.ItemPage, None, None]
- pagegenerators.WikidataPageFromItemGenerator(gen, site)[source]#
Generate pages from site based on sitelinks of item pages.
- Parameters:
gen (Iterable[ItemPage]) – generator of
pywikibot.ItemPage
site (BaseSite) – Site for generator results.
- Return type:
Generator[Page, None, None]
- pagegenerators.WikidataSPARQLPageGenerator(query, site=None, item_name='item', endpoint=None, entity_url=None, result_type=<class 'set'>)[source]#
Generate pages that result from the given SPARQL query.
- Parameters:
query (str) – the SPARQL query string.
site (BaseSite | None) – Site for generator results.
item_name (str) – name of the item in the SPARQL query
endpoint (str | None) – SPARQL endpoint URL
entity_url (str | None) – URL prefix for any entities returned in a query.
result_type (Any) – type of the iterable in which SPARQL results are stored (default set)
- Return type:
Iterator[pywikibot.page.Page]
- pagegenerators.WithoutInterwikiPageGenerator(total=None, site=None)[source]#
Page lacking interwikis generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- class pagegenerators.XMLDumpPageGenerator(filename, start=None, namespaces=None, site=None, text_predicate=None, content=False)[source]#
Bases:
Iterator
Xml iterator that yields Page objects.
Added in version 7.2: the
content
parameter- Parameters:
filename (str) – filename of XML dump
start (str | None) – skip entries below that value
namespaces (NamespaceArgType) – namespace filter
site (BaseSite | None) – current site for the generator
text_predicate (Callable[[str], bool] | None) – a callable with entry.text as parameter and boolean as result to indicate the generator should return the page or not
content – If True, assign old page content to Page.text
- Variables:
skipping – True if start parameter is given, else False
parser – holds the xmlreader.XmlDump parse method
- pagegenerators.YearPageGenerator(start=1, end=2050, site=None)[source]#
Year page generator.
- Parameters:
site (BaseSite | None) – Site for generator results.
start (int)
end (int)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators.page_with_property_generator(name, total=None, site=None)[source]#
Special:PagesWithProperty page generator.
- Parameters:
name (str) – Property name of pages to be retrieved
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
pagegenerators._factory
— Pagegenerators Options Handler#
GeneratorFactory module which handles pagegenerators options.
- class pagegenerators._factory.GeneratorFactory(site=None, positional_arg_name=None, enabled_options=None, disabled_options=None)[source]#
Bases:
object
Process command line arguments and return appropriate page generator.
This factory is responsible for processing command line arguments that are used by many scripts and that determine which pages to work on.
Note
GeneratorFactory must be instantiated after global arguments are parsed except if site parameter is given.
- Parameters:
site (BaseSite | None) – Site for generator results
positional_arg_name (str | None) – generator to use for positional args, which do not begin with a hyphen
enabled_options (Iterable[str] | None) – only enable options given by this Iterable. This is priorized over disabled_options
disabled_options (Iterable[str] | None) – disable these given options and let them be handled by scripts options handler
- getCategory(category)[source]#
Return Category and start as defined by category.
- Parameters:
category (str) – category name with start parameter
- Return type:
tuple[Category, str | None]
- getCategoryGen(category, recurse=False, content=False, gen_func=None)[source]#
Return generator based on Category defined by category and gen_func.
- Parameters:
category (str) – category name with start parameter
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
content (bool) – if True, retrieve the content of the current version of each page (default False)
gen_func (Callable | None)
- Return type:
Any
- getCombinedGenerator(gen=None, preload=False)[source]#
Return the combination of all accumulated generators.
Only call this after all arguments have been parsed.
Changed in version 7.3: set the instance variable
is_preloading
to True or False.Changed in version 8.0: if
limit
option is set and multiple generators are given, pages are yieded in aroundrobin
way.- Parameters:
gen (OPT_GENERATOR_TYPE) – Another generator to be combined with
preload (bool) – preload pages using PreloadingGenerator unless self.nopreload is True
- Return type:
OPT_GENERATOR_TYPE
- handle_arg(arg)[source]#
Parse one argument at a time.
If it is recognized as an argument that specifies a generator, a generator is created and added to the accumulation list, and the function returns true. Otherwise, it returns false, so that caller can try parsing the argument. Call getCombinedGenerator() after all arguments have been parsed to get the final output generator.
Added in version 6.0: renamed from
handleArg
- Parameters:
arg (str) – Pywikibot argument consisting of -name:value
- Returns:
True if the argument supplied was recognised by the factory
- Return type:
bool
- handle_args(args)[source]#
Handle command line arguments and return the rest as a list.
Added in version 6.0.
Changed in version 7.3: Prioritize -namespaces options to solve problems with several generators like -newpages/-random/-randomredirect/-linter
- Parameters:
args (Iterable[str])
- Return type:
list[str]
- is_preloading: bool | None#
Return whether Page objects are preloaded. You may use this instance variable after
getCombinedGenerator()
is called e.g.:gen_factory = GeneratorFactory() print(gen_factory.is_preloading) # None gen = gen_factory.getCombinedGenerator() print(gen_factory.is_preloading) # True or False
Otherwise the value is undefined and gives None.
Added in version 7.3.
- property namespaces: frozenset[Namespace]#
List of Namespace parameters.
Converts int or string namespaces to Namespace objects and change the storage to immutable once it has been accessed.
The resolving and validation of namespace command line arguments is performed in this method, as it depends on the site property which is lazy loaded to avoid being cached before the global arguments are handled.
- Returns:
namespaces selected using arguments
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool
- property site: BaseSite#
Generator site.
The generator site should not be accessed until after the global arguments have been handled, otherwise the default Site may be changed by global arguments, which will cause this cached value to be stale.
- Returns:
Site given to initializer, otherwise the default Site at the time this property is first accessed.
pagegenerators._filters
— Filter Functions#
Page filter generators provided by the pagegenerators module.
- pagegenerators._filters.CategoryFilterPageGenerator(generator, category_list)[source]#
Wrap a generator to filter pages by categories specified.
- pagegenerators._filters.EdittimeFilterPageGenerator(generator, last_edit_start=None, last_edit_end=None, first_edit_start=None, first_edit_end=None, show_filtered=False)[source]#
Wrap a generator to filter pages outside last or first edit range.
- Parameters:
generator (Iterable[BasePage]) – A generator object
last_edit_start (datetime | None) – Only yield pages last edited after this time
last_edit_end (datetime | None) – Only yield pages last edited before this time
first_edit_start (datetime | None) – Only yield pages first edited after this time
first_edit_end (datetime | None) – Only yield pages first edited before this time
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Generator[BasePage, None, None]
- class pagegenerators._filters.ItemClaimFilter[source]#
Bases:
object
Item claim filter.
- classmethod filter(generator, prop, claim, qualifiers=None, negate=False)[source]#
Yield all ItemPages which contain certain claim in a property.
- Parameters:
prop (str) – property id to check
claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).
qualifiers (dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant
negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise
generator (Iterable[WikibasePage])
- Return type:
Generator[WikibasePage, None, None]
- page_classes = {False: <class 'pywikibot.page._wikibase.ItemPage'>, True: <class 'pywikibot.page._wikibase.PropertyPage'>}#
- pagegenerators._filters.ItemClaimFilterPageGenerator(generator, prop, claim, qualifiers=None, negate=False)#
Yield all ItemPages which contain certain claim in a property.
- Parameters:
prop (str) – property id to check
claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).
qualifiers (dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant
negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise
generator (Iterable[WikibasePage])
- Return type:
Generator[WikibasePage, None, None]
- pagegenerators._filters.NamespaceFilterPageGenerator(generator, namespaces, site=None)[source]#
A generator yielding pages from another generator in given namespaces.
If a site is provided, the namespaces are validated using the namespaces of that site, otherwise the namespaces are validated using the default site.
Note
API-based generators that have a “namespaces” parameter perform namespace filtering more efficiently than this generator.
- Parameters:
namespaces (frozenset[Namespace] | str | Namespace | Sequence[str | Namespace]) – list of namespace identifiers to limit results
site (BaseSite | None) – Site for generator results; mandatory if namespaces contains namespace names. Defaults to the default site.
generator (Iterable[pywikibot.page.BasePage])
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool, or more than one namespace if the API module does not support multiple namespaces
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- pagegenerators._filters.PageTitleFilterPageGenerator(generator, ignore_list)[source]#
Yield only those pages are not listed in the ignore list.
- Parameters:
ignore_list (dict[str, dict[str, str]]) – family names are mapped to dictionaries in which language codes are mapped to lists of page titles. Each title must be a valid regex as they are compared using
re.search
.generator (Iterable[BasePage])
- Return type:
Generator[BasePage, None, None]
- pagegenerators._filters.QualityFilterPageGenerator(generator, quality)[source]#
Wrap a generator to filter pages according to quality levels.
This is possible only for pages with content_model ‘proofread-page’. In all the other cases, no filter is applied.
- pagegenerators._filters.RedirectFilterPageGenerator(generator, no_redirects=True, show_filtered=False)[source]#
Yield pages from another generator that are redirects or not.
- pagegenerators._filters.RegexBodyFilterPageGenerator(generator, regex, quantifier='any')#
Yield pages from another generator whose body matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
For parameters see titlefilter above.
- Parameters:
generator (Iterable[pywikibot.page.BasePage])
regex (PATTERN_STR_OR_SEQ_TYPE)
quantifier (str)
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- class pagegenerators._filters.RegexFilter[source]#
Bases:
object
Regex filter.
- classmethod contentfilter(generator, regex, quantifier='any')[source]#
Yield pages from another generator whose body matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
For parameters see titlefilter above.
- Parameters:
generator (Iterable[pywikibot.page.BasePage])
regex (PATTERN_STR_OR_SEQ_TYPE)
quantifier (str)
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- classmethod titlefilter(generator, regex, quantifier='any', ignore_namespace=True)[source]#
Yield pages from another generator whose title matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
If ignore_namespace is False, the whole page title is compared.
Note
if you want to check for a match at the beginning of the title, you have to start the regex with “^”
- Parameters:
generator (Iterable[pywikibot.page.BasePage]) – another generator
regex (PATTERN_STR_OR_SEQ_TYPE) – a regex which should match the page title
quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes
ignore_namespace (bool) – ignore the namespace when matching the title
- Returns:
return a page depending on the matching parameters
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- pagegenerators._filters.RegexFilterPageGenerator(generator, regex, quantifier='any', ignore_namespace=True)#
Yield pages from another generator whose title matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
If ignore_namespace is False, the whole page title is compared.
Note
if you want to check for a match at the beginning of the title, you have to start the regex with “^”
- Parameters:
generator (Iterable[pywikibot.page.BasePage]) – another generator
regex (PATTERN_STR_OR_SEQ_TYPE) – a regex which should match the page title
quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes
ignore_namespace (bool) – ignore the namespace when matching the title
- Returns:
return a page depending on the matching parameters
- Return type:
Generator[pywikibot.page.BasePage, None, None]
- pagegenerators._filters.SubpageFilterGenerator(generator, max_depth=0, show_filtered=False)[source]#
Generator which filters out subpages based on depth.
It looks at the namespace of each page and checks if that namespace has subpages enabled. If so, pages with forward slashes (‘/’) are excluded.
- pagegenerators._filters.UserEditFilterGenerator(generator, username, timestamp=None, skip=False, max_revision_depth=None, show_filtered=False)[source]#
Generator which will yield Pages modified by username.
It only looks at the last editors given by max_revision_depth. If timestamp is set in MediaWiki format JJJJMMDDhhmmss, older edits are ignored. If skip is set, pages edited by the given user are ignored otherwise only pages edited by this user are given back.
- Parameters:
generator (Iterable[BasePage]) – A generator object
username (str) – user name which edited the page
timestamp (str | datetime | None) – ignore edits which are older than this timestamp
skip (bool) – Ignore pages edited by the given user
max_revision_depth (int | None) – It only looks at the last editors given by max_revision_depth
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Generator[BasePage, None, None]
- pagegenerators._filters.WikibaseItemFilterPageGenerator(generator, has_item=True, show_filtered=False)[source]#
A wrapper generator used to exclude if page has a Wikibase item or not.
- Parameters:
generator (Iterable[BasePage]) – Generator to wrap.
has_item (bool) – Exclude pages without an item if True, or only include pages without an item if False
show_filtered (bool) – Output a message for each page not yielded
- Returns:
Wrapped generator
- Return type:
Generator[BasePage, None, None]
pagegenerators._generators
— Generator Functions#
Page filter generators provided by the pagegenerators module.
- pagegenerators._generators.AllpagesPageGenerator(start='!', namespace=0, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#
Iterate Page objects for all titles in a single namespace.
Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.
See also
- Parameters:
start (str) – if provided, only generate pages >= this title lexically
namespace (SingleNamespaceType) – Namespace to retrieve pages from
includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.
site (BaseSite | None) – Site for generator results.
total (int | None) – Maximum number of pages to retrieve in total
content (bool) – If True, load current version of each page (default False)
filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).
- Returns:
a generator that yields Page objects
- Raises:
ValueError – filterredir as well as includeredirects parameters were given. Use filterredir only.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.AncientPagesPageGenerator(total=100, site=None)[source]#
Ancient page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.CategorizedPageGenerator(category, recurse=False, start=None, total=None, content=False, namespaces=None)[source]#
Yield all pages in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (pywikibot.page.Category)
namespaces (NamespaceArgType)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.DayPageGenerator(start_month=1, end_month=12, site=None, year=2000)[source]#
Day page generator.
- Parameters:
site (BaseSite | None) – Site for generator results.
year (int) – considering leap year.
start_month (int)
end_month (int)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.DeadendPagesPageGenerator(total=100, site=None)[source]#
Dead-end page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.FileLinksGenerator(referredFilePage, total=None, content=False)[source]#
Yield Pages on which referredFilePage file is displayed.
- class pagegenerators._generators.GoogleSearchPageGenerator(query=None, site=None)[source]#
Bases:
GeneratorWrapper
Page generator using Google search results.
To use this generator, you need to install the package ‘google’:
https://pypi.org/project/google
This package has been available since 2010, hosted on GitHub since 2012, and provided by PyPI since 2013.
As there are concerns about Google’s Terms of Service, this generator prints a warning for each query.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
site (BaseSite | None) – Site for generator results.
query (str | None)
- property generator: Generator[Page, None, None]#
Yield results from
queryGoogle()
query.Google contains links in the format: https://de.wikipedia.org/wiki/en:Foobar
Changed in version 7.6: changed from iterator method to generator property
- static queryGoogle(query)[source]#
Perform a query using python package ‘google’.
The terms of service as at June 2014 give two conditions that may apply to use of search:
Don’t access [Google Services] using a method other than the interface and the instructions that [they] provide.
Don’t remove, obscure, or alter any legal notices displayed in or along with [Google] Services.
Both of those issues should be managed by the package ‘google’, however Pywikibot will at least ensure the user sees the TOS in order to comply with the second condition.
- Parameters:
query (str)
- Return type:
Generator[str, None, None]
- pagegenerators._generators.ImagesPageGenerator(pageWithImages, total=None, content=False)[source]#
Yield FilePages displayed on pageWithImages.
- pagegenerators._generators.InterwikiPageGenerator(page)[source]#
Iterate over all interwiki (non-language) links on a page.
- pagegenerators._generators.LanguageLinksPageGenerator(page, total=None)[source]#
Iterate over all interwiki language links on a page.
- pagegenerators._generators.LinkedPageGenerator(linkingPage, total=None, content=False)[source]#
Yield all pages linked from a specific page.
See
page.BasePage.linkedPages
for details.- Parameters:
linkingPage (Page) – the page that links to the pages we want
total (int | None) – the total number of pages to iterate
content (bool) – if True, retrieve the current content of each linked page
- Returns:
a generator that yields Page objects of pages linked to linkingPage
- Return type:
Iterable[BasePage]
- pagegenerators._generators.LinksearchPageGenerator(url, namespaces=None, total=None, site=None, protocol=None)[source]#
Yield all pages that link to a certain URL.
- Parameters:
url (str) – The URL to search for (with or without the protocol prefix); this may include a ‘*’ as a wildcard, only at the start of the hostname
namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results
protocol (str | None) – Protocol to search for, likely http or https, http by default. Full list shown on Special:LinkSearch wikipage.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.LiveRCPageGenerator(site=None, total=None)[source]#
Yield pages from a socket.io RC stream.
Generates pages based on the EventStreams Server-Sent-Event (SSE) recent changes stream. The Page objects will have an extra property ._rcinfo containing the literal rc data. This can be used to e.g. filter only new pages. See
pywikibot.comms.eventstreams.rc_listener
for details on the .rcinfo format.- Parameters:
site (BaseSite | None) – site to return recent changes for
total (int | None) – the maximum number of changes to return
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.LogeventsPageGenerator(logtype=None, user=None, site=None, namespace=None, total=None, start=None, end=None, reverse=False)[source]#
Generate Pages for specified modes of logevents.
- Parameters:
logtype (str | None) – Mode of logs to retrieve
user (str | None) – User of logs retrieved
site (BaseSite | None) – Site for generator results
namespace (SingleNamespaceType | None) – Namespace to retrieve logs from
total (int | None) – Maximum number of pages to retrieve in total
start (Timestamp | None) – Timestamp to start listing from
end (Timestamp | None) – Timestamp to end listing at
reverse (bool) – if True, start with oldest changes (default: newest)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.LonelyPagesPageGenerator(total=None, site=None)[source]#
Lonely page generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.LongPagesPageGenerator(total=100, site=None)[source]#
Long page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.MySQLPageGenerator(query, site=None, verbose=None)[source]#
Yield a list of pages based on a MySQL query.
The query should return two columns, page namespace and page title pairs from some table. An example query that yields all ns0 pages might look like:
SELECT page_namespace, page_title FROM page WHERE page_namespace = 0;
See also
- Parameters:
query (str) – MySQL query to execute
site (BaseSite | None) – Site object
verbose (bool | None) – if True, print query to be executed; if None, config.verbose_output will be used.
- Returns:
generator which yields pywikibot.Page
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.NewimagesPageGenerator(total=None, site=None)[source]#
New file generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.NewpagesPageGenerator(site=None, namespaces=(0,), total=None)[source]#
Iterate Page objects for all new titles in a single namespace.
- Parameters:
site (BaseSite | None) – Site for generator results.
namespaces (NamespaceArgType) – namespace to retrieve pages from
total (int | None) – Maximum number of pages to retrieve in total
- Return type:
Generator[pywikibot.page.Page, None, None]
- class pagegenerators._generators.PagePilePageGenerator(id)[source]#
Bases:
GeneratorWrapper
Queries PagePile to generate pages.
See also
Added in version 9.0.
- Parameters:
id (int) – The PagePile id to query
- buildQuery(id)[source]#
Get the querystring options to query PagePile.
- Parameters:
id (int) – int
- Returns:
Dictionary of querystring parameters to use in the query
- query()[source]#
Query PagePile.
- Raises:
ServerError – Either ReadTimeout or server status error
APIError – error response from petscan
- Return type:
Generator[str, None, None]
- pagegenerators._generators.PagesFromPageidGenerator(pageids, site=None)[source]#
Return a page generator from pageids.
Pages are iterated in the same order than in the underlying pageids. Pageids are filtered and only one page is returned in case of duplicate pageid.
- Parameters:
pageids (Iterable[str]) – an iterable that returns pageids, or a comma-separated string of pageids (e.g. ‘945097,1483753,956608’)
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.PagesFromTitlesGenerator(iterable, site=None)[source]#
Generate pages from the titles (strings) yielded by iterable.
- Parameters:
site (BaseSite | None) – Site for generator results.
iterable (Iterable[str])
- Return type:
Generator[pywikibot.page.Page, None, None]
- class pagegenerators._generators.PetScanPageGenerator(categories, subset_combination=True, namespaces=None, site=None, extra_options=None)[source]#
Bases:
GeneratorWrapper
Queries PetScan to generate pages.
See also
Added in version 3.0.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
categories (Sequence[str]) – List of category names to retrieve pages from
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Iterable[int | pywikibot.site.Namespace] | None) – List of namespaces to search in (default is None, meaning all namespaces)
site (BaseSite | None) – Site to operate on (default is the default site from the user config)
extra_options (dict[Any, Any] | None) – Dictionary of extra options to use (optional)
- buildQuery(categories, subset_combination, namespaces, extra_options)[source]#
Get the querystring options to query PetScan.
- Parameters:
categories (Sequence[str]) – List of categories (as strings)
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Iterable[int | Namespace] | None) – List of namespaces to search in
extra_options (dict[Any, Any] | None) – Dictionary of extra options to use
- Returns:
Dictionary of querystring parameters to use in the query
- Return type:
dict[str, Any]
- property generator: Generator[Page, None, None]#
Yield results from
query()
.Changed in version 7.6: changed from iterator method to generator property
- query()[source]#
Query PetScan.
Changed in version 7.4: raises
APIError
if query returns an error message.- Raises:
ServerError – Either ReadTimeout or server status error
APIError – error response from petscan
- Return type:
Generator[dict[str, Any], None, None]
- pagegenerators._generators.PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site=None, total=None, content=False, *, filterredir=None)[source]#
Prefixed Page generator.
Deprecated since version 10.0: The includeredirects parameter; use filterredir instead.
- Parameters:
prefix (str) – The prefix of the pages.
namespace (SingleNamespaceType | None) – Namespace to retrieve pages from
includeredirects (Literal['only'] | bool) – If False, redirects are not included. If equals the string ‘only’, only redirects are added. Otherwise redirects will be included. This parameter is deprecated; use filterredir instead.
site (BaseSite | None) – Site for generator results.
total (int | None) – Maximum number of pages to retrieve in total
content (bool) – If True, load current version of each page (default False)
filterredir (bool | None) – if True, only yield redirects; if False (and not None), only yield non-redirects (default: yield both).
- Returns:
a generator that yields Page objects
- Raises:
ValueError – filterredir as well as includeredirects parameters were given. Use filterredir only.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.RandomPageGenerator(total=None, site=None, namespaces=None)[source]#
Random page generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
namespaces (NamespaceArgType)
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.RandomRedirectPageGenerator(total=None, site=None, namespaces=None)[source]#
Random redirect generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
namespaces (NamespaceArgType)
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.RecentChangesPageGenerator(site=None, _filter_unique=None, **kwargs)[source]#
Generate recent changes pages, including duplicates.
For keyword parameters refer
APISite.recentchanges()
.Changed in version 8.2: The YieldType depends on namespace. It can be
pywikibot.Page
,pywikibot.User
,pywikibot.FilePage
orpywikibot.Category
.Changed in version 9.4: Ignore
pywikibot.FilePage
if it raises aValueError
during upcast e.g. due to an invalid file extension.
- pagegenerators._generators.SearchPageGenerator(query, total=None, namespaces=None, site=None)[source]#
Yield pages from the MediaWiki internal search engine.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
query (str)
namespaces (NamespaceArgType)
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.ShortPagesPageGenerator(total=100, site=None)[source]#
Short page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.SubCategoriesPageGenerator(category, recurse=False, start=None, total=None, content=False)[source]#
Yield all subcategories in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (Category)
- Return type:
Generator[Page, None, None]
- pagegenerators._generators.SupersetPageGenerator(query, site=None, schema_name=None, database_id=None)[source]#
Generate pages that result from the given SPARQL query.
Pages are generated using site in following order:
site retrieved using page_wikidb column in SQL result
site as parameter
site retrieved using schema_name
SQL columns used are
page_id
page_namespace + page_title
page_wikidb
Example SQL queries
SELECT gil_wiki AS page_wikidb, gil_page AS page_id FROM globalimagelinks GROUP BY gil_wiki LIMIT 10
OR
SELECT page_id FROM page LIMIT 10
OR
SELECT page_namespace, page_title FROM page LIMIT 10
Added in version 9.2.
- Parameters:
query (str) – the SQL query string.
site (BaseSite | None) – Site for generator results.
schema_name (str | None) – target superset schema name
database_id (int | None) – target superset database id
- Return type:
Iterator[pywikibot.page.Page]
- pagegenerators._generators.TextIOPageGenerator(source=None, site=None)[source]#
Iterate pages from a list in a text file or on a webpage.
The text source must contain page links between double-square-brackets or, alternatively, separated by newlines. The generator will yield each corresponding Page object.
- Parameters:
source (str | None) – the file path or URL that should be read. If no name is given, the generator prompts the user.
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.UnCategorizedCategoryGenerator(total=100, site=None)[source]#
Uncategorized category generator.
- pagegenerators._generators.UnCategorizedImageGenerator(total=100, site=None)[source]#
Uncategorized file generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.FilePage]
- pagegenerators._generators.UnCategorizedPageGenerator(total=100, site=None)[source]#
Uncategorized page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.UnCategorizedTemplateGenerator(total=100, site=None)[source]#
Uncategorized template generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.UnconnectedPageGenerator(site=None, total=None)[source]#
Iterate Page objects for all unconnected pages to a Wikibase repository.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.UnusedFilesGenerator(total=None, site=None)[source]#
Unused files generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.FilePage]
- pagegenerators._generators.UnwatchedPagesPageGenerator(total=None, site=None)[source]#
Unwatched page generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.UserContributionsGenerator(username, namespaces=None, site=None, total=None, _filter_unique=functools.partial(<function filter_unique>, key=<function <lambda>>))[source]#
Yield unique pages edited by user:username.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
namespaces (NamespaceArgType) – list of namespace numbers to fetch contribs from
site (BaseSite | None) – Site for generator results.
username (str)
_filter_unique (None | Callable[[Iterable[pywikibot.page.Page]], Iterable[pywikibot.page.Page]])
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.WantedPagesPageGenerator(total=100, site=None)[source]#
Wanted page generator.
- Parameters:
total (int) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- pagegenerators._generators.WikibaseItemGenerator(gen)[source]#
A wrapper generator used to yield Wikibase items of another generator.
- pagegenerators._generators.WikibaseSearchItemPageGenerator(text, language=None, total=None, site=None)[source]#
Generate pages that contain the provided text.
- Parameters:
text (str) – Text to look for.
language (str | None) – Code of the language to search in. If not specified, value from pywikibot.config.data_lang is used.
total (int | None) – Maximum number of pages to retrieve in total, or None in case of no limit.
site (BaseSite | None) – Site for generator results.
- Return type:
Generator[pywikibot.page.ItemPage, None, None]
- pagegenerators._generators.WikidataPageFromItemGenerator(gen, site)[source]#
Generate pages from site based on sitelinks of item pages.
- Parameters:
gen (Iterable[ItemPage]) – generator of
pywikibot.ItemPage
site (BaseSite) – Site for generator results.
- Return type:
Generator[Page, None, None]
- pagegenerators._generators.WikidataSPARQLPageGenerator(query, site=None, item_name='item', endpoint=None, entity_url=None, result_type=<class 'set'>)[source]#
Generate pages that result from the given SPARQL query.
- Parameters:
query (str) – the SPARQL query string.
site (BaseSite | None) – Site for generator results.
item_name (str) – name of the item in the SPARQL query
endpoint (str | None) – SPARQL endpoint URL
entity_url (str | None) – URL prefix for any entities returned in a query.
result_type (Any) – type of the iterable in which SPARQL results are stored (default set)
- Return type:
Iterator[pywikibot.page.Page]
- pagegenerators._generators.WithoutInterwikiPageGenerator(total=None, site=None)[source]#
Page lacking interwikis generator.
- Parameters:
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]
- class pagegenerators._generators.XMLDumpPageGenerator(filename, start=None, namespaces=None, site=None, text_predicate=None, content=False)[source]#
Bases:
Iterator
Xml iterator that yields Page objects.
Added in version 7.2: the
content
parameter- Parameters:
filename (str) – filename of XML dump
start (str | None) – skip entries below that value
namespaces (NamespaceArgType) – namespace filter
site (BaseSite | None) – current site for the generator
text_predicate (Callable[[str], bool] | None) – a callable with entry.text as parameter and boolean as result to indicate the generator should return the page or not
content – If True, assign old page content to Page.text
- Variables:
skipping – True if start parameter is given, else False
parser – holds the xmlreader.XmlDump parse method
- pagegenerators._generators.YearPageGenerator(start=1, end=2050, site=None)[source]#
Year page generator.
- Parameters:
site (BaseSite | None) – Site for generator results.
start (int)
end (int)
- Return type:
Generator[pywikibot.page.Page, None, None]
- pagegenerators._generators.page_with_property_generator(name, total=None, site=None)[source]#
Special:PagesWithProperty page generator.
- Parameters:
name (str) – Property name of pages to be retrieved
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results.
- Return type:
Iterable[pywikibot.page.Page]