pagegenerators
— Page Generators#
This module offers a wide variety of page generators.
A page generator is an object that is iterable (see PEP 255) and that yields page objects on which other scripts can then work.
Most of these functions just wrap a Site or Page method that returns a generator. For testing purposes listpages.py can be used, to print page titles to standard output.
These parameters are supported to specify which pages titles to print:
GENERATOR OPTIONS#
- -cat
Work on all pages which are in a specific category. Argument can also be given as “-cat:categoryname” or as “-cat:categoryname|fromtitle” (using # instead of | is also allowed in this one and the following)
- -catr
Like -cat, but also recursively includes pages in subcategories, sub-subcategories etc. of the given category. Argument can also be given as “-catr:categoryname” or as “-catr:categoryname|fromtitle”.
- -subcats
Work on all subcategories of a specific category. Argument can also be given as “-subcats:categoryname” or as “-subcats:categoryname|fromtitle”.
- -subcatsr
Like -subcats, but also includes sub-subcategories etc. of the given category. Argument can also be given as “-subcatsr:categoryname” or as “-subcatsr:categoryname|fromtitle”.
- -uncat
Work on all pages which are not categorised.
- -uncatcat
Work on all categories which are not categorised.
- -uncatfiles
Work on all files which are not categorised.
- -file
Read a list of pages to treat from the named text file. Page titles in the file may be either enclosed with [[brackets]], or be separated by new lines. Argument can also be given as “-file:filename”.
- -filelinks
Work on all pages that use a certain image/media file. Argument can also be given as “-filelinks:filename”.
- -search
Work on all pages that are found in a MediaWiki search across all namespaces.
- -logevents
Work on articles that were on a specified Special:Log. The value may be a comma separated list of these values:
logevent,username,start,end
or for backward compatibility:
logevent,username,total
Note
‘start’ is the most recent date and log events are iterated from present to past. If ‘start’ is not provided, it means ‘now’; if ‘end’ is not provided, it means ‘since the beginning’.
To use the default value, use an empty string. You have options for every type of logs given by the log event parameter which could be one of the following:
spamblacklist, titleblacklist, gblblock, renameuser, globalauth, gblrights, gblrename, abusefilter, massmessage, thanks, usermerge, block, protect, rights, delete, upload, move, import, patrol, merge, suppress, tag, managetags, contentmodel, review, stable, timedmediahandler, newusers
It uses the default number of pages 10.
Examples:
-logevents:move gives pages from move log (usually redirects) -logevents:delete,,20 gives 20 pages from deletion log -logevents:protect,Usr gives pages from protect log by user Usr -logevents:patrol,Usr,20 gives 20 patrolled pages by Usr -logevents:upload,,20121231,20100101 gives upload pages in the 2010s, 2011s, and 2012s -logevents:review,,20121231 gives review pages since the beginning till the 31 Dec 2012 -logevents:review,Usr,20121231 gives review pages by user Usr since the beginning till the 31 Dec 2012
In some cases it must be given as -logevents:”move,Usr,20”
- -interwiki
Work on the given page and all equivalent pages in other languages. This can, for example, be used to fight multi-site spamming. Attention: this will cause the bot to modify pages on several wiki sites, this is not well tested, so check your edits!
- -links
Work on all pages that are linked from a certain page. Argument can also be given as “-links:linkingpagetitle”.
- -liverecentchanges
Work on pages from the live recent changes feed. If used as -liverecentchanges:x, work on x recent changes.
- -imagesused
Work on all images that contained on a certain page. Can also be given as “-imagesused:linkingpagetitle”.
- -newimages
Work on the most recent new images. If given as -newimages:x, will work on x newest images.
- -newpages
Work on the most recent new pages. If given as -newpages:x, will work on x newest pages.
- -recentchanges
Work on the pages with the most recent changes. If given as -recentchanges:x, will work on the x most recently changed pages. If given as -recentchanges:offset,duration it will work on pages changed from ‘offset’ minutes with ‘duration’ minutes of timespan. rctags are supported too. The rctag must be the very first parameter part.
Examples:
-recentchanges:20 gives the 20 most recently changed pages -recentchanges:120,70 will give pages with 120 offset minutes and 70 minutes of timespan -recentchanges:visualeditor,10 gives the 10 most recently changed pages marked with ‘visualeditor’ -recentchanges:”mobile edit,60,35” will retrieve pages marked with ‘mobile edit’ for the given offset and timespan
- -unconnectedpages
Work on the most recent unconnected pages to the Wikibase repository. Given as -unconnectedpages:x, will work on the x most recent unconnected pages.
- -ref
Work on all pages that link to a certain page. Argument can also be given as “-ref:referredpagetitle”.
- -start
Specifies that the robot should go alphabetically through all pages on the home wiki, starting at the named page. Argument can also be given as “-start:pagetitle”.
You can also include a namespace. For example, “-start:Template:!” will make the bot work on all pages in the template namespace.
default value is start:!
- -prefixindex
Work on pages commencing with a common prefix.
- -transcludes
Work on all pages that use a certain template. Argument can also be given as “-transcludes:Title”.
- -unusedfiles
Work on all description pages of images/media files that are not used anywhere. Argument can be given as “-unusedfiles:n” where n is the maximum number of articles to work on.
- -lonelypages
Work on all articles that are not linked from any other article. Argument can be given as “-lonelypages:n” where n is the maximum number of articles to work on.
- -unwatched
Work on all articles that are not watched by anyone. Argument can be given as “-unwatched:n” where n is the maximum number of articles to work on.
- -property:name Work on all pages with a given property name from
Special:PagesWithProp.
- -usercontribs
Work on all articles that were edited by a certain user. (Example : -usercontribs:DumZiBoT)
- -weblink
Work on all articles that contain an external link to a given URL; may be given as “-weblink:url”
- -withoutinterwiki
Work on all pages that don’t have interlanguage links. Argument can be given as “-withoutinterwiki:n” where n is the total to fetch.
- -mysqlquery
Takes a MySQL query string like “SELECT page_namespace, page_title FROM page WHERE page_namespace = 0” and treats the resulting pages. See MySQL for more details.
- -sparql
Takes a SPARQL SELECT query string including ?item and works on the resulting pages.
- -sparqlendpoint
Specify SPARQL endpoint URL (optional). (Example: -sparqlendpoint:http://myserver.com/sparql)
- -searchitem
Takes a search string and works on Wikibase pages that contain it. Argument can be given as “-searchitem:text”, where text is the string to look for, or “-searchitem:lang:text”, where lang is the language to search items in.
- -wantedpages
Work on pages that are linked, but do not exist; may be given as “-wantedpages:n” where n is the maximum number of articles to work on.
- -wantedcategories
Work on categories that are used, but do not exist; may be given as “-wantedcategories:n” where n is the maximum number of categories to work on.
- -wantedfiles
Work on files that are used, but do not exist; may be given as “-wantedfiles:n” where n is the maximum number of files to work on.
- -wantedtemplates
Work on templates that are used, but do not exist; may be given as “-wantedtemplates:n” where n is the maximum number of templates to work on.
- -random
Work on random pages returned by [[Special:Random]]. Can also be given as “-random:n” where n is the number of pages to be returned.
- -randomredirect
Work on random redirect pages returned by [[Special:RandomRedirect]]. Can also be given as “-randomredirect:n” where n is the number of pages to be returned.
Work on all pages that are found in a Google search. You need a Google Web API license key. Note that Google doesn’t give out license keys anymore. See google_key in config.py for instructions. Argument can also be given as “-google:searchstring”.
- -page
Work on a single page. Argument can also be given as “-page:pagetitle”, and supplied multiple times for multiple pages.
- -pageid
Work on a single pageid. Argument can also be given as “-pageid:pageid1,pageid2,.” or “-pageid:’pageid1|pageid2|..’” and supplied multiple times for multiple pages.
- -linter
Work on pages that contain lint errors. Extension Linter must be available on the site. -linter select all categories. -linter:high, -linter:medium or -linter:low select all categories for that prio. Single categories can be selected with commas as in -linter:cat1,cat2,cat3
Adding ‘/int’ identifies Lint ID to start querying from: e.g. -linter:high/10000
-linter:show just shows available categories.
- -querypage:name Work on pages provided by a QueryPage-based special page,
see API:Querypage. (tip: use -limit:n to fetch only n pages).
-querypage shows special pages available.
- -url
Read a list of pages to treat from the provided URL. The URL must return text in the same format as expected for the -file argument, e.g. page titles separated by newlines or enclosed in brackets.
FILTER OPTIONS#
- -catfilter
Filter the page generator to only yield pages in the specified category. See -cat generator for argument format.
- -grep
A regular expression that needs to match the article otherwise the page won’t be returned. Multiple -grep:regexpr can be provided and the page will be returned if content is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character, including a newline.
- -grepnot
Like -grep, but return the page only if the regular expression does not match.
- -intersect
Work on the intersection of all the provided generators.
- -limit
When used with any other argument
-limit:n
specifies a set of pages, work on no more than n pages in total. If used with multiple generators, pages are yielded in a roundrobin way.- -namespaces
Filter the page generator to only yield pages in the
- -namespace
specified namespaces. Separate multiple namespace
- -ns
numbers or names with commas.
Examples:
-ns:0,2,4 -ns:Help,MediaWiki
You may use a preleading “not” to exclude the namespace.
Examples:
-ns:not:2,3 -ns:not:Help,File
If used with -newpages/-random/-randomredirect/-linter generators, -namespace/ns must be provided before -newpages/-random/-randomredirect/-linter. If used with -recentchanges generator, efficiency is improved if -namespace is provided before -recentchanges.
If used with -start generator, -namespace/ns shall contain only one value.
- -onlyif
A claim the page needs to contain, otherwise the item won’t be returned. The format is property=value,qualifier=value. Multiple (or none) qualifiers can be passed, separated by commas.
Examples:
P1=Q2 (property P1 must contain value Q2), P3=Q4,P5=Q6,P6=Q7 (property P3 with value Q4 and qualifiers: P5 with value Q6 and P6 with value Q7).
Value can be page ID, coordinate in format: latitude,longitude[,precision] (all values are in decimal degrees), year, or plain string.
The argument can be provided multiple times and the item page will be returned only if all claims are present. Argument can be also given as “-onlyif:expression”.
- -onlyifnot
A claim the page must not contain, otherwise the item won’t be returned. For usage and examples, see
-onlyif
above.- -ql
Filter pages based on page quality. This is only applicable if contentmodel equals ‘proofread-page’, otherwise has no effects. Valid values are in range 0-4. Multiple values can be comma-separated.
- -subpage
-subpage:n filters pages to only those that have depth n i.e. a depth of 0 filters out all pages that are subpages, and a depth of 1 filters out all pages that are subpages of subpages.
- -titleregex
A regular expression that needs to match the article title otherwise the page won’t be returned. Multiple -titleregex:regexpr can be provided and the page will be returned if title is matched by any of the regexpr provided. Case insensitive regular expressions will be used and dot matches any character.
- -titleregexnot
Like -titleregex, but return the page only if the regular expression does not match.
- pagegenerators.AllpagesPageGenerator(start='!', namespace=0, includeredirects=True, site=None, total=None, content=False)[source]#
Iterate Page objects for all titles in a single namespace.
If includeredirects is False, redirects are not included. If includeredirects equals the string ‘only’, only redirects are added.
- pagegenerators.CategorizedPageGenerator(category, recurse=False, start=None, total=None, content=False, namespaces=None)[source]#
Yield all pages in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (Category) –
namespaces (Sequence[int] | None) –
- Return type:
Iterable[Page]
- pagegenerators.CategoryFilterPageGenerator(generator, category_list)[source]#
Wrap a generator to filter pages by categories specified.
- pagegenerators.DayPageGenerator(start_month=1, end_month=12, site=None, year=2000)[source]#
Day page generator.
- pagegenerators.DequePreloadingGenerator(generator, groupsize=50, quiet=False)[source]#
Preload generator of type DequeGenerator.
- pagegenerators.EdittimeFilterPageGenerator(generator, last_edit_start=None, last_edit_end=None, first_edit_start=None, first_edit_end=None, show_filtered=False)[source]#
Wrap a generator to filter pages outside last or first edit range.
- Parameters:
generator (Iterable[Page]) – A generator object
last_edit_start (datetime | None) – Only yield pages last edited after this time
last_edit_end (datetime | None) – Only yield pages last edited before this time
first_edit_start (datetime | None) – Only yield pages first edited after this time
first_edit_end (datetime | None) – Only yield pages first edited before this time
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Iterator[Page]
- pagegenerators.FileLinksGenerator(referredFilePage, total=None, content=False)[source]#
Yield Pages on which referredFilePage file is displayed.
- class pagegenerators.GeneratorFactory(site=None, positional_arg_name=None, enabled_options=None, disabled_options=None)[source]#
Bases:
object
Process command line arguments and return appropriate page generator.
This factory is responsible for processing command line arguments that are used by many scripts and that determine which pages to work on.
Note
GeneratorFactory must be instantiated after global arguments are parsed except if site parameter is given.
- Parameters:
site (pywikibot.site.BaseSite | None) – Site for generator results
positional_arg_name (str | None) – generator to use for positional args, which do not begin with a hyphen
enabled_options (Iterable[str] | None) – only enable options given by this Iterable. This is priorized over disabled_options
disabled_options (Iterable[str] | None) – disable these given options and let them be handled by scripts options handler
- getCategory(category)[source]#
Return Category and start as defined by category.
- Parameters:
category (str) – category name with start parameter
- Return type:
Tuple[Category, str | None]
- getCategoryGen(category, recurse=False, content=False, gen_func=None)[source]#
Return generator based on Category defined by category and gen_func.
- Parameters:
category (str) – category name with start parameter
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
content (bool) – if True, retrieve the content of the current version of each page (default False)
gen_func (Callable | None) –
- Return type:
Any
- getCombinedGenerator(gen=None, preload=False)[source]#
Return the combination of all accumulated generators.
Only call this after all arguments have been parsed.
Changed in version 7.3: set the instance variable
is_preloading
to True or False.Changed in version 8.0: if
limit
option is set and multiple generators are given, pages are yieded in aroundrobin
way.
- handle_arg(arg)[source]#
Parse one argument at a time.
If it is recognized as an argument that specifies a generator, a generator is created and added to the accumulation list, and the function returns true. Otherwise, it returns false, so that caller can try parsing the argument. Call getCombinedGenerator() after all arguments have been parsed to get the final output generator.
New in version 6.0: renamed from
handleArg
- Parameters:
arg (str) – Pywikibot argument consisting of -name:value
- Returns:
True if the argument supplied was recognised by the factory
- Return type:
bool
- handle_args(args)[source]#
Handle command line arguments and return the rest as a list.
New in version 6.0.
Changed in version 7.3: Prioritize -namespaces options to solve problems with several generators like -newpages/-random/-randomredirect/-linter
- Parameters:
args (Iterable[str]) –
- Return type:
List[str]
- is_preloading: bool | None#
Return whether Page objects are preloaded. You may use this instance variable after
getCombinedGenerator()
is called e.g.:gen_factory = GeneratorFactory() print(gen_factory.is_preloading) # None gen = gen_factory.getCombinedGenerator() print(gen_factory.is_preloading) # True or False
Otherwise the value is undefined and gives None.
New in version 7.3.
- property namespaces: FrozenSet[Namespace]#
List of Namespace parameters.
Converts int or string namespaces to Namespace objects and change the storage to immutable once it has been accessed.
The resolving and validation of namespace command line arguments is performed in this method, as it depends on the site property which is lazy loaded to avoid being cached before the global arguments are handled.
- Returns:
namespaces selected using arguments
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool
- property site: BaseSite#
Generator site.
The generator site should not be accessed until after the global arguments have been handled, otherwise the default Site may be changed by global arguments, which will cause this cached value to be stale.
- Returns:
Site given to initializer, otherwise the default Site at the time this property is first accessed.
- class pagegenerators.GoogleSearchPageGenerator(query=None, site=None)[source]#
Bases:
GeneratorWrapper
Page generator using Google search results.
To use this generator, you need to install the package ‘google’:
https://pypi.org/project/google
This package has been available since 2010, hosted on GitHub since 2012, and provided by PyPI since 2013.
As there are concerns about Google’s Terms of Service, this generator prints a warning for each query.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
site (pywikibot.site.BaseSite | None) – Site for generator results.
query (str | None) –
- property generator: Iterator[Page]#
Yield results from
queryGoogle()
query.Google contains links in the format: https://de.wikipedia.org/wiki/en:Foobar
Changed in version 7.6: changed from iterator method to generator property
- static queryGoogle(query)[source]#
Perform a query using python package ‘google’.
The terms of service as at June 2014 give two conditions that may apply to use of search:
Don’t access [Google Services] using a method other than the interface and the instructions that [they] provide.
Don’t remove, obscure, or alter any legal notices displayed in or along with [Google] Services.
Both of those issues should be managed by the package ‘google’, however Pywikibot will at least ensure the user sees the TOS in order to comply with the second condition.
- Parameters:
query (str) –
- Return type:
Iterator[Any]
- pagegenerators.ImagesPageGenerator(pageWithImages, total=None, content=False)[source]#
Yield FilePages displayed on pageWithImages.
- pagegenerators.InterwikiPageGenerator(page)[source]#
Iterate over all interwiki (non-language) links on a page.
- pagegenerators.ItemClaimFilterPageGenerator(generator, prop, claim, qualifiers=None, negate=False)#
Yield all ItemPages which contain certain claim in a property.
- Parameters:
prop (str) – property id to check
claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).
qualifiers (Dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant
negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise
generator (Iterable[Page]) –
- Return type:
Iterator[Page]
- pagegenerators.LanguageLinksPageGenerator(page, total=None)[source]#
Iterate over all interwiki language links on a page.
- pagegenerators.LinkedPageGenerator(linkingPage, total=None, content=False)[source]#
Yield all pages linked from a specific page.
See
page.BasePage.linkedPages
for details.- Parameters:
linkingPage (Page) – the page that links to the pages we want
total (int | None) – the total number of pages to iterate
content (bool) – if True, retrieve the current content of each linked page
- Returns:
a generator that yields Page objects of pages linked to linkingPage
- Return type:
Iterable[Page]
- pagegenerators.LinksearchPageGenerator(url, namespaces=None, total=None, site=None, protocol=None)[source]#
Yield all pages that link to a certain URL.
- Parameters:
url (str) – The URL to search for (with ot without the protocol prefix); this may include a ‘*’ as a wildcard, only at the start of the hostname
namespaces (List[int] | None) – list of namespace numbers to fetch contribs from
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results
protocol (str | None) – Protocol to search for, likely http or https, http by default. Full list shown on Special:LinkSearch wikipage
- Return type:
Iterable[Page]
- pagegenerators.LiveRCPageGenerator(site=None, total=None)[source]#
Yield pages from a socket.io RC stream.
Generates pages based on the EventStreams Server-Sent-Event (SSE) recent changes stream. The Page objects will have an extra property ._rcinfo containing the literal rc data. This can be used to e.g. filter only new pages. See
pywikibot.comms.eventstreams.rc_listener
for details on the .rcinfo format.
- pagegenerators.LogeventsPageGenerator(logtype=None, user=None, site=None, namespace=None, total=None, start=None, end=None, reverse=False)[source]#
Generate Pages for specified modes of logevents.
- Parameters:
logtype (str | None) – Mode of logs to retrieve
user (str | None) – User of logs retrieved
site (BaseSite | None) – Site for generator results
namespace (int | None) – Namespace to retrieve logs from
total (int | None) – Maximum number of pages to retrieve in total
start (Timestamp | None) – Timestamp to start listing from
end (Timestamp | None) – Timestamp to end listing at
reverse (bool) – if True, start with oldest changes (default: newest)
- Return type:
Iterator[Page]
- pagegenerators.MySQLPageGenerator(query, site=None, verbose=None)[source]#
Yield a list of pages based on a MySQL query.
The query should return two columns, page namespace and page title pairs from some table. An example query that yields all ns0 pages might look like:
SELECT page_namespace, page_title FROM page WHERE page_namespace = 0;
See also
- pagegenerators.NamespaceFilterPageGenerator(generator, namespaces, site=None)[source]#
A generator yielding pages from another generator in given namespaces.
If a site is provided, the namespaces are validated using the namespaces of that site, otherwise the namespaces are validated using the default site.
Note
API-based generators that have a “namespaces” parameter perform namespace filtering more efficiently than this generator.
- Parameters:
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool, or more than one namespace if the API module does not support multiple namespaces
- Return type:
Iterator[Page]
- pagegenerators.NewpagesPageGenerator(site=None, namespaces=(0,), total=None)[source]#
Iterate Page objects for all new titles in a single namespace.
- pagegenerators.PageClassGenerator(generator)[source]#
Yield pages from another generator as Page subclass objects.
The page class type depends on the page namespace. Objects may be Category, FilePage, Userpage or Page.
- pagegenerators.PageTitleFilterPageGenerator(generator, ignore_list)[source]#
Yield only those pages are not listed in the ignore list.
- pagegenerators.PageWithTalkPageGenerator(generator, return_talk_only=False)[source]#
Yield pages and associated talk pages from another generator.
Only yields talk pages if the original generator yields a non-talk page, and does not check if the talk page in fact exists.
- pagegenerators.PagesFromPageidGenerator(pageids, site=None)[source]#
Return a page generator from pageids.
Pages are iterated in the same order than in the underlying pageids. Pageids are filtered and only one page is returned in case of duplicate pageid.
- pagegenerators.PagesFromTitlesGenerator(iterable, site=None)[source]#
Generate pages from the titles (strings) yielded by iterable.
- class pagegenerators.PetScanPageGenerator(categories, subset_combination=True, namespaces=None, site=None, extra_options=None)[source]#
Bases:
GeneratorWrapper
Queries PetScan to generate pages.
See also
New in version 3.0.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
categories (Sequence[str]) – List of category names to retrieve pages from
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Sequence[str | pywikibot.site.Namespace] | None) – List of namespaces to search in (default is None, meaning all namespaces)
site (pywikibot.site.BaseSite | None) – Site to operate on (default is the default site from the user config)
extra_options (Dict[Any, Any] | None) – Dictionary of extra options to use (optional)
- buildQuery(categories, subset_combination, namespaces, extra_options)[source]#
Get the querystring options to query PetScan.
- Parameters:
categories (Sequence[str]) – List of categories (as strings)
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Sequence[str | Namespace] | None) – List of namespaces to search in
extra_options (Dict[Any, Any] | None) – Dictionary of extra options to use
- Returns:
Dictionary of querystring parameters to use in the query
- Return type:
Dict[str, Any]
- property generator: Iterator[Page]#
Yield results from
query()
.Changed in version 7.6: changed from iterator method to generator property
- query()[source]#
Query PetScan.
Changed in version 7.4: raises
APIError
if query returns an error message.- Raises:
ServerError – Either ReadTimeout or server status error
APIError – error response from petscan
- Return type:
Iterator[Dict[str, Any]]
- pagegenerators.PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site=None, total=None, content=False)[source]#
Prefixed Page generator.
- Parameters:
prefix (str) – The prefix of the pages.
namespace (int | Namespace | None) – Namespace to retrieve pages from
includeredirects (None | bool | str) – If includeredirects is None, False or an empty string, redirects will not be found. If includeredirects equals the string ‘only’, only redirects will be found. Otherwise redirects will be included.
site (BaseSite | None) – Site for generator results.
total (int | None) – Maximum number of pages to retrieve in total
content (bool) – If True, load current version of each page (default False)
- Returns:
a generator that yields Page objects
- Return type:
Iterable[Page]
- pagegenerators.PreloadingEntityGenerator(generator, groupsize=50)[source]#
Yield preloaded pages taken from another generator.
Function basically is copied from above, but for Wikibase entities.
- pagegenerators.PreloadingGenerator(generator, groupsize=50, quiet=False)[source]#
Yield preloaded pages taken from another generator.
- pagegenerators.QualityFilterPageGenerator(generator, quality)[source]#
Wrap a generator to filter pages according to quality levels.
This is possible only for pages with content_model ‘proofread-page’. In all the other cases, no filter is applied.
- pagegenerators.RandomPageGenerator(total=None, site=None, namespaces=None)[source]#
Random page generator.
- pagegenerators.RandomRedirectPageGenerator(total=None, site=None, namespaces=None)[source]#
Random redirect generator.
- pagegenerators.RecentChangesPageGenerator(site=None, _filter_unique=None, **kwargs)[source]#
Generate pages that are in the recent changes list, including duplicates.
For keyword parameters refer
APISite.recentchanges()
.Changed in version 8.2: The YieldType depends on namespace. It can be
pywikibot.Page
,pywikibot.User
,pywikibot.FilePage
orpywikibot.Category
.
- pagegenerators.RedirectFilterPageGenerator(generator, no_redirects=True, show_filtered=False)[source]#
Yield pages from another generator that are redirects or not.
- pagegenerators.RegexBodyFilterPageGenerator(generator, regex, quantifier='any')#
Yield pages from another generator whose body matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
For parameters see titlefilter above.
- pagegenerators.RegexFilterPageGenerator(generator, regex, quantifier='any', ignore_namespace=True)#
Yield pages from another generator whose title matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
If ignore_namespace is False, the whole page title is compared.
Note
if you want to check for a match at the beginning of the title, you have to start the regex with “^”
- Parameters:
generator (Iterable[Page]) – another generator
regex (str | Pattern[str] | Sequence[str] | Sequence[Pattern[str]]) – a regex which should match the page title
quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes
ignore_namespace (bool) – ignore the namespace when matching the title
- Returns:
return a page depending on the matching parameters
- Return type:
Iterator[Page]
- pagegenerators.RepeatingGenerator(generator, key_func=<function <lambda>>, sleep_duration=60, total=None, **kwargs)[source]#
Yield items in live time.
The provided generator must support parameter ‘start’, ‘end’, ‘reverse’, and ‘total’ such as site.recentchanges(), site.logevents().
To fetch revisions in recentchanges in live time:
gen = RepeatingGenerator(site.recentchanges, lambda x: x['revid'])
To fetch new pages in live time:
gen = RepeatingGenerator(site.newpages, lambda x: x[0])
Note that other parameters not listed below will be passed to the generator function. Parameter ‘reverse’, ‘start’, ‘end’ will always be discarded to prevent the generator yielding items in wrong order.
- Parameters:
generator (Callable) – a function returning a generator that will be queried
key_func (Callable[[Any], Any]) – a function returning key that will be used to detect duplicate entry
sleep_duration (int) – duration between each query
total (int | None) – if it is a positive number, iterate no more than this number of items in total. Otherwise, iterate forever
kwargs (Any) –
- Returns:
a generator yielding items in ascending order by time
- Return type:
Iterator[Page]
- pagegenerators.SearchPageGenerator(query, total=None, namespaces=None, site=None)[source]#
Yield pages from the MediaWiki internal search engine.
- pagegenerators.SubCategoriesPageGenerator(category, recurse=False, start=None, total=None, content=False)[source]#
Yield all subcategories in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (Category) –
- Return type:
Iterable[Page]
- pagegenerators.SubpageFilterGenerator(generator, max_depth=0, show_filtered=False)[source]#
Generator which filters out subpages based on depth.
It looks at the namespace of each page and checks if that namespace has subpages enabled. If so, pages with forward slashes (‘/’) are excluded.
- pagegenerators.TextIOPageGenerator(source=None, site=None)[source]#
Iterate pages from a list in a text file or on a webpage.
The text source must contain page links between double-square-brackets or, alternatively, separated by newlines. The generator will yield each corresponding Page object.
- pagegenerators.UnCategorizedCategoryGenerator(total=100, site=None)[source]#
Uncategorized category generator.
- pagegenerators.UnCategorizedImageGenerator(total=100, site=None)[source]#
Uncategorized file generator.
- pagegenerators.UnCategorizedPageGenerator(total=100, site=None)[source]#
Uncategorized page generator.
- pagegenerators.UnCategorizedTemplateGenerator(total=100, site=None)[source]#
Uncategorized template generator.
- pagegenerators.UnconnectedPageGenerator(site=None, total=None)[source]#
Iterate Page objects for all unconnected pages to a Wikibase repository.
- pagegenerators.UnwatchedPagesPageGenerator(total=None, site=None)[source]#
Unwatched page generator.
- pagegenerators.UserContributionsGenerator(username, namespaces=None, site=None, total=None, _filter_unique=functools.partial(<function filter_unique>, key=<function <lambda>>))[source]#
Yield unique pages edited by user:username.
- Parameters:
- Return type:
Iterator[Page]
- pagegenerators.UserEditFilterGenerator(generator, username, timestamp=None, skip=False, max_revision_depth=None, show_filtered=False)[source]#
Generator which will yield Pages modified by username.
It only looks at the last editors given by max_revision_depth. If timestamp is set in MediaWiki format JJJJMMDDhhmmss, older edits are ignored. If skip is set, pages edited by the given user are ignored otherwise only pages edited by this user are given back.
- Parameters:
generator (Iterable[Page]) – A generator object
username (str) – user name which edited the page
timestamp (None | str | datetime) – ignore edits which are older than this timestamp
skip (bool) – Ignore pages edited by the given user
max_revision_depth (int | None) – It only looks at the last editors given by max_revision_depth
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Iterator[Page]
- pagegenerators.WikibaseItemFilterPageGenerator(generator, has_item=True, show_filtered=False)[source]#
A wrapper generator used to exclude if page has a Wikibase item or not.
- pagegenerators.WikibaseItemGenerator(gen)[source]#
A wrapper generator used to yield Wikibase items of another generator.
- pagegenerators.WikibaseSearchItemPageGenerator(text, language=None, total=None, site=None)[source]#
Generate pages that contain the provided text.
- Parameters:
text (str) – Text to look for.
language (str | None) – Code of the language to search in. If not specified, value from pywikibot.config.data_lang is used.
total (int | None) – Maximum number of pages to retrieve in total, or None in case of no limit.
site (BaseSite | None) – Site for generator results.
- Return type:
Iterator[ItemPage]
- pagegenerators.WikidataPageFromItemGenerator(gen, site)[source]#
Generate pages from site based on sitelinks of item pages.
- Parameters:
gen (Iterable[ItemPage]) – generator of
pywikibot.ItemPage
site (BaseSite) – Site for generator results.
- Return type:
Iterator[Page]
- pagegenerators.WikidataSPARQLPageGenerator(query, site=None, item_name='item', endpoint=None, entity_url=None, result_type=<class 'set'>)[source]#
Generate pages that result from the given SPARQL query.
- Parameters:
query (str) – the SPARQL query string.
site (BaseSite | None) – Site for generator results.
item_name (str) – name of the item in the SPARQL query
endpoint (str | None) – SPARQL endpoint URL
entity_url (str | None) – URL prefix for any entities returned in a query.
result_type (Any) – type of the iterable in which SPARQL results are stored (default set)
- Return type:
Iterator[Page]
- pagegenerators.WithoutInterwikiPageGenerator(total=None, site=None)[source]#
Page lacking interwikis generator.
- class pagegenerators.XMLDumpPageGenerator(filename, start=None, namespaces=None, site=None, text_predicate=None, content=False)[source]#
Bases:
Iterator
Xml iterator that yields Page objects.
New in version 7.2: the
content
parameter- Parameters:
filename (str) – filename of XML dump
start (str | None) – skip entries below that value
namespaces (None | str | pywikibot.site.Namespace | Sequence[str | pywikibot.site.Namespace]) – namespace filter
site (pywikibot.site.BaseSite | None) – current site for the generator
text_predicate (Callable[[str], bool] | None) – a callable with entry.text as parameter and boolean as result to indicate the generator should return the page or not
content – If True, assign old page content to Page.text
- Variables:
skipping – True if start parameter is given, else False
parser – holds the xmlreader.XmlDump parse method
- pagegenerators.page_with_property_generator(name, total=None, site=None)[source]#
Special:PagesWithProperty page generator.
pagegenerators._factory
— Pagegenerators Options Handler#
GeneratorFactory module wich handles pagegenerators options.
- class pagegenerators._factory.GeneratorFactory(site=None, positional_arg_name=None, enabled_options=None, disabled_options=None)[source]#
Bases:
object
Process command line arguments and return appropriate page generator.
This factory is responsible for processing command line arguments that are used by many scripts and that determine which pages to work on.
Note
GeneratorFactory must be instantiated after global arguments are parsed except if site parameter is given.
- Parameters:
site (pywikibot.site.BaseSite | None) – Site for generator results
positional_arg_name (str | None) – generator to use for positional args, which do not begin with a hyphen
enabled_options (Iterable[str] | None) – only enable options given by this Iterable. This is priorized over disabled_options
disabled_options (Iterable[str] | None) – disable these given options and let them be handled by scripts options handler
- getCategory(category)[source]#
Return Category and start as defined by category.
- Parameters:
category (str) – category name with start parameter
- Return type:
Tuple[Category, str | None]
- getCategoryGen(category, recurse=False, content=False, gen_func=None)[source]#
Return generator based on Category defined by category and gen_func.
- Parameters:
category (str) – category name with start parameter
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
content (bool) – if True, retrieve the content of the current version of each page (default False)
gen_func (Callable | None) –
- Return type:
Any
- getCombinedGenerator(gen=None, preload=False)[source]#
Return the combination of all accumulated generators.
Only call this after all arguments have been parsed.
Changed in version 7.3: set the instance variable
is_preloading
to True or False.Changed in version 8.0: if
limit
option is set and multiple generators are given, pages are yieded in aroundrobin
way.
- handle_arg(arg)[source]#
Parse one argument at a time.
If it is recognized as an argument that specifies a generator, a generator is created and added to the accumulation list, and the function returns true. Otherwise, it returns false, so that caller can try parsing the argument. Call getCombinedGenerator() after all arguments have been parsed to get the final output generator.
New in version 6.0: renamed from
handleArg
- Parameters:
arg (str) – Pywikibot argument consisting of -name:value
- Returns:
True if the argument supplied was recognised by the factory
- Return type:
bool
- handle_args(args)[source]#
Handle command line arguments and return the rest as a list.
New in version 6.0.
Changed in version 7.3: Prioritize -namespaces options to solve problems with several generators like -newpages/-random/-randomredirect/-linter
- Parameters:
args (Iterable[str]) –
- Return type:
List[str]
- is_preloading: bool | None#
Return whether Page objects are preloaded. You may use this instance variable after
getCombinedGenerator()
is called e.g.:gen_factory = GeneratorFactory() print(gen_factory.is_preloading) # None gen = gen_factory.getCombinedGenerator() print(gen_factory.is_preloading) # True or False
Otherwise the value is undefined and gives None.
New in version 7.3.
- property namespaces: FrozenSet[Namespace]#
List of Namespace parameters.
Converts int or string namespaces to Namespace objects and change the storage to immutable once it has been accessed.
The resolving and validation of namespace command line arguments is performed in this method, as it depends on the site property which is lazy loaded to avoid being cached before the global arguments are handled.
- Returns:
namespaces selected using arguments
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool
- property site: BaseSite#
Generator site.
The generator site should not be accessed until after the global arguments have been handled, otherwise the default Site may be changed by global arguments, which will cause this cached value to be stale.
- Returns:
Site given to initializer, otherwise the default Site at the time this property is first accessed.
pagegenerators._filters
— Filter Funtions#
Page filter generators provided by the pagegenerators module.
- pagegenerators._filters.CategoryFilterPageGenerator(generator, category_list)[source]#
Wrap a generator to filter pages by categories specified.
- pagegenerators._filters.EdittimeFilterPageGenerator(generator, last_edit_start=None, last_edit_end=None, first_edit_start=None, first_edit_end=None, show_filtered=False)[source]#
Wrap a generator to filter pages outside last or first edit range.
- Parameters:
generator (Iterable[Page]) – A generator object
last_edit_start (datetime | None) – Only yield pages last edited after this time
last_edit_end (datetime | None) – Only yield pages last edited before this time
first_edit_start (datetime | None) – Only yield pages first edited after this time
first_edit_end (datetime | None) – Only yield pages first edited before this time
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Iterator[Page]
- class pagegenerators._filters.ItemClaimFilter[source]#
Bases:
object
Item claim filter.
- classmethod filter(generator, prop, claim, qualifiers=None, negate=False)[source]#
Yield all ItemPages which contain certain claim in a property.
- Parameters:
prop (str) – property id to check
claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).
qualifiers (Dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant
negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise
generator (Iterable[Page]) –
- Return type:
Iterator[Page]
- page_classes = {False: <class 'pywikibot.page._wikibase.ItemPage'>, True: <class 'pywikibot.page._wikibase.PropertyPage'>}#
- pagegenerators._filters.ItemClaimFilterPageGenerator(generator, prop, claim, qualifiers=None, negate=False)#
Yield all ItemPages which contain certain claim in a property.
- Parameters:
prop (str) – property id to check
claim (str) – value of the property to check. Can be exact value (for instance, ItemPage instance) or a string (e.g. ‘Q37470’).
qualifiers (Dict[str, str] | None) – dict of qualifiers that must be present, or None if qualifiers are irrelevant
negate (bool) – true if pages that do not contain specified claim should be yielded, false otherwise
generator (Iterable[Page]) –
- Return type:
Iterator[Page]
- pagegenerators._filters.NamespaceFilterPageGenerator(generator, namespaces, site=None)[source]#
A generator yielding pages from another generator in given namespaces.
If a site is provided, the namespaces are validated using the namespaces of that site, otherwise the namespaces are validated using the default site.
Note
API-based generators that have a “namespaces” parameter perform namespace filtering more efficiently than this generator.
- Parameters:
- Raises:
KeyError – a namespace identifier was not resolved
TypeError – a namespace identifier has an inappropriate type such as NoneType or bool, or more than one namespace if the API module does not support multiple namespaces
- Return type:
Iterator[Page]
- pagegenerators._filters.PageTitleFilterPageGenerator(generator, ignore_list)[source]#
Yield only those pages are not listed in the ignore list.
- pagegenerators._filters.QualityFilterPageGenerator(generator, quality)[source]#
Wrap a generator to filter pages according to quality levels.
This is possible only for pages with content_model ‘proofread-page’. In all the other cases, no filter is applied.
- pagegenerators._filters.RedirectFilterPageGenerator(generator, no_redirects=True, show_filtered=False)[source]#
Yield pages from another generator that are redirects or not.
- pagegenerators._filters.RegexBodyFilterPageGenerator(generator, regex, quantifier='any')#
Yield pages from another generator whose body matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
For parameters see titlefilter above.
- class pagegenerators._filters.RegexFilter[source]#
Bases:
object
Regex filter.
- classmethod contentfilter(generator, regex, quantifier='any')[source]#
Yield pages from another generator whose body matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
For parameters see titlefilter above.
- classmethod titlefilter(generator, regex, quantifier='any', ignore_namespace=True)[source]#
Yield pages from another generator whose title matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
If ignore_namespace is False, the whole page title is compared.
Note
if you want to check for a match at the beginning of the title, you have to start the regex with “^”
- Parameters:
generator (Iterable[Page]) – another generator
regex (str | Pattern[str] | Sequence[str] | Sequence[Pattern[str]]) – a regex which should match the page title
quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes
ignore_namespace (bool) – ignore the namespace when matching the title
- Returns:
return a page depending on the matching parameters
- Return type:
Iterator[Page]
- pagegenerators._filters.RegexFilterPageGenerator(generator, regex, quantifier='any', ignore_namespace=True)#
Yield pages from another generator whose title matches regex.
Uses regex option re.IGNORECASE depending on the quantifier parameter.
If ignore_namespace is False, the whole page title is compared.
Note
if you want to check for a match at the beginning of the title, you have to start the regex with “^”
- Parameters:
generator (Iterable[Page]) – another generator
regex (str | Pattern[str] | Sequence[str] | Sequence[Pattern[str]]) – a regex which should match the page title
quantifier (str) – must be one of the following values: ‘all’ - yields page if title is matched by all regexes ‘any’ - yields page if title is matched by any regexes ‘none’ - yields page if title is NOT matched by any regexes
ignore_namespace (bool) – ignore the namespace when matching the title
- Returns:
return a page depending on the matching parameters
- Return type:
Iterator[Page]
- pagegenerators._filters.SubpageFilterGenerator(generator, max_depth=0, show_filtered=False)[source]#
Generator which filters out subpages based on depth.
It looks at the namespace of each page and checks if that namespace has subpages enabled. If so, pages with forward slashes (‘/’) are excluded.
- pagegenerators._filters.UserEditFilterGenerator(generator, username, timestamp=None, skip=False, max_revision_depth=None, show_filtered=False)[source]#
Generator which will yield Pages modified by username.
It only looks at the last editors given by max_revision_depth. If timestamp is set in MediaWiki format JJJJMMDDhhmmss, older edits are ignored. If skip is set, pages edited by the given user are ignored otherwise only pages edited by this user are given back.
- Parameters:
generator (Iterable[Page]) – A generator object
username (str) – user name which edited the page
timestamp (None | str | datetime) – ignore edits which are older than this timestamp
skip (bool) – Ignore pages edited by the given user
max_revision_depth (int | None) – It only looks at the last editors given by max_revision_depth
show_filtered (bool) – Output a message for each page not yielded
- Return type:
Iterator[Page]
pagegenerators._generators
— Generator Functions#
Page filter generators provided by the pagegenerators module.
- pagegenerators._generators.AllpagesPageGenerator(start='!', namespace=0, includeredirects=True, site=None, total=None, content=False)[source]#
Iterate Page objects for all titles in a single namespace.
If includeredirects is False, redirects are not included. If includeredirects equals the string ‘only’, only redirects are added.
- pagegenerators._generators.AncientPagesPageGenerator(total=100, site=None)[source]#
Ancient page generator.
- pagegenerators._generators.CategorizedPageGenerator(category, recurse=False, start=None, total=None, content=False, namespaces=None)[source]#
Yield all pages in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (Category) –
namespaces (Sequence[int] | None) –
- Return type:
Iterable[Page]
- pagegenerators._generators.DayPageGenerator(start_month=1, end_month=12, site=None, year=2000)[source]#
Day page generator.
- pagegenerators._generators.DeadendPagesPageGenerator(total=100, site=None)[source]#
Dead-end page generator.
- pagegenerators._generators.FileLinksGenerator(referredFilePage, total=None, content=False)[source]#
Yield Pages on which referredFilePage file is displayed.
- class pagegenerators._generators.GoogleSearchPageGenerator(query=None, site=None)[source]#
Bases:
GeneratorWrapper
Page generator using Google search results.
To use this generator, you need to install the package ‘google’:
https://pypi.org/project/google
This package has been available since 2010, hosted on GitHub since 2012, and provided by PyPI since 2013.
As there are concerns about Google’s Terms of Service, this generator prints a warning for each query.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
site (pywikibot.site.BaseSite | None) – Site for generator results.
query (str | None) –
- property generator: Iterator[Page]#
Yield results from
queryGoogle()
query.Google contains links in the format: https://de.wikipedia.org/wiki/en:Foobar
Changed in version 7.6: changed from iterator method to generator property
- static queryGoogle(query)[source]#
Perform a query using python package ‘google’.
The terms of service as at June 2014 give two conditions that may apply to use of search:
Don’t access [Google Services] using a method other than the interface and the instructions that [they] provide.
Don’t remove, obscure, or alter any legal notices displayed in or along with [Google] Services.
Both of those issues should be managed by the package ‘google’, however Pywikibot will at least ensure the user sees the TOS in order to comply with the second condition.
- Parameters:
query (str) –
- Return type:
Iterator[Any]
- pagegenerators._generators.ImagesPageGenerator(pageWithImages, total=None, content=False)[source]#
Yield FilePages displayed on pageWithImages.
- pagegenerators._generators.InterwikiPageGenerator(page)[source]#
Iterate over all interwiki (non-language) links on a page.
- pagegenerators._generators.LanguageLinksPageGenerator(page, total=None)[source]#
Iterate over all interwiki language links on a page.
- pagegenerators._generators.LinkedPageGenerator(linkingPage, total=None, content=False)[source]#
Yield all pages linked from a specific page.
See
page.BasePage.linkedPages
for details.- Parameters:
linkingPage (Page) – the page that links to the pages we want
total (int | None) – the total number of pages to iterate
content (bool) – if True, retrieve the current content of each linked page
- Returns:
a generator that yields Page objects of pages linked to linkingPage
- Return type:
Iterable[Page]
- pagegenerators._generators.LinksearchPageGenerator(url, namespaces=None, total=None, site=None, protocol=None)[source]#
Yield all pages that link to a certain URL.
- Parameters:
url (str) – The URL to search for (with ot without the protocol prefix); this may include a ‘*’ as a wildcard, only at the start of the hostname
namespaces (List[int] | None) – list of namespace numbers to fetch contribs from
total (int | None) – Maximum number of pages to retrieve in total
site (BaseSite | None) – Site for generator results
protocol (str | None) – Protocol to search for, likely http or https, http by default. Full list shown on Special:LinkSearch wikipage
- Return type:
Iterable[Page]
- pagegenerators._generators.LiveRCPageGenerator(site=None, total=None)[source]#
Yield pages from a socket.io RC stream.
Generates pages based on the EventStreams Server-Sent-Event (SSE) recent changes stream. The Page objects will have an extra property ._rcinfo containing the literal rc data. This can be used to e.g. filter only new pages. See
pywikibot.comms.eventstreams.rc_listener
for details on the .rcinfo format.
- pagegenerators._generators.LogeventsPageGenerator(logtype=None, user=None, site=None, namespace=None, total=None, start=None, end=None, reverse=False)[source]#
Generate Pages for specified modes of logevents.
- Parameters:
logtype (str | None) – Mode of logs to retrieve
user (str | None) – User of logs retrieved
site (BaseSite | None) – Site for generator results
namespace (int | None) – Namespace to retrieve logs from
total (int | None) – Maximum number of pages to retrieve in total
start (Timestamp | None) – Timestamp to start listing from
end (Timestamp | None) – Timestamp to end listing at
reverse (bool) – if True, start with oldest changes (default: newest)
- Return type:
Iterator[Page]
- pagegenerators._generators.LonelyPagesPageGenerator(total=None, site=None)[source]#
Lonely page generator.
- pagegenerators._generators.LongPagesPageGenerator(total=100, site=None)[source]#
Long page generator.
- pagegenerators._generators.MySQLPageGenerator(query, site=None, verbose=None)[source]#
Yield a list of pages based on a MySQL query.
The query should return two columns, page namespace and page title pairs from some table. An example query that yields all ns0 pages might look like:
SELECT page_namespace, page_title FROM page WHERE page_namespace = 0;
See also
- pagegenerators._generators.NewimagesPageGenerator(total=None, site=None)[source]#
New file generator.
- pagegenerators._generators.NewpagesPageGenerator(site=None, namespaces=(0,), total=None)[source]#
Iterate Page objects for all new titles in a single namespace.
- pagegenerators._generators.PagesFromPageidGenerator(pageids, site=None)[source]#
Return a page generator from pageids.
Pages are iterated in the same order than in the underlying pageids. Pageids are filtered and only one page is returned in case of duplicate pageid.
- pagegenerators._generators.PagesFromTitlesGenerator(iterable, site=None)[source]#
Generate pages from the titles (strings) yielded by iterable.
- class pagegenerators._generators.PetScanPageGenerator(categories, subset_combination=True, namespaces=None, site=None, extra_options=None)[source]#
Bases:
GeneratorWrapper
Queries PetScan to generate pages.
See also
New in version 3.0.
Changed in version 7.6: subclassed from
tools.collections.GeneratorWrapper
- Parameters:
categories (Sequence[str]) – List of category names to retrieve pages from
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Sequence[str | pywikibot.site.Namespace] | None) – List of namespaces to search in (default is None, meaning all namespaces)
site (pywikibot.site.BaseSite | None) – Site to operate on (default is the default site from the user config)
extra_options (Dict[Any, Any] | None) – Dictionary of extra options to use (optional)
- buildQuery(categories, subset_combination, namespaces, extra_options)[source]#
Get the querystring options to query PetScan.
- Parameters:
categories (Sequence[str]) – List of categories (as strings)
subset_combination (bool) – Combination mode. If True, returns the intersection of the results of the categories, else returns the union of the results of the categories
namespaces (Sequence[str | Namespace] | None) – List of namespaces to search in
extra_options (Dict[Any, Any] | None) – Dictionary of extra options to use
- Returns:
Dictionary of querystring parameters to use in the query
- Return type:
Dict[str, Any]
- property generator: Iterator[Page]#
Yield results from
query()
.Changed in version 7.6: changed from iterator method to generator property
- query()[source]#
Query PetScan.
Changed in version 7.4: raises
APIError
if query returns an error message.- Raises:
ServerError – Either ReadTimeout or server status error
APIError – error response from petscan
- Return type:
Iterator[Dict[str, Any]]
- pagegenerators._generators.PrefixingPageGenerator(prefix, namespace=None, includeredirects=True, site=None, total=None, content=False)[source]#
Prefixed Page generator.
- Parameters:
prefix (str) – The prefix of the pages.
namespace (int | Namespace | None) – Namespace to retrieve pages from
includeredirects (None | bool | str) – If includeredirects is None, False or an empty string, redirects will not be found. If includeredirects equals the string ‘only’, only redirects will be found. Otherwise redirects will be included.
site (BaseSite | None) – Site for generator results.
total (int | None) – Maximum number of pages to retrieve in total
content (bool) – If True, load current version of each page (default False)
- Returns:
a generator that yields Page objects
- Return type:
Iterable[Page]
- pagegenerators._generators.RandomPageGenerator(total=None, site=None, namespaces=None)[source]#
Random page generator.
- pagegenerators._generators.RandomRedirectPageGenerator(total=None, site=None, namespaces=None)[source]#
Random redirect generator.
- pagegenerators._generators.RecentChangesPageGenerator(site=None, _filter_unique=None, **kwargs)[source]#
Generate pages that are in the recent changes list, including duplicates.
For keyword parameters refer
APISite.recentchanges()
.Changed in version 8.2: The YieldType depends on namespace. It can be
pywikibot.Page
,pywikibot.User
,pywikibot.FilePage
orpywikibot.Category
.
- pagegenerators._generators.SearchPageGenerator(query, total=None, namespaces=None, site=None)[source]#
Yield pages from the MediaWiki internal search engine.
- pagegenerators._generators.ShortPagesPageGenerator(total=100, site=None)[source]#
Short page generator.
- pagegenerators._generators.SubCategoriesPageGenerator(category, recurse=False, start=None, total=None, content=False)[source]#
Yield all subcategories in a specific category.
- Parameters:
recurse (int | bool) – if not False or 0, also iterate articles in subcategories. If an int, limit recursion to this number of levels. (Example: recurse=1 will iterate articles in first-level subcats, but no deeper.)
start (str | None) – if provided, only generate pages >= this title lexically
total (int | None) – iterate no more than this number of pages in total (at all levels)
content (bool) – if True, retrieve the content of the current version of each page (default False)
category (Category) –
- Return type:
Iterable[Page]
- pagegenerators._generators.TextIOPageGenerator(source=None, site=None)[source]#
Iterate pages from a list in a text file or on a webpage.
The text source must contain page links between double-square-brackets or, alternatively, separated by newlines. The generator will yield each corresponding Page object.
- pagegenerators._generators.UnCategorizedCategoryGenerator(total=100, site=None)[source]#
Uncategorized category generator.
- pagegenerators._generators.UnCategorizedImageGenerator(total=100, site=None)[source]#
Uncategorized file generator.
- pagegenerators._generators.UnCategorizedPageGenerator(total=100, site=None)[source]#
Uncategorized page generator.
- pagegenerators._generators.UnCategorizedTemplateGenerator(total=100, site=None)[source]#
Uncategorized template generator.
- pagegenerators._generators.UnconnectedPageGenerator(site=None, total=None)[source]#
Iterate Page objects for all unconnected pages to a Wikibase repository.
- pagegenerators._generators.UnusedFilesGenerator(total=None, site=None)[source]#
Unused files generator.
- pagegenerators._generators.UnwatchedPagesPageGenerator(total=None, site=None)[source]#
Unwatched page generator.
- pagegenerators._generators.UserContributionsGenerator(username, namespaces=None, site=None, total=None, _filter_unique=functools.partial(<function filter_unique>, key=<function <lambda>>))[source]#
Yield unique pages edited by user:username.
- Parameters:
- Return type:
Iterator[Page]
- pagegenerators._generators.WantedPagesPageGenerator(total=100, site=None)[source]#
Wanted page generator.
- pagegenerators._generators.WikibaseItemGenerator(gen)[source]#
A wrapper generator used to yield Wikibase items of another generator.
- pagegenerators._generators.WikibaseSearchItemPageGenerator(text, language=None, total=None, site=None)[source]#
Generate pages that contain the provided text.
- Parameters:
text (str) – Text to look for.
language (str | None) – Code of the language to search in. If not specified, value from pywikibot.config.data_lang is used.
total (int | None) – Maximum number of pages to retrieve in total, or None in case of no limit.
site (BaseSite | None) – Site for generator results.
- Return type:
Iterator[ItemPage]
- pagegenerators._generators.WikidataPageFromItemGenerator(gen, site)[source]#
Generate pages from site based on sitelinks of item pages.
- Parameters:
gen (Iterable[ItemPage]) – generator of
pywikibot.ItemPage
site (BaseSite) – Site for generator results.
- Return type:
Iterator[Page]
- pagegenerators._generators.WikidataSPARQLPageGenerator(query, site=None, item_name='item', endpoint=None, entity_url=None, result_type=<class 'set'>)[source]#
Generate pages that result from the given SPARQL query.
- Parameters:
query (str) – the SPARQL query string.
site (BaseSite | None) – Site for generator results.
item_name (str) – name of the item in the SPARQL query
endpoint (str | None) – SPARQL endpoint URL
entity_url (str | None) – URL prefix for any entities returned in a query.
result_type (Any) – type of the iterable in which SPARQL results are stored (default set)
- Return type:
Iterator[Page]
- pagegenerators._generators.WithoutInterwikiPageGenerator(total=None, site=None)[source]#
Page lacking interwikis generator.
- class pagegenerators._generators.XMLDumpPageGenerator(filename, start=None, namespaces=None, site=None, text_predicate=None, content=False)[source]#
Bases:
Iterator
Xml iterator that yields Page objects.
New in version 7.2: the
content
parameter- Parameters:
filename (str) – filename of XML dump
start (str | None) – skip entries below that value
namespaces (None | str | pywikibot.site.Namespace | Sequence[str | pywikibot.site.Namespace]) – namespace filter
site (pywikibot.site.BaseSite | None) – current site for the generator
text_predicate (Callable[[str], bool] | None) – a callable with entry.text as parameter and boolean as result to indicate the generator should return the page or not
content – If True, assign old page content to Page.text
- Variables:
skipping – True if start parameter is given, else False
parser – holds the xmlreader.XmlDump parse method
- pagegenerators._generators.YearPageGenerator(start=1, end=2050, site=None)[source]#
Year page generator.