Scripts package#
Scripts folder contains predefined scripts easy to use.
Scripts are only available with Pywikibot if installed in directory mode and not as site package. They can be run in command line using the pwb wrapper script:
python pwb.py <global options> <name_of_script> <options>
Every script provides a -help
option which shows all available
options, their explanation and usage examples. Global options
will be shown by -help:global
or using:
python pwb.py -help
The advantages of pwb.py wrapper script are:
check for framework and script depedencies and show a warning if a package is missing or outdated or if the Python release does not fit
check whether user config file (user-config.py) is available and ask to create it by starting the generate_user_files.py script
enable global options even if a script does not support them
start private scripts located in userscripts sub-folder
find a script even if given script name does not match a filename e.g. due to spelling mistake
- scripts.base_dir = PosixPath('/src/scripts')#
defines the entry point for pywikibot-scripts package
add_text script#
Append text to the top or bottom of a page
By default this adds the text to the bottom above the categories and interwiki.
Use the following command line parameters to specify what to add:
-text Text to append. "\n" are interpreted as newlines.
-textfile Path to a file with text to append
-summary Change summary to use
-up Append text to the top of the page rather than the bottom
-create Create the page if necessary. Note that talk pages are
created already without of this option.
-createonly Only create the page but do not edit existing ones
-always If used, the bot won't ask if it should add the specified
text
-major If used, the edit will be saved without the "minor edit" flag
-talkpage Put the text onto the talk page instead
-talk
-excepturl Skip pages with a url that matches this regular expression
-noreorder Place the text beneath the categories and interwiki
Furthermore, the following can be used to specify which pages to process…
This script supports use of pagegenerators
arguments.
Examples
Append ‘hello world’ to the bottom of the sandbox:
python pwb.py add_text -page:Wikipedia:Sandbox \
-summary:"Bot: pywikibot practice" -text:"hello world"
Add a template to the top of the pages with ‘category:catname’:
python pwb.py add_text -cat:catname -summary:"Bot: Adding a template" \
-text:"{{Something}}" -except:"\{\{([Tt]emplate:|)[Ss]omething" -up
Command used on it.wikipedia to put the template in the page without any category:
python pwb.py add_text -except:"\{\{([Tt]emplate:|)[Cc]ategorizzare" \
-text:"{{Categorizzare}}" -excepturl:"class='catlinks'>" -uncat \
-summary:"Bot: Aggiungo template Categorizzare"
- class scripts.add_text.AddTextBot(**kwargs)[source]#
Bases:
AutomaticTWSummaryBot
,ExistingPageBot
A bot which adds a text to a page.
- Parameters:
kwargs (Any) – bot options
- Keyword Arguments:
generator – a
generator
processed byrun()
method
- summary_key: str | None = 'add_text-adding'#
Must be defined in subclasses.
- property summary_parameters#
Return a dictionary of all parameters for i18n.
Line breaks are replaced by dash.
- update_options: dict[str, Any] = {'always': False, 'create': False, 'createonly': False, 'minor': True, 'regex_skip_url': '', 'reorder': True, 'summary': '', 'talk_page': False, 'text': '', 'textfile': '', 'up': False}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
- use_redirects: bool | None = False#
Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:
class MyRedirectBot(ExistingPageBot): '''Bot who only works on existing redirects.''' use_redirects = True
Added in version 7.2.
- scripts.add_text.main(*argv)[source]#
Process command line arguments and invoke bot.
If args is an empty list, sys.argv is used.
- Parameters:
argv (str) – Command line arguments
- Return type:
None
- scripts.add_text.parse(argv, generator_factory)[source]#
Parses our arguments and provide a dictionary with their values.
- Parameters:
argv (Sequence[str]) – input arguments to be parsed
generator_factory (GeneratorFactory) – factory that will determine what pages to process
- Returns:
dictionary with our parsed arguments
- Raises:
ValueError – if we receive invalid arguments
- Return type:
dict[str, bool | str]
archivebot script#
archivebot.py - discussion page archiving bot
usage:
python pwb.py archivebot [OPTIONS] [TEMPLATE_PAGE]
Several TEMPLATE_PAGE templates can be given at once. Default is
User:MiszaBot/config
. Bot examines backlinks (Special:WhatLinksHere)
to all TEMPLATE_PAGE templates. Then goes through all pages (unless a
specific page specified using options) and archives old discussions.
This is done by breaking a page into threads, then scanning each thread
for timestamps. Threads older than a specified threshold are then moved
to another page (the archive), which can be named either basing on the
thread’s name or then name can contain a counter which will be
incremented when the archive reaches a certain size.
Transcluded template may contain the following parameters:
{{TEMPLATE_PAGE
|archive =
|algo =
|counter =
|maxarchivesize =
|minthreadsleft =
|minthreadstoarchive =
|archiveheader =
|key =
}}
Meanings of parameters are:
archive Name of the page to which archived threads will be put.
Must be a subpage of the current page. Variables are
supported.
algo Specifies the maximum age of a thread. Must be
in the form old(<delay>) where <delay> specifies
the age in seconds (s), hours (h), days (d),
weeks (w), or years (y) like 24h or 5d. Default is
old(24h).
counter The current value of a counter which could be assigned as
variable. Will be updated by bot. Initial value is 1.
maxarchivesize The maximum archive size before incrementing the counter.
Value can be given with appending letter like K or M
which indicates KByte or MByte. Default value is 200K.
minthreadsleft Minimum number of threads that should be left on a page.
Default value is 5.
minthreadstoarchive The minimum number of threads to archive at once. Default
value is 2.
archiveheader Content that will be put on new archive pages as the
header. This parameter supports the use of variables.
Default value is {{talkarchive}}
key A secret key that (if valid) allows archives not to be
subpages of the page being archived.
Variables below can be used in the value for “archive” in the template above; numbers are latin digits:
%(counter)d the current value of the counter
%(year)d year of the thread being archived
%(isoyear)d ISO year of the thread being archived
%(isoweek)d ISO week number of the thread being archived
%(semester)d semester term of the year of the thread being archived
%(quarter)d quarter of the year of the thread being archived
%(month)d month (as a number 1-12) of the thread being archived
%(monthname)s localized name of the month above
%(monthnameshort)s first three letters of the name above
%(week)d week number of the thread being archived
Alternatively you may use localized digits. This is only available for a
few site languages. Refer NON_LATIN_DIGITS
whether
there is a localized one:
%(localcounter)s the current value of the counter
%(localyear)s year of the thread being archived
%(localisoyear)s ISO year of the thread being archived
%(localisoweek)s ISO week number of the thread being archived
%(localsemester)s semester term of the year of the thread being archived
%(localquarter)s quarter of the year of the thread being archived
%(localmonth)s month (as a number 1-12) of the thread being archived
%(localweek)s week number of the thread being archived
The ISO calendar starts with the Monday of the week which has at least four days in the new Gregorian calendar. If January 1st is between Monday and Thursday (including), the first week of that year started the Monday of that week, which is in the year before if January 1st is not a Monday. If it’s between Friday or Sunday (including) the following week is then the first week of the year. So up to three days are still counted as the year before.
See also
Python datetime.date.isocalendar, https://webspace.science.uu.nl/~gent0113/calendar/isocalendar.htm
Options (may be omitted):
-help show this help message and exit
-calc:PAGE calculate key for PAGE and exit
-file:FILE load list of pages from FILE
-force override security options
-locale:LOCALE switch to locale LOCALE
-namespace:NS only archive pages from a given namespace
-page:PAGE archive a single PAGE, default ns is a user talk page
-salt:SALT specify salt
-keep Preserve thread order in archive even if threads are
archived later
-sort Sort archive by timestamp; should not be used with -keep
-async Run the bot in parallel tasks.
Changed in version 7.6: Localized variables for “archive” template parameter are supported.
User:MiszaBot/config
is the default template. -keep
option was
added.
Changed in version 7.7: -sort
and -async
options were added.
Changed in version 8.2: KeyboardInterrupt was enabled with -async
option.
- exception scripts.archivebot.ArchiveBotSiteConfigError(arg)[source]#
Bases:
Error
There is an error originated by archivebot’s on-site configuration.
- Parameters:
arg (Exception | str)
- Return type:
None
- exception scripts.archivebot.ArchiveSecurityError(arg)[source]#
Bases:
ArchiveBotSiteConfigError
Page title is not a valid archive of page being archived.
The page title is neither a subpage of the page being archived, nor does it match the key specified in the archive configuration template.
- Parameters:
arg (Exception | str)
- Return type:
None
- class scripts.archivebot.DiscussionPage(source, archiver, params=None, keep=False)[source]#
Bases:
Page
A class that represents a single page of discussion threads.
Feed threads to it and run an update() afterwards.
- feed_thread(thread, max_archive_size)[source]#
Append a new thread to the archive.
- Parameters:
thread (DiscussionThread)
max_archive_size (tuple[int, str])
- Return type:
bool
- is_full(max_archive_size)[source]#
Check whether archive size exceeded.
- Parameters:
max_archive_size (tuple[int, str])
- Return type:
bool
- load_page()[source]#
Load the page to be archived and break it up into threads.
Changed in version 7.6: If
-keep
option is given run through all threads and set the current timestamp to the previous if the current is lower.Changed in version 7.7: Load unsigned threads using timestamp of the next thread.
- Return type:
None
- static max(ts1, ts2)[source]#
Calculate the maximum of two timestamps but allow None as value.
Added in version 7.6.
- class scripts.archivebot.DiscussionThread(title, timestripper)[source]#
Bases:
object
An object representing a discussion thread on a page.
It represents something that is of the form:
== Title of thread == Thread content here. ~~~~ :Reply, etc. ~~~~
- Parameters:
title (str)
timestripper (TimeStripper)
- feed_line(line)[source]#
Add a line to the content and find the newest timestamp.
- Parameters:
line (str)
- Return type:
None
- exception scripts.archivebot.MalformedConfigError(arg)[source]#
Bases:
ArchiveBotSiteConfigError
There is an error in the configuration template.
- Parameters:
arg (Exception | str)
- Return type:
None
- exception scripts.archivebot.MissingConfigError(arg)[source]#
Bases:
ArchiveBotSiteConfigError
The config is missing in the header.
It’s in one of the threads or transcluded from another page.
- Parameters:
arg (Exception | str)
- Return type:
None
- class scripts.archivebot.PageArchiver(page, template, salt, force=False, keep=False, sort=False)[source]#
Bases:
object
A class that encapsulates all archiving methods.
- Parameters:
page (
pywikibot.Page
) – a page object to be archivedtemplate (
pywikibot.Page
) – a template with configuration settingssalt (str) – salt value
force (bool) – override security value
keep (bool)
sort (bool)
- algo = 'none'#
- get_archive_page(title, params=None)[source]#
Return the page for archiving.
If it doesn’t exist yet, create and cache it. Also check for security violations.
- Parameters:
title (str)
- Return type:
- get_params(timestamp, counter)[source]#
Make params for archiving template.
- Parameters:
counter (int)
- Return type:
dict
- preload_pages(counter, thread, pattern)[source]#
Preload pages if counter matters.
- Parameters:
counter (int)
- Return type:
None
- set_attr(attr, value, out=True)[source]#
Set an archiver attribute.
- Parameters:
out (bool)
- Return type:
None
- should_archive_thread(thread)[source]#
Check whether a thread has to be archived.
- Returns:
the archivation reason as a tuple of localization args
- Parameters:
thread (DiscussionThread)
- Return type:
tuple[str, str] | None
- scripts.archivebot.calc_md5_hexdigest(txt, salt)[source]#
Return md5 hexdigest computed from text and salt.
- Return type:
str
- scripts.archivebot.main(*args)[source]#
Process command line arguments and invoke bot.
If args is an empty list, sys.argv is used.
- Parameters:
args (str) – command line arguments
- Return type:
None
- scripts.archivebot.process_page(page, *args)[source]#
Call PageArchiver for a single page.
- Returns:
Return True to continue with the next page, False to break the loop.
- Parameters:
args (Any)
- Return type:
bool
Added in version 7.6.
Changed in version 7.7: pass an unspecified number of arguments to the bot using
*args
- scripts.archivebot.show_md5_key(calc, salt, site)[source]#
Show calculated MD5 hexdigest.
- Return type:
bool
- scripts.archivebot.str2localized_duration(site, string)[source]#
Localise a shorthand duration.
Translates a duration written in the shorthand notation (ex. “24h”, “7d”) into an expression in the local wiki language (“24 hours”, “7 days”).
- Parameters:
string (str)
- Return type:
str
- scripts.archivebot.str2size(string)[source]#
Return a size for a shorthand size.
Accepts a string defining a size:
1337 - 1337 bytes 150K - 150 kilobytes 2M - 2 megabytes
- Returns:
a tuple
(size, unit)
, wheresize
is an integer and unit is'B'
(bytes) or'T'
(threads).- Parameters:
string (str)
- Return type:
tuple[int, str]
- scripts.archivebot.template_title_regex(tpl_page)[source]#
Return a regex that matches to variations of the template title.
It supports the transcluding variant as well as localized namespaces and case-insensitivity depending on the namespace.
- Parameters:
tpl_page (pywikibot.page.Page) – The template page
- Return type:
Pattern
basic script#
An incomplete sample script
This is not a complete bot; rather, it is a template from which simple bots can be made. You can rename it to mybot.py, then edit it in whatever way you want.
Use global -simulate option for test purposes. No changes to live wiki will be done.
The following parameters are supported:
-always The bot won't ask for confirmation when putting a page
-text: Use this text to be added; otherwise 'Test' is used
-replace: Don't add text but replace it
-top Place additional text on top of the page
-summary: Set the action summary message for the edit.
This sample script is a
ConfigParserBot
. All settings can be
made either by giving option with the command line or with a settings file
which is scripts.ini by default. If you don’t want the default values you can
add any option you want to change to that settings file below the [basic]
section like:
[basic] ; inline comments starts with colon
# This is a commend line. Assignments may be done with '=' or ':'
text: A text with line break and
continuing on next line to be put
replace: yes ; yes/no, on/off, true/false and 1/0 is also valid
summary = Bot: My first test edit with pywikibot
Every script has its own section with the script name as header.
In addition the following generators and filters are supported but cannot be set by settings file:
This script supports use of pagegenerators
arguments.
- class scripts.basic.BasicBot(site=True, **kwargs)[source]#
Bases:
SingleSiteBot
,ConfigParserBot
,ExistingPageBot
,AutomaticTWSummaryBot
An incomplete sample bot.
- Variables:
summary_key – Edit summary message key. The message that should be used is placed on /i18n subdirectory. The file containing these messages should have the same name as the caller script (i.e. basic.py in this case). Use summary_key to set a default edit summary message.
- Parameters:
site (BaseSite | bool | None)
kwargs (Any)
Create a SingleSiteBot instance.
- Parameters:
site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.
kwargs (Any)
- summary_key: str | None = 'basic-changing'#
Must be defined in subclasses.
- update_options: dict[str, Any] = {'replace': False, 'summary': None, 'text': 'Test', 'top': False}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
- use_redirects: bool | None = False#
Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:
class MyRedirectBot(ExistingPageBot): '''Bot who only works on existing redirects.''' use_redirects = True
Added in version 7.2.
blockpageschecker script#
A bot to remove stale protection templates from pages that are not protected
Very often sysops block the pages for a set time but then they forget to remove the warning! This script is useful if you want to remove those useless warning left in these pages.
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
-protectedpages Check all the blocked pages; useful when you have not
categories or when you have problems with them. (add the
namespace after ":" where you want to check - default checks
all protected pages.)
-moveprotected Same as -protectedpages, for moveprotected pages
This script is a ConfigParserBot
.
The following options can be set within a settings file which is scripts.ini
by default::
-always Doesn't ask every time whether the bot should make the change.
Do it always.
-show When the bot can't delete the template from the page (wrong
regex or something like that) it will ask you if it should
show the page on your browser.
(attention: pages included may give false positives!)
-move The bot will check if the page is blocked also for the move
option, not only for edit
Examples::
python pwb.py blockpageschecker -always
python pwb.py blockpageschecker -cat:Geography -always
python pwb.py blockpageschecker -show -protectedpages:4
- class scripts.blockpageschecker.CheckerBot(site=True, **kwargs)[source]#
Bases:
ConfigParserBot
,ExistingPageBot
,SingleSiteBot
Bot to remove stale protection templates from unprotected pages.
Changed in version 7.0: CheckerBot is a ConfigParserBot
Create a SingleSiteBot instance.
- Parameters:
site (BaseSite | bool | None) – If True it’ll be set to the configured site using pywikibot.Site.
kwargs (Any)
- update_options: dict[str, Any] = {'move': False, 'show': False}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
category script#
Script to manage categories
Syntax:
python pwb.py category action [-option]
where action can be one of these
- add
mass-add a category to a list of pages.
- remove
remove category tag from all pages in a category. If a pagegenerators option is given, the intersection with category pages is processed.
- move
move all pages in a category to another category. If a pagegenerators option is given, the intersection with category pages is processed.
- tidy
tidy up a category by moving its pages into subcategories.
- tree
show a tree of subcategories of a given category.
- listify
make a list of all of the articles that are in a category.
- clean
Removes redundant grandchildren from specified category by removing direct link to grandparent. In another words a grandchildren should not be also a children.
and option can be one of these
Options for “add” action:
-person - Sort persons by their last name.
-create - If a page doesn't exist, do not skip it, create it instead.
-redirect - Follow redirects.
Options for “listify” action:
-append - This appends the list to the current page that is already
existing (appending to the bottom by default).
-overwrite - This overwrites the current page with the list even if
something is already there.
-showimages - This displays images rather than linking them in the list.
-talkpages - This outputs the links to talk pages of the pages to be
listified in addition to the pages themselves.
-prefix:# - You may specify a list prefix like "#" for a numbered list or
any other prefix. Default is a bullet list with prefix "*".
Options for “remove” action:
-nodelsum - This specifies not to use the custom edit summary as the
deletion reason. Instead, it uses the default deletion reason
for the language, which is "Category was disbanded" in
English.
Options for “move” action:
-hist - Creates a nice wikitable on the talk page of target category
that contains detailed page history of the source category.
-nodelete - Don't delete the old category after move.
-nowb - Don't update the Wikibase repository.
-allowsplit - If that option is not set, it only moves the talk and main
page together.
-mvtogether - Only move the pages/subcategories of a category, if the
target page (and talk page, if -allowsplit is not set)
doesn't exist.
-keepsortkey - Use sortKey of the old category also for the new category.
If not specified, sortKey is removed.
An alternative method to keep sortKey is to use -inplace
option.
Options for “listify” and “tidy” actions:
-namespaces Filter the arcitles in the specified namespaces. Separate
-namespace multiple namespace numbers or names with commas. Examples::
-ns -ns:0,2,4
-ns:Help,MediaWiki
Options for “clean” action:
-always
Options for several actions:
-rebuild - Reset the database.
-from: - The category to move from (for the move option)
Also, the category to remove from in the remove option
Also, the category to make a list of in the listify option.
-to: - The category to move to (for the move option).
- Also, the name of the list to make in the listify option.
-batch - Don't prompt to delete emptied categories (do it
automatically).
-summary: - Pick a custom edit summary for the bot.
-inplace - Use this flag to change categories in place rather than
rearranging them.
-recurse[:<depth>]
- Recurse through subcategories of the category to
optional depth.
-pagesonly - While removing pages from a category, keep the subpage links
and do not remove them.
-match - Only work on pages whose titles match the given regex (for
move and remove actions).
-depth: - The max depth limit beyond which no subcategories will be
listed.
Note
If the category names have spaces in them you may need to use
a special syntax in your shell so that the names aren’t treated as
separate parameters. For instance, in BASH, use single quotes, e.g.
-from:'Polar bears'
.
If action is “add”, “move” or “remove, the following additional options are supported:
This script supports use of pagegenerators
arguments.
For the actions tidy and tree, the bot will store the category structure locally in category.dump. This saves time and server load, but if it uses these data later, they may be outdated; use the -rebuild parameter in this case.
For example, to create a new category from a list of persons, type:
python pwb.py category add -person
and follow the on-screen instructions.
Or to do it all from the command-line, use the following syntax:
python pwb.py category move -from:US -to:"United States"
This will move all pages in the category US to the category United States.
A pagegenerators option can be given with move
and remove
action:
pwb category -site:wikipedia:en remove -from:Hydraulics -cat:Pneumatics
The sample above would remove ‘Hydraulics’ category from all pages which are also in ‘Pneumatics’ category.
Changed in version 8.0: pagegenerators
are supported with “move” and “remove” action.
- class scripts.category.CategoryAddBot(generator, newcat=None, sort_by_last_name=False, create=False, comment='', follow_redirects=False)[source]#
Bases:
CategoryPreprocess
A robot to mass-add a category to a list of pages.
- Parameters:
sort_by_last_name (bool)
create (bool)
comment (str)
follow_redirects (bool)
- static sorted_by_last_name(catlink, pagelink)[source]#
Return a Category with key that sorts persons by their last name.
- Parameters: catlink - The Category to be linked.
pagelink - the Page to be placed in the category.
Trailing words in brackets will be removed. Example: If category_name is ‘Author’ and pl is a Page to [[Alexandre Dumas (senior)]], this function will return this Category: [[Category:Author|Dumas, Alexandre]].
- Return type:
- class scripts.category.CategoryDatabase(rebuild=False, filename='category.dump.bz2')[source]#
Bases:
object
Temporary database saving pages and subcategories for each category.
This prevents loading the category pages over and over again.
- Parameters:
rebuild (bool)
filename (str)
- dump(filename=None)[source]#
Save the dictionaries to disk if not empty.
Pickle the contents of the dictionaries superclass_db and cat_content_db if at least one is not empty. If both are empty, removes the file from the disk.
If the filename is None, it’ll use the filename determined in __init__.
- Return type:
None
- get_articles(cat)[source]#
Return the list of pages for a given category.
Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.
- Return type:
set[Page]
- get_subcats(supercat)[source]#
Return the list of subcategories for a given supercategory.
Saves this list in a temporary database so that it won’t be loaded from the server next time it’s required.
- Return type:
set[Category]
- get_supercats(subcat)[source]#
Return the supercategory (or a set of) for a given subcategory.
- Return type:
set[Category]
- property is_loaded: bool#
Return whether the contents have been loaded.
- class scripts.category.CategoryListifyRobot(cat_title, list_title, edit_summary, append=False, overwrite=False, show_images=False, *, talk_pages=False, recurse=False, namespaces=None, **kwargs)[source]#
Bases:
object
Create a list containing all of the members in a category.
- Parameters:
cat_title (str | None)
list_title (str | None)
edit_summary (str)
append (bool)
overwrite (bool)
show_images (bool)
talk_pages (bool)
recurse (int | bool)
- class scripts.category.CategoryMoveRobot(oldcat, newcat=None, batch=False, comment='', inplace=False, move_oldcat=True, delete_oldcat=True, title_regex=None, history=False, pagesonly=False, deletion_comment=0, move_comment=None, wikibase=True, allow_split=False, move_together=False, keep_sortkey=None, generator=None)[source]#
Bases:
CategoryPreprocess
Change or remove the category from the pages.
If the new category is given changes the category from the old to the new one. Otherwise remove the category from the page and the category if it’s empty.
Per default the operation applies to pages and subcategories.
Added in version 8.0: The
generator
parameter.Store all given parameters in the objects attributes.
- Parameters:
oldcat – The move source.
newcat – The move target.
batch (bool) – If True the user has not to confirm the deletion.
comment (str) – The edit summary for all pages where the category is changed, and also for moves and deletions if not overridden.
inplace (bool) – If True the categories are not reordered.
move_oldcat (bool) – If True the category page (and talkpage) is copied to the new category.
delete_oldcat (bool) – If True the oldcat page and talkpage are deleted (or nominated for deletion) if it is empty.
title_regex – Only pages (and subcats) with a title that matches the regex are moved.
history (bool) – If True the history of the oldcat is posted on the talkpage of newcat.
pagesonly (bool) – If True only move pages, not subcategories.
deletion_comment (int | str) – Either string or special value: DELETION_COMMENT_AUTOMATIC: use a generated message, DELETION_COMMENT_SAME_AS_EDIT_COMMENT: use the same message for delete that is used for the edit summary of the pages whose category was changed (see the comment param above). If the value is not recognized, it’s interpreted as DELETION_COMMENT_AUTOMATIC.
move_comment – If set, uses this as the edit summary on the actual move of the category page. Otherwise, defaults to the value of the comment parameter.
wikibase (bool) – If True, update the Wikibase item of the old category.
allow_split (bool) – If False only moves page and talk page together.
move_together (bool) – If True moves the pages/subcategories only if page and talk page could be moved or both source page and target page don’t exist.
generator – a generator from pagegenerators.GeneratorFactory. If given an intersection to the oldcat category members is used.
- DELETION_COMMENT_AUTOMATIC = 0#
- DELETION_COMMENT_SAME_AS_EDIT_COMMENT = 1#
- static check_move(name, old_page, new_page)[source]#
Return if the old page can be safely moved to the new page.
- Parameters:
name (str) – Title of the new page
old_page (pywikibot.page.BasePage) – Page to be moved
new_page (pywikibot.page.BasePage) – Page to be moved to
- Returns:
True if possible to move page, False if not page move not possible
- Return type:
bool
- run()[source]#
The main bot function that does all the work.
For readability it is split into several helper functions: - _movecat() - _movetalk() - _hist() - _change() - _delete()
Changed in version 8.0: if a page generator is given to the bot, the intersection with
pagegenerators.CategorizedPageGenerator()
orpagegenerators.SubCategoriesPageGenerator()
is used.- Return type:
None
- class scripts.category.CategoryPreprocess(follow_redirects=False, edit_redirects=False, create=False, **kwargs)[source]#
Bases:
BaseBot
A class to prepare a list of pages for robots.
- Parameters:
follow_redirects (bool)
edit_redirects (bool)
create (bool)
- class scripts.category.CategoryTidyRobot(cat_title, cat_db, namespaces=None, comment=None)[source]#
Bases:
Bot
,CategoryPreprocess
Robot to move members of a category into sub- or super-categories.
Specify the category title on the command line. The robot will pick up the page, look for all sub- and super-categories, and show them listed as possibilities to move page into with an assigned number. It will ask you to type number of the appropriate replacement, and performs the change robotically. It will then automatically loop over all pages in the category.
If you don’t want to move the member to a sub- or super-category, but to another category, you can use the ‘j’ (jump) command.
By typing ‘s’ you can leave the complete page unchanged.
By typing ‘m’ you can show more content of the current page, helping you to find out what the page is about and in which other categories it currently is.
- Parameters:
cat_title (str | None) – a title of the category to process.
cat_db (CategoryDatabase object) – a CategoryDatabase object.
namespaces (iterable of pywikibot.Namespace) – namespaces to focus on.
comment (str | None) – a custom summary for edits.
- move_to_category(member, original_cat, current_cat)[source]#
Ask whether to move it to one of the sub- or super-categories.
Given a page in the original_cat category, ask the user whether to move it to one of original_cat’s sub- or super-categories. Recursively run through subcategories’ subcategories.
Note
current_cat is only used for internal recursion. You should always use
current_cat = original_cat
.
- class scripts.category.CategoryTreeRobot(cat_title, cat_db, filename=None, max_depth=10)[source]#
Bases:
object
Robot to create tree overviews of the category structure.
- Parameters:
root. (* cat_title - The category which will be the tree's)
object. (* cat_db - A CategoryDatabase)
listed. (* max_depth - The limit beyond which no subcategories will be) – This also guarantees that loops in the category structure won’t be a problem.
print (* filename - The textfile where the tree should be saved; None to) – the tree to stdout.
max_depth (int)
- run()[source]#
Handle the multi-line string generated by treeview.
After string was generated by treeview it is either printed to the console or saved it to a file.
- Return type:
None
- treeview(cat, current_depth=0, parent=None)[source]#
Return a tree view of all subcategories of cat.
The multi-line string contains a tree view of all subcategories of cat, up to level max_depth. Recursively calls itself.
- Parameters:
opening. (* cat - the Category of the node we're currently)
tree (* current_depth - the current level in the)
from. (* parent - the Category of the category we're coming)
current_depth (int)
- Return type:
str
- class scripts.category.CleanBot(**kwargs)[source]#
Bases:
Bot
Automatically cleans up specified category.
Removes redundant grandchildren from specified category by removing direct link to grandparent.
In another words a grandchildren should not be also a children.
Stubs categories are exception.
Added in version 7.0.
- update_options: dict[str, Any] = {'recurse': False}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
category_graph script#
Visualizes category hierarchy
Generates graphical representation in formats dot, svg and html5 of category hierarchy.
Usage:
pwb.py category_graph [-style STYLE] [-depth DEPTH] [-from FROM] [-to TO]
actions:
-from [FROM] Category name to scan, default is main category, "?" to ask.
optional arguments:
-to TO base file name to save, "?" to ask
-style STYLE graphviz style definitions in dot format (see below)
-depth DEPTH maximal hierarchy depth. 2 by default
-downsize K font size divider for subcategories. 4 by default
Use 1 for the same font size
See also
https://graphviz.org/doc/info/attrs.html for graphviz style definitions.
Example
Visualizes main category:
pwb.py -v category_graph -from
Extended example with style settings:
pwb.py category_graph -from Life -downsize 1.5 \
-style 'graph[rankdir=BT ranksep=0.5] node[shape=circle style=filled \
fillcolor=green] edge[style=dashed penwidth=3]'
Added in version 8.0.
- class scripts.category_graph.CategoryGraphBot(args)[source]#
Bases:
SingleSiteBot
Bot to create graph of the category structure.
- Parameters:
args (argparse.Namespace)
category_redirect script#
This bot will move pages out of redirected categories
The bot will look for categories that are marked with a category
redirect template, take the first parameter of the template as the
target of the redirect, and move all pages and subcategories of the
category there. It also changes hard redirects into soft redirects, and
fixes double redirects. A log is written under
<userpage>/category_redirect_log
. A log is written under
<userpage>/category_edit_requests
if a page cannot be moved to be
done manually. Only category pages that haven’t been edited for a
certain cooldown period (default 7 days) are taken into account.
The following parameters are supported:
-always If used, the bot won't ask if it should add the specified
text
-delay:# Set an amount of days. If the category is edited more
recently than given days, ignore it. Default is 7.
-tiny Only loops over Category:Non-empty_category_redirects and
moves all images, pages and categories in redirect categories
to the target category.
-category:<cat> Category to be used with this script. If not given
either wikibase entries Q4616723 or Q8099903 are used.
Usage:
python pwb.py category_redirect [options]
Note
This script is a
ConfigParserBot
. All options
can be set within a settings file which is scripts.ini by default.
- class scripts.category_redirect.CategoryRedirectBot(**kwargs)[source]#
Bases:
ConfigParserBot
,SingleSiteBot
,AutomaticTWSummaryBot
Page category update bot.
Changed in version 7.0: CategoryRedirectBot is a ConfigParserBot
Changed in version 9.0: A logentry is writen to <userpage>/category_edit_requests if a page cannot be moved
- check_hard_redirect()[source]#
Check for hard-redirected categories.
Check categories that are not already marked with an appropriate softredirect template and replace the content with a redirect template.
- Return type:
None
- move_contents(old_cat_title, new_cat_title, edit_summary)[source]#
The worker function that moves pages out of oldCat into newCat.
- Parameters:
old_cat_title (str)
new_cat_title (str)
edit_summary (str)
- Return type:
tuple[int, int]
- update_options: dict[str, Any] = {'category': '', 'delay': 7, 'tiny': False}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
change_pagelang script#
This script changes the content language of pages
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
-setlang What language the pages should be set to
-always If a language is already set for a page, always change
it to the one set in -setlang.
-never If a language is already set for a page, never change
it to the one set in -setlang (keep the current
language).
Note
This script is a
ConfigParserBot
. All options can be set
within a settings file which is scripts.ini by default.
Added in version 5.1.
- class scripts.change_pagelang.ChangeLangBot(**kwargs)[source]#
Bases:
ConfigParserBot
,SingleSiteBot
Change page language bot.
Changed in version 7.0: ChangeLangBot is a ConfigParserBot
- changelang(page)[source]#
Set page language.
- Parameters:
page (pywikibot.page.BasePage) – The page to update and save
- Return type:
None
- treat(page)[source]#
Treat a page.
- Parameters:
page (pywikibot.page.BasePage) – The page to treat
- Return type:
None
- update_options: dict[str, Any] = {'never': False, 'setlang': ''}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
checkimages script#
Script to check recently uploaded files
This script checks if a file description is present and if there are other problems in the image’s description.
This script will have to be configured for each language. Please submit translations as addition to the Pywikibot framework.
Everything that needs customisation is indicated by comments.
This script understands the following command-line arguments:
-limit The number of images to check (default: 80)
-commons The bot will check if an image on Commons has the same name
and if true it reports the image.
-duplicates[:#] Checking if the image has duplicates (if arg, set how many
rollback wait before reporting the image in the report
instead of tag the image) default: 1 rollback.
-duplicatesreport Report the duplicates in a log *AND* put the template in
the images.
-maxusernotify Maximum notifications added to a user talk page in a single
check, to avoid email spamming.
-sendemail Send an email after tagging.
-break To break the bot after the first check (default: recursive)
-sleep[:#] Time in seconds between repeat runs (default: 30)
-wait[:#] Wait x second before check the images (default: 0)
-skip[:#] The bot skip the first [:#] images (default: 0)
-start[:#] Use allimages() as generator
(it starts already from File:[:#])
-cat[:#] Use a category as generator
-regex[:#] Use regex, must be used with -url or -page
-page[:#] Define the name of the wikipage where are the images
-url[:#] Define the url where are the images
-nologerror If given, this option will disable the error that is risen
when the log is full.
Instructions for the real-time settings. For every new block you have to add:
<------- ------->
In this way the bot can understand where the block starts in order to take the right parameter.
Name= Set the name of the block
Find= search this text in the image’s description
Findonly= search for exactly this text in the image’s description
- Summary= That’s the summary that the bot will use when it will notify the
problem.
Head= That’s the incipit that the bot will use for the message.
- Text= This is the template that the bot will use when it will report the
image’s problem.
Changed in version 8.4: Welcome messages are imported from scripts.welcome
script.
- scripts.checkimages.CATEGORIES_WITH_LICENSES = ('Q4481876', 'Q7451504')#
Category items with the licenses; subcategories may contain other licenses.
Changed in version 7.2: uses wikibase items instead of category titles.
- class scripts.checkimages.CheckImagesBot(site, log_full_number=25000, sendemail_active=False, duplicates_report=False, log_full_error=True, max_user_notify=None)[source]#
Bases:
object
A robot to check recently uploaded files.
Initializer, define some instance variables.
- Parameters:
log_full_number (int)
sendemail_active (bool)
duplicates_report (bool)
log_full_error (bool)
- check_image_duplicated(duplicates_rollback)[source]#
Function to check the duplicated files.
- Return type:
bool
- find_additional_problems()[source]#
Extract additional settings from configuration page.
- Return type:
None
- ignore_server_errors = False#
- static important_image(list_given)[source]#
Get tuples of image and time, return the most used or oldest image.
Changed in version 7.2: itertools.zip_longest is used to stop
using_pages
as soon as possible.
Function to load the white templates.
- Return type:
None
- load_licenses()[source]#
Load the list of the licenses.
Changed in version 7.2: return a set instead of a list for quicker lookup.
- Return type:
set[Page]
- mini_template_check(template)[source]#
Check if template is in allowed licenses or in licenses to skip.
- Return type:
bool
- put_mex_in_talk()[source]#
Function to put the warning in talk page of the uploader.
When the bot find that the usertalk is empty it adds the welcome message first. The messages are imported from welcome.py script.
- Return type:
None
- regex_generator(regexp, textrun)[source]#
Find page to yield using regex to parse text.
- Return type:
Generator[FilePage]
- report(newtext, image_to_report, notification=None, head=None, notification2=None, unver=True, comm_talk=None, comm_image=None)[source]#
Function to make the reports easier.
- Parameters:
unver (bool)
- Return type:
None
- report_image(image_to_report, rep_page=None, com=None, rep_text=None, addings=True)[source]#
Report the files to the report page when needed.
- Parameters:
addings (bool)
- Return type:
bool
- skip_images(skip_number, limit)[source]#
Given a number of files, skip the first -number- files.
- Return type:
bool
- smart_detection()[source]#
Detect templates.
The bot instead of checking if there’s a simple template in the image’s description, checks also if that template is a license or something else. In this sense this type of check is smart.
- Return type:
tuple[str, bool]
- tag_image(put=True)[source]#
Add template to the Image page and find out the uploader.
- Parameters:
put (bool)
- Return type:
bool
- template_in_list()[source]#
Check if template is in list.
The problem is the calls to the MediaWiki system because they can be pretty slow. While searching in a list of objects is really fast, so first of all let’s see if we can find something in the info that we already have, then make a deeper check.
- Return type:
None
- exception scripts.checkimages.LogIsFull(arg)[source]#
Bases:
Error
Log is full and the bot cannot add other data to prevent Errors.
- Parameters:
arg (Exception | str)
- Return type:
None
claimit script#
A script that adds claims to Wikidata items based on a list of pages
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Usage:
python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456
You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.
For geographic coordinates:
python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]
[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.
Example
python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001
By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:
python pwb.py claimit [pagegenerators] P246 "string example" -exists:p
Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.
For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:
python pwb.py claimit [pagegenerators] P246 "string example" -exists:ptq
Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.
- class scripts.claimit.ClaimRobot(claims, exists_arg='', **kwargs)[source]#
Bases:
WikidataBot
A bot to add Wikidata claims.
- Parameters:
claims (list) – A list of wikidata claims
exists_arg (str) – String specifying how to handle duplicate claims
- treat_page_and_item(page, item)[source]#
Treat each page.
- Parameters:
page (pywikibot.page.BasePage) – The page to update and change
item (pywikibot.page.ItemPage) – The item to treat
- Return type:
None
- use_from_page = None#
clean_sandbox script#
This bot resets a (user) sandbox with predefined text
This script understands the following command-line arguments:
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
-hours:# Use this parameter if to make the script repeat itself
after # hours. Hours can be defined as a decimal. 0.01
hours are 36 seconds; 0.1 are 6 minutes.
-delay:# Use this parameter for a wait time after the last edit
was made. If no parameter is given it takes it from
hours and limits it between 5 and 15 minutes.
The minimum delay time is 5 minutes.
-text The text that substitutes in the sandbox, you can use this
when you haven't configured clean_sandbox for your wiki.
-textfile As an alternative to -text, you can use this to provide
a file containing the text to be used.
-summary Summary of the edit made by the bot. Overrides the default
from i18n.
This script is a ConfigParserBot
.
All local parameters can be given inside a scripts.ini file. Options
passed to the script are priorized over options read from ini file.
See also
For example:
[clean_sandbox]
# the parameter section for clean_sandbox script
summary = Bot: Cleaning sandbox
text = {{subst:Clean Sandbox}}
hours: 0.5
delay: 7
- class scripts.clean_sandbox.SandboxBot(**kwargs)[source]#
Bases:
Bot
,ConfigParserBot
Sandbox reset bot.
- available_options: dict[str, Any] = {'delay': -1, 'hours': -1.0, 'summary': '', 'text': ''}#
Handler configuration attribute. Only the keys of the dict can be passed as
__init__
options. The values are the default values. Overwrite this in subclasses!
commons_information script#
This bot adds a language template to the file’s description field
The Information
template is commonly used to provide formatting to
the basic information for files (description, source, author, etc.). The
description
field should provide brief but complete information
about the image. The description format should use Language templates
like {{En}}
or {{De}}
to specify the language of the description.
This script adds these langage templates if missing. For example the
description of
{{Information
| Description = A simplified icon for [[Pywikibot]]
| Date = 2003-06-14
| Other fields =
}}
will be analyzed as en
language by ~100 % accurancy and the bot
replaces its content by
{{Information
| Description = {{en|A simplified icon for [[Pywikibot]]}}
| Date = 2003-06-14
| Other fields =
}}
Note
langdetect
package is needed for fully support of language
detection. Install it with::
pip install langdetect
This script understands the following command-line arguments:
This script supports use of pagegenerators
arguments.
Usage:
python pwb.py commons_information [pagegenerators]
You can use any typical pagegenerator (like categories) to provide with
a list of pages. If no pagegenerator is given, transcluded pages from
Information
template are used.
Hint
This script uses commons
site as default. For other sites
use the global -site
option.
Example for going through all files:
python pwb.py commons_information -start:File:!
Added in version 6.0.
Changed in version 9.2: accelerate script with preloading pages; use commons
as default
site; use transcluded pages of Information
template.
- class scripts.commons_information.InformationBot(**kwargs)[source]#
Bases:
SingleSiteBot
,ExistingPageBot
Bot for the Information template.
Initialzer.
- comment = {'en': 'Bot: wrap the description parameter of Information in the appropriate language template'}#
- desc_params = ('Description', 'description')#
- lang_tmp_cat = 'Language templates'#
- process_desc_other(wikicode, nodes)[source]#
Process other description text.
The description text may consist of different Node types except of Template which is handled by
process_desc_template()
. Combine all nodes and replace the last with new created Template while removing the remaining from wikicode.Added in version 9.2.
- Parameters:
wikicode (Wikicode) – The Wikicode of the parsed page text.
nodes (list[Node]) – wikitext nodes to be processed
- Returns:
whether the description nodes were changed
- Return type:
bool
- process_desc_template(template)[source]#
Process description template.
- Parameters:
template (Template) – a mwparserfromhell Template found in the description parameter of
Information
template.- Returns:
whether the template node was changed.
- Return type:
bool
commonscat script#
With this tool you can add the template {{commonscat}} to categories
The tool works by following the interwiki links. If the template is present on another language page, the bot will use it.
You could probably use it at articles as well, but this isn’t tested.
The following parameters are supported:
-checkcurrent Work on all category pages that use the primary commonscat
template.
This script is a ConfigParserBot
.
The following options can be set within a settings file which is scripts.ini
by default::
-always Don't prompt you for each replacement. Warning message
has not to be confirmed. ATTENTION: Use this with care!
-summary:XYZ Set the action summary message for the edit to XYZ,
otherwise it uses messages from add_text.py as default.
This bot uses pagegenerators to get a list of pages. The following options are supported:
This script supports use of pagegenerators
arguments.
For example to go through all categories:
python pwb.py commonscat -start:Category:!
- class scripts.commonscat.CommonscatBot(**kwargs)[source]#
Bases:
ConfigParserBot
,ExistingPageBot
Commons categorisation bot.
Changed in version 7.0: CommonscatBot is a ConfigParserBot
- Parameters:
kwargs (Any) – bot options
- Keyword Arguments:
generator – a
generator
processed byrun()
method
- changeCommonscat(page=None, oldtemplate='', oldcat='', newtemplate='', newcat='', linktitle='')[source]#
Change the current commonscat template and target.
- Parameters:
oldtemplate (str)
oldcat (str)
newtemplate (str)
newcat (str)
linktitle (str)
- Return type:
None
- checkCommonscatLink(name='')[source]#
Return the name of a valid commons category.
If the page is a redirect this function tries to follow it. If the page doesn’t exists the function will return an empty string
- Parameters:
name (str)
- findCommonscatLink(page)[source]#
Find CommonsCat template on interwiki pages.
- Returns:
name of a valid commons category
- Return type:
str
- find_commons_category(page)[source]#
Find CommonsCat template on Wikibase repository.
Use Wikibase property to get the category if possible. Otherwise check all langlinks to find it.
- Returns:
name of a valid commons category
- Return type:
str
- static getCommonscatLink(page)[source]#
Find CommonsCat template on page.
- Return type:
tuple of (<templatename>, <target>, <linktext>, <note>)
- treat_page()[source]#
Add CommonsCat template to page.
Take a page. Go to all the interwiki page looking for a commonscat template. When all the interwiki’s links are checked and a proper category is found add it to the page.
- Return type:
None
- update_options: dict[str, Any] = {'summary': ''}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
- use_disambigs: bool | None = False#
Attribute to determine whether to use disambiguation pages. Set it to True to use disambigs only, set it to False to skip disambigs. If None both are processed.
Added in version 7.2.
- use_redirects: bool | None = False#
Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:
class MyRedirectBot(ExistingPageBot): '''Bot who only works on existing redirects.''' use_redirects = True
Added in version 7.2.
coordinate_import script#
Coordinate importing script
Usage:
python pwb.py coordinate_import -site:wikipedia:en \
-cat:Category:Coordinates_not_on_Wikidata
This will work on all pages in the category “coordinates not on Wikidata” and will import the coordinates on these pages to Wikidata.
The data from the “GeoData” extension (https://www.mediawiki.org/wiki/Extension:GeoData) is used so that extension has to be setup properly. You can look at the [[Special:Nearby]] page on your local Wiki to see if it’s populated.
You can use any typical pagegenerator to provide with a list of pages:
python pwb.py coordinate_import -lang:it -family:wikipedia -namespace:0 \
-transcludes:Infobox_stazione_ferroviaria
You can also run over a set of items on the repo without coordinates and try to import them from any connected page. To do this, you have to explicitly provide the repo as the site using -site argument.
Example
python pwb.py coordinate_import -site:wikidata:wikidata -namespace:0
-querypage:Deadendpages
The following command line parameters are supported:
-always If used, the bot won't ask if it should add the specified
text
-create Create items for pages without one.
Note
This script is a
ConfigParserBot
. All options
can be set within a settings file which is scripts.ini by default.
This script supports use of pagegenerators
arguments.
- class scripts.coordinate_import.CoordImportRobot(**kwargs)[source]#
Bases:
ConfigParserBot
,WikidataBot
A bot to import coordinates to Wikidata.
Changed in version 7.0: CoordImportRobot is a ConfigParserBot
- has_coord_qualifier(claims)[source]#
Check if self.prop is used as property for a qualifier.
- Parameters:
claims (dict) – the Wikibase claims to check in
- Returns:
the first property for which self.prop is used as qualifier, or None if any
- Return type:
str | None
- item_has_coordinates(item)[source]#
Check if the item has coordinates.
- Returns:
whether the item has coordinates
- Return type:
bool
- try_import_coordinates_from_page(page, item)[source]#
Try import coordinate from the given page to the given item.
- Returns:
whether any coordinates were found and the import was successful
- Return type:
bool
- use_from_page = None#
cosmetic_changes script#
This module can do slight modifications to tidy a wiki page’s source code
The changes are not supposed to change the look of the rendered wiki page.
The following parameters are supported:
-always Don't prompt you for each replacement. Warning (see below)
has not to be confirmed. ATTENTION: Use this with care!
-async Put page on queue to be saved to wiki asynchronously.
-summary:XYZ Set the summary message text for the edit to XYZ, bypassing
the predefined message texts with original and replacements
inserted.
-ignore: Ignores if an error occurred and either skips the page or
only that method. It can be set to:
all - dos not ignore errors
match - ignores ISBN related errors (default)
method - ignores fixing method errors
page - ignores page related errors
The following generators and filters are supported:
This script supports use of pagegenerators
arguments.
ATTENTION: You can run this script as a stand-alone for testing purposes. However, the changes that are made are only minor, and other users might get angry if you fill the version histories and watchlists with such irrelevant changes. Some wikis prohibit stand-alone running.
For further information see pywikibot/cosmetic_changes.py
- class scripts.cosmetic_changes.CosmeticChangesBot(**kwargs)[source]#
Bases:
AutomaticTWSummaryBot
,ExistingPageBot
Cosmetic changes bot.
- Parameters:
kwargs (Any) – bot options
- Keyword Arguments:
generator – a
generator
processed byrun()
method
- summary_key: str | None = 'cosmetic_changes-standalone'#
Must be defined in subclasses.
- treat_page()[source]#
Treat page with the cosmetic toolkit.
Changed in version 7.0: skip if InvalidPageError is raised
- Return type:
None
- update_options: dict[str, Any] = {'async': False, 'ignore': CANCEL.MATCH, 'summary': ''}#
update_options
can be used to updateavailable_options
; do not use it if the bot class is to be derived but useself.available_options.update(<dict>)
initializer in such case.Added in version 6.4.
- use_redirects: bool | None = False#
Attribute to determine whether to use redirect pages. Set it to True to use redirects only, set it to False to skip redirects. If None both are processed. For example to create a RedirectBot you may define:
class MyRedirectBot(ExistingPageBot): '''Bot who only works on existing redirects.''' use_redirects = True
Added in version 7.2.
create_isbn_edition script#
Pywikibot script to load ISBN related data into Wikidata
Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212=ISBN number as unique external ID).
Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.
Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.
- param All parameters are optional:
P1: digital library (default goob “-“)
bnf Catalogue General (France) bol Bol.com dnb Deutsche National Library goob Google Books kb National Library of the Netherlands loc Library of Congress US mcues Ministerio de Cultura (Spain) openl OpenLibrary.org porbase urn.porbase.org Portugal sbn Servizio Bibliotecario Nazionale wiki wikipedia.org worldcat WorldCat
- P2: ISO 639-1 language code
Default LANG; e.g. en, nl, fr, de, es, it, etc.
- P3 P4…: P/Q pairs to add additional claims (repeated)
e.g. P921 Q107643461 (main subject: database management linked to P2163 Fast ID)
- param stdin:
ISBN numbers (International standard book number)
Free text (e.g. Wikipedia references list, or publication list) is accepted. Identification is done via an ISBN regex expression.
- Functionality:
The ISBN number is used as a primary key (P212 where no duplicates are allowed. The item update is not performed when there is no unique match
Statements are added or merged incrementally; existing data is not overwritten.
Authors and publishers are searched to get their item number (ambiguous items are skipped)
Book title and subtitle are separated with ‘.’, ‘:’, or ‘-’
This script can be run incrementally with the same parameters Caveat: Take into account the Wikidata Query database replication delay. Wait for minimum 5 minutes to avoid creating duplicate objects.
- Data quality:
Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.
The following properties should only be used for written works P5331: OCLC work ID (editions should only have P243) P8383: Goodreads-identificatiecode for work (editions should only have P2969)
Examples
Default library (Google Books), language (LANG), no additional statements:
pwb create_isbn_edition.py 9789042925564
Wikimedia, language Dutch, main subject: database management:
pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6
Standard ISBN properties:
P31:Q3331189: instance of edition
P50: author
P123: publisher
P212: canonical ISBN number (lookup via Wikidata Query)
P407: language of work (Qnumber linked to ISO 639-1 language code)
P577: date of publication (year)
P1476: book title
P1680: subtitle
Other ISBN properties:
P291: place of publication
P921: main subject (inverse lookup from external Fast ID P2163)
P629: work for edition
P747: edition of work
P1104: number of pages
Qualifiers:
P1545: (author) sequence number
External identifiers:
P213: ISNI ID
P243: OCLC ID
P496: ORCID iD
P675: Google Books-identificatiecode
P1036: Dewey Decimal Classification
P2163: Fast ID (inverse lookup via Wikidata Query) -> P921: main subject
P2969: Goodreads-identificatiecode
(only for written works)
P5331: OCLC work ID (editions should only have P243)
P8383: Goodreads-identificatiecode for work (editions should only
have P2969)
- Author:
Geert Van Pamel, 2022-08-04, GNU General Public License v3.0, User:Geertivp
- Documentation:
https://www.freecodecamp.org/news/python-json-how-to-convert-a-string-to-json/
https://buildmedia.readthedocs.org/media/pdf/isbnlib/v3.4.5/isbnlib.pdf
WikiProject Books: https://www.wikidata.org/wiki/Q21831105
https://www.wikidata.org/wiki/Wikidata:List_of_properties/work
https://www.wikidata.org/wiki/Template:Bibliographic_properties
https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.html
http://www.isbn.org/standards/home/isbn/international/hyphenation-instructions.asp
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_qualifiers
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_statements
- Prerequisites:
pywikibot
Install the following ISBN lib packages:: https://pypi.org/search/?q=isbnlib_
pip install isbnlib (mandatory)
(optional) pip install isbnlib-bol pip install isbnlib-bnf pip install isbnlib-dnb pip install isbnlib-kb pip install isbnlib-loc pip install isbnlib-worldcat2 etc.
- Restrictions:
- Better use the ISO 639-1 language code parameter as a default
The language code is not always available from the digital library.
- SPARQL queries run on a replicated database
Possible important replication delay; wait 5 minutes before retry – otherwise risk for creating duplicates.
- Algorithm:
# Get parameters # Validate parameters # Get ISBN data # Convert ISBN data # Get additional data # Register ISBN data into Wikidata (create or amend items or claims)
Environment:
The python script can run on the following platforms::
Linux client
Google Chromebook (Linux container)
Toolforge Portal
PAWS
LANG: ISO 639-1 language code
Applications:
Generate a book reference
Example: {{Cite Q|Q63413107}} (wp.en)
See also::
https://meta.wikimedia.org/wiki/WikiCite
https://www.wikidata.org/wiki/Q21831105 (WikiCite)
https://www.wikidata.org/wiki/Q22321052 (Cite_Q)
https://www.mediawiki.org/wiki/Global_templates
https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
https://phabricator.wikimedia.org/tag/wikicite/
https://meta.wikimedia.org/wiki/WikiCite/Shared_Citations
- Wikidata Query:
List of editions about musicians: https://w.wiki/5aaz
List of editions having ISBN number: https://w.wiki/5akq
- Related projects:
- Other systems:
wiki:`bibliographic_database
Added in version 7.7.
- scripts.create_isbn_edition.add_claims(isbn_data)[source]#
Inspect isbn_data and add claims if possible.
- Parameters:
isbn_data (dict[str, Any])
- Return type:
None
- scripts.create_isbn_edition.amend_isbn_edition(isbn_number)[source]#
Amend ISBN registration.
Amend Wikidata, by registering the ISBN-13 data via P212, depending on the data obtained from the digital library.
- Parameters:
isbn_number (str) – ISBN number (10 or 13 digits with optional hyphens)
- Return type:
None
- scripts.create_isbn_edition.get_item_list(item_name, instance_id)[source]#
Get list of items by name, belonging to an instance (list).
- Parameters:
item_name (str) – Item name (case sensitive)
instance_id – Instance ID (string, set, or list)
- Returns:
Set of items (Q-numbers)
- scripts.create_isbn_edition.is_in_list(statement_list, checklist)[source]#
Verify if statement list contains at least one item from the checklist.
- Parameters:
statement_list – Statement list
checklist (list[str]) – List of values
- Returns:
True when match
- Return type:
bool
dataextend script#
Script to add properties, identifiers and sources to WikiBase items
Usage:
dataextend <item> [<property>[+*]] [args]
In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape
If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.
The following parameters are supported:
-always If this is supplied, the bot will not ask for permission
after each external link has been handled.
-showonly Only show claims for a given ItemPage. Don't try to add any
properties
The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.
After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.
After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.
It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.
Added in version 7.2.
- class scripts.dataextend.AKLAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AbartAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AcademiaeGroninganaeAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AcademicTreeAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AcademieFrancaiseAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AcademieRouenAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AccademiaCruscaAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AdultFilmAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AgorhaAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AinmAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AlkindiAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AlvinAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AmericanArtAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AmericanBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.Analyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
object
- SCRIPTRE = re.compile('(?s)<script.*?</script>', re.DOTALL)#
- TAGRE = re.compile('<[^<>]*>')#
- property alturl#
- property extraurls: list[str]#
- property url#
- class scripts.dataextend.AngelicumAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AnimeConsAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArchivesDuSpectacleAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArmbAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArtHistoriansAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArtUkAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArtcyclopediaAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArticArtistAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArtistsCanadaAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.ArtnetAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AthenaeumAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AustrianBiographicalAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AuteursLuxembourgAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.AutoresArAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BabelioAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BacklinkAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BandcampAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BdelAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BdfaAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BedethequeAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BelgianPhotographerAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BenezitAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BenezitUrlAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
UrlAnalyzer
- class scripts.dataextend.BewebAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BibliotecaNacionalAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
UrlAnalyzer
- class scripts.dataextend.BibsysAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BiografischPortaalAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BiuSanteAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BnaAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BnbAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BneAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BnfAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BookTradeAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BritishExecutionsAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BritishMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.BrooklynMuseumAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
UrlAnalyzer
- class scripts.dataextend.CageMatchAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.CanadianBiographyAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.CanticAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
MarcAnalyzer
- class scripts.dataextend.CbdbAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.CcedAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.CerlAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.CesarAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.Chess365Analyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.CinemagiaAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer
- class scripts.dataextend.CiniiAnalyzer(ident, data=None, item=None, bot=None)[source]#
Bases:
Analyzer