Scripts package#

Scripts folder contains predefined scripts easy to use.

Scripts are only available with Pywikibot if installed in directory mode and not as site package. They can be run in command line using the pwb wrapper script:

python pwb.py <global options> <name_of_script> <options>

Every script provides a -help option which shows all available options, their explanation and usage examples. Global options will be shown by -help:global or using:

python pwb.py -help

The advantages of pwb.py wrapper script are:

  • check for framework and script dependencies and show a warning if a package is missing or outdated or if the Python release does not fit

  • check whether user config file (user-config.py) is available and ask to create it by starting the generate_user_files.py script

  • enable global options even if a script does not support them

  • start private scripts located in userscripts sub-folder

  • find a script even if given script name does not match a filename e.g. due to spelling mistake

wiki-etiquette should be followed before running it on any wiki.

To get started on proper usage of the bot framework, refer to Manual:Pywikibot.

The contents of the package#

Bots and scripts#

Bots and Scripts

add_text.py

Adds text at the top or end of pages.

archivebot.py

Archives discussion threads.

basic.py

Is a template from which simple bots can be made.

blockpagechecker.py

Deletes any protection templates that are on pages which aren’t actually protected.

category.py

Add a category link to all pages mentioned on a page, change or remove category tags.

category_graph.py

Visualizes category hierarchy

category_redirect.py

Maintain category redirects and replace links to redirected categories.

change_pagelang.py

Changes the content language of pages.

checkimages.py

Check recently uploaded files. Checks if a file description is present and if there are other problems in the image’s description.

claimit.py

Adds claims to Wikidata items based on categories.

clean_sandbox.py

This bot resets a sandbox with predefined text.

commonscat.py

Adds {{commonscat}} to Wikipedia categories (or articles), if other language Wikipedia already has such a template.

commons_information.py

Insert a language template into the description field.

coordinate_import.py

Coordinate importing script.

cosmetic_changes.py

Can do slight modifications to a wiki page source code such that the code looks cleaner.

data_ingestion.py

A generic bot to do batch uploading to Commons.

delete.py

This script can be used to delete pages en masse.

delinker.py

Unlink file references of deleted images.

djvutext.py

Extracts OCR text from djvu files and uploads onto pages in the “Page” namespace on Wikisource.

download_dump.py

Downloads dumps from dumps.wikimedia.org

fixing_redirects.py

Correct all redirect links of processed pages.

harvest_template.py

Template harvesting script.

illustrate_wikidata.py

Bot to add images to Wikidata items.

image.py

Script to replace transclusions of files.

imagetransfer.py

Given a wiki page, check the interwiki links for images, and let the user choose among them for images to upload.

interwiki.py

A robot to check interwiki links on all pages (or a range of pages) of a wiki.

interwikidata.py

Script to handle interwiki links based on Wikibase.

listpages.py

Print a list of pages, defined by a page generator.

misspelling.py

Similar to solve_disambiguation.py. It is supposed to fix links that contain common spelling mistakes.

movepages.py

Bot that can move pages to another title.

newitem.py

Script creates new items on Wikidata based on criteria.

noreferences.py

Searches for pages where <references /> is missing although a <ref> tag is present, and in that case adds a new references section.

nowcommons.py

This bot can delete images with NowCommons template.

pagefromfile.py

This bot takes its input from a file that contains a number of pages to be put on the wiki.

parser_function_count.py

Find expensive templates that are subject to be converted to Lua.

patrol.py

Obtains a list pages and marks the edits as patrolled based on a whitelist.

protect.py

Protect and unprotect pages en masse.

redirect.py

Fix double redirects and broken redirects. Note: solve_disambiguation also has functions which treat redirects.

reflinks.py

Search for references which are only made of a link without title and fetch the html title from the link to use it as the title of the wiki link in the reference.

replace.py

Search articles for a text and replace it by another text. Both text are set in two configurable text files. The bot can either work on a set of given pages or crawl an SQL dump.

replicate_wiki.py

Replicates pages in wiki to a second wiki within family

revertbot.py

Script that can be used for reverting certain edits.

solve_disambiguation.py

Interactive robot doing disambiguation.

speedy_delete.py

Help sysops to quickly check and/or delete pages listed for speedy deletion.

template.py

Change one template (that is {{…}}) into another.

templatecount.py

Display the list of pages transcluding a given list of templates.

touch.py

Bot goes over all pages of the home wiki, and edits them without changes.

transferbot.py

Transfers pages from a source wiki to a target wiki.

transwikiimport.py

Transfers pages from a source wiki to a target wiki including edit history using API:Import.

unusedfiles.py

Bot appends some text to all unused images and other text to the respective uploaders.

unlink.py

This bot unlinks a page on every page that links to it.

upload.py

Upload an image to a wiki.

watchlists.py

Allows access to the account’s watchlist.

weblinkchecker.py

Check if external links are still working.

welcome.py

Script to welcome new users.

Maintenance#

maintenance

Framework helper scripts

addwikis.py

Script to add wikis to a family file.

cache.py

Script for showing and deleting API cache.

colors.py

Utility to show pywikibot colors.

make_i18n_dict.py

Generate an i18n file from a given script.

unidata.py

Updates _first_upper_exception_dict in tools.unidata

External packages could be required with Pywikibot:

The pwb.py wrapper scripts informs about the requirement and how to install.

Script descriptions#

Scripts changes#

Scripts Changelog#

10.0.0#

  • Require Python 3.8 or higher

  • Require Pywikibot 10.0.0

dataextend#
  • Script was removed from repository

9.6.1#

  • Require Pywikibot 9.6.1 (T358635)

  • i18n updates

9.6.0#

  • i18n updates

create_isbn_edition#
dataextend#
  • The script is deprecated and will be removed from script package with Pywikibot 10.

replace#
  • Strip newlines from pairsfile lines (T378647)

weblinkchecker#
  • Remove unused error parameter from History.log() method (T380693)

9.5.0#

  • i18n updates

interwiki#
  • Remove -repository option which was never implemented in core

9.4.1#

  • import scripts from pywikibot-scripts if site-package is installed (T377056)

  • i18n updates

9.4.0#

delinker#
  • Use difflib.get_close_matches() to find the closest image match

  • Add -category option to work from given category and look for the latest file deletion first (T372206)

  • Check whether image exists first (T372106)

unusedfiles#
  • L10N updates

  • flow support was dropped, it never worked (T372477)

9.3.1#

9.3.0#

delinker#
  • Ignore file extension check (T352237)

fixing_redirects#
  • Ignore SectionError in fixing_redirects.py script (T370295)

interwiki#
  • -wiktionary option was removed

redirect#
  • Show the current redirect target with redirect summary (T254839)

9.2.0#

addwikis#
  • This maintenance script was added to add wikis to the Family.codes set

commons_information#
  • Do not remove valid description parts of Information template (T364640)

  • Use transclusions of Information template as default generator

  • Preload pages to make the script upto 10 times faster

illustrate_wikidata#
  • -always option is supported

interwikidata#
  • Do not create an option named None (T366409)

noreferences#
  • L10N updates

9.1.0#

colors#
noreferences#
  • L10N updates

  • Show an error message and leave if script is not localized (T362103)

replace#
  • Permit strings as exceptions for fixes

  • Do not apply replacements multiple times (T363047)

  • Respect ‘text-contains’ from fixes dictionaries (T142324)

9.0.0#

category_graph#
  • Check for -from option first (T354162)

  • Validate file path input (T346417)

category_redirect#
interwiki#
touch#
  • Use site.ratelimit for bulk purge in PurgeBot

8.5.0#

category_graph#
  • Change category output string to format string (T348709)

commonscat#
  • Fix skip page template parameter check (T106952)

8.4.0#

  • L10N for several scripts

category_graph#
  • Wrap DOT-string in curly braces (T346007)

checkimages#
newitem#
  • Enable -touch in newitem script for confirmed user (T343877)

maintenance#
  • new script unidata to update _first_upper_exception_dict of pywikibot.tools._unidata.

8.3.0#

patrol#

8.2.0#

archivebot#
  • KeyboardInterrupt was enabled for -async option

listpages#
  • -tofile option was added to save list to a file

noreferences#
replicate_wiki#
transwikiimport#

8.1.0#

archivebot#
  • Processing speed was improved and is up to 20 times faster, 2-3 times on average

redirect#
  • Use Bot: prefixed summary (T161459)

  • Fix -namespace usage if RedirectGenerator is used (T331243)

8.0.2#

clean_sandbox#
  • L10N for es-wikis

8.0.1#

clean_sandbox#
  • L10N for several wikis

touch#
  • Login first when starting the script (T328204)

8.0.0#

blockpageschecker#
  • Fix neutral additive element

category#
  • Enable pagegenerators options with move and remove actions (T318239)

category_graph#
  • category_graph script was added which creates category graph in formats dot, svg and html5

clean_sandbox#
  • L10N updates

  • A -textfile option was addet to fetch the text from a file

create_isbn_edition#
  • Fix argument parsing

fixing_redirects#
  • Skip invalid link titles (T324434)

interwiki#

Fix string concatenation (T322180)

touch#

Provide bulk purge to run upto 1000 times faster

7.7.0#

archivebot#
  • Process pages in parallel tasks with -async option (T57899)

  • Add -sort option to sort archives by (latest) timestamp

  • Archive unsigned threads using timestamp of the next thread (T69663, T182685)

category_redirect#
  • Use localized template prefix (T318049)

create_isbn_edition#
  • New script to load ISBN related data into Wikidata (T314942)

watchlist#
  • Watchlist is retrieved faster in parallel tasks (T57899)

  • Enable watchlist.refresh_all for API generator access (T316359)

7.6.0#

21 August 2022

archivebot#
  • Use User:MiszaBot/config as default template

  • Raise MalformedConfigError if ‘maxarchivesize’ is 0 (T313886)

  • Preserve thread order in archive even if threads are archived later (T312773, T314560)

  • Skip the page if it does not exist

  • Fix for DiscussionPage.size() (T313886)

  • Decrease memory usage and improve processing speed

interwiki#
  • Fix wrong Subject property

pagefromfile#
  • Derive PageFromFileReader from tools.collections.GeneratorWrapper

7.5.2#

26 July 2022

archivebot#

7.5.1#

24 July 2022

archivebot#
  • Replace archive pattern fields to string conversion (T313692)

7.5.0#

22 July 2022

harvest_template#
  • Support harvesting time values (T66503)

  • Do not rely on self.current_page.site

  • Add -inverse option for inverse claims (T173238)

  • Only follow redirects in harvest_template.py if no wikibase item exists (T311883)

7.4.0#

26 June 2022

addtext#
  • Fix for -createonly option (T311173)

harvest_template#
  • Add -confirm option which sets ‘always’ option to False (T310356)

  • Do not show a warning if generator is specified later (T310418)

interwiki#
  • Fix regression where interwiki script removes all interwiki links (T310964)

  • Assign compareLanguages to be reused and fix process_limit_two call (T310908)

listpages#
  • Print the page list immediately except pages are preloaded

nowcommons#

7.3.0#

21 May 2022

general#
  • Call ExistingPageBot.skip_page() first (T86491)

delete#
  • Count deleted pages and other actions (T212040)

replace#
  • A -nopreload option was added

weblinkchecker#
  • Throttle connections to the same host (T152350)

  • Do not kill threads after generator is exhausted (T113139)

  • Use Page.extlinks() to get external links (T60812)

update_script#
  • update_script script was removed

7.2.1#

07 May 2022

movepages#
  • Fix regression of option parsing (T307826)

7.2.0#

26 April 2022

general#
  • Archived scripts were removed

archive#
checkimages#
  • Use page_from_repository() method to read categorized from wikibase

  • Use itertools.zip_longest to find the most important image

dataextend#
  • A -showonly option was added to only show claims of an ItemPage

  • This new script was added. It is able to add properties, identifiers and sources to WikiBase items

delinker#
  • New delinker script was added; it replaces compat’s CommonsDelinker (T299563)

image#
reflinks#
replace#
  • A -quiet option was added to omit message when no change was made

7.1.1#

15 April 2022

replace#
  • Fix regression of XmlDumpPageGenerator

7.1.0#

26 March 2022

fixing_redirects#
  • -always option was enabled

reflinks#
  • Solve UnicodeDecodeError in ReferencesRobot.treat() (T304288)

  • Decode pdfinfo if it is bytes content (T303731)

7.0.0#

26 February 2022

general#
  • L10N updates

  • Provide ConfigParserBot for several scripts (T223778)

add_text#
  • Provide -create and -createonly options (T291354)

  • Deprecated function get_text() was removed in favour of Page.text and BaseBot.skip_page()

  • Deprecated function put_text() was removed in favour of BaseBot.userPut() method

  • Deprecated function add_text() were remove in favour of textlib.add_text()

blockpageschecker#
  • Use different edit comments when adding, changing or removing templates (T291345)

  • Derive CheckerBot from ConfigParserBot (T57106)

  • Derive CheckerBot from CurrentPageBot (T196851, T171713)

category#
  • CleanBot was added which can be invoked by clean action option

  • Recurse CategoryListifyRobot with depth

  • Show a warning if a pagegenerator option is not enabled (T298522)

  • Deprecated code parts were removed

checkimages#
  • Skip PageSaveRelatedError and ServerError when putting talk page (T302174)

commonscat#
  • Ignore InvalidTitleError in CommonscatBot.findCommonscatLink (T291783)

cosmetic_changes#
  • Ignore InvalidTitleError in CosmeticChangesBot.treat_page (T293612)

djvutext#
  • pass site arg only once (T292367)

fixing_redirects#
  • Let only put_current show the message “No changes were needed”

  • Use concurrent.futures to retrieve redirect or moved targets (T298789)

  • Add an option to ignore solving moved targets (T298789)

imagetransfer#
  • Add support for chunked uploading (T300531)

newitem#
  • Do not pass OtherPageSaveRelatedError silently

pagefromfile#
  • Preload pages instead of reading them one by one before putting changes

  • Don’t ask for confirmation by default (T291757)

redirect#
  • Use site.maxlimit to determine the highest limit to load (T299859)

replace#
  • Enable default behaviour with -mysqlquery (T299306)

  • Deprecated “acceptall” and “addedCat” parameters were replaced by “always” and “addcat”

revertbot#
  • Add support for translated dates/times (T102174)

  • Deprecated “max” parameter was replaced by “total”

solve_disambiguation#
  • Remove deprecated properties in favour of DisambiguationRobot.opt options

touch#

*Do not pass OtherPageSaveRelatedError silently

unusedfiles#
  • Use oldest_file_info.user as uploader (T301768)

6.6.1#

21 September 2021

category#
  • Fix -match option

6.6.0#

15 September 2021

add_text#
  • Add -major flag to disable minor edit flag when saving

6.5.0#

05 August 2021

reflinks#
  • Don’t ignore identical references with newline in ref content (T286369)

  • L10N updates

6.4.0#

01 July 2021

general#
  • show a warning if pywikibot.__version__ is behind scripts.__version__

addtext#
  • Deprecate get_text, put_text and add_text functions (T284388)

  • Use AutomaticTWSummaryBot and NoRedirectPageBot bot class instead of functions (T196851)

blockpageschecker#
  • Script was unarchived

commonscat#
  • Enable multiple sites (T57083)

  • Use new textlib.add_text function

cosmetic_changes#
  • set -ignore option to CANCEL.MATCH by default (T108446)

fixing_redirects#
imagetransfer#
  • Skip pages which does not exist on source site (T284414)

  • Use roundrobin_generators to combine multiple template inclusions

  • Allow images existing in the shared repo (T267535)

template#
  • Do not try to initialize generator twice in TemplateRobot (T284534)

update_script#
  • compat2core script was restored and renamed to update_script

version#
  • Show all mandatory dependencies

6.3.0#

31 May 2021

addtext#
  • -except option was removed in favour of commonly used -grepnot

archivebot#
  • Durations must to have a time unit

6.2.0#

28 May 2021

general#
  • image.py was restored

  • nowcommons.py was restored

  • i18n updates

  • L10N updates

category#
  • dry parameter of CategoryAddBot will be removed

commonscat#
  • Ignore InvalidTitleError (T267742)

  • exit checkCommonscatLink method if target name is empty (T282693)

fixing_redirects#
  • ValueError will be ignored (T283403, T111513)

  • InterwikiRedirectPageError will be ignored (T137754)

  • InvalidPageError will be ignored (T280043)

reflinks#
  • Use consecutive reference numbers for autogenerated links

replace#
  • InvalidPageError will be ignored (T280043)

upload#
  • Support async chunked uploads (T129216)

6.1.0#

17 April 2021

general#
  • commonscat.py was restored

  • compat2core.py script was archived

  • djvutext.py was restored

  • interwiki.py was restored

  • patrol.py was restored

  • watchlist.py was restored

archivebot#
  • PageArchiver.maxsize must be defined before load_config() (T277547)

  • Time period must have a qualifier

imagetransfer#
  • Fix usage of -tofamily -tolang options (T279232)

misspelling#
  • Use the new DisambiguationRobot interface and options

reflinks#
  • Catch urllib3.LocationParseError and skip link (T280356)

  • L10N updates

  • Avoid duplicate reference names (T278040)

solve_disambiguation#
  • Keyword arguments are recommended if deriving the bot; opt option handler is used.

welcome#
  • Fix reporting bad account names

6.0.0#

15 March 2021

general#
  • interwikidumps.py, cfd.py and featured.py scripts were deleted (T223826)

  • Long time unused scripts were archived (T223826). Ask to recover if needed.

  • pagegenerators.handle_args() is used in several scripts

archivebot#
  • Always take ‘maxarticlesize’ into account when saving (T276937)

  • Remove deprecated parts

category#
  • add ‘namespaces’ option to category ‘listify’

commons_information#
  • New script to wrap Commons file descriptions in language templates

generate_family_file#
  • Ignore ssl certificate validation (T265210)

login#
  • update help string

maintenance#
  • Add a preload_sites.py script to preload site information (T226157)

reflinks#
  • Force pdf file to be closed (T276747)

  • Fix http.fetch response data attribute

  • Fix treat process flow

replace#
  • Add replacement description to -summary message

replicate_wiki#
  • replace pages in all sites (T275291)

solve_disambiguation#
  • Deprecated methods were removed

  • Positional arguments of DisambiguationRobot are deprecated, also some keywords were replaced

unusedfiles#
  • Update unusedfiles.py to add custom templates

5.6.0#

24 January 2021

general#
  • pagegenerators handleArg was renamed to handle_arg (T271437)

  • i18n updates

add_text#
  • bugfix: str.join() expects an iterable not multiple args (T272223)

redirect#
  • pagegenerators -page option was implemented (T100643)

  • pagegenerators namespace filter was implemented (T234133, T271116)

weblinkchecker#

  • Deprecated LinkChecker class was removed

5.5.0#

*12 January 2021

general#
  • i18n updates

  • L10N updates

add_text#
  • -except option was renamed to -grepnot from pagegenerators

solve_disambiguation#
  • ignore ValueError when parsing a Link object (T111513)

5.4.0#

2 January 2021

general#
  • i18n updates

replace#
  • Desupported ReplaceRobot.doReplacements method was removed

5.3.0#

19 December 2020

data_ingestion#
  • Remove deprecated Photo.reader property and Photo.doSingle() method

replicate_wiki#
  • Remove deprecated namespace function

template#
  • remove deprecated XmlDumpTemplatePageGenerator

5.2.0#

10 December 2020

general#
  • Removed unsupported BadTitle Exception (T267768)

  • Replaced PageNotSaved by PageSaveRelatedError (T267821)

  • Update scripts to support Python 3.5+ only

  • i18n updates

  • L10N updates

basic#
  • Make BasicBot example a ConfigParserBot to explain the usage

clean_sandbox#
fixing_redirects#
  • Ignore RuntimeError for missing ‘redirects’ in api response (T267567)

imagetransfer#
  • Implement -tosite command and other improvements

  • Do not use UploadRobot.run() with imagetransfer (T267579)

interwiki#
  • Use textfile for interwiki dumps and enable -restore:all option (T74943, T213624)

makecat#
  • Use input_choice for options

  • New option handling

  • Other improvements

revertbot#
  • Take rollbacktoken to revert (T250509)

solve_disambiguation#
  • Write ignoring pages as a whole

touch#
  • Fix available_options and purge options (T268394)

weblinkchecker#
  • Fix AttributeError of HttpRequest (T269821)

5.1.0#

1 November 2020

general#
  • i18n updates

  • switch to new OptionHandler interface (T264721)

change_pagelang#
  • New script was added

download_dump#
  • Make dumpdate param work when using the script in Toolforge (T266630)

imagetransfer#
  • Remove outdated “followRedirects” parameter from imagelinks(); treat instead of run method (T266867, T196851, T171713)

interwiki#
  • Replace deprecated originPage by origin in Subjects

misspelling#
  • Enable misspelling.py for several sites using wikidata (T258859, T94681)

noreferences#
  • Rename NoReferencesBot.run to treat (T196851, T171713)

  • Use wikidata item instead of dropped MediaWiki message for default category (T266413)

reflinks#
  • Derive ReferencesRobot from ExistingPageBot and NoRedirectPageBot

  • Use chardet to find a valid encoding (266862)

  • Rename ReferencesRobot.run to treat (T196851, T171713)

  • Ignore duplication replacements inside templates (T266411)

  • Fix edit summary (T265968)

  • Add Server414Error in and close file after reading (T266000)

  • Call ReferencesRobot.setup() (T265928)

welcome#
  • Replace _COLORS and _MSGS dicts by Enum

5.0.0#

19 October 2020

general#
  • i18n updates

  • L10N updates

  • Remove deprecated use of fileUrl

  • Remove ArgumentDeprecationWarning for several scripts

casechecker#
  • Split initializer and put getting whitelist to its own method

checkimages#
  • Re-enable -sleep parameter (T264521)

commonscat#
  • get commons category from wikibase (T175207)

  • Adjust save counter (T262772)

flickrripper#
  • Improve option handling

imagecopy_self#
  • Improvements were made

imagetransfer#
  • Do not encode str to bytes (T265257)

match_images#
  • Improvements

parser_function_count#

Porting parser_function_count.py from compat to core/scripts (T66878)

reflinks#

decode byte-like object meta_content.group() (T264575)

speedy_delete#
  • port speedy_delete.py to core (T66880)

weblinkchecker#
  • Use ThreadList with weblinkchecker

maintenance#
  • new maintenance script sorting_order was added

  • new maintenance script update_linktrails was added

4.3.0#

2 September 2020

general#
  • i18n updates

4.2.0#

28 August 2020

general#
  • i18n updates

archivebot#
  • Determine whether counter matters only once

4.1.1#

18 August 2020

general#
  • Add missing commas in string constants

4.1.0#

16 August 2020

general#
  • i18n updates

download_dump#

replace#

  • Show a FutureWarning for deprecated doReplacements method

replicate_wiki#

  • Show a FutureWarning for deprecated namespace function

template#

  • Show a FutureWarning for deprecated XmlDumpTemplatePageGenerator class

4.0.0#

4 August 2020

general#
  • Remove Python 2 related code (T257399)

  • i18n updates

  • L10N updates

archivebot#
  • Only mention archives where something was really archived

  • Reset counter when “era” changes (T215247)

  • Code improvements and cleanups

  • Fix ShouldArchive type

  • Refactor PageArchiver’s main loop

  • Move archiving logic to PageArchiver

  • Fix str2size to allow space separators

cfd#
  • Script was archived and is no longer supported (T223826)

delete#
  • Use Dict in place of DefaultDict (T257770)