Wikibase scripts#
claimit script#
A script that adds claims to Wikidata items based on a list of pages
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Usage:
python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456
You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.
For geographic coordinates:
python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]
[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.
Example
python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001
By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:
python pwb.py claimit [pagegenerators] P246 "string example" -exists:p
Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.
For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:
python pwb.py claimit [pagegenerators] P246 "string example" -exists:ptq
Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.
create_isbn_edition script#
Pywikibot script to load ISBN related data into Wikidata
Pywikibot script to get ISBN data from a digital library, and create or amend the related Wikidata item for edition (with the P212=ISBN number as unique external ID).
Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.
Then the resulting item number can be used e.g. to generate Wikipedia references using template Cite_Q.
- param All parameters are optional:
P1: digital library (default goob “-“)
bnf Catalogue General (France) bol Bol.com dnb Deutsche National Library goob Google Books kb National Library of the Netherlands loc Library of Congress US mcues Ministerio de Cultura (Spain) openl OpenLibrary.org porbase urn.porbase.org Portugal sbn Servizio Bibliotecario Nazionale wiki wikipedia.org worldcat WorldCat
- P2: ISO 639-1 language code
Default LANG; e.g. en, nl, fr, de, es, it, etc.
- P3 P4…: P/Q pairs to add additional claims (repeated)
e.g. P921 Q107643461 (main subject: database management linked to P2163 Fast ID)
- param stdin:
ISBN numbers (International standard book number)
Free text (e.g. Wikipedia references list, or publication list) is accepted. Identification is done via an ISBN regex expression.
- Functionality:
The ISBN number is used as a primary key (P212 where no duplicates are allowed. The item update is not performed when there is no unique match
Statements are added or merged incrementally; existing data is not overwritten.
Authors and publishers are searched to get their item number (ambiguous items are skipped)
Book title and subtitle are separated with ‘.’, ‘:’, or ‘-’
This script can be run incrementally with the same parameters Caveat: Take into account the Wikidata Query database replication delay. Wait for minimum 5 minutes to avoid creating duplicate objects.
- Data quality:
Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.
The following properties should only be used for written works P5331: OCLC work ID (editions should only have P243) P8383: Goodreads-identificatiecode for work (editions should only have P2969)
Examples
Default library (Google Books), language (LANG), no additional statements:
pwb create_isbn_edition.py 9789042925564
Wikimedia, language Dutch, main subject: database management:
pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6
Standard ISBN properties:
P31:Q3331189: instance of edition
P50: author
P123: publisher
P212: canonical ISBN number (lookup via Wikidata Query)
P407: language of work (Qnumber linked to ISO 639-1 language code)
P577: date of publication (year)
P1476: book title
P1680: subtitle
Other ISBN properties:
P291: place of publication
P921: main subject (inverse lookup from external Fast ID P2163)
P629: work for edition
P747: edition of work
P1104: number of pages
Qualifiers:
P1545: (author) sequence number
External identifiers:
P213: ISNI ID
P243: OCLC ID
P496: ORCID iD
P675: Google Books-identificatiecode
P1036: Dewey Decimal Classification
P2163: Fast ID (inverse lookup via Wikidata Query) -> P921: main subject
P2969: Goodreads-identificatiecode
(only for written works)
P5331: OCLC work ID (editions should only have P243)
P8383: Goodreads-identificatiecode for work (editions should only
have P2969)
- Author:
Geert Van Pamel, 2022-08-04, GNU General Public License v3.0, User:Geertivp
- Documentation:
https://www.freecodecamp.org/news/python-json-how-to-convert-a-string-to-json/
https://buildmedia.readthedocs.org/media/pdf/isbnlib/v3.4.5/isbnlib.pdf
WikiProject Books: https://www.wikidata.org/wiki/Q21831105
https://www.wikidata.org/wiki/Wikidata:List_of_properties/work
https://www.wikidata.org/wiki/Template:Bibliographic_properties
https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
https://doc.wikimedia.org/pywikibot/master/api_ref/pywikibot.html
http://www.isbn.org/standards/home/isbn/international/hyphenation-instructions.asp
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_qualifiers
https://www.wikidata.org/wiki/Wikidata:Pywikibot_-_Python_3_Tutorial/Setting_statements
- Prerequisites:
pywikibot
Install the following ISBN lib packages:: https://pypi.org/search/?q=isbnlib_
pip install isbnlib (mandatory)
(optional) pip install isbnlib-bol pip install isbnlib-bnf pip install isbnlib-dnb pip install isbnlib-kb pip install isbnlib-loc pip install isbnlib-worldcat2 etc.
- Restrictions:
- Better use the ISO 639-1 language code parameter as a default
The language code is not always available from the digital library.
- SPARQL queries run on a replicated database
Possible important replication delay; wait 5 minutes before retry – otherwise risk for creating duplicates.
- Algorithm:
# Get parameters # Validate parameters # Get ISBN data # Convert ISBN data # Get additional data # Register ISBN data into Wikidata (create or amend items or claims)
Environment:
The python script can run on the following platforms::
Linux client
Google Chromebook (Linux container)
Toolforge Portal
PAWS
LANG: ISO 639-1 language code
Applications:
Generate a book reference
Example: {{Cite Q|Q63413107}} (wp.en)
See also::
https://meta.wikimedia.org/wiki/WikiCite
https://www.wikidata.org/wiki/Q21831105 (WikiCite)
https://www.wikidata.org/wiki/Q22321052 (Cite_Q)
https://www.mediawiki.org/wiki/Global_templates
https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
https://phabricator.wikimedia.org/tag/wikicite/
https://meta.wikimedia.org/wiki/WikiCite/Shared_Citations
- Wikidata Query:
List of editions about musicians: https://w.wiki/5aaz
List of editions having ISBN number: https://w.wiki/5akq
- Related projects:
- Other systems:
Added in version 7.7.
dataextend script#
Script to add properties, identifiers and sources to WikiBase items
Usage:
dataextend <item> [<property>[+*]] [args]
In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape
If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.
The following parameters are supported:
-always If this is supplied, the bot will not ask for permission
after each external link has been handled.
-showonly Only show claims for a given ItemPage. Don't try to add any
properties
The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.
After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.
After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.
It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.
Added in version 7.2.
harvest_template script#
Template harvesting script
Usage (see below for explanations and examples):
python pwb.py harvest_template -transcludes:"..." \
[default optional arguments] template_parameter PID \
[local optional arguments] \
[template_parameter PID [local optional arguments]]
python pwb.py harvest_template [generators] -template:"..." \
[default optional arguments] template_parameter PID \
[local optional arguments] \
[template_parameter PID [local optional arguments]]
This will work on all pages that transclude the template in the article namespace
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
You can also use additional parameters:
-confirm If used, the bot will ask if it should make changes
-create Create missing items before importing.
The following command line parameters can be used to change the bot’s behavior. If you specify them before all parameters, they are global and are applied to all param-property pairs. If you specify them after a param-property pair, they are local and are only applied to this pair. If you specify the same argument as both local and global, the local argument overrides the global one (see also examples):
-islink Treat plain text values as links ("text" -> "[[text]]").
-exists If set to 'p', add a new value, even if the item already
has the imported property but not the imported value.
If set to 'pt', add a new value, even if the item already
has the imported property with the imported value and
some qualifiers.
-multi If set, try to match multiple values from parameter.
-inverse Import this property as the inverse claim.
Examples
The following command will try to import existing images from “image” parameter of “Infobox person” on English Wikipedia as Wikidata property “P18” (image):
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
-template:"Infobox person" image P18
The following command will behave the same as the previous example and also try to import [[links]] from “birth_place” parameter of the same template as Wikidata property “P19” (place of birth):
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
-template:"Infobox person" image P18 birth_place P19
The following command will import both “birth_place” and “death_place” params with -islink modifier, ie. the bot will try to import values, even if it doesn’t find a [[link]]:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
-template:"Infobox person" -islink birth_place P19 death_place P20
The following command will do the same but only “birth_place” can be imported without a link:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
-template:"Infobox person" birth_place P19 -islink death_place P20
The following command will import an occupation from “occupation” parameter of “Infobox person” on English Wikipedia as Wikidata property “P106” (occupation). The page won’t be skipped if the item already has that property but there is not the new value:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
-template:"Infobox person" occupation P106 -exists:p
The following command will import band members from the “current_members” parameter of “Infobox musical artist” on English Wikipedia as Wikidata property “P527” (has part). This will only extract multiple band members if each is linked, and will not add duplicate claims for the same member:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 \
-template:"Infobox musical artist" current_members P527 -exists:p -multi
The following command will import the category’s main topic from the first anonymous parameter of “Cat main” on English Wikipedia as Wikidata property “P301” (category’s main topic) and whenever a new value is imported, the inverse claim is imported to the topic item as Wikidata property “P910” (topic’s main category) unless a claim of that property is already there:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:14 \
-template:"Cat main" 1 P301 -inverse:P910 -islink
Note
This script is a
ConfigParserBot
. All options
can be set within a settings file which is scripts.ini by default.
Added in version 7.5: the -inverse option.
illustrate_wikidata script#
Bot to add images to Wikidata items
The image is extracted from the page_props. For this to be available the PageImages extension (https://www.mediawiki.org/wiki/Extension:PageImages) needs to be installed.
The following options are provided:
-always Don't prompt to make changes, just do them.
-property The property to add. Should be of type commonsMedia.
Usage:
python pwb.py illustrate_wikidata <some generator>
This script supports use of pagegenerators
arguments.
interwikidata script#
Script to handle interwiki links based on Wikibase
This script connects pages to Wikibase items using language links on the page. If multiple language links are present, and they are connected to different items, the bot skips. After connecting the page to an item, language links can be removed from the page.
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
-always If used, the bot won't ask if it should add the specified
text
-clean Clean pages.
-create Create items.
-merge Merge items.
-summary: Use your own edit summary for cleaning the page.
Note
This script is a
ConfigParserBot
. All options
can be set within a settings file which is scripts.ini by default.
newitem script#
This script creates new items on Wikidata based on certain criteria
When was the (Wikipedia) page created?
When was the last edit on the page?
Does the page contain interwikis?
This script understands various command-line arguments:
-lastedit The minimum number of days that has passed since the page was
last edited.
-pageage The minimum number of days that has passed since the page was
created.
-touch Do a null edit on every page which has a Wikibase item.
Be careful, this option can trigger edit rates or captchas
if your account is not autoconfirmed.