Wikibase scripts#
claimit script#
A script that adds claims to Wikidata items based on a list of pages
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Usage:
python pwb.py claimit [pagegenerators] P1 Q2 P123 Q456
You can use any typical pagegenerator (like categories) to provide with a list of pages. Then list the property–>target pairs to add.
For geographic coordinates:
python pwb.py claimit [pagegenerators] P625 [lat-dec],[long-dec],[prec]
[lat-dec] and [long-dec] represent the latitude and longitude respectively, and [prec] represents the precision. All values are in decimal degrees, not DMS. If [prec] is omitted, the default precision is 0.0001 degrees.
Example
python pwb.py claimit [pagegenerators] P625 -23.3991,-52.0910,0.0001
By default, claimit.py does not add a claim if one with the same property already exists on the page. To override this behavior, use the ‘exists’ option:
python pwb.py claimit [pagegenerators] P246 “string example” -exists:p
Suppose the claim you want to add has the same property as an existing claim and the “-exists:p” argument is used. Now, claimit.py will not add the claim if it has the same target, source, and/or the existing claim has qualifiers. To override this behavior, add ‘t’ (target), ‘s’ (sources), or ‘q’ (qualifiers) to the ‘exists’ argument.
For instance, to add the claim to each page even if one with the same property and target and some qualifiers already exists:
python pwb.py claimit [pagegenerators] P246 “string example” -exists:ptq
Note that the ordering of the letters in the ‘exists’ argument does not matter, but ‘p’ must be included.
create_isbn_edition script#
Pywikibot client to load ISBN linked data into Wikidata
Pywikibot script to get ISBN data from a digital library, and create or
amend the related Wikidata item for edition (with the
P212, ISBN number
as unique external ID).
Use digital libraries to get ISBN data in JSON format, and integrate the results into Wikidata.
Note
ISBN data should only be used for editions, and not for written works.
Then the resulting item number can be used e.g. to generate Wikipedia
references using template Cite_Q
.
- Parameters:
All parameters are optional:
*P1:* digital library (default wiki "-") bnf Catalogue General (France) bol Bol.com dnb Deutsche National Library goob Google Books kb National Library of the Netherlands loc Library of Congress US mcues Ministerio de Cultura (Spain) openl OpenLibrary.org porbase urn.porbase.org Portugal sbn Servizio Bibliotecario Nazionale (Italy) wiki wikipedia.org worldcat WorldCat (wc) *P2:* ISO 639-1 language code. Default LANG; e.g. en, nl, fr, de, es, it, etc. *P3 P4...:* P/Q pairs to add additional claims (repeated) e.g. P921 Q107643461 (main subject: database management linked to P2163, Fast ID 888037) *stdin:* List of ISBN numbers (International standard book number, version 10 or 13). Free text (e.g. Wikipedia references list, or publication list) is accepted. Identification is done via an ISBN regex expression.
- Functionality:
Both ISBN-10 and ISBN-13 numbers are accepted as input.
Only ISBN-13 numbers are stored. ISBN-10 numbers are only used for identification purposes; they are not stored.
The ISBN number is used as a primary key; no two items can have the same P212 ISBN number. The item update is not performed when there is no unique match. Only editions are updated or created.
Individual statements are added or merged incrementally; existing data is not overwritten.
Authors and publishers are searched to get their item number; unknown of ambiguous items are skipped.
Book title and subtitle are separated with either ‘.’, ‘:’, or ‘-’ in that order.
Detect author, illustrator, writer preface, afterwork instances.
Add profession “author” to individual authors.
This script can be run incrementally.
- Examples:
Default library (Google Books), language (LANG), no additional statements:
pwb create_isbn_edition.py 9789042925564
Wikimedia, language English, main subject: database management:
pwb create_isbn_edition.py wiki en P921 Q107643461 978-0-596-10089-6
- Data quality:
ISBN numbers (P212) are only assigned to editions.
A written work should not have an ISBN number (P212).
For targets of P629 (edition of) amend “is an Q47461344 (written work) instance” and “inverse P747 (work has edition)” statements
Use https://query.wikidata.org/querybuilder/ to identify P212 duplicates. Merge duplicate items before running the script again.
The following properties should only be used for written works, not for editions:
P5331: OCLC work ID (editions should only have P243)
P8383: Goodreads-identificatiecode for work (editions should only have P2969)
- Return status:
The following status codes are returned to the shell:
3 Invalid or missing parameter 4 Library not installed 12 Item does not exist 20 Network error
- Standard ISBN properties for editions:
P31:Q3331189: instance of edition (mandatory statement) P50: author P123: publisher P212: canonical ISBN number (with dashes; searchable via Wikidata Query) P407: language of work (Qnumber linked to ISO 639-1 language code) P577: date of publication (year) P1476: book title P1680: subtitle
- Other ISBN properties:
P921: main subject (inverse lookup from external Fast ID P2163) P629: work for edition P747: edition of work
- Qualifiers:
P248: Source P813: Retrieval date P1545: (author) sequence number
- External identifiers:
P243: OCLC ID P1036: Dewey Decimal Classification P2163: Fast ID (inverse lookup via Wikidata Query) -> P921: main subject (not implemented) P2969: Goodreads-identificatiecode (only for written works) P5331: OCLC work ID (editions should only have P243) (not implemented) P8383: Goodreads-identificatiecode for work (editions should only have P2969) P213: ISNI ID P496: ORCID ID P675: Google Books-identificatiecode
- Unavailable properties from digital library:
(not implemented by isbnlib) P98: Editor P110: Illustrator/photographer P291: place of publication P1104: number of pages ?: edition format (hardcover, paperback)
- Author:
Geert Van Pamel (User:Geertivp), MIT License, 2022-08-04,
- Prerequisites:
In addition to Pywikibot the following ISBN lib package is mandatory; install it with:
pip install isbnlib
The following ISBN lib package are optional; install them with:
pip install isbnlib-bnf pip install isbnlib-bol pip install isbnlib-dnb pip install isbnlib-kb pip install isbnlib-loc pip install isbnlib-worldcat2
- Restrictions:
Better use the ISO 639-1 language code parameter as a default. The language code is not always available from the digital library; therefore we need a default.
Publisher unknown: * Missing P31:Q2085381 statement, missing subclass in script * Missing alias * Create publisher
Unknown author: create author as a person
- Known Problems:
Unknown ISBN, e.g. 9789400012820
If there is no ISBN data available for an edition either returns no output (goob = Google Books), or an error message (wiki, openl). The script is taking care of both. Try another library instance.
Only 6 specific ISBN attributes are listed by the webservice(s), missing are e.g.: place of publication, number of pages
Some digital libraries have more registrations than others.
Some digital libraries have data quality problems.
Not all ISBN atttributes have data values (authors, publisher, date of publication), language can be missing at the digital library.
How to add still more digital libraries?
This would require an additional isbnlib module
Does the KBR has a public ISBN service (Koninklijke Bibliotheek van België)?
The script uses multiple webservice calls; script might take time, but it is automated.
Need to manually amend ISBN items that have no author, publisher, or other required data * You could use another digital library * Which other services to use?
BibTex service is currently unavailable
Filter for work properties: https://www.wikidata.org/wiki/Q63413107
['9781282557246', '9786612557248', '9781847196057', '9781847196040'] P5331: OCLC identification code for work 793965595; should only have P243) P8383: Goodreads identification code for work 13957943; should only have P2969)
ERROR: an HTTP error has ocurred e.g. (503) Service Unavailable
error: externally-managed-environment
isbnlib-kb
cannot be installed viapip install
command. It raiseserror: externally-managed-environment
because this environment is externally managed.To install Python packages system-wide, try
apt install python3-xyz
, where xyz is the package you are trying to install.If you wish to install a non-Debian-packaged Python package, create a virtual environment using
python3 -m venv path/to/venv
. Then usepath/to/venv/bin/python
andpath/to/venv/bin/pip
. Make sure you havepython3-full
installed.If you wish to install a non-Debian packaged Python application, it may be easiest to use
pipx install xyz
, which will manage a virtual environment for you. Make sure you havepipx
installed.See also
See Python Library venv for more information about virtual environments.
Note
If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing
--break-system-packages
topip
.Hint
See PEP 668 for the detailed specification.
You need to install a local python environment:
sudo -s apt install python3-full python3 -m venv /opt/python /opt/python/bin/pip install pywikibot /opt/python/bin/pip install isbnlib-kb /opt/python/bin/python ../userscripts/create_isbn_edition.py kb
- Environment:
The python script can run on the following platforms:
Linux client
Google Chromebook (Linux container)
Toolforge Portal
PAWS
LANG: default ISO 639-1 language code
- Applications:
Generate a book reference. Example for wp.en only:
{{Cite Q|Q63413107}}
Use the Visual editor reference with Qnumber.
See also
https://www.wikidata.org/wiki/Q21831105 (WikiProject Books)
https://www.wikidata.org/wiki/Q21831105 (WikiCite)
https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
https://www.wikidata.org/wiki/Q36524 (Authority control)
- Wikidata Query:
List of editions about musicians: https://w.wiki/5aaz
List of editions having ISBN number: https://w.wiki/5akq
- Related projects:
- Other systems:
- Documentation:
https://buildmedia.readthedocs.org/media/pdf/isbnlib/v3.4.5/isbnlib.pdf
http://www.isbn.org/standards/home/isbn/international/hyphenation-instructions.asp
https://www.wikidata.org/wiki/Wikidata:List_of_properties/work
https://www.wikidata.org/wiki/Template:Bibliographic_properties
https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData
https://www.wikidata.org/wiki/Q22696135 (Wikidata references module)
https://www.wikidata.org/wiki/Special:BookSources/978-94-014-9746-6
Goodreads:
Added in version 7.7.
Changed in version 9.6: several implementation improvements
dataextend script#
Script to add properties, identifiers and sources to WikiBase items
Usage:
dataextend <item> [<property>[+*]] [args]
In the basic usage, where no property is specified, item is the Q-number of the item to work on.from html import unescape
If a property (P-number, or the special value ‘Wiki’ or ‘Data’) is specified, only the data from that identifier are added. With a ‘+’ after it, work starts on that identifier, then goes on to identifiers after that (including new identifiers added while working on those identifiers). With a ‘*’ after it, the identifier itself is skipped, but those coming after it (not those coming before it) are included.
The following parameters are supported:
- -always
If this is supplied, the bot will not ask for permission after each external link has been handled.
- -showonly
Only show claims for a given ItemPage. Don’t try to add any properties
The bot will load the corresponding pages for these identifiers, and try to the meaning of that string for the specified type of thing (for example ‘city’ or ‘gender’). If you want to use it, but not save it (which can happen if the string specifies a certain value now, but might show another value elsewhere, or if it is so specific that you’re pretty sure it won’t occur a second time), you can provide the Q-number with X rather than Q. If you do not want to use the string, you can just hit enter, or give the special value ‘XXX’ which means that it will be skipped in each subsequent run as well.
After an identifier has been worked on, there might be a list of names that has been found, in lc:name format, where lc is a language code. You can accept all suggested names (answer Y), none (answer N) or ask to get asked for each name separately (answer S), the latter being the default if you do not fill in anything.
After all identifiers have been worked on, possible descriptions in various languages are presented, and you get to choose one. The default is here 0, which always is the current description for that language. Finally, for a number of identifiers text is shown that usually gives parts of the description that are hard to parse automatically, so you can see if there any additional pieces of data that can be added.
It is advisable to (re)load the item page that the bot has been working on in the browser afterward, to correct any mistakes it has made, or cases where a more precise and less precise value have both been included.
Added in version 7.2.
Deprecated since version 9.6: will be removed with Pywikibot 10.
harvest_template script#
Template harvesting script
Usage (see below for explanations and examples):
python pwb.py harvest_template -transcludes:”…” [default optional arguments] template_parameter PID [local optional arguments] [template_parameter PID [local optional arguments]]
python pwb.py harvest_template [generators] -template:”…” [default optional arguments] template_parameter PID [local optional arguments] [template_parameter PID [local optional arguments]]
This will work on all pages that transclude the template in the article namespace
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
You can also use additional parameters:
- -confirm
If used, the bot will ask if it should make changes
- -create
Create missing items before importing.
The following command line parameters can be used to change the bot’s behavior. If you specify them before all parameters, they are global and are applied to all param-property pairs. If you specify them after a param-property pair, they are local and are only applied to this pair. If you specify the same argument as both local and global, the local argument overrides the global one (see also examples):
- -islink
Treat plain text values as links (“text” -> “[[text]]”).
- -exists
If set to ‘p’, add a new value, even if the item already has the imported property but not the imported value. If set to ‘pt’, add a new value, even if the item already has the imported property with the imported value and some qualifiers.
- -multi
If set, try to match multiple values from parameter.
- -inverse
Import this property as the inverse claim.
Examples
The following command will try to import existing images from “image” parameter of “Infobox person” on English Wikipedia as Wikidata property “P18” (image):
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” image P18
The following command will behave the same as the previous example and also try to import [[links]] from “birth_place” parameter of the same template as Wikidata property “P19” (place of birth):
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” image P18 birth_place P19
The following command will import both “birth_place” and “death_place” params with -islink modifier, ie. the bot will try to import values, even if it doesn’t find a [[link]]:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” -islink birth_place P19 death_place P20
The following command will do the same but only “birth_place” can be imported without a link:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” birth_place P19 -islink death_place P20
The following command will import an occupation from “occupation” parameter of “Infobox person” on English Wikipedia as Wikidata property “P106” (occupation). The page won’t be skipped if the item already has that property but there is not the new value:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox person” occupation P106 -exists:p
The following command will import band members from the “current_members” parameter of “Infobox musical artist” on English Wikipedia as Wikidata property “P527” (has part). This will only extract multiple band members if each is linked, and will not add duplicate claims for the same member:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:0 -template:”Infobox musical artist” current_members P527 -exists:p -multi
The following command will import the category’s main topic from the first anonymous parameter of “Cat main” on English Wikipedia as Wikidata property “P301” (category’s main topic) and whenever a new value is imported, the inverse claim is imported to the topic item as Wikidata property “P910” (topic’s main category) unless a claim of that property is already there:
python pwb.py harvest_template -lang:en -family:wikipedia -namespace:14 -template:”Cat main” 1 P301 -inverse:P910 -islink
Note
This script is a
ConfigParserBot
. All options
can be set within a settings file which is scripts.ini by default.
Added in version 7.5: the -inverse option.
illustrate_wikidata script#
Bot to add images to Wikidata items
The image is extracted from the page_props. For this to be available the PageImages extension (https://www.mediawiki.org/wiki/Extension:PageImages) needs to be installed.
The following options are provided:
- -always
Don’t prompt to make changes, just do them.
- -property
The property to add. Should be of type commonsMedia.
Usage:
python pwb.py illustrate_wikidata <some generator>
This script supports use of pagegenerators
arguments.
interwikidata script#
Script to handle interwiki links based on Wikibase
This script connects pages to Wikibase items using language links on the page. If multiple language links are present, and they are connected to different items, the bot skips. After connecting the page to an item, language links can be removed from the page.
These command line parameters can be used to specify which pages to work on:
This script supports use of pagegenerators
arguments.
Furthermore, the following command line parameters are supported:
- -always
If used, the bot won’t ask if it should add the specified text.
- -clean
Clean pages.
- -create
Create items.
- -merge
Merge items.
- -summary:
(str) Use your own edit summary for cleaning the page.
Note
This script is a ConfigParserBot
.
All options can be set within a settings file which is scripts.ini by
default.
newitem script#
This script creates new items on Wikidata based on certain criteria
When was the (Wikipedia) page created?
When was the last edit on the page?
Does the page contain interwikis?
This script understands various command-line arguments:
- -lastedit
The minimum number of days that has passed since the page was last edited.
- -pageage
The minimum number of days that has passed since the page was created.
- -touch
Do a null edit on every page which has a Wikibase item. Be careful, this option can trigger edit rates or captchas if your account is not autoconfirmed.