Crate wikipedia_prosesize

Source
Expand description

Calculate Wikipedia prose size

This crate is a rough port of the Wikipedia Prosesize script that allows for counting the bytes of prose on a page rather than the wikitext markup or generated HTML.

You will most likely fetch ImmutableWikicode using the parsoid crate.

The response from prosesize() provides the text-only prose size, word count and text-only references size. Enabling the optional serde-1 feature makes the size struct serializable and deserializable.

§Contributing

wikipedia_prosesize is part of the mwbot-rs project. We’re always looking for new contributors, please reach out if you’re interested!

Structs§

ProseSize

Constants§

PROSE_SELECTOR
Selector for prose, assuming you’ve already removed other non-counted elements

Functions§

parsoid_stylesheet
Get a stylesheet for Parsoid HTML that highlights elements counted for prosesize in yellow and references in light blue.
prosesize
Calculate the prose size for the given HTML. Note that if you provide a mutable parsoid::Wikicode instance, the document will be modified!
remove_noncounted_elements
Remove elements that we absolutely don’t plan on counting. This can be run against your own Wikicode instance if you just want to analyze prosesize-counted content.