Expand description
Calculate Wikipedia prose size
This crate is a rough port of the Wikipedia Prosesize script that allows for counting the bytes of prose on a page rather than the wikitext markup or generated HTML.
You will most likely fetch ImmutableWikicode
using the parsoid
crate.
The response from prosesize()
provides the text-only prose size, word count and text-only
references size. Enabling the optional serde-1
feature makes the size struct serializable
and deserializable.
§Contributing
wikipedia_prosesize
is part of the mwbot-rs
project.
We’re always looking for new contributors, please reach out
if you’re interested!
Structs§
Constants§
- PROSE_
SELECTOR - Selector for prose, assuming you’ve already removed other non-counted elements
Functions§
- parsoid_
stylesheet - Get a stylesheet for Parsoid HTML that highlights elements counted for prosesize in yellow and references in light blue.
- prosesize
- Calculate the prose size for the given HTML. Note that
if you provide a mutable
parsoid::Wikicode
instance, the document will be modified! - remove_
noncounted_ elements - Remove elements that we absolutely don’t plan on counting. This can be run against your own Wikicode instance if you just want to analyze prosesize-counted content.