HtmlFormatter
Performs transformations of HTML by wrapping around libxml2 and working around its countless bugs.
Loading...
Searching...
No Matches
HtmlFormatter\HtmlFormatter Class Reference

Public Member Functions

 __construct (string $html)
 
 getDoc ()
 
 setRemoveComments (bool $flag=true)
 Sets whether comments should be removed from output.
 
 setRemoveMedia (bool $flag=true)
 Sets whether images/videos/sounds should be removed from output.
 
 remove ( $selectors)
 Adds one or more selector of content to remove.
 
 flatten ( $elements)
 Adds one or more element name to the list to flatten (remove tag, but not its content) Can accept non-delimited regexes.
 
 flattenAllTags ()
 Instructs the formatter to flatten all tags, and remove comments.
 
 getText ( $element=null)
 Performs final transformations and returns resulting HTML.
 

Static Public Member Functions

static wrapHTML (string $html)
 Turns a chunk of HTML into a proper document.
 
static removeBeforeIncluding (string $haystack, string $needle)
 Removes everything from beginning of string to last occurrence of $needle, including $needle.
 
static removeAfterIncluding (string $haystack, string $needle)
 Removes everything from the first occurrence of $needle to the end of the string, including $needle.
 
static removeBetweenIncluding (string $haystack, string $open, string $close)
 Removes everything between $open and $close, including $open and $close.
 

Protected Member Functions

 onHtmlReady (string $html)
 Override this in descendant class to modify HTML after it has been converted from DOM tree.
 
 parseSelector (string $selector, string &$type, string &$rawName)
 Helper function for parseItemsToRemove().
 
 parseItemsToRemove ()
 Transforms CSS-style selectors into an internal representation suitable for processing by filterContent()
 

Protected Attributes

bool $removeMedia = false
 
bool $removeComments = false
 

Constructor & Destructor Documentation

◆ __construct()

HtmlFormatter\HtmlFormatter::__construct ( string $html)
Parameters
string$htmlText to process

Member Function Documentation

◆ flatten()

HtmlFormatter\HtmlFormatter::flatten ( $elements)

Adds one or more element name to the list to flatten (remove tag, but not its content) Can accept non-delimited regexes.

Note this interface may fail in surprising unexpected ways due to usage of regexes, so should not be relied on for HTML markup security measures.

Parameters
string[] | string$elementsName(s) of tag(s) to flatten

◆ getDoc()

HtmlFormatter\HtmlFormatter::getDoc ( )
Returns
DOMDocument DOM to manipulate

◆ getText()

HtmlFormatter\HtmlFormatter::getText ( $element = null)

Performs final transformations and returns resulting HTML.

Note that if you want to call this both without an element and with an element, you should call it without an element first. If you specify the $element in the method, it'll change the underlying dom and you won't be able to get it back.

Parameters
DOMElement | string | null$elementID of the element to get HTML from or false to get it from the whole tree
Returns
string Processed HTML

◆ onHtmlReady()

HtmlFormatter\HtmlFormatter::onHtmlReady ( string $html)
protected

Override this in descendant class to modify HTML after it has been converted from DOM tree.

Parameters
string$htmlHTML to process
Returns
string Processed HTML

◆ parseItemsToRemove()

HtmlFormatter\HtmlFormatter::parseItemsToRemove ( )
protected

Transforms CSS-style selectors into an internal representation suitable for processing by filterContent()

Returns
array

◆ parseSelector()

HtmlFormatter\HtmlFormatter::parseSelector ( string $selector,
string & $type,
string & $rawName )
protected

Helper function for parseItemsToRemove().

This function extracts the selector type and the raw name of a selector from a CSS-style selector string and assigns those values to parameters passed by reference. For example, if given '#toc' as the $selector parameter, it will assign 'ID' as the $type and 'toc' as the $rawName.

Parameters
string$selectorCSS selector to parse
string&$typeThe type of selector (ID, CLASS, TAG_CLASS, or TAG)
string&$rawNameThe raw name of the selector
Returns
bool Whether the selector was successfully recognised

◆ remove()

HtmlFormatter\HtmlFormatter::remove ( $selectors)

Adds one or more selector of content to remove.

A subset of CSS selector syntax is supported:

<tag> <tag>.class .<class> #<id>

Parameters
string[] | string$selectorsSelector(s) of stuff to remove

◆ removeAfterIncluding()

static HtmlFormatter\HtmlFormatter::removeAfterIncluding ( string $haystack,
string $needle )
static

Removes everything from the first occurrence of $needle to the end of the string, including $needle.

Equivalent to the regex /<\/body>.*$/s when $needle = '</body>'

◆ removeBeforeIncluding()

static HtmlFormatter\HtmlFormatter::removeBeforeIncluding ( string $haystack,
string $needle )
static

Removes everything from beginning of string to last occurrence of $needle, including $needle.

Equivalent to the regex /^.*?<body>/s when $needle = '<body>'

◆ setRemoveComments()

HtmlFormatter\HtmlFormatter::setRemoveComments ( bool $flag = true)

Sets whether comments should be removed from output.

Parameters
bool$flagWhether to remove or not

◆ setRemoveMedia()

HtmlFormatter\HtmlFormatter::setRemoveMedia ( bool $flag = true)

Sets whether images/videos/sounds should be removed from output.

Parameters
bool$flagWhether to remove or not

◆ wrapHTML()

static HtmlFormatter\HtmlFormatter::wrapHTML ( string $html)
static

Turns a chunk of HTML into a proper document.

Parameters
string$htmlHTML to wrap
Returns
string

The documentation for this class was generated from the following file: