CirrusSearch
Elasticsearch-powered search for MediaWiki
|
An approximate, incomplete and rather dangerous algorithm to reduce the size of a CirrusSearch document. More...
Public Member Functions | |
__construct (array $profile) | |
resize (Document $document) | |
Truncate some textual data from the input Document. | |
Static Public Member Functions | |
static | estimateDataSize (Document $document) |
An approximate, incomplete and rather dangerous algorithm to reduce the size of a CirrusSearch document.
This class is meant to reduce the size of abnormally large documents. What we can consider abnormally large is certainly prone to interpretation but this class was designed with numbers like 1Mb considered as extremely large. You should not expect this class to be byte precise and there is no guarantee that the resulting size after the operation will be below the expected max. There might be various reasons for this:
If the use-case is to ensure that the resulting json representation is below a size S you should definitely account for some overhead and ask this class to reduce the document to something smaller than S (i.e. S*0.9).
Limiter heuristics are controlled by a profile that supports the following criteria:
Text fields are truncated using mb_strcut, if the string is part of an array and it becomes empty after the truncation it's removed from the array, if the string is a "keyword" (non tokenized field) it's not truncated and simply removed from its array.
If an array is mixing string and non-string data it's ignored.
CirrusSearch\BuildDocument\DocumentSizeLimiter::resize | ( | Document | $document | ) |
Truncate some textual data from the input Document.
Document | $document |
@phan-suppress-next-line PhanRedundantCondition