CirrusSearch
Elasticsearch-powered search for MediaWiki
|
Builds one elasticsearch analyzer to add to an analysis config array. More...
Public Member Functions | |
__construct (string $langName, string $analyzerName='text') | |
withCharFilters (array $charFilters) | |
withTokenizer (string $tokenizer) | |
withFilters (array $filters) | |
withCharMap (array $mappings, string $name=null) | |
withNumberCharFilter (int $langZero, string $name=null) | |
withElision (array $articles, bool $articleCase=true) | |
withLangLowercase () | |
withStop ( $stop, string $name=null) | |
withExtraStop ( $stop, string $name, $beforeFilter=self::APPEND, bool $ignoreCase=null) | |
withExtraStemmer (string $lang, string $name=null) | |
withStemmerOverride ( $rules, string $name=null) | |
Rules can be a single rule string, or an array of rules. | |
withUnpackedAnalyzer () | |
insertFiltersBefore ( $beforeFilter, array $filterList) | |
appendFilters (array $filterList) | |
prependFilters (array $filterList) | |
omitDottedI () | |
withLightStemmer () | |
omitStemmer () | |
withAsciifoldingPreserve () | |
omitAsciifolding () | |
withRemoveEmpty () | |
withDecimalDigit () | |
build (array $config) | |
Create a basic analyzer with support for various common options. | |
Static Public Member Functions | |
static | patternFilter (string $pat, string $repl='') |
Create a pattern_replace filter/char_filter with the mappings provided. | |
static | mappingCharFilter (array $mappings) |
Create a mapping character filter with the mappings provided. | |
static | numberCharFilter (int $langZero) |
Create a character filter that maps non-Arabic digits (e.g., ០-៩ or 0-9) to Arabic digits (0-9). | |
static | elisionFilter (array $articles, bool $case=true) |
Create an elision filter with the "articles" provided; $case determines whether stripping is case sensitive or not. | |
static | stopFilterFromList ( $stopwords, bool $ignoreCase=null) |
Create a stop word filter with the provided config. | |
static | stemmerFilter (string $stemmer) |
Create a stemmer filter with the provided config. | |
Public Attributes | |
const | APPEND = 1 |
Indicate that filters should be automatically appended or prepended, rather than inserted before a given filter. | |
const | PREPEND = 2 |
Builds one elasticsearch analyzer to add to an analysis config array.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. http://www.gnu.org/copyleft/gpl.html
CirrusSearch\Maintenance\AnalyzerBuilder::__construct | ( | string | $langName, |
string | $analyzerName = 'text' ) |
string | $langName | |
string | $analyzerName | (default to 'text') |
CirrusSearch\Maintenance\AnalyzerBuilder::appendFilters | ( | array | $filterList | ) |
string[] | $filterList | list of additional filters to append |
CirrusSearch\Maintenance\AnalyzerBuilder::build | ( | array | $config | ) |
Create a basic analyzer with support for various common options.
Can create various filters and character filters as specified. None are automatically added to the char_filter or filter list because the best order for these basic analyzers depends on the details of various third-party plugins.
type: custom tokenizer: standard char_filter: as per $this->charFilters filter: as per $this->filters
mixed[] | $config | to be updated |
|
static |
Create an elision filter with the "articles" provided; $case determines whether stripping is case sensitive or not.
string[] | $articles | |
bool | $case |
CirrusSearch\Maintenance\AnalyzerBuilder::insertFiltersBefore | ( | $beforeFilter, | |
array | $filterList ) |
mixed | $beforeFilter | specific filter to insert $filters before; use APPEND or PREPEND to always add to beginning or end of the list |
string[] | $filterList | list of additional filters to insert |
|
static |
Create a mapping character filter with the mappings provided.
string[] | $mappings |
|
static |
Create a character filter that maps non-Arabic digits (e.g., ០-៩ or 0-9) to Arabic digits (0-9).
Since they are usually all in a row, we just need the starting digit (equal to 0)
int | $langZero |
CirrusSearch\Maintenance\AnalyzerBuilder::omitAsciifolding | ( | ) |
CirrusSearch\Maintenance\AnalyzerBuilder::omitDottedI | ( | ) |
CirrusSearch\Maintenance\AnalyzerBuilder::omitStemmer | ( | ) |
|
static |
Create a pattern_replace filter/char_filter with the mappings provided.
string | $pat | |
string | $repl |
CirrusSearch\Maintenance\AnalyzerBuilder::prependFilters | ( | array | $filterList | ) |
string[] | $filterList | list of additional filters to prepend |
|
static |
Create a stemmer filter with the provided config.
string | $stemmer |
|
static |
Create a stop word filter with the provided config.
The config can be an array of stop words, or a string like french that refers to a pre-defined list.
mixed | $stopwords | |
bool | null | $ignoreCase |
CirrusSearch\Maintenance\AnalyzerBuilder::withAsciifoldingPreserve | ( | ) |
CirrusSearch\Maintenance\AnalyzerBuilder::withCharFilters | ( | array | $charFilters | ) |
string[] | $charFilters |
CirrusSearch\Maintenance\AnalyzerBuilder::withCharMap | ( | array | $mappings, |
string | $name = null ) |
string[] | $mappings | |
string | null | $name |
CirrusSearch\Maintenance\AnalyzerBuilder::withDecimalDigit | ( | ) |
CirrusSearch\Maintenance\AnalyzerBuilder::withElision | ( | array | $articles, |
bool | $articleCase = true ) |
string[] | $articles | "articles" to be elided |
bool | $articleCase | whether elision is case insensitive |
CirrusSearch\Maintenance\AnalyzerBuilder::withExtraStemmer | ( | string | $lang, |
string | $name = null ) |
string | $lang | |
string | null | $name |
CirrusSearch\Maintenance\AnalyzerBuilder::withExtraStop | ( | $stop, | |
string | $name, | ||
$beforeFilter = self::APPEND, | |||
bool | $ignoreCase = null ) |
mixed | $stop | pre-defined list like french or an array of stopwords |
string | $name | |
mixed | $beforeFilter | filter to insert extra stop before |
bool | null | $ignoreCase |
CirrusSearch\Maintenance\AnalyzerBuilder::withFilters | ( | array | $filters | ) |
string[] | $filters |
CirrusSearch\Maintenance\AnalyzerBuilder::withLangLowercase | ( | ) |
CirrusSearch\Maintenance\AnalyzerBuilder::withLightStemmer | ( | ) |
CirrusSearch\Maintenance\AnalyzerBuilder::withNumberCharFilter | ( | int | $langZero, |
string | $name = null ) |
int | $langZero | |
string | null | $name |
CirrusSearch\Maintenance\AnalyzerBuilder::withRemoveEmpty | ( | ) |
CirrusSearch\Maintenance\AnalyzerBuilder::withStemmerOverride | ( | $rules, | |
string | $name = null ) |
Rules can be a single rule string, or an array of rules.
mixed | $rules | stemmer override rules |
string | null | $name |
CirrusSearch\Maintenance\AnalyzerBuilder::withStop | ( | $stop, | |
string | $name = null ) |
mixed | $stop | pre-defined list like french or an array of stopwords |
string | null | $name |
CirrusSearch\Maintenance\AnalyzerBuilder::withTokenizer | ( | string | $tokenizer | ) |
string | $tokenizer |
CirrusSearch\Maintenance\AnalyzerBuilder::withUnpackedAnalyzer | ( | ) |