CirrusSearch
Elasticsearch-powered search for MediaWiki
Loading...
Searching...
No Matches
CirrusSearch\Maintenance\AnalyzerBuilder Class Reference

Builds one elasticsearch analyzer to add to an analysis config array. More...

Public Member Functions

 __construct (string $langName, string $analyzerName='text')
 
 withCharFilters (array $charFilters)
 
 withTokenizer (string $tokenizer)
 
 withFilters (array $filters)
 
 withCharMap (array $mappings, string $name=null, bool $limited=false)
 
 withLimitedCharMap (array $mappings, string $name=null)
 
 withReversedNumberCharFilter (int $langZero, string $name=null)
 
 withNumberCharFilter (int $langZero, string $name=null, bool $reversed=false)
 
 withElision (array $articles, bool $articleCase=true)
 
 withLangLowercase (string $name=null)
 
 withStop ( $stop, string $name=null)
 
 withExtraStop ( $stop, string $name, $beforeFilter=self::APPEND, bool $ignoreCase=null)
 
 withExtraStemmer (string $lang, string $name=null)
 
 withStemmerOverride ( $rules, string $name=null)
 Rules can be a single rule string, or an array of rules.
 
 withUnpackedAnalyzer ()
 
 insertFiltersBefore ( $beforeFilter, array $filterList)
 
 appendFilters (array $filterList)
 
 prependFilters (array $filterList)
 
 withLightStemmer ()
 
 omitStemmer ()
 
 withAsciifoldingPreserve ()
 
 omitAsciifolding ()
 
 withRemoveEmpty ()
 
 withDecimalDigit ()
 
 build (array $config)
 Create a basic analyzer with support for various common options.
 

Static Public Member Functions

static patternFilter (string $pat, string $repl='')
 Create a pattern_replace filter/char_filter with the mappings provided.
 
static mappingCharFilter (array $mappings, bool $limited)
 Create a mapping or limited_mapping character filter with the mappings provided.
 
static numberCharFilter (int $langZero, bool $reversed=false)
 Create a character filter that maps non-Arabic digits (e.g., ០-៩ or 0-9) to Arabic digits (0-9).
 
static elisionFilter (array $articles, bool $case=true)
 Create an elision filter with the "articles" provided; $case determines whether stripping is case sensitive or not.
 
static stopFilterFromList ( $stopwords, bool $ignoreCase=null)
 Create a stop word filter with the provided config.
 
static stemmerFilter (string $stemmer)
 Create a stemmer filter with the provided config.
 

Public Attributes

const APPEND = 1
 Indicate that filters should be automatically appended or prepended, rather than inserted before a given filter.
 
const PREPEND = 2
 

Detailed Description

Builds one elasticsearch analyzer to add to an analysis config array.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. http://www.gnu.org/copyleft/gpl.html

Constructor & Destructor Documentation

◆ __construct()

CirrusSearch\Maintenance\AnalyzerBuilder::__construct ( string $langName,
string $analyzerName = 'text' )
Parameters
string$langName
string$analyzerName(default to 'text')

Member Function Documentation

◆ appendFilters()

CirrusSearch\Maintenance\AnalyzerBuilder::appendFilters ( array $filterList)
Parameters
string[]$filterListlist of additional filters to append
Returns
self

◆ build()

CirrusSearch\Maintenance\AnalyzerBuilder::build ( array $config)

Create a basic analyzer with support for various common options.

Can create various filters and character filters as specified. None are automatically added to the char_filter or filter list because the best order for these basic analyzers depends on the details of various third-party plugins.

type: custom tokenizer: standard char_filter: as per $this->charFilters filter: as per $this->filters

Parameters
mixed[]$configto be updated
Returns
mixed[] updated config

◆ elisionFilter()

static CirrusSearch\Maintenance\AnalyzerBuilder::elisionFilter ( array $articles,
bool $case = true )
static

Create an elision filter with the "articles" provided; $case determines whether stripping is case sensitive or not.

Parameters
string[]$articles
bool$case
Returns
mixed[] token filter

◆ insertFiltersBefore()

CirrusSearch\Maintenance\AnalyzerBuilder::insertFiltersBefore ( $beforeFilter,
array $filterList )
Parameters
mixed$beforeFilterspecific filter to insert $filters before; use APPEND or PREPEND to always add to beginning or end of the list
string[]$filterListlist of additional filters to insert
Returns
self

◆ mappingCharFilter()

static CirrusSearch\Maintenance\AnalyzerBuilder::mappingCharFilter ( array $mappings,
bool $limited )
static

Create a mapping or limited_mapping character filter with the mappings provided.

Parameters
string[]$mappings
bool$limited
Returns
mixed[] character filter

◆ numberCharFilter()

static CirrusSearch\Maintenance\AnalyzerBuilder::numberCharFilter ( int $langZero,
bool $reversed = false )
static

Create a character filter that maps non-Arabic digits (e.g., ០-៩ or 0-9) to Arabic digits (0-9).

Since they are usually all in a row, we just need the starting digit (equal to 0).

Optionally reverse the mapping from Arabic to non-Arabic. For example, the ICU tokenizer works better on tokenizing Thai digits in Thai text than it does on Arabic digits.

Parameters
int$langZero
bool$reversedreverse the mapping from Arabic to non-Arabic
Returns
mixed[] character filter

◆ omitAsciifolding()

CirrusSearch\Maintenance\AnalyzerBuilder::omitAsciifolding ( )
Returns
self

◆ omitStemmer()

CirrusSearch\Maintenance\AnalyzerBuilder::omitStemmer ( )
Returns
self

◆ patternFilter()

static CirrusSearch\Maintenance\AnalyzerBuilder::patternFilter ( string $pat,
string $repl = '' )
static

Create a pattern_replace filter/char_filter with the mappings provided.

Parameters
string$pat
string$repl
Returns
mixed[] filter

◆ prependFilters()

CirrusSearch\Maintenance\AnalyzerBuilder::prependFilters ( array $filterList)
Parameters
string[]$filterListlist of additional filters to prepend
Returns
self

◆ stemmerFilter()

static CirrusSearch\Maintenance\AnalyzerBuilder::stemmerFilter ( string $stemmer)
static

Create a stemmer filter with the provided config.

Parameters
string$stemmer
Returns
mixed[] token filter

◆ stopFilterFromList()

static CirrusSearch\Maintenance\AnalyzerBuilder::stopFilterFromList ( $stopwords,
bool $ignoreCase = null )
static

Create a stop word filter with the provided config.

The config can be an array of stop words, or a string like french that refers to a pre-defined list.

Parameters
mixed$stopwords
bool | null$ignoreCase
Returns
mixed[] token filter

◆ withAsciifoldingPreserve()

CirrusSearch\Maintenance\AnalyzerBuilder::withAsciifoldingPreserve ( )
Returns
self

◆ withCharFilters()

CirrusSearch\Maintenance\AnalyzerBuilder::withCharFilters ( array $charFilters)
Parameters
string[]$charFilters
Returns
self

◆ withCharMap()

CirrusSearch\Maintenance\AnalyzerBuilder::withCharMap ( array $mappings,
string $name = null,
bool $limited = false )
Parameters
string[]$mappings
string | null$name
bool$limited
Returns
self

◆ withDecimalDigit()

CirrusSearch\Maintenance\AnalyzerBuilder::withDecimalDigit ( )
Returns
self

◆ withElision()

CirrusSearch\Maintenance\AnalyzerBuilder::withElision ( array $articles,
bool $articleCase = true )
Parameters
string[]$articles"articles" to be elided
bool$articleCasewhether elision is case insensitive
Returns
self

◆ withExtraStemmer()

CirrusSearch\Maintenance\AnalyzerBuilder::withExtraStemmer ( string $lang,
string $name = null )
Parameters
string$lang
string | null$name
Returns
self

◆ withExtraStop()

CirrusSearch\Maintenance\AnalyzerBuilder::withExtraStop ( $stop,
string $name,
$beforeFilter = self::APPEND,
bool $ignoreCase = null )
Parameters
mixed$stoppre-defined list like french or an array of stopwords
string$name
mixed$beforeFilterfilter to insert extra stop before
bool | null$ignoreCase
Returns
self

◆ withFilters()

CirrusSearch\Maintenance\AnalyzerBuilder::withFilters ( array $filters)
Parameters
string[]$filters
Returns
self

◆ withLangLowercase()

CirrusSearch\Maintenance\AnalyzerBuilder::withLangLowercase ( string $name = null)
Parameters
string | null$name
Returns
self

◆ withLightStemmer()

CirrusSearch\Maintenance\AnalyzerBuilder::withLightStemmer ( )
Returns
self

◆ withLimitedCharMap()

CirrusSearch\Maintenance\AnalyzerBuilder::withLimitedCharMap ( array $mappings,
string $name = null )
Parameters
string[]$mappings
string | null$name
Returns
self

◆ withNumberCharFilter()

CirrusSearch\Maintenance\AnalyzerBuilder::withNumberCharFilter ( int $langZero,
string $name = null,
bool $reversed = false )
Parameters
int$langZero
string | null$name
bool$reversedreverse the mapping from Arabic to non-Arabic
Returns
self

◆ withRemoveEmpty()

CirrusSearch\Maintenance\AnalyzerBuilder::withRemoveEmpty ( )
Returns
self

◆ withReversedNumberCharFilter()

CirrusSearch\Maintenance\AnalyzerBuilder::withReversedNumberCharFilter ( int $langZero,
string $name = null )
Parameters
int$langZero
string | null$name
Returns
self

◆ withStemmerOverride()

CirrusSearch\Maintenance\AnalyzerBuilder::withStemmerOverride ( $rules,
string $name = null )

Rules can be a single rule string, or an array of rules.

Parameters
mixed$rulesstemmer override rules
string | null$name
Returns
self

◆ withStop()

CirrusSearch\Maintenance\AnalyzerBuilder::withStop ( $stop,
string $name = null )
Parameters
mixed$stoppre-defined list like french or an array of stopwords
string | null$name
Returns
self

◆ withTokenizer()

CirrusSearch\Maintenance\AnalyzerBuilder::withTokenizer ( string $tokenizer)
Parameters
string$tokenizer
Returns
self

◆ withUnpackedAnalyzer()

CirrusSearch\Maintenance\AnalyzerBuilder::withUnpackedAnalyzer ( )
Returns
self

The documentation for this class was generated from the following file: