CirrusSearch
Elasticsearch-powered search for MediaWiki
|
Builds elasticsearch analysis config arrays. More...
Public Member Functions | |
__construct ( $langCode, array $plugins, ?SearchConfig $config=null, ?CirrusSearchHookRunner $cirrusSearchHookRunner=null) | |
shouldActivateIcuFolding ( $language) | |
Determine if asciifolding should be upgraded to icu_folding, or icu_folding should be stripped. | |
shouldActivateIcuTokenization ( $language) | |
Determine if the icu_tokenizer can replace the standard tokenizer for this language. | |
buildConfig ( $language=null) | |
Build the analysis config. | |
buildSimilarityConfig () | |
enableICUTokenizer (array $config) | |
replace the standard tokenizer with icu_tokenizer | |
standardTokenizerOnlyCleanup (array $config) | |
replace STANDARD_TOKENIZER_ONLY with the actual standard tokenizer | |
disableLimitedMappings (array $config) | |
replace limited_mappings with mappings if limited_mapping is unavailable | |
enableICUFolding (array $config, $language) | |
Activate ICU folding instead of asciifolding. | |
getDefaultTextAnalyzerType ( $language) | |
Pick the appropriate default analyzer based on the language. | |
buildLanguageConfigs (array &$config, array $languages, array $analyzers) | |
Create per-language configs for specific analyzers which separates and namespaces filters that are different between languages. | |
isIcuAvailable () | |
isTextifyAvailable () | |
enableGlobalCustomFilters (array $config, string $language) | |
update languages with global custom filters (e.g., homoglyph & nnbsp filters) | |
Public Attributes | |
const | VERSION = '0.12' |
Version number for the core analysis. | |
$globalCustomFilters | |
Protected Member Functions | |
addRemoveEmpty (array $config) | |
Add remove_empty as needed after icu_folding/preserve_original. | |
getICUSetFilter ( $language) | |
Return the list of chars to exclude from ICU folding. | |
getICUNormSetFilter ( $language) | |
Return the list of chars to exclude from ICU normalization. | |
Protected Attributes | |
$config | |
$defaultLanguage | |
Builds elasticsearch analysis config arrays.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. http://www.gnu.org/copyleft/gpl.html
CirrusSearch\Maintenance\AnalysisConfigBuilder::__construct | ( | $langCode, | |
array | $plugins, | ||
?SearchConfig | $config = null, | ||
?CirrusSearchHookRunner | $cirrusSearchHookRunner = null ) |
string | $langCode | The language code to build config for |
string[] | $plugins | list of plugins installed in Elasticsearch |
SearchConfig | null | $config | |
CirrusSearchHookRunner | null | $cirrusSearchHookRunner |
|
protected |
Add remove_empty as needed after icu_folding/preserve_original.
mixed[] | $config |
CirrusSearch\Maintenance\AnalysisConfigBuilder::buildConfig | ( | $language = null | ) |
Build the analysis config.
string | null | $language | Config language |
Reimplemented in CirrusSearch\Maintenance\SuggesterAnalysisConfigBuilder.
CirrusSearch\Maintenance\AnalysisConfigBuilder::buildLanguageConfigs | ( | array & | $config, |
array | $languages, | ||
array | $analyzers ) |
Create per-language configs for specific analyzers which separates and namespaces filters that are different between languages.
array | &$config | Existing config, will be modified |
string[] | $languages | List of languages to process |
string[] | $analyzers | List of analyzers to process |
CirrusSearch\Maintenance\AnalysisConfigBuilder::buildSimilarityConfig | ( | ) |
CirrusSearch\Maintenance\AnalysisConfigBuilder::disableLimitedMappings | ( | array | $config | ) |
replace limited_mappings with mappings if limited_mapping is unavailable
mixed[] | $config |
CirrusSearch\Maintenance\AnalysisConfigBuilder::enableGlobalCustomFilters | ( | array | $config, |
string | $language ) |
update languages with global custom filters (e.g., homoglyph & nnbsp filters)
mixed[] | $config | |
string | $language | language to add plugin to |
CirrusSearch\Maintenance\AnalysisConfigBuilder::enableICUFolding | ( | array | $config, |
$language ) |
Activate ICU folding instead of asciifolding.
mixed[] | $config | |
string | $language | Config language |
CirrusSearch\Maintenance\AnalysisConfigBuilder::enableICUTokenizer | ( | array | $config | ) |
replace the standard tokenizer with icu_tokenizer
mixed[] | $config |
CirrusSearch\Maintenance\AnalysisConfigBuilder::getDefaultTextAnalyzerType | ( | $language | ) |
Pick the appropriate default analyzer based on the language.
Rather than think of this as per language customization you should think of this as an effort to pick a reasonably default in case CirrusSearch isn't customized for the language.
string | $language | Config language |
|
protected |
Return the list of chars to exclude from ICU normalization.
string | $language | Config language |
|
protected |
Return the list of chars to exclude from ICU folding.
string | $language | Config language |
CirrusSearch\Maintenance\AnalysisConfigBuilder::isIcuAvailable | ( | ) |
CirrusSearch\Maintenance\AnalysisConfigBuilder::isTextifyAvailable | ( | ) |
CirrusSearch\Maintenance\AnalysisConfigBuilder::shouldActivateIcuFolding | ( | $language | ) |
Determine if asciifolding should be upgraded to icu_folding, or icu_folding should be stripped.
string | $language | Config language |
CirrusSearch\Maintenance\AnalysisConfigBuilder::shouldActivateIcuTokenization | ( | $language | ) |
Determine if the icu_tokenizer can replace the standard tokenizer for this language.
string | $language | Config language |
CirrusSearch\Maintenance\AnalysisConfigBuilder::standardTokenizerOnlyCleanup | ( | array | $config | ) |
replace STANDARD_TOKENIZER_ONLY with the actual standard tokenizer
mixed[] | $config |
const CirrusSearch\Maintenance\AnalysisConfigBuilder::VERSION = '0.12' |
Version number for the core analysis.
Increment the major version when the analysis changes in an incompatible way, and change the minor version when it changes but isn't incompatible.
You may also need to increment MetaStoreIndex::METASTORE_VERSION manually as well.