CirrusSearch
Elasticsearch-powered search for MediaWiki
|
Full text query parser that uses regex to parse its token. More...
Public Member Functions | |
__construct (\CirrusSearch\Parser\KeywordRegistry $keywordRegistry, Escaper $escaper, $qmarkStripLevel, ParsedQueryClassifiersRepository $classifierRepository, NamespacePrefixParser $namespacePrefixParser, ?int $maxQueryLen) | |
parse (string $query) | |
Public Attributes | |
const | QUERY_LEN_HARD_LIMIT = 4096 |
Full text query parser that uses regex to parse its token.
Far from being a state of the art parser it detects most of its tokens using regular expression. And make arbitrary decisions at tokenization.
The tokenizer will understand few token types:
PARSED_NODE is a type that groups:
Phrase does not have its own token " and is part the tokenization and is never exposed to the parser. Same for negation prefix (! and -), they are parsed at tokenization time.
NOTE that this parser is broken by design:
CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser::__construct | ( | \CirrusSearch\Parser\KeywordRegistry | $keywordRegistry, |
Escaper | $escaper, | ||
$qmarkStripLevel, | |||
ParsedQueryClassifiersRepository | $classifierRepository, | ||
NamespacePrefixParser | $namespacePrefixParser, | ||
?int | $maxQueryLen ) |
\CirrusSearch\Parser\KeywordRegistry | $keywordRegistry | |
Escaper | $escaper | |
string | $qmarkStripLevel | Level of question mark stripping to apply, either "all", "break", or "final" |
ParsedQueryClassifiersRepository | $classifierRepository | |
NamespacePrefixParser | $namespacePrefixParser | |
int | null | $maxQueryLen | maximum length of the query in chars |
CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser::parse | ( | string | $query | ) |
string | $query |
SearchQueryParseException |
Implements CirrusSearch\Parser\QueryParser.