CirrusSearch
Elasticsearch-powered search for MediaWiki
Loading...
Searching...
No Matches
CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser Class Reference

Full text query parser that uses regex to parse its token. More...

+ Inheritance diagram for CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser:
+ Collaboration diagram for CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser:

Public Member Functions

 __construct (\CirrusSearch\Parser\KeywordRegistry $keywordRegistry, Escaper $escaper, $qmarkStripLevel, ParsedQueryClassifiersRepository $classifierRepository, NamespacePrefixParser $namespacePrefixParser, ?int $maxQueryLen)
 
 parse (string $query)
 

Public Attributes

const QUERY_LEN_HARD_LIMIT = 4096
 

Detailed Description

Full text query parser that uses regex to parse its token.

Far from being a state of the art parser it detects most of its tokens using regular expression. And make arbitrary decisions at tokenization.

The tokenizer will understand few token types:

  • WHITESPACE: all unicode whitespace and control chars ([\pZ\pC]) the WHITESPACE token is ignored and never presented to the parser
  • EOF: dummy type used to mark end of string
  • BOOL_AND/BOOL_OR/BOOL_NOT: explicit boolean opeartors
  • PARSED_NODE: complex type (usually part of the query)

PARSED_NODE is a type that groups:

  • Keywords
  • Phrase
  • Words
  • Wildcards/Prefix

Phrase does not have its own token " and is part the tokenization and is never exposed to the parser. Same for negation prefix (! and -), they are parsed at tokenization time.

NOTE that this parser is broken by design:

  • no lexical context support, we first parse keywords
  • no support for groupings (parenthesis)

Constructor & Destructor Documentation

◆ __construct()

CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser::__construct ( \CirrusSearch\Parser\KeywordRegistry $keywordRegistry,
Escaper $escaper,
$qmarkStripLevel,
ParsedQueryClassifiersRepository $classifierRepository,
NamespacePrefixParser $namespacePrefixParser,
?int $maxQueryLen )
Parameters
\CirrusSearch\Parser\KeywordRegistry$keywordRegistry
Escaper$escaper
string$qmarkStripLevelLevel of question mark stripping to apply, either "all", "break", or "final"
ParsedQueryClassifiersRepository$classifierRepository
NamespacePrefixParser$namespacePrefixParser
int | null$maxQueryLenmaximum length of the query in chars
See also
Util::stripQuestionMarks() for acceptable $qmarkStripLevel values

Member Function Documentation

◆ parse()

CirrusSearch\Parser\QueryStringRegex\QueryStringRegexParser::parse ( string $query)
Parameters
string$query
Returns
\CirrusSearch\Parser\AST\ParsedQuery
Exceptions
SearchQueryParseException

Implements CirrusSearch\Parser\QueryParser.


The documentation for this class was generated from the following file: