AhoCorasick
PHP implementation of the Aho-Corasick string search algorithm
|
Represents a finite state machine that can find all occurrences of a set of search keywords in a body of text. More...
Public Member Functions | |
__construct (array $searchKeywords) | |
Constructor. | |
getKeywords () | |
Accessor for the search keywords. | |
nextState ( $currentState, $inputChar) | |
Map the current state and input character to the next state. | |
searchIn ( $text) | |
Locate the search keywords in some text. | |
Protected Member Functions | |
computeYesTransitions () | |
Get the state transitions which the string-matching automaton shall make as it advances through input text. | |
computeNoTransitions () | |
Get the state transitions which the string-matching automaton shall make when a partial match proves false. | |
Protected Attributes | |
$searchKeywords = [] | |
$numStates = 1 | |
$outputs = [] | |
$noTransitions = [] | |
$yesTransitions = [] | |
Represents a finite state machine that can find all occurrences of a set of search keywords in a body of text.
The time it takes to construct the finite state machine is proportional to the sum of the lengths of the search keywords. Once constructed, the machine can locate all occurences of all search keywords in a body of text in a single pass, making exactly one state transition per input character.
This is an implementation of the Aho-Corasick string matching algorithm.
Alfred V. Aho and Margaret J. Corasick, "Efficient string matching: an aid to bibliographic search", CACM, 18(6):333-340, June 1975.
AhoCorasick\MultiStringMatcher::__construct | ( | array | $searchKeywords | ) |
Constructor.
string[] | $searchKeywords | The set of keywords to be matched. |
Reimplemented in AhoCorasick\MultiStringReplacer.
|
protected |
Get the state transitions which the string-matching automaton shall make as it advances through input text.
Constructs a directed tree with a root node which represents the initial state of the string-matching automaton and from which a path exists which spells out each search keyword.
AhoCorasick\MultiStringMatcher::getKeywords | ( | ) |
Accessor for the search keywords.
AhoCorasick\MultiStringMatcher::nextState | ( | $currentState, | |
$inputChar ) |
Map the current state and input character to the next state.
int | $currentState | The current state of the string-matching automaton. |
string | $inputChar | The character the string-matching automaton is currently processing. |
AhoCorasick\MultiStringMatcher::searchIn | ( | $text | ) |
Locate the search keywords in some text.
string | $text | The string to search in. |