AhoCorasick
PHP implementation of the Aho-Corasick string search algorithm
|
AhoCorasick is a PHP implementation of the Aho-Corasick string search algorithm, which is an efficient way of searching a body of text for multiple search keywords.
Here is how you use it:
The algorithm works by constructing a finite-state machine out of the set of search keywords. The time it takes to construct the finite state machine is proportional to the sum of the lengths of the search keywords. Once constructed, the machine can locate all occurences of all search keywords in any body of text in a single pass, making exactly one state transition per input character.
The algorithm originates from "Efficient string matching: an aid to bibliographic search" (CACM, Volume 18, Issue 6, June 1975) by Alfred V. Aho and Margaret J. Corasick.
See also the definition and reference implementation on nist.gov.
If you are having issues, please let us know.
The project is licensed under the Apache license.