Wikibase
MediaWiki Wikibase extension
|
Hash based implementation of DedupeBag. More...
Public Member Functions | |
__construct ( $cutoff=5) | |
Constructs a new HashDedupeBag with the given cutoff value, which is the number of hash characters to use. More... | |
alreadySeen ( $hash, $namespace='') | |
Private Attributes | |
$bag | |
$cutoff | |
Hash based implementation of DedupeBag.
This implementation of DedupeBag operates like a rather lossy cache; it's implemented as a hash that just evicts old values when a collision occurs.
The idea for this implementation was taken mainly from from blog posts:
The implementation of alreadySeen() works as follows:
Wikibase\Repo\Rdf\HashDedupeBag::__construct | ( | $cutoff = 5 | ) |
Constructs a new HashDedupeBag with the given cutoff value, which is the number of hash characters to use.
A larger number means less collisions (fewer false negatives), but a larger bag. The number can be read as an exponent to the size of the hash's alphabet, so with a hex hash and $cutoff = 5, you'd get a max bag size of 16^5, and a collision probability of 16^-5 = 1/32.
int | $cutoff |
Wikibase\Repo\Rdf\HashDedupeBag::alreadySeen | ( | $hash, | |
$namespace = '' |
|||
) |
Returns true if the given combination of $hash and $namespace has been seen before - that is, alreadySeen() had already been called on this HashDedupeBag with the same values for $hash and $namespace. Returning false is inconclusive: The hash and namespace may or may not have been seen before, false negatives are possible. The probability of a false negatives here can be controlled using the $cutoff parameter passed to the constructor.
See the class level documentation for an explanation of the algorithm.
string | $hash | |
string | $namespace |
Implements Wikibase\Repo\Rdf\DedupeBag.
|
private |
|
private |