MediaWiki  1.33.0
LinkFilter Class Reference

Some functions to help implement an external link filter for spam control. More...

Static Public Member Functions

static getQueryConditions ( $filterEntry, array $options=[])
 Return query conditions which will match the specified string. More...
 
static keepOneWildcard ( $arr)
 Filters an array returned by makeLikeArray(), removing everything past first pattern placeholder. More...
 
static makeIndexes ( $url)
 Converts a URL into a format for el_index. More...
 
static makeLikeArray ( $filterEntry, $protocol='http://')
 Make an array to be used for calls to Database::buildLike(), which will match the specified string. More...
 
static matchEntry (Content $content, $filterEntry, $protocol='http://')
 Check whether $content contains a link to $filterEntry. More...
 
static supportsIDN ()
 Indicate whether LinkFilter IDN support is available. More...
 

Public Attributes

const VERSION = 1
 Increment this when makeIndexes output changes. More...
 

Static Private Member Functions

static indexifyHost ( $host)
 Canonicalize a hostname for el_index. More...
 
static makeRegex ( $filterEntry, $protocol)
 Builds a regex pattern for $filterEntry. More...
 

Detailed Description

Some functions to help implement an external link filter for spam control.

Todo:
implement the filter. Currently these are just some functions to help maintenance/cleanupSpam.php remove links to a single specified domain. The next thing is to implement functions for checking a given page against a big list of domains.

Another cool thing to do would be a web interface for fast spam removal.

Definition at line 34 of file LinkFilter.php.

Member Function Documentation

◆ getQueryConditions()

static LinkFilter::getQueryConditions (   $filterEntry,
array  $options = [] 
)
static

Return query conditions which will match the specified string.

There are several kinds of filter entry:

*.domain.com    -  Matches domain.com and www.domain.com
domain.com      -  Matches domain.com or domain.com/ but not www.domain.com
*.domain.com/x  -  Matches domain.com/xy or www.domain.com/xy. Also probably matches
                   domain.com/foobar/xy due to limitations of LIKE syntax.
domain.com/x    -  Matches domain.com/xy but not www.domain.com/xy
192.0.2.*       -  Matches any IP in 192.0.2.0/24. Can also have a path appended.
[2001:db8::*]   -  Matches any IP in 2001:db8::/112. Can also have a path appended.
[2001:db8:*]    -  Matches any IP in 2001:db8::/32. Can also have a path appended.
foo@domain.com  -  With protocol 'mailto:', matches the email address foo@domain.com.
*@domain.com    -  With protocol 'mailto:', matches any email address at domain.com, but
                   not subdomains like foo@mail.domain.com

Asterisks in any other location are considered invalid.

Since
1.33
Parameters
string$filterEntryFilter entry, as described above
array$optionsOptions are:
  • protocol: (string) Protocol to query (default http://)
  • oneWildcard: (bool) Stop at the first wildcard (default false)
  • prefix: (string) Field prefix (default 'el'). The query will test fields '{$prefix}_index' and '{$prefix}_index_60'
  • db: (IDatabase|null) Database to use.
Returns
array|bool Conditions to be used for the query (to be ANDed) or false on error. To determine if the query is constant on the el_index_60 field, check whether key 'el_index_60' is set.

Definition at line 254 of file LinkFilter.php.

References $options, captcha-old\count, DB_REPLICA, keepOneWildcard(), makeLikeArray(), and wfGetDB().

Referenced by ApiQueryExternalLinks\execute(), DeleteSelfExternals\execute(), CleanupSpam\execute(), LinkSearchPage\getQueryInfo(), ApiQueryExtLinksUsage\run(), and LinkFilterTest\testGetQueryConditions().

◆ indexifyHost()

static LinkFilter::indexifyHost (   $host)
staticprivate

Canonicalize a hostname for el_index.

Parameters
string$hose
Returns
string

Definition at line 97 of file LinkFilter.php.

References captcha-old\count, StringUtils\isUtf8(), IP\isValid(), and IP\sanitizeIP().

Referenced by makeIndexes(), and makeLikeArray().

◆ keepOneWildcard()

static LinkFilter::keepOneWildcard (   $arr)
static

Filters an array returned by makeLikeArray(), removing everything past first pattern placeholder.

Note
You probably want self::getQueryConditions() instead
Parameters
array$arrArray to filter
Returns
array Filtered array

Definition at line 385 of file LinkFilter.php.

References $value, and as.

Referenced by getQueryConditions(), and ApiQueryBase\prepareUrlQuerySearchString().

◆ makeIndexes()

static LinkFilter::makeIndexes (   $url)
static

Converts a URL into a format for el_index.

Since
1.33
Parameters
string$url
Returns
string[] Usually one entry, but might be two in case of protocol-relative URLs. Empty array on error.

Definition at line 171 of file LinkFilter.php.

References captcha-old\count, indexifyHost(), and wfParseUrl().

Referenced by RefreshExternallinksIndex\doDBUpdates(), LinksUpdate\getExternalInsertions(), LinkFilterTest\testMakeIndexes(), LinkFilterTest\testMakeLikeArrayWithValidPatterns(), and wfMakeUrlIndexes().

◆ makeLikeArray()

static LinkFilter::makeLikeArray (   $filterEntry,
  $protocol = 'http://' 
)
static

Make an array to be used for calls to Database::buildLike(), which will match the specified string.

This function does the same as LinkFilter::makeIndexes(), except it also takes care of adding wildcards

Note
You probably want self::getQueryConditions() instead
Parameters
string$filterEntryFilter entry,
See also
self::getQueryConditions()
Parameters
string$protocolProtocol (default http://)
Returns
array|bool Array to be passed to Database::buildLike() or false on error

Definition at line 312 of file LinkFilter.php.

References as, captcha-old\count, DB_REPLICA, indexifyHost(), wfGetDB(), and wfParseUrl().

Referenced by getQueryConditions(), ApiQueryBase\prepareUrlQuerySearchString(), LinkFilterTest\testMakeLikeArrayWithInvalidPatterns(), and LinkFilterTest\testMakeLikeArrayWithValidPatterns().

◆ makeRegex()

static LinkFilter::makeRegex (   $filterEntry,
  $protocol 
)
staticprivate

Builds a regex pattern for $filterEntry.

Todo:
This doesn't match the rest of the functionality here.
Parameters
string$filterEntryURL, if it begins with "*.", it'll be replaced to match any subdomain
string$protocol'http://' or 'https://'
Returns
string Regex pattern, for preg_match()

Definition at line 73 of file LinkFilter.php.

Referenced by matchEntry().

◆ matchEntry()

static LinkFilter::matchEntry ( Content  $content,
  $filterEntry,
  $protocol = 'http://' 
)
static

Check whether $content contains a link to $filterEntry.

Parameters
Content$contentContent to check
string$filterEntryDomainparts, see makeRegex() for more details
string$protocol'http://' or 'https://'
Returns
int 0 if no match or 1 if there's at least one match

Definition at line 49 of file LinkFilter.php.

References $content, and makeRegex().

Referenced by CleanupSpam\cleanupArticle().

◆ supportsIDN()

static LinkFilter::supportsIDN ( )
static

Indicate whether LinkFilter IDN support is available.

Since
1.33
Returns
bool

Definition at line 88 of file LinkFilter.php.

Referenced by RefreshExternallinksIndex\getUpdateKey(), and LinkFilterTest\testMakeLikeArrayWithValidPatterns().

Member Data Documentation

◆ VERSION

const LinkFilter::VERSION = 1

Increment this when makeIndexes output changes.

It'll cause maintenance/refreshExternallinksIndex.php to run from update.php.

Definition at line 39 of file LinkFilter.php.

Referenced by RefreshExternallinksIndex\getUpdateKey().


The documentation for this class was generated from the following file: