MediaWiki  master
MediaWiki\ExternalLinks\LinkFilter Class Reference

Some functions to help implement an external link filter for spam control. More...

Static Public Member Functions

static getIndexedUrlsNonReversed ( $urls)
 Converts a set of URLs to be able to compare them with existing indexes. More...
 
static getProtocolPrefix ( $protocol)
 
static getQueryConditions ( $filterEntry, array $options=[])
 Return query conditions which will match the specified string. More...
 
static keepOneWildcard ( $arr)
 Filters an array returned by makeLikeArray(), removing everything past first pattern placeholder. More...
 
static makeIndexes ( $url, $reverseDomain=true)
 Converts a URL into a format for el_index. More...
 
static makeLikeArray ( $filterEntry, $protocol='http://')
 Make an array to be used for calls to Database::buildLike(), which will match the specified string. More...
 
static matchEntry (Content $content, $filterEntry, $protocol='http://')
 Check whether $content contains a link to $filterEntry. More...
 
static prepareProtocols ()
 
static reverseIndexe ( $domainIndex)
 

Public Attributes

const VERSION = 1
 Increment this when makeIndexes output changes. More...
 

Detailed Description

Some functions to help implement an external link filter for spam control.

Todo:
implement the filter. Currently these are just some functions to help maintenance/cleanupSpam.php remove links to a single specified domain. The next thing is to implement functions for checking a given page against a big list of domains.

Another cool thing to do would be a web interface for fast spam removal.

Definition at line 43 of file LinkFilter.php.

Member Function Documentation

◆ getIndexedUrlsNonReversed()

static MediaWiki\ExternalLinks\LinkFilter::getIndexedUrlsNonReversed (   $urls)
static

Converts a set of URLs to be able to compare them with existing indexes.

Since
1.41
Parameters
string[]$urlsList of URLs to be indexed
Returns
string[]

Definition at line 232 of file LinkFilter.php.

References MediaWiki\MainConfigNames\ExternalLinksSchemaMigrationStage, MediaWiki\MediaWikiServices\getInstance(), MediaWiki\ExternalLinks\LinkFilter\makeIndexes(), and SCHEMA_COMPAT_READ_OLD.

◆ getProtocolPrefix()

static MediaWiki\ExternalLinks\LinkFilter::getProtocolPrefix (   $protocol)
static

◆ getQueryConditions()

static MediaWiki\ExternalLinks\LinkFilter::getQueryConditions (   $filterEntry,
array  $options = [] 
)
static

Return query conditions which will match the specified string.

There are several kinds of filter entry:

*.domain.com    -  Matches domain.com and www.domain.com
domain.com      -  Matches domain.com or domain.com/ but not www.domain.com
*.domain.com/x  -  Matches domain.com/xy or www.domain.com/xy. Also probably matches
                   domain.com/foobar/xy due to limitations of LIKE syntax.
domain.com/x    -  Matches domain.com/xy but not www.domain.com/xy
192.0.2.*       -  Matches any IP in 192.0.2.0/24. Can also have a path appended.
[2001:db8::*]   -  Matches any IP in 2001:db8::/112. Can also have a path appended.
[2001:db8:*]    -  Matches any IP in 2001:db8::/32. Can also have a path appended.
foo@domain.com  -  With protocol 'mailto:', matches the email address foo@domain.com.
*@domain.com    -  With protocol 'mailto:', matches any email address at domain.com, but
                   not subdomains like foo@mail.domain.com

Asterisks in any other location are considered invalid.

Since
1.33
Parameters
string$filterEntryFilter entry, as described above
array$optionsOptions are:
  • protocol: (string) Protocol to query (default http://)
  • oneWildcard: (bool) Stop at the first wildcard (default false)
  • db: (IDatabase|null) Database to use.
Returns
array|false Conditions to be used for the query (to be ANDed) or false on error. To determine if the query is constant on the el_index_60 field, check whether key 'el_index_60' is set.

Definition at line 304 of file LinkFilter.php.

References DB_REPLICA, MediaWiki\MainConfigNames\ExternalLinksSchemaMigrationStage, MediaWiki\MediaWikiServices\getInstance(), MediaWiki\ExternalLinks\LinkFilter\keepOneWildcard(), MediaWiki\ExternalLinks\LinkFilter\makeLikeArray(), SCHEMA_COMPAT_READ_OLD, and wfGetDB().

Referenced by MediaWiki\Specials\SpecialLinkSearch\getQueryInfo().

◆ keepOneWildcard()

static MediaWiki\ExternalLinks\LinkFilter::keepOneWildcard (   $arr)
static

Filters an array returned by makeLikeArray(), removing everything past first pattern placeholder.

Note
You probably want self::getQueryConditions() instead
Parameters
array$arrArray to filter
Returns
array Filtered array

Definition at line 513 of file LinkFilter.php.

Referenced by MediaWiki\ExternalLinks\LinkFilter\getQueryConditions().

◆ makeIndexes()

static MediaWiki\ExternalLinks\LinkFilter::makeIndexes (   $url,
  $reverseDomain = true 
)
static

Converts a URL into a format for el_index.

Since
1.33
Parameters
string$url
bool$reverseDomain
Returns
string[][] One entry. Empty array on error. Each entry is an array in form of <host,path>

Definition at line 171 of file LinkFilter.php.

References wfParseUrl().

Referenced by MediaWiki\ExternalLinks\LinkFilter\getIndexedUrlsNonReversed(), and MediaWiki\Deferred\LinksUpdate\ExternalLinksTable\insertLink().

◆ makeLikeArray()

static MediaWiki\ExternalLinks\LinkFilter::makeLikeArray (   $filterEntry,
  $protocol = 'http://' 
)
static

Make an array to be used for calls to Database::buildLike(), which will match the specified string.

This function does the same as LinkFilter::makeIndexes(), except it also takes care of adding wildcards

Note
You probably want self::getQueryConditions() instead
Parameters
string$filterEntryFilter entry,
See also
self::getQueryConditions()
Parameters
string$protocolProtocol (default http://)
Returns
array|false Array to be passed to Database::buildLike() or false on error

Definition at line 442 of file LinkFilter.php.

References DB_REPLICA, wfGetDB(), and wfParseUrl().

Referenced by MediaWiki\ExternalLinks\LinkFilter\getQueryConditions().

◆ matchEntry()

static MediaWiki\ExternalLinks\LinkFilter::matchEntry ( Content  $content,
  $filterEntry,
  $protocol = 'http://' 
)
static

Check whether $content contains a link to $filterEntry.

Parameters
Content$contentContent to check
string$filterEntryDomainparts, see makeRegex() for more details
string$protocol'http://' or 'https://'
Returns
int 0 if no match or 1 if there's at least one match

Definition at line 58 of file LinkFilter.php.

References $content.

◆ prepareProtocols()

static MediaWiki\ExternalLinks\LinkFilter::prepareProtocols ( )
static

◆ reverseIndexe()

static MediaWiki\ExternalLinks\LinkFilter::reverseIndexe (   $domainIndex)
static

Member Data Documentation

◆ VERSION

const MediaWiki\ExternalLinks\LinkFilter::VERSION = 1

Increment this when makeIndexes output changes.

It'll cause maintenance/refreshExternallinksIndex.php to run from update.php.

Definition at line 48 of file LinkFilter.php.


The documentation for this class was generated from the following file: