[ Index ]

PHP Cross Reference of MediaWiki-1.24.0

title

Body

[close]

/extensions/SpamBlacklist/ -> README (source)

   1  MediaWiki extension: SpamBlacklist
   2  ----------------------------------
   3  
   4  SpamBlacklist is a simple edit filter extension. When someone tries to save the
   5  page, it checks the text against a potentially very large list of "bad"
   6  hostnames. If there is a match, it displays an error message to the user and 
   7  refuses to save the page.
   8  
   9  To enable it, first download a copy of the SpamBlacklist directory and put it
  10  into your extensions directory. Then put the following at the end of your 
  11  LocalSettings.php:
  12  
  13  require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" );
  14  
  15  The list of bad URLs can be drawn from multiple sources. These sources are
  16  configured with the $wgSpamBlacklistFiles global variable. This global variable
  17  can be set in LocalSettings.php, AFTER including SpamBlacklist.php.
  18  
  19  $wgSpamBlacklistFiles is an array, each value containing either a URL, a filename 
  20  or a database location. Specifying a database location allows you to draw the
  21  blacklist from a page on your wiki. The format of the database location
  22  specifier is "DB: <db name> <title>".
  23  
  24  Example:
  25  
  26  require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" );
  27  $wgSpamBlacklistFiles = array(
  28      "$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list
  29  
  30  //          database    title
  31      "DB: wikidb My_spam_blacklist",
  32  );
  33  
  34  The local pages [[MediaWiki:Spam-blacklist]] and [[MediaWiki:Spam-whitelist]]
  35  will always be used, whatever additional files are listed.
  36  
  37  Compatibility
  38  -----------
  39  
  40  This extension is primarily maintained to run on the latest release version
  41  of MediaWiki (1.22.x as of this writing) and development versions, however
  42  the current version should work up to 1.21.
  43  
  44  If you are using an older version of MediaWiki, you can checkout an
  45  older release branch, for example MediaWiki 1.20 would use REL1_20.
  46  
  47  For even older versions, you may be able to dig older versions out of the
  48  Git repository which work, but if using Wikimedia's blacklist file
  49  you will likely have problems with failure due to the large size of the
  50  blacklist not being handled by old versions of the code.
  51  
  52  
  53  File format
  54  -----------
  55  
  56  In simple terms:
  57     * Everything from a "#" character to the end of the line is a comment
  58     * Every non-blank line is a regex fragment which will only match inside URLs
  59  
  60  Internally, a regex is formed which looks like this:
  61  
  62     !http://[a-z0-9\-.]*(line 1|line 2|line 3|....)!Si
  63  
  64  A few notes about this format. It's not necessary to add www to the start of
  65  hostnames, the regex is designed to match any subdomain. Don't add patterns
  66  to your file which may run off the end of the URL, e.g. anything containing 
  67  ".*". Unlike in some similar systems, the line-end metacharacter "$" will not
  68  assert the end of the hostname, it'll assert the end of the page.
  69  
  70  Performance
  71  -----------
  72  
  73  This extension uses a small "loader" file, to avoid loading all the code on 
  74  every page view. This means that page view performance will not be affected 
  75  even if you are not running a PHP bytecode cache such as Turck MMCache. Note 
  76  that a bytecode cache is strongly recommended for any MediaWiki installation.
  77  
  78  The regex match itself generally adds an insignificant overhead to page saves,
  79  on the order of 100ms in our experience. However loading the spam file from disk
  80  or the database, and constructing the regex, may take a significant amount of
  81  time depending on your hardware. If you find that enabling this extension slows
  82  down saves excessively, try installing MemCached or another supported data
  83  caching solution. The SpamBlacklist extension will cache the constructed regex 
  84  if such a system is present.
  85  
  86  Caching behavior
  87  ----------------
  88  
  89  Blacklist files loaded from remote web sites are cached locally, in the cache
  90  subsystem used for MediaWiki's localization. (This usually means the objectcache
  91  table on a default install.)
  92  
  93  By default, the list is cached for 15 minutes (if successfully fetched) or
  94  10 minutes (if the network fetch failed), after which point it will be fetched
  95  again when next requested. This should be a decent balance between avoiding
  96  too-frequent fetches if your site is frequently used and staying up to date.
  97  
  98  Fully-processed blacklist data may be cached in memcached or another shared
  99  memory cache if it's been configured in MediaWiki.
 100  
 101  
 102  Stability
 103  ---------
 104  
 105  This extension has not been widely tested outside Wikimedia. Although it has
 106  been in production on Wikimedia websites since December 2004, it should be 
 107  considered experimental. Its design is simple, with little input validation, so
 108  unexpected behavior due to incorrect regular expression input or non-standard
 109  configuration is entirely possible.
 110  
 111  Obtaining or making blacklists
 112  ------------------------------
 113  
 114  The primary source for a MediaWiki-compatible blacklist file is the Wikimedia
 115  spam blacklist on meta:
 116  
 117      http://meta.wikimedia.org/wiki/Spam_blacklist
 118  
 119  In the default configuration, the extension loads this list from our site 
 120  once every 10-15 minutes.
 121  
 122  The Wikimedia spam blacklist can only be edited by trusted administrators. 
 123  Wikimedia hosts large, diverse wikis with many thousands of external links, 
 124  hence the Wikimedia blacklist is comparatively conservative in the links it 
 125  blocks. You may want to add your own keyword blocks or even ccTLD blocks.
 126  You may suggest modifications to the Wikimedia blacklist at:
 127  
 128      http://meta.wikimedia.org/wiki/Talk:Spam_blacklist
 129  
 130  To make maintenance of local lists easier, you may wish to add a DB: source to
 131  $wgSpamBlacklistFiles and hence create a blacklist on your wiki. If you do this,
 132  it is strongly recommended that you protect the page from general editing.
 133  Besides the obvious danger that someone may add a regex that matches everything,
 134  please note that an attacker with the ability to input arbitrary regular
 135  expressions may be able to generate segfaults in the PCRE library.
 136  
 137  Whitelisting
 138  ------------
 139  
 140  You may sometimes find that a site listed in a centrally-maintained blacklist
 141  contains something you nonetheless want to link to.
 142  
 143  A local whitelist can be maintained by creating a [[MediaWiki:Spam-whitelist]]
 144  page and listing hostnames in it, using the same format as the blacklists.
 145  URLs matching the whitelist will be ignored locally.
 146  
 147  Logging
 148  -------
 149  
 150  To aid with tracking which domains are being spammed, this extension has
 151  multiple logging features. By default, hits are included in the standard
 152  debug log (controlled by $wgDebugLogFile). You can grep for 'SpamBlacklistHit',
 153  which includes the IP of the user and the URL they tried to submit. This
 154  file is only availible for people with server access and includes private info.
 155  
 156  You can also enable logging to [[Special:Log]] by setting $wgLogSpamBlacklistHits to
 157  true. This will include the account which tripped the blacklist, the page title the
 158  edit was attempted on, and the specific URL. By default this log is only viewable
 159  to wiki administrators, and you can grant other groups access by giving them the
 160  "spamblacklistlog" permission.
 161  
 162  Copyright
 163  ---------
 164  This extension and this documentation was written by Tim Starling and is 
 165  ambiguously licensed.


Generated: Fri Nov 28 14:03:12 2014 Cross-referenced by PHPXref 0.7.1