[ Index ] |
PHP Cross Reference of MediaWiki-1.24.0 |
[Summary view] [Print] [Text view]
1 MediaWiki extension: SpamBlacklist 2 ---------------------------------- 3 4 SpamBlacklist is a simple edit filter extension. When someone tries to save the 5 page, it checks the text against a potentially very large list of "bad" 6 hostnames. If there is a match, it displays an error message to the user and 7 refuses to save the page. 8 9 To enable it, first download a copy of the SpamBlacklist directory and put it 10 into your extensions directory. Then put the following at the end of your 11 LocalSettings.php: 12 13 require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" ); 14 15 The list of bad URLs can be drawn from multiple sources. These sources are 16 configured with the $wgSpamBlacklistFiles global variable. This global variable 17 can be set in LocalSettings.php, AFTER including SpamBlacklist.php. 18 19 $wgSpamBlacklistFiles is an array, each value containing either a URL, a filename 20 or a database location. Specifying a database location allows you to draw the 21 blacklist from a page on your wiki. The format of the database location 22 specifier is "DB: <db name> <title>". 23 24 Example: 25 26 require_once( "$IP/extensions/SpamBlacklist/SpamBlacklist.php" ); 27 $wgSpamBlacklistFiles = array( 28 "$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimedia's list 29 30 // database title 31 "DB: wikidb My_spam_blacklist", 32 ); 33 34 The local pages [[MediaWiki:Spam-blacklist]] and [[MediaWiki:Spam-whitelist]] 35 will always be used, whatever additional files are listed. 36 37 Compatibility 38 ----------- 39 40 This extension is primarily maintained to run on the latest release version 41 of MediaWiki (1.22.x as of this writing) and development versions, however 42 the current version should work up to 1.21. 43 44 If you are using an older version of MediaWiki, you can checkout an 45 older release branch, for example MediaWiki 1.20 would use REL1_20. 46 47 For even older versions, you may be able to dig older versions out of the 48 Git repository which work, but if using Wikimedia's blacklist file 49 you will likely have problems with failure due to the large size of the 50 blacklist not being handled by old versions of the code. 51 52 53 File format 54 ----------- 55 56 In simple terms: 57 * Everything from a "#" character to the end of the line is a comment 58 * Every non-blank line is a regex fragment which will only match inside URLs 59 60 Internally, a regex is formed which looks like this: 61 62 !http://[a-z0-9\-.]*(line 1|line 2|line 3|....)!Si 63 64 A few notes about this format. It's not necessary to add www to the start of 65 hostnames, the regex is designed to match any subdomain. Don't add patterns 66 to your file which may run off the end of the URL, e.g. anything containing 67 ".*". Unlike in some similar systems, the line-end metacharacter "$" will not 68 assert the end of the hostname, it'll assert the end of the page. 69 70 Performance 71 ----------- 72 73 This extension uses a small "loader" file, to avoid loading all the code on 74 every page view. This means that page view performance will not be affected 75 even if you are not running a PHP bytecode cache such as Turck MMCache. Note 76 that a bytecode cache is strongly recommended for any MediaWiki installation. 77 78 The regex match itself generally adds an insignificant overhead to page saves, 79 on the order of 100ms in our experience. However loading the spam file from disk 80 or the database, and constructing the regex, may take a significant amount of 81 time depending on your hardware. If you find that enabling this extension slows 82 down saves excessively, try installing MemCached or another supported data 83 caching solution. The SpamBlacklist extension will cache the constructed regex 84 if such a system is present. 85 86 Caching behavior 87 ---------------- 88 89 Blacklist files loaded from remote web sites are cached locally, in the cache 90 subsystem used for MediaWiki's localization. (This usually means the objectcache 91 table on a default install.) 92 93 By default, the list is cached for 15 minutes (if successfully fetched) or 94 10 minutes (if the network fetch failed), after which point it will be fetched 95 again when next requested. This should be a decent balance between avoiding 96 too-frequent fetches if your site is frequently used and staying up to date. 97 98 Fully-processed blacklist data may be cached in memcached or another shared 99 memory cache if it's been configured in MediaWiki. 100 101 102 Stability 103 --------- 104 105 This extension has not been widely tested outside Wikimedia. Although it has 106 been in production on Wikimedia websites since December 2004, it should be 107 considered experimental. Its design is simple, with little input validation, so 108 unexpected behavior due to incorrect regular expression input or non-standard 109 configuration is entirely possible. 110 111 Obtaining or making blacklists 112 ------------------------------ 113 114 The primary source for a MediaWiki-compatible blacklist file is the Wikimedia 115 spam blacklist on meta: 116 117 http://meta.wikimedia.org/wiki/Spam_blacklist 118 119 In the default configuration, the extension loads this list from our site 120 once every 10-15 minutes. 121 122 The Wikimedia spam blacklist can only be edited by trusted administrators. 123 Wikimedia hosts large, diverse wikis with many thousands of external links, 124 hence the Wikimedia blacklist is comparatively conservative in the links it 125 blocks. You may want to add your own keyword blocks or even ccTLD blocks. 126 You may suggest modifications to the Wikimedia blacklist at: 127 128 http://meta.wikimedia.org/wiki/Talk:Spam_blacklist 129 130 To make maintenance of local lists easier, you may wish to add a DB: source to 131 $wgSpamBlacklistFiles and hence create a blacklist on your wiki. If you do this, 132 it is strongly recommended that you protect the page from general editing. 133 Besides the obvious danger that someone may add a regex that matches everything, 134 please note that an attacker with the ability to input arbitrary regular 135 expressions may be able to generate segfaults in the PCRE library. 136 137 Whitelisting 138 ------------ 139 140 You may sometimes find that a site listed in a centrally-maintained blacklist 141 contains something you nonetheless want to link to. 142 143 A local whitelist can be maintained by creating a [[MediaWiki:Spam-whitelist]] 144 page and listing hostnames in it, using the same format as the blacklists. 145 URLs matching the whitelist will be ignored locally. 146 147 Logging 148 ------- 149 150 To aid with tracking which domains are being spammed, this extension has 151 multiple logging features. By default, hits are included in the standard 152 debug log (controlled by $wgDebugLogFile). You can grep for 'SpamBlacklistHit', 153 which includes the IP of the user and the URL they tried to submit. This 154 file is only availible for people with server access and includes private info. 155 156 You can also enable logging to [[Special:Log]] by setting $wgLogSpamBlacklistHits to 157 true. This will include the account which tripped the blacklist, the page title the 158 edit was attempted on, and the specific URL. By default this log is only viewable 159 to wiki administrators, and you can grant other groups access by giving them the 160 "spamblacklistlog" permission. 161 162 Copyright 163 --------- 164 This extension and this documentation was written by Tim Starling and is 165 ambiguously licensed.
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated: Fri Nov 28 14:03:12 2014 | Cross-referenced by PHPXref 0.7.1 |