[ Index ] |
PHP Cross Reference of vtigercrm-6.1.0 |
[Summary view] [Print] [Text view]
1 2 Install 3 How to install HTML Purifier 4 5 HTML Purifier is designed to run out of the box, so actually using the 6 library is extremely easy. (Although... if you were looking for a 7 step-by-step installation GUI, you've downloaded the wrong software!) 8 9 While the impatient can get going immediately with some of the sample 10 code at the bottom of this library, it's well worth reading this entire 11 document--most of the other documentation assumes that you are familiar 12 with these contents. 13 14 15 --------------------------------------------------------------------------- 16 1. Compatibility 17 18 HTML Purifier is PHP 5 only, and is actively tested from PHP 5.0.5 and 19 up. It has no core dependencies with other libraries. PHP 20 4 support was deprecated on December 31, 2007 with HTML Purifier 3.0.0. 21 22 These optional extensions can enhance the capabilities of HTML Purifier: 23 24 * iconv : Converts text to and from non-UTF-8 encodings 25 * bcmath : Used for unit conversion and imagecrash protection 26 * tidy : Used for pretty-printing HTML 27 28 29 --------------------------------------------------------------------------- 30 2. Reconnaissance 31 32 A big plus of HTML Purifier is its inerrant support of standards, so 33 your web-pages should be standards-compliant. (They should also use 34 semantic markup, but that's another issue altogether, one HTML Purifier 35 cannot fix without reading your mind.) 36 37 HTML Purifier can process these doctypes: 38 39 * XHTML 1.0 Transitional (default) 40 * XHTML 1.0 Strict 41 * HTML 4.01 Transitional 42 * HTML 4.01 Strict 43 * XHTML 1.1 44 45 ...and these character encodings: 46 47 * UTF-8 (default) 48 * Any encoding iconv supports (with crippled internationalization support) 49 50 These defaults reflect what my choices would be if I were authoring an 51 HTML document, however, what you choose depends on the nature of your 52 codebase. If you don't know what doctype you are using, you can determine 53 the doctype from this identifier at the top of your source code: 54 55 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 56 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 57 58 ...and the character encoding from this code: 59 60 <meta http-equiv="Content-type" content="text/html;charset=ENCODING"> 61 62 If the character encoding declaration is missing, STOP NOW, and 63 read 'docs/enduser-utf8.html' (web accessible at 64 http://htmlpurifier.org/docs/enduser-utf8.html). In fact, even if it is 65 present, read this document anyway, as many websites specify their 66 document's character encoding incorrectly. 67 68 69 --------------------------------------------------------------------------- 70 3. Including the library 71 72 The procedure is quite simple: 73 74 require_once '/path/to/library/HTMLPurifier.auto.php'; 75 76 This will setup an autoloader, so the library's files are only included 77 when you use them. 78 79 Only the contents in the library/ folder are necessary, so you can remove 80 everything else when using HTML Purifier in a production environment. 81 82 If you installed HTML Purifier via PEAR, all you need to do is: 83 84 require_once 'HTMLPurifier.auto.php'; 85 86 Please note that the usual PEAR practice of including just the classes you 87 want will not work with HTML Purifier's autoloading scheme. 88 89 Advanced users, read on; other users can skip to section 4. 90 91 Autoload compatibility 92 ---------------------- 93 94 HTML Purifier attempts to be as smart as possible when registering an 95 autoloader, but there are some cases where you will need to change 96 your own code to accomodate HTML Purifier. These are those cases: 97 98 PHP VERSION IS LESS THAN 5.1.2, AND YOU'VE DEFINED __autoload 99 Because spl_autoload_register() doesn't exist in early versions 100 of PHP 5, HTML Purifier has no way of adding itself to the autoload 101 stack. Modify your __autoload function to test 102 HTMLPurifier_Bootstrap::autoload($class) 103 104 For example, suppose your autoload function looks like this: 105 106 function __autoload($class) { 107 require str_replace('_', '/', $class) . '.php'; 108 return true; 109 } 110 111 A modified version with HTML Purifier would look like this: 112 113 function __autoload($class) { 114 if (HTMLPurifier_Bootstrap::autoload($class)) return true; 115 require str_replace('_', '/', $class) . '.php'; 116 return true; 117 } 118 119 Note that there *is* some custom behavior in our autoloader; the 120 original autoloader in our example would work for 99% of the time, 121 but would fail when including language files. 122 123 AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED 124 spl_autoload_register() has the curious behavior of disabling 125 the existing __autoload() handler. Users need to explicitly 126 spl_autoload_register('__autoload'). Because we use SPL when it 127 is available, __autoload() will ALWAYS be disabled. If __autoload() 128 is declared before HTML Purifier is loaded, this is not a problem: 129 HTML Purifier will register the function for you. But if it is 130 declared afterwards, it will mysteriously not work. This 131 snippet of code (after your autoloader is defined) will fix it: 132 133 spl_autoload_register('__autoload') 134 135 Users should also be on guard if they use a version of PHP previous 136 to 5.1.2 without an autoloader--HTML Purifier will define __autoload() 137 for you, which can collide with an autoloader that was added by *you* 138 later. 139 140 141 For better performance 142 ---------------------- 143 144 Opcode caches, which greatly speed up PHP initialization for scripts 145 with large amounts of code (HTML Purifier included), don't like 146 autoloaders. We offer an include file that includes all of HTML Purifier's 147 files in one go in an opcode cache friendly manner: 148 149 // If /path/to/library isn't already in your include path, uncomment 150 // the below line: 151 // require '/path/to/library/HTMLPurifier.path.php'; 152 153 require 'HTMLPurifier.includes.php'; 154 155 Optional components still need to be included--you'll know if you try to 156 use a feature and you get a class doesn't exists error! The autoloader 157 can be used in conjunction with this approach to catch classes that are 158 missing. Simply add this afterwards: 159 160 require 'HTMLPurifier.autoload.php'; 161 162 Standalone version 163 ------------------ 164 165 HTML Purifier has a standalone distribution; you can also generate 166 a standalone file from the full version by running the script 167 maintenance/generate-standalone.php . The standalone version has the 168 benefit of having most of its code in one file, so parsing is much 169 faster and the library is easier to manage. 170 171 If HTMLPurifier.standalone.php exists in the library directory, you 172 can use it like this: 173 174 require '/path/to/HTMLPurifier.standalone.php'; 175 176 This is equivalent to including HTMLPurifier.includes.php, except that 177 the contents of standalone/ will be added to your path. To override this 178 behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can 179 be found (usually, this will be one directory up, the "true" library 180 directory in full distributions). Don't forget to set your path too! 181 182 The autoloader can be added to the end to ensure the classes are 183 loaded when necessary; otherwise you can manually include them. 184 To use the autoloader, use this: 185 186 require 'HTMLPurifier.autoload.php'; 187 188 For advanced users 189 ------------------ 190 191 HTMLPurifier.auto.php performs a number of operations that can be done 192 individually. These are: 193 194 HTMLPurifier.path.php 195 Puts /path/to/library in the include path. For high performance, 196 this should be done in php.ini. 197 198 HTMLPurifier.autoload.php 199 Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class). 200 201 You can do these operations by yourself--in fact, you must modify your own 202 autoload handler if you are using a version of PHP earlier than PHP 5.1.2 203 (See "Autoload compatibility" above). 204 205 206 --------------------------------------------------------------------------- 207 4. Configuration 208 209 HTML Purifier is designed to run out-of-the-box, but occasionally HTML 210 Purifier needs to be told what to do. If you answer no to any of these 211 questions, read on; otherwise, you can skip to the next section (or, if you're 212 into configuring things just for the heck of it, skip to 4.3). 213 214 * Am I using UTF-8? 215 * Am I using XHTML 1.0 Transitional? 216 217 If you answered no to any of these questions, instantiate a configuration 218 object and read on: 219 220 $config = HTMLPurifier_Config::createDefault(); 221 222 223 4.1. Setting a different character encoding 224 225 You really shouldn't use any other encoding except UTF-8, especially if you 226 plan to support multilingual websites (read section three for more details). 227 However, switching to UTF-8 is not always immediately feasible, so we can 228 adapt. 229 230 HTML Purifier uses iconv to support other character encodings, as such, 231 any encoding that iconv supports <http://www.gnu.org/software/libiconv/> 232 HTML Purifier supports with this code: 233 234 $config->set('Core', 'Encoding', /* put your encoding here */); 235 236 An example usage for Latin-1 websites (the most common encoding for English 237 websites): 238 239 $config->set('Core', 'Encoding', 'ISO-8859-1'); 240 241 Note that HTML Purifier's support for non-Unicode encodings is crippled by the 242 fact that any character not supported by that encoding will be silently 243 dropped, EVEN if it is ampersand escaped. If you want to work around 244 this, you are welcome to read docs/enduser-utf8.html for a fix, 245 but please be cognizant of the issues the "solution" creates (for this 246 reason, I do not include the solution in this document). 247 248 249 4.2. Setting a different doctype 250 251 For those of you using HTML 4.01 Transitional, you can disable 252 XHTML output like this: 253 254 $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); 255 256 Other supported doctypes include: 257 258 * HTML 4.01 Strict 259 * HTML 4.01 Transitional 260 * XHTML 1.0 Strict 261 * XHTML 1.0 Transitional 262 * XHTML 1.1 263 264 265 4.3. Other settings 266 267 There are more configuration directives which can be read about 268 here: <http://htmlpurifier.org/live/configdoc/plain.html> They're a bit boring, 269 but they can help out for those of you who like to exert maximum control over 270 your code. Some of the more interesting ones are configurable at the 271 demo <http://htmlpurifier.org/demo.php> and are well worth looking into 272 for your own system. 273 274 For example, you can fine tune allowed elements and attributes, convert 275 relative URLs to absolute ones, and even autoparagraph input text! These 276 are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and 277 %AutoFormat.AutoParagraph. The %Namespace.Directive naming convention 278 translates to: 279 280 $config->set('Namespace', 'Directive', $value); 281 282 E.g. 283 284 $config->set('HTML', 'Allowed', 'p,b,a[href],i'); 285 $config->set('URI', 'Base', 'http://www.example.com'); 286 $config->set('URI', 'MakeAbsolute', true); 287 $config->set('AutoFormat', 'AutoParagraph', true); 288 289 290 --------------------------------------------------------------------------- 291 5. Caching 292 293 HTML Purifier generates some cache files (generally one or two) to speed up 294 its execution. For maximum performance, make sure that 295 library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver. 296 297 If you are in the library/ folder of HTML Purifier, you can set the 298 appropriate permissions using: 299 300 chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer 301 302 If the above command doesn't work, you may need to assign write permissions 303 to all. This may be necessary if your webserver runs as nobody, but is 304 not recommended since it means any other user can write files in the 305 directory. Use: 306 307 chmod -R 0777 HTMLPurifier/DefinitionCache/Serializer 308 309 You can also chmod files via your FTP client; this option 310 is usually accessible by right clicking the corresponding directory and 311 then selecting "chmod" or "file permissions". 312 313 Starting with 2.0.1, HTML Purifier will generate friendly error messages 314 that will tell you exactly what you have to chmod the directory to, if in doubt, 315 follow its advice. 316 317 If you are unable or unwilling to give write permissions to the cache 318 directory, you can either disable the cache (and suffer a performance 319 hit): 320 321 $config->set('Core', 'DefinitionCache', null); 322 323 Or move the cache directory somewhere else (no trailing slash): 324 325 $config->set('Cache', 'SerializerPath', '/home/user/absolute/path'); 326 327 328 --------------------------------------------------------------------------- 329 6. Using the code 330 331 The interface is mind-numbingly simple: 332 333 $purifier = new HTMLPurifier(); 334 $clean_html = $purifier->purify( $dirty_html ); 335 336 ...or, if you're using the configuration object: 337 338 $purifier = new HTMLPurifier($config); 339 $clean_html = $purifier->purify( $dirty_html ); 340 341 That's it! For more examples, check out docs/examples/ (they aren't very 342 different though). Also, docs/enduser-slow.html gives advice on what to 343 do if HTML Purifier is slowing down your application. 344 345 346 --------------------------------------------------------------------------- 347 7. Quick install 348 349 First, make sure library/HTMLPurifier/DefinitionCache/Serializer is 350 writable by the webserver (see Section 5: Caching above for details). 351 If your website is in UTF-8 and XHTML Transitional, use this code: 352 353 <?php 354 require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php'; 355 356 $purifier = new HTMLPurifier(); 357 $clean_html = $purifier->purify($dirty_html); 358 ?> 359 360 If your website is in a different encoding or doctype, use this code: 361 362 <?php 363 require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php'; 364 365 $config = HTMLPurifier_Config::createDefault(); 366 $config->set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding 367 $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype 368 $purifier = new HTMLPurifier($config); 369 370 $clean_html = $purifier->purify($dirty_html); 371 ?> 372 373 vim: et sw=4 sts=4
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated: Fri Nov 28 20:08:37 2014 | Cross-referenced by PHPXref 0.7.1 |