[ Index ]

PHP Cross Reference of vtigercrm-6.1.0

title

Body

[close]

/libraries/htmlpurifier/ -> INSTALL (source)

   1  
   2  Install
   3      How to install HTML Purifier
   4  
   5  HTML Purifier is designed to run out of the box, so actually using the
   6  library is extremely easy.  (Although... if you were looking for a
   7  step-by-step installation GUI, you've downloaded the wrong software!)
   8  
   9  While the impatient can get going immediately with some of the sample
  10  code at the bottom of this library, it's well worth reading this entire
  11  document--most of the other documentation assumes that you are familiar
  12  with these contents.
  13  
  14  
  15  ---------------------------------------------------------------------------
  16  1.  Compatibility
  17  
  18  HTML Purifier is PHP 5 only, and is actively tested from PHP 5.0.5 and
  19  up. It has no core dependencies with other libraries. PHP
  20  4 support was deprecated on December 31, 2007 with HTML Purifier 3.0.0.
  21  
  22  These optional extensions can enhance the capabilities of HTML Purifier:
  23  
  24      * iconv  : Converts text to and from non-UTF-8 encodings
  25      * bcmath : Used for unit conversion and imagecrash protection
  26      * tidy   : Used for pretty-printing HTML
  27  
  28  
  29  ---------------------------------------------------------------------------
  30  2.  Reconnaissance
  31  
  32  A big plus of HTML Purifier is its inerrant support of standards, so
  33  your web-pages should be standards-compliant.  (They should also use
  34  semantic markup, but that's another issue altogether, one HTML Purifier
  35  cannot fix without reading your mind.)
  36  
  37  HTML Purifier can process these doctypes:
  38  
  39  * XHTML 1.0 Transitional (default)
  40  * XHTML 1.0 Strict
  41  * HTML 4.01 Transitional
  42  * HTML 4.01 Strict
  43  * XHTML 1.1
  44  
  45  ...and these character encodings:
  46  
  47  * UTF-8 (default)
  48  * Any encoding iconv supports (with crippled internationalization support)
  49  
  50  These defaults reflect what my choices would be if I were authoring an
  51  HTML document, however, what you choose depends on the nature of your
  52  codebase.  If you don't know what doctype you are using, you can determine
  53  the doctype from this identifier at the top of your source code:
  54  
  55      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  56          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  57  
  58  ...and the character encoding from this code:
  59  
  60      <meta http-equiv="Content-type" content="text/html;charset=ENCODING">
  61  
  62  If the character encoding declaration is missing, STOP NOW, and
  63  read 'docs/enduser-utf8.html' (web accessible at
  64  http://htmlpurifier.org/docs/enduser-utf8.html).  In fact, even if it is
  65  present, read this document anyway, as many websites specify their
  66  document's character encoding incorrectly.
  67  
  68  
  69  ---------------------------------------------------------------------------
  70  3.  Including the library
  71  
  72  The procedure is quite simple:
  73  
  74      require_once '/path/to/library/HTMLPurifier.auto.php';
  75  
  76  This will setup an autoloader, so the library's files are only included
  77  when you use them.
  78  
  79  Only the contents in the library/ folder are necessary, so you can remove
  80  everything else when using HTML Purifier in a production environment.
  81  
  82  If you installed HTML Purifier via PEAR, all you need to do is:
  83  
  84      require_once 'HTMLPurifier.auto.php';
  85  
  86  Please note that the usual PEAR practice of including just the classes you
  87  want will not work with HTML Purifier's autoloading scheme.
  88  
  89  Advanced users, read on; other users can skip to section 4.
  90  
  91  Autoload compatibility
  92  ----------------------
  93  
  94      HTML Purifier attempts to be as smart as possible when registering an
  95      autoloader, but there are some cases where you will need to change
  96      your own code to accomodate HTML Purifier. These are those cases:
  97  
  98      PHP VERSION IS LESS THAN 5.1.2, AND YOU'VE DEFINED __autoload
  99          Because spl_autoload_register() doesn't exist in early versions
 100          of PHP 5, HTML Purifier has no way of adding itself to the autoload
 101          stack. Modify your __autoload function to test
 102          HTMLPurifier_Bootstrap::autoload($class)
 103  
 104          For example, suppose your autoload function looks like this:
 105  
 106              function __autoload($class) {
 107                  require str_replace('_', '/', $class) . '.php';
 108                  return true;
 109              }
 110  
 111          A modified version with HTML Purifier would look like this:
 112  
 113              function __autoload($class) {
 114                  if (HTMLPurifier_Bootstrap::autoload($class)) return true;
 115                  require str_replace('_', '/', $class) . '.php';
 116                  return true;
 117              }
 118  
 119          Note that there *is* some custom behavior in our autoloader; the
 120          original autoloader in our example would work for 99% of the time,
 121          but would fail when including language files.
 122  
 123      AN __autoload FUNCTION IS DECLARED AFTER OUR AUTOLOADER IS REGISTERED
 124          spl_autoload_register() has the curious behavior of disabling
 125          the existing __autoload() handler. Users need to explicitly
 126          spl_autoload_register('__autoload'). Because we use SPL when it
 127          is available, __autoload() will ALWAYS be disabled. If __autoload()
 128          is declared before HTML Purifier is loaded, this is not a problem:
 129          HTML Purifier will register the function for you. But if it is
 130          declared afterwards, it will mysteriously not work. This
 131          snippet of code (after your autoloader is defined) will fix it:
 132  
 133              spl_autoload_register('__autoload')
 134  
 135      Users should also be on guard if they use a version of PHP previous
 136      to 5.1.2 without an autoloader--HTML Purifier will define __autoload()
 137      for you, which can collide with an autoloader that was added by *you*
 138      later.
 139  
 140  
 141  For better performance
 142  ----------------------
 143  
 144      Opcode caches, which greatly speed up PHP initialization for scripts
 145      with large amounts of code (HTML Purifier included), don't like
 146      autoloaders. We offer an include file that includes all of HTML Purifier's
 147      files in one go in an opcode cache friendly manner:
 148  
 149          // If /path/to/library isn't already in your include path, uncomment

 150          // the below line:

 151          // require '/path/to/library/HTMLPurifier.path.php';

 152  
 153          require 'HTMLPurifier.includes.php';
 154  
 155      Optional components still need to be included--you'll know if you try to
 156      use a feature and you get a class doesn't exists error! The autoloader
 157      can be used in conjunction with this approach to catch classes that are
 158      missing. Simply add this afterwards:
 159  
 160          require  'HTMLPurifier.autoload.php';
 161  
 162  Standalone version
 163  ------------------
 164  
 165      HTML Purifier has a standalone distribution; you can also generate
 166      a standalone file from the full version by running the script
 167      maintenance/generate-standalone.php . The standalone version has the
 168      benefit of having most of its code in one file, so parsing is much
 169      faster and the library is easier to manage.
 170  
 171      If HTMLPurifier.standalone.php exists in the library directory, you
 172      can use it like this:
 173  
 174          require '/path/to/HTMLPurifier.standalone.php';
 175  
 176      This is equivalent to including HTMLPurifier.includes.php, except that
 177      the contents of standalone/ will be added to your path. To override this
 178      behavior, specify a new HTMLPURIFIER_PREFIX where standalone files can
 179      be found (usually, this will be one directory up, the "true" library
 180      directory in full distributions). Don't forget to set your path too!
 181  
 182      The autoloader can be added to the end to ensure the classes are
 183      loaded when necessary; otherwise you can manually include them.
 184      To use the autoloader, use this:
 185  
 186          require  'HTMLPurifier.autoload.php';
 187  
 188  For advanced users
 189  ------------------
 190  
 191      HTMLPurifier.auto.php performs a number of operations that can be done
 192      individually. These are:
 193  
 194          HTMLPurifier.path.php
 195              Puts /path/to/library in the include path. For high performance,
 196              this should be done in php.ini.
 197  
 198          HTMLPurifier.autoload.php
 199              Registers our autoload handler HTMLPurifier_Bootstrap::autoload($class).
 200  
 201      You can do these operations by yourself--in fact, you must modify your own
 202      autoload handler if you are using a version of PHP earlier than PHP 5.1.2
 203      (See "Autoload compatibility" above).
 204  
 205  
 206  ---------------------------------------------------------------------------
 207  4. Configuration
 208  
 209  HTML Purifier is designed to run out-of-the-box, but occasionally HTML
 210  Purifier needs to be told what to do.  If you answer no to any of these
 211  questions, read on; otherwise, you can skip to the next section (or, if you're
 212  into configuring things just for the heck of it, skip to 4.3).
 213  
 214  * Am I using UTF-8?
 215  * Am I using XHTML 1.0 Transitional?
 216  
 217  If you answered no to any of these questions, instantiate a configuration
 218  object and read on:
 219  
 220      $config = HTMLPurifier_Config::createDefault();
 221  
 222  
 223  4.1. Setting a different character encoding
 224  
 225  You really shouldn't use any other encoding except UTF-8, especially if you
 226  plan to support multilingual websites (read section three for more details).
 227  However, switching to UTF-8 is not always immediately feasible, so we can
 228  adapt.
 229  
 230  HTML Purifier uses iconv to support other character encodings, as such,
 231  any encoding that iconv supports <http://www.gnu.org/software/libiconv/>
 232  HTML Purifier supports with this code:
 233  
 234      $config->set('Core', 'Encoding', /* put your encoding here */);
 235  
 236  An example usage for Latin-1 websites (the most common encoding for English
 237  websites):
 238  
 239      $config->set('Core', 'Encoding', 'ISO-8859-1');
 240  
 241  Note that HTML Purifier's support for non-Unicode encodings is crippled by the
 242  fact that any character not supported by that encoding will be silently
 243  dropped, EVEN if it is ampersand escaped.  If you want to work around
 244  this, you are welcome to read docs/enduser-utf8.html for a fix,
 245  but please be cognizant of the issues the "solution" creates (for this
 246  reason, I do not include the solution in this document).
 247  
 248  
 249  4.2. Setting a different doctype
 250  
 251  For those of you using HTML 4.01 Transitional, you can disable
 252  XHTML output like this:
 253  
 254      $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');
 255  
 256  Other supported doctypes include:
 257  
 258      * HTML 4.01 Strict
 259      * HTML 4.01 Transitional
 260      * XHTML 1.0 Strict
 261      * XHTML 1.0 Transitional
 262      * XHTML 1.1
 263  
 264  
 265  4.3. Other settings
 266  
 267  There are more configuration directives which can be read about
 268  here: <http://htmlpurifier.org/live/configdoc/plain.html>  They're a bit boring,
 269  but they can help out for those of you who like to exert maximum control over
 270  your code.  Some of the more interesting ones are configurable at the
 271  demo <http://htmlpurifier.org/demo.php> and are well worth looking into
 272  for your own system.
 273  
 274  For example, you can fine tune allowed elements and attributes, convert
 275  relative URLs to absolute ones, and even autoparagraph input text! These
 276  are, respectively, %HTML.Allowed, %URI.MakeAbsolute and %URI.Base, and
 277  %AutoFormat.AutoParagraph. The %Namespace.Directive naming convention
 278  translates to:
 279  
 280      $config->set('Namespace', 'Directive', $value);
 281  
 282  E.g.
 283  
 284      $config->set('HTML', 'Allowed', 'p,b,a[href],i');
 285      $config->set('URI', 'Base', 'http://www.example.com');
 286      $config->set('URI', 'MakeAbsolute', true);
 287      $config->set('AutoFormat', 'AutoParagraph', true);
 288  
 289  
 290  ---------------------------------------------------------------------------
 291  5. Caching
 292  
 293  HTML Purifier generates some cache files (generally one or two) to speed up
 294  its execution. For maximum performance, make sure that
 295  library/HTMLPurifier/DefinitionCache/Serializer is writeable by the webserver.
 296  
 297  If you are in the library/ folder of HTML Purifier, you can set the
 298  appropriate permissions using:
 299  
 300      chmod -R 0755 HTMLPurifier/DefinitionCache/Serializer
 301  
 302  If the above command doesn't work, you may need to assign write permissions
 303  to all. This may be necessary if your webserver runs as nobody, but is
 304  not recommended since it means any other user can write files in the
 305  directory. Use:
 306  
 307      chmod -R 0777 HTMLPurifier/DefinitionCache/Serializer
 308  
 309  You can also chmod files via your FTP client; this option
 310  is usually accessible by right clicking the corresponding directory and
 311  then selecting "chmod" or "file permissions".
 312  
 313  Starting with 2.0.1, HTML Purifier will generate friendly error messages
 314  that will tell you exactly what you have to chmod the directory to, if in doubt,
 315  follow its advice.
 316  
 317  If you are unable or unwilling to give write permissions to the cache
 318  directory, you can either disable the cache (and suffer a performance
 319  hit):
 320  
 321      $config->set('Core', 'DefinitionCache', null);
 322  
 323  Or move the cache directory somewhere else (no trailing slash):
 324  
 325      $config->set('Cache', 'SerializerPath', '/home/user/absolute/path');
 326  
 327  
 328  ---------------------------------------------------------------------------
 329  6.   Using the code
 330  
 331  The interface is mind-numbingly simple:
 332  
 333      $purifier = new HTMLPurifier();
 334      $clean_html = $purifier->purify( $dirty_html );
 335  
 336  ...or, if you're using the configuration object:
 337  
 338      $purifier = new HTMLPurifier($config);
 339      $clean_html = $purifier->purify( $dirty_html );
 340  
 341  That's it!  For more examples, check out docs/examples/ (they aren't very
 342  different though).  Also, docs/enduser-slow.html gives advice on what to
 343  do if HTML Purifier is slowing down your application.
 344  
 345  
 346  ---------------------------------------------------------------------------
 347  7.   Quick install
 348  
 349  First, make sure library/HTMLPurifier/DefinitionCache/Serializer is
 350  writable by the webserver (see Section 5: Caching above for details).
 351  If your website is in UTF-8 and XHTML Transitional, use this code:
 352  
 353  <?php
 354      require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
 355  
 356      $purifier = new HTMLPurifier();
 357      $clean_html = $purifier->purify($dirty_html);
 358  ?>
 359  
 360  If your website is in a different encoding or doctype, use this code:
 361  
 362  <?php
 363      require_once '/path/to/htmlpurifier/library/HTMLPurifier.auto.php';
 364  
 365      $config = HTMLPurifier_Config::createDefault();
 366      $config->set('Core', 'Encoding', 'ISO-8859-1'); // replace with your encoding

 367      $config->set('HTML', 'Doctype', 'HTML 4.01 Transitional'); // replace with your doctype

 368      $purifier = new HTMLPurifier($config);
 369  
 370      $clean_html = $purifier->purify($dirty_html);
 371  ?>
 372  
 373      vim: et sw=4 sts=4


Generated: Fri Nov 28 20:08:37 2014 Cross-referenced by PHPXref 0.7.1