[ Index ] |
PHP Cross Reference of MediaWiki-1.24.0 |
[Source view] [Print] [Project Stats]
HTML sanitizer for %MediaWiki. Copyright © 2002-2005 Brion Vibber <[email protected]> et al https://www.mediawiki.org/
File Size: | 1885 lines (56 kb) |
Included or required: | 0 times |
Referenced: | 1 time |
Includes or requires: | 0 files |
Sanitizer:: (41 methods):
getAttribsRegex()
removeHTMLtags()
removeHTMLcomments()
validateTag()
validateTagAttributes()
validateAttributes()
mergeAttributes()
normalizeCss()
checkCss()
cssDecodeCallback()
fixTagAttributes()
encodeAttribute()
safeEncodeAttribute()
escapeId()
escapeClass()
escapeHtmlAllowEntities()
armorLinksCallback()
decodeTagAttributes()
safeEncodeTagAttributes()
getTagAttributeCallback()
normalizeAttributeValue()
normalizeWhitespace()
normalizeSectionNameWhitespace()
normalizeCharReferences()
normalizeCharReferencesCallback()
normalizeEntity()
decCharReference()
hexCharReference()
validateCodepoint()
decodeCharReferences()
decodeCharReferencesAndNormalize()
decodeCharReferencesCallback()
decodeChar()
decodeEntity()
attributeWhitelist()
setupAttributeWhitelist()
stripAllTags()
hackDocType()
cleanUrl()
cleanUrlCallback()
validateEmail()
getAttribsRegex() X-Ref |
Regular expression to match HTML/XML attribute pairs within a tag. Allows some... latitude. Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes return: string |
removeHTMLtags( $text, $processCallback = null,$args = array() X-Ref |
Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments param: string $text param: callable $processCallback Callback to do any variable or parameter param: array|bool $args Arguments for the processing callback param: array $extratags For any extra tags to include param: array $removetags For any tags (default or extra) to exclude return: string |
removeHTMLcomments( $text ) X-Ref |
Remove '<!--', '-->', and everything between. To avoid leaving blank lines, when a comment is both preceded and followed by a newline (ignoring spaces), trim leading and trailing spaces and one of the newlines. param: string $text return: string |
validateTag( $params, $element ) X-Ref |
Takes attribute names and values for a tag and the tag name and validates that the tag is allowed to be present. This DOES NOT validate the attributes, nor does it validate the tags themselves. This method only handles the special circumstances where we may want to allow a tag within content but ONLY when it has specific attributes set. param: string $params param: string $element return: bool |
validateTagAttributes( $attribs, $element ) X-Ref |
Take an array of attribute names and values and normalize or discard illegal values for the given element type. - Discards attributes not on a whitelist for the given element - Unsafe style attributes are discarded - Invalid id attributes are re-encoded param: array $attribs param: string $element return: array |
validateAttributes( $attribs, $whitelist ) X-Ref |
Take an array of attribute names and values and normalize or discard illegal values for the given whitelist. - Discards attributes not the given whitelist - Unsafe style attributes are discarded - Invalid id attributes are re-encoded param: array $attribs param: array $whitelist List of allowed attribute names return: array |
mergeAttributes( $a, $b ) X-Ref |
Merge two sets of HTML attributes. Conflicting items in the second set will override those in the first, except for 'class' attributes which will be combined (if they're both strings). param: array $a param: array $b return: array |
normalizeCss( $value ) X-Ref |
Normalize CSS into a format we can easily search for hostile input - decode character references - decode escape sequences - convert characters that IE6 interprets into ascii - remove comments, unless the entire value is one single comment param: string $value the css string return: string normalized css |
checkCss( $value ) X-Ref |
No description |
cssDecodeCallback( $matches ) X-Ref |
param: array $matches return: string |
fixTagAttributes( $text, $element ) X-Ref |
Take a tag soup fragment listing an HTML element's attributes and normalize it to well-formed XML, discarding unwanted attributes. Output is safe for further wikitext processing, with escaping of values that could trigger problems. - Normalizes attribute names to lowercase - Discards attributes not on a whitelist for the given element - Turns broken or invalid entities into plaintext - Double-quotes all attribute values - Attributes without values are given the name as attribute - Double attributes are discarded - Unsafe style attributes are discarded - Prepends space if there are attributes. param: string $text param: string $element return: string |
encodeAttribute( $text ) X-Ref |
Encode an attribute value for HTML output. param: string $text return: string HTML-encoded text fragment |
safeEncodeAttribute( $text ) X-Ref |
Encode an attribute value for HTML tags, with extra armoring against further wiki processing. param: string $text return: string HTML-encoded text fragment |
escapeId( $id, $options = array() X-Ref |
Given a value, escape it so that it can be used in an id attribute and return it. This will use HTML5 validation if $wgExperimentalHtmlIds is true, allowing anything but ASCII whitespace. Otherwise it will use HTML 4 rules, which means a narrow subset of ASCII, with bad characters escaped with lots of dots. To ensure we don't have to bother escaping anything, we also strip ', ", & even if $wgExperimentalIds is true. TODO: Is this the best tactic? We also strip # because it upsets IE, and % because it could be ambiguous if it's part of something that looks like a percent escape (which don't work reliably in fragments cross-browser). param: string $id Id to escape param: string|array $options String or array of strings (default is array()): return: string |
escapeClass( $class ) X-Ref |
Given a value, escape it so that it can be used as a CSS class and return it. param: string $class return: string |
escapeHtmlAllowEntities( $html ) X-Ref |
Given HTML input, escape with htmlspecialchars but un-escape entities. This allows (generally harmless) entities like   to survive. param: string $html HTML to escape return: string Escaped input |
armorLinksCallback( $matches ) X-Ref |
Regex replace callback for armoring links against further processing. param: array $matches return: string |
decodeTagAttributes( $text ) X-Ref |
Return an associative array of attribute names and values from a partial tag string. Attribute names are forces to lowercase, character references are decoded to UTF-8 text. param: string $text return: array |
safeEncodeTagAttributes( $assoc_array ) X-Ref |
Build a partial tag string from an associative array of attribute names and values as returned by decodeTagAttributes. param: array $assoc_array return: string |
getTagAttributeCallback( $set ) X-Ref |
Pick the appropriate attribute value from a match set from the attribs regex matches. param: array $set return: string |
normalizeAttributeValue( $text ) X-Ref |
Normalize whitespace and character references in an XML source- encoded text for an attribute value. See http://www.w3.org/TR/REC-xml/#AVNormalize for background, but note that we're not returning the value, but are returning XML source fragments that will be slapped into output. param: string $text return: string |
normalizeWhitespace( $text ) X-Ref |
param: string $text return: string |
normalizeSectionNameWhitespace( $section ) X-Ref |
Normalizes whitespace in a section name, such as might be returned by Parser::stripSectionName(), for use in the id's that are used for section links. param: string $section return: string |
normalizeCharReferences( $text ) X-Ref |
Ensure that any entities and character references are legal for XML and XHTML specifically. Any stray bits will be &-escaped to result in a valid text fragment. a. named char refs can only be < > & ", others are numericized (this way we're well-formed even without a DTD) b. any numeric char refs must be legal chars, not invalid or forbidden c. use lower cased "&#x", not "&#X" d. fix or reject non-valid attributes param: string $text return: string |
normalizeCharReferencesCallback( $matches ) X-Ref |
param: string $matches return: string |
normalizeEntity( $name ) X-Ref |
If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the equivalent numeric entity reference (except for the core < > & "). If the entity is a MediaWiki-specific alias, returns the HTML equivalent. Otherwise, returns HTML-escaped text of pseudo-entity source (eg &foo;) param: string $name return: string |
decCharReference( $codepoint ) X-Ref |
param: int $codepoint return: null|string |
hexCharReference( $codepoint ) X-Ref |
param: int $codepoint return: null|string |
validateCodepoint( $codepoint ) X-Ref |
Returns true if a given Unicode codepoint is a valid character in XML. param: int $codepoint return: bool |
decodeCharReferences( $text ) X-Ref |
Decode any character references, numeric or named entities, in the text and return a UTF-8 string. param: string $text return: string |
decodeCharReferencesAndNormalize( $text ) X-Ref |
Decode any character references, numeric or named entities, in the next and normalize the resulting string. (bug 14952) This is useful for page titles, not for text to be displayed, MediaWiki allows HTML entities to escape normalization as a feature. param: string $text Already normalized, containing entities return: string Still normalized, without entities |
decodeCharReferencesCallback( $matches ) X-Ref |
param: string $matches return: string |
decodeChar( $codepoint ) X-Ref |
Return UTF-8 string for a codepoint if that is a valid character reference, otherwise U+FFFD REPLACEMENT CHARACTER. param: int $codepoint return: string |
decodeEntity( $name ) X-Ref |
If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the UTF-8 encoding of that character. Otherwise, returns pseudo-entity source (eg "&foo;") param: string $name return: string |
attributeWhitelist( $element ) X-Ref |
Fetch the whitelist of acceptable attributes for a given element name. param: string $element return: array |
setupAttributeWhitelist() X-Ref |
Foreach array key (an allowed HTML element), return an array of allowed attributes return: array |
stripAllTags( $text ) X-Ref |
Take a fragment of (potentially invalid) HTML and return a version with any tags removed, encoded as plain text. Warning: this return value must be further escaped for literal inclusion in HTML output as of 1.10! param: string $text HTML fragment return: string |
hackDocType() X-Ref |
Hack up a private DOCTYPE with HTML's standard entity declarations. PHP 4 seemed to know these if you gave it an HTML doctype, but PHP 5.1 doesn't. Use for passing XHTML fragments to PHP's XML parsing functions return: string |
cleanUrl( $url ) X-Ref |
param: string $url return: mixed|string |
cleanUrlCallback( $matches ) X-Ref |
param: array $matches return: string |
validateEmail( $addr ) X-Ref |
Does a string look like an e-mail address? This validates an email address using an HTML5 specification found at: http://www.whatwg.org/html/states-of-the-type-attribute.html#valid-e-mail-address Which as of 2011-01-24 says: A valid e-mail address is a string that matches the ABNF production 1*( atext / "." ) "@" ldh-str *( "." ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5. This function is an implementation of the specification as requested in bug 22449. Client-side forms will use the same standard validation rules via JS or HTML 5 validation; additional restrictions can be enforced server-side by extensions via the 'isValidEmailAddr' hook. Note that this validation doesn't 100% match RFC 2822, but is believed to be liberal enough for wide use. Some invalid addresses will still pass validation here. param: string $addr E-mail address return: bool |
Generated: Fri Nov 28 14:03:12 2014 | Cross-referenced by PHPXref 0.7.1 |