XHTML sanitizer for MediaWiki. More...

Static Public Member Functions
static	attributeWhitelist ($element)
	Fetch the whitelist of acceptable attributes for a given element name.
static	checkCss ($value)
	Pick apart some CSS and check it for forbidden or unsafe structures.
static	cleanUrl ($url)
static	cleanUrlCallback ($matches)
static	cssDecodeCallback ($matches)
static	cssNormalizeUnicodeWidth ($matches)
	Normalize Unicode U+FF01 to U+FF5A.
static	decCharReference ($codepoint)
static	decodeChar ($codepoint)
	Return UTF-8 string for a codepoint if that is a valid character reference, otherwise U+FFFD REPLACEMENT CHARACTER.
static	decodeCharReferences ($text)
	Decode any character references, numeric or named entities, in the text and return a UTF-8 string.
static	decodeCharReferencesAndNormalize ($text)
	Decode any character references, numeric or named entities, in the next and normalize the resulting string.
static	decodeCharReferencesCallback ($matches)
static	decodeEntity ($name)
	If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the UTF-8 encoding of that character.
static	decodeTagAttributes ($text)
	Return an associative array of attribute names and values from a partial tag string.
static	encodeAttribute ($text)
	Encode an attribute value for HTML output.
static	escapeClass ($class)
	Given a value, escape it so that it can be used as a CSS class and return it.
static	escapeHtmlAllowEntities ($html)
	Given HTML input, escape with htmlspecialchars but un-escape entites.
static	escapeId ($id, $options=array())
	Given a value, escape it so that it can be used in an id attribute and return it.
static	fixDeprecatedAttributes ($attribs, $element)
	Take an array of attribute names and values and fix some deprecated values for the given element type.
static	fixTagAttributes ($text, $element)
	Take a tag soup fragment listing an HTML element's attributes and normalize it to well-formed XML, discarding unwanted attributes.
static	getAttribsRegex ()
	Regular expression to match HTML/XML attribute pairs within a tag.
static	hackDocType ()
	Hack up a private DOCTYPE with HTML's standard entity declarations.
static	hexCharReference ($codepoint)
static	mergeAttributes ($a, $b)
	Merge two sets of HTML attributes.
static	normalizeCharReferences ($text)
	Ensure that any entities and character references are legal for XML and XHTML specifically.
static	normalizeCharReferencesCallback ($matches)
static	normalizeCss ($value)
	Normalize CSS into a format we can easily search for hostile input.
static	normalizeEntity ($name)
	If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the equivalent numeric entity reference (except for the core < > & ").
static	normalizeSectionNameWhitespace ($section)
	Normalizes whitespace in a section name, such as might be returned by Parser::stripSectionName(), for use in the id's that are used for section links.
static	removeHTMLcomments ($text)
	Remove '', and everything between.
static	removeHTMLtags ($text, $processCallback=null, $args=array(), $extratags=array(), $removetags=array())
	Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments.
static	safeEncodeAttribute ($text)
	Encode an attribute value for HTML tags, with extra armoring against further wiki processing.
static	setupAttributeWhitelist ()
	Foreach array key (an allowed HTML element), return an array of allowed attributes.
static	stripAllTags ($text)
	Take a fragment of (potentially invalid) HTML and return a version with any tags removed, encoded as plain text.
static	validateAttributes ($attribs, $whitelist)
	Take an array of attribute names and values and normalize or discard illegal values for the given whitelist.
static	validateEmail ($addr)
	Does a string look like an e-mail address?
static	validateTagAttributes ($attribs, $element)
	Take an array of attribute names and values and normalize or discard illegal values for the given element type.
Public Attributes
const	CHAR_REFS_REGEX
	Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences.
const	EVIL_URI_PATTERN = '!(^\|\s\|\/\s)(javascript\|vbscript)([^\w]\|$)!i'
	Blacklist for evil uris like javascript: WARNING: DO NOT use this in any place that actually requires blacklisting for security reasons.
const	XMLNS_ATTRIBUTE_PATTERN = "/^xmlns:[:A-Z_a-z-.0-9]+$/"
Static Public Attributes
static	$attribsRegex
	Lazy-initialised attributes regex, see getAttribsRegex()
static	$htmlEntities
	List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html As well as ' which is only defined starting in XHTML1.
static	$htmlEntityAliases
	Character entity aliases accepted by MediaWiki.
Static Private Member Functions
static	armorLinksCallback ($matches)
	Regex replace callback for armoring links against further processing.
static	getTagAttributeCallback ($set)
	Pick the appropriate attribute value from a match set from the attribs regex matches.
static	normalizeAttributeValue ($text)
	Normalize whitespace and character references in an XML source- encoded text for an attribute value.
static	normalizeWhitespace ($text)
static	validateCodepoint ($codepoint)
	Returns true if a given Unicode codepoint is a valid character in XML.

Detailed Description

XHTML sanitizer for MediaWiki.

Definition at line 31 of file Sanitizer.php.

Member Function Documentation

static Sanitizer::armorLinksCallback ( $ matches ) [static, private]

Regex replace callback for armoring links against further processing.

Parameters:

$matches Array

Returns:: string

Definition at line 1199 of file Sanitizer.php.

References $matches.

static Sanitizer::attributeWhitelist ( $ element ) [static]

Fetch the whitelist of acceptable attributes for a given element name.

Parameters:

$element String

Returns:: Array

Definition at line 1507 of file Sanitizer.php.

References setupAttributeWhitelist().

Referenced by validateTagAttributes().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::checkCss ( $ value ) [static]

Pick apart some CSS and check it for forbidden or unsafe structures.

Returns a sanitized string. This sanitized string will have character references and escape sequences decoded and comments stripped (unless it is itself one valid comment, in which case the value will be passed through). If the input is just too evil, only a comment complaining about evilness will be returned.

Currently URL references, 'expression', 'tps' are forbidden.

NOTE: Despite the fact that character references are decoded, the returned string may contain character references given certain clever input strings. These character references must be escaped before the return value is embedded in HTML.

Parameters:

string $value

Returns:: string

Definition at line 946 of file Sanitizer.php.

References normalizeCss().

Referenced by SanitizerTest\testCssCommentsChecking(), and validateAttributes().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::cleanUrl ( $ url ) [static]

Parameters:

$url string

Returns:: mixed|string

Definition at line 1732 of file Sanitizer.php.

Referenced by Parser\makeFreeExternalLink(), and Parser\replaceExternalLinks().

Here is the caller graph for this function:

static Sanitizer::cleanUrlCallback ( $ matches ) [static]

Parameters:

$matches array

Returns:: string

Definition at line 1779 of file Sanitizer.php.

static Sanitizer::cssDecodeCallback ( $ matches ) [static]

Parameters:

$matches array

Returns:: String

Definition at line 984 of file Sanitizer.php.

References $matches, and codepointToUtf8().

Here is the call graph for this function:

static Sanitizer::cssNormalizeUnicodeWidth ( $ matches ) [static]

Normalize Unicode U+FF01 to U+FF5A.

Parameters:

character $char

Returns:: character in ASCII range -

Definition at line 972 of file Sanitizer.php.

References $matches, and utf8ToCodepoint().

Here is the call graph for this function:

static Sanitizer::decCharReference ( $ codepoint ) [static]

Parameters:

$codepoint

Returns:: null|string

Definition at line 1377 of file Sanitizer.php.

References validateCodepoint().

Referenced by normalizeCharReferencesCallback().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::decodeChar ( $ codepoint ) [static]

Return UTF-8 string for a codepoint if that is a valid character reference, otherwise U+FFFD REPLACEMENT CHARACTER.

Parameters:

$codepoint Integer

Returns:: String

Access:: private

Definition at line 1474 of file Sanitizer.php.

References codepointToUtf8(), and validateCodepoint().

Referenced by decodeCharReferencesCallback().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::decodeCharReferences ( $ text ) [static]

Decode any character references, numeric or named entities, in the text and return a UTF-8 string.

Parameters:

$text String

Returns:: String

Definition at line 1420 of file Sanitizer.php.

Referenced by CoreLinkFunctions\categoryLinkHook(), RecentChange\cleanupForIRC(), decodeTagAttributes(), UploadBase\detectScript(), Skin\doEditSectionLink(), escapeHtmlAllowEntities(), escapeId(), normalizeCss(), ImageCleanup\processRow(), Parser\replaceInternalLinks2(), SanitizerTest\testDecodeMixedComplexEntities(), SanitizerTest\testDecodeMixedEntities(), SanitizerTest\testDecodeNamedEntities(), SanitizerTest\testDecodeNumericEntities(), SanitizerTest\testInvalidAmpersand(), SanitizerTest\testInvalidEntities(), and SanitizerTest\testInvalidNumberedEntities().

Here is the caller graph for this function:

static Sanitizer::decodeCharReferencesAndNormalize ( $ text ) [static]

Decode any character references, numeric or named entities, in the next and normalize the resulting string.

(bug 14952)

This is useful for page titles, not for text to be displayed, MediaWiki allows HTML entities to escape normalization as a feature.

Parameters:

$text String (already normalized, containing entities)

Returns:: String (still normalized, without entities)

Definition at line 1437 of file Sanitizer.php.

References $count, and $wgContLang.

Referenced by Title\newFromText().

Here is the caller graph for this function:

static Sanitizer::decodeCharReferencesCallback ( $ matches ) [static]

Parameters:

$matches String

Returns:: String

Definition at line 1455 of file Sanitizer.php.

References $matches, decodeChar(), and decodeEntity().

Here is the call graph for this function:

static Sanitizer::decodeEntity ( $ name ) [static]

If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the UTF-8 encoding of that character.

Otherwise, returns pseudo-entity source (eg )

Parameters:

$name String

Returns:: String

Definition at line 1490 of file Sanitizer.php.

References codepointToUtf8().

Referenced by decodeCharReferencesCallback().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::decodeTagAttributes ( $ text ) [static]

Return an associative array of attribute names and values from a partial tag string.

Attribute names are forces to lowercase, character references are decoded to UTF-8 text.

Parameters:

$text String

Returns:: Array

Definition at line 1211 of file Sanitizer.php.

References decodeCharReferences(), and getTagAttributeCallback().

Referenced by LanguageConverter\autoConvert(), Parser\extensionSubstitution(), Parser\extractTagsAndParams(), fixTagAttributes(), Linker\makeKnownLinkObj(), and SanitizerTest\testDecodeTagAttributes().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::encodeAttribute ( $ text ) [static]

Encode an attribute value for HTML output.

Parameters:

$text String

Returns:: HTML-encoded text fragment

Definition at line 1048 of file Sanitizer.php.

Referenced by Xml\expandAttributes(), ApiFormatXml\recXmlPrint(), and safeEncodeAttribute().

Here is the caller graph for this function:

static Sanitizer::escapeClass ( $ class ) [static]

Given a value, escape it so that it can be used as a CSS class and return it.

Todo:: For extra validity, input should be validated UTF-8.

See also:: http://www.w3.org/TR/CSS21/syndata.html Valid characters/format

Parameters:

$class String

Returns:: String

Definition at line 1171 of file Sanitizer.php.

Referenced by ChangeTags\formatSummaryRow(), SpecialStatistics\getGroupStats(), Skin\getPageClasses(), OutputPage\headElement(), SkinTemplate\outputPage(), EnhancedChangesList\recentChangesBlockGroup(), EnhancedChangesList\recentChangesBlockLine(), and OldChangesList\recentChangesLine().

Here is the caller graph for this function:

static Sanitizer::escapeHtmlAllowEntities ( $ html ) [static]

Given HTML input, escape with htmlspecialchars but un-escape entites.

This allows (generally harmless) entities like   to survive.

Parameters:

$html String to escape

Returns:: String: escaped input

Definition at line 1186 of file Sanitizer.php.

References decodeCharReferences().

Referenced by Linker\formatComment(), AllmessagesTablePager\formatValue(), and wfMsgExt().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::escapeId	(	$	id,
		$	options = `array()`
	)		`[static]`

Given a value, escape it so that it can be used in an id attribute and return it.

This will use HTML5 validation if $wgExperimentalHtmlIds is true, allowing anything but ASCII whitespace. Otherwise it will use HTML 4 rules, which means a narrow subset of ASCII, with bad characters escaped with lots of dots.

To ensure we don't have to bother escaping anything, we also strip ', ", & even if $wgExperimentalIds is true. TODO: Is this the best tactic? We also strip # because it upsets IE, and % because it could be ambiguous if it's part of something that looks like a percent escape (which don't work reliably in fragments cross-browser).

See also:: http://www.w3.org/TR/html401/types.html#type-name Valid characters in the id and name attributes; http://www.w3.org/TR/html401/struct/links.html#h-12.2.3 Anchors with the id attribute; http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#the-id-attribute HTML5 definition of id attribute

Parameters:

$id String: id to escape

$options Mixed: string or array of strings (default is array()): 'noninitial': This is a non-initial fragment of an id, not a full id, so don't pay attention if the first character isn't valid at the beginning of an id. Only matters if $wgExperimentalHtmlIds is false. 'legacy': Behave the way the old HTML 4-based ID escaping worked even if $wgExperimentalHtmlIds is used, so we can generate extra anchors and links won't break.

Returns:: String

Definition at line 1127 of file Sanitizer.php.

References $options, $wgExperimentalHtmlIds, $wgHtml5, and decodeCharReferences().

Referenced by HTMLFormField\__construct(), Skin\addToSidebarPlain(), MonoBookTemplate\customBox(), HTMLForm\displaySection(), Title\escapeFragmentForURL(), SpecialListGroupRights\execute(), VectorTemplate\execute(), Parser\formatHeadings(), HTMLRadioField\formatOptions(), AllmessagesTablePager\getRowAttrs(), Parser\guessLegacySectionNameFromWikiText(), Parser\guessSectionNameFromWikiText(), ImagePage\makeMetadataTable(), and validateAttributes().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::fixDeprecatedAttributes	(	$	attribs,
		$	element
	)		`[static]`

Take an array of attribute names and values and fix some deprecated values for the given element type.

This does not validate properties, so you should ensure that you call validateTagAttributes AFTER this to ensure that the resulting style rule this may add is safe.

Converts most presentational attributes like align into inline css

Parameters:

$attribs	Array
$element	String

Returns:: Array

Definition at line 628 of file Sanitizer.php.

References $wgCleanupPresentationalAttributes, and $wgHtml5.

Referenced by fixTagAttributes().

Here is the caller graph for this function:

static Sanitizer::fixTagAttributes	(	$	text,
		$	element
	)		`[static]`

Take a tag soup fragment listing an HTML element's attributes and normalize it to well-formed XML, discarding unwanted attributes.

Output is safe for further wikitext processing, with escaping of values that could trigger problems.

Normalizes attribute names to lowercase
Discards attributes not on a whitelist for the given element
Turns broken or invalid entities into plaintext
Double-quotes all attribute values
Attributes without values are given the name as attribute
Double attributes are discarded
Unsafe style attributes are discarded
Prepends space if there are attributes.

Parameters:

$text	String
$element	String

Returns:: String

Definition at line 1024 of file Sanitizer.php.

References decodeTagAttributes(), fixDeprecatedAttributes(), safeEncodeAttribute(), and validateTagAttributes().

Referenced by Parser\doTableStuff(), removeHTMLtags(), and SanitizerTest\testDeprecatedAttributes().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::getAttribsRegex ( ) [static]

Regular expression to match HTML/XML attribute pairs within a tag.

Allows some... latitude. Used in Sanitizer::fixTagAttributes and Sanitizer::decodeTagAttributes

Definition at line 333 of file Sanitizer.php.

References $attribsRegex.

static Sanitizer::getTagAttributeCallback ( $ set ) [static, private]

Pick the appropriate attribute value from a match set from the attribs regex matches.

Parameters:

$set Array

Returns:: String

Definition at line 1247 of file Sanitizer.php.

Referenced by decodeTagAttributes().

Here is the caller graph for this function:

static Sanitizer::hackDocType ( ) [static]

Hack up a private DOCTYPE with HTML's standard entity declarations.

PHP 4 seemed to know these if you gave it an HTML doctype, but PHP 5.1 doesn't.

Use for passing XHTML fragments to PHP's XML parsing functions

Returns:: String

Definition at line 1719 of file Sanitizer.php.

Referenced by Xml\isWellFormedXmlFragment(), and ParserTest\wellFormed().

Here is the caller graph for this function:

static Sanitizer::hexCharReference ( $ codepoint ) [static]

Parameters:

$codepoint

Returns:: null|string

Definition at line 1390 of file Sanitizer.php.

References validateCodepoint().

Referenced by normalizeCharReferencesCallback().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::mergeAttributes	(	$	a,
		$	b
	)		`[static]`

Merge two sets of HTML attributes.

Conflicting items in the second set will override those in the first, except for 'class' attributes which will be combined (if they're both strings).

Todo:: implement merging for other attributes such as style

Parameters:

$a	Array
$b	Array

Returns:: array

Definition at line 827 of file Sanitizer.php.

References $out.

Referenced by Linker\linkAttribs(), Linker\makeKnownLinkObj(), and ImageGallery\toHTML().

Here is the caller graph for this function:

static Sanitizer::normalizeAttributeValue ( $ text ) [static, private]

Normalize whitespace and character references in an XML source- encoded text for an attribute value.

See http://www.w3.org/TR/REC-xml/#AVNormalize for background, but note that we're not returning the value, but are returning XML source fragments that will be slapped into output.

Parameters:

$text String

Returns:: String

Definition at line 1280 of file Sanitizer.php.

References normalizeCharReferences().

Here is the call graph for this function:

static Sanitizer::normalizeCharReferences ( $ text ) [static]

Ensure that any entities and character references are legal for XML and XHTML specifically.

Any stray bits will be &-escaped to result in a valid text fragment.

a. named char refs can only be < > & ", others are numericized (this way we're well-formed even without a DTD) b. any numeric char refs must be legal chars, not invalid or forbidden c. use &#x, not &#X d. fix or reject non-valid attributes

Parameters:

$text String

Returns:: String

Access:: private

Definition at line 1324 of file Sanitizer.php.

Referenced by CoreParserFunctions\displaytitle(), normalizeAttributeValue(), Parser\parse(), and OutputPage\setPageTitle().

Here is the caller graph for this function:

static Sanitizer::normalizeCharReferencesCallback ( $ matches ) [static]

Parameters:

$matches String

Returns:: String

Definition at line 1334 of file Sanitizer.php.

References $matches, decCharReference(), hexCharReference(), and normalizeEntity().

Here is the call graph for this function:

static Sanitizer::normalizeCss ( $ value ) [static]

Normalize CSS into a format we can easily search for hostile input.

decode character references
decode escape sequences
convert characters that IE6 interprets into ascii
remove comments, unless the entire value is one single comment
Parameters:

string $value the css string

Returns:
string normalized css

Definition at line 848 of file Sanitizer.php.

References decodeCharReferences(), and StringUtils\delimiterReplace().

Referenced by checkCss(), and UploadBase\checkSvgScriptCallback().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::normalizeEntity ( $ name ) [static]

If the named entity is defined in the HTML 4.0/XHTML 1.0 DTD, return the equivalent numeric entity reference (except for the core < > & ").

If the entity is a MediaWiki-specific alias, returns the HTML equivalent. Otherwise, returns HTML-escaped text of pseudo-entity source (eg &foo;)

Parameters:

$name String

Returns:: String

Definition at line 1360 of file Sanitizer.php.

Referenced by normalizeCharReferencesCallback().

Here is the caller graph for this function:

static Sanitizer::normalizeSectionNameWhitespace ( $ section ) [static]

Normalizes whitespace in a section name, such as might be returned by Parser::stripSectionName(), for use in the id's that are used for section links.

Parameters:

$section String

Returns:: String

Definition at line 1305 of file Sanitizer.php.

References $section.

Referenced by Linker\formatAutocommentsCallback(), Parser\formatHeadings(), Parser\guessLegacySectionNameFromWikiText(), and Parser\guessSectionNameFromWikiText().

Here is the caller graph for this function:

static Sanitizer::normalizeWhitespace ( $ text ) [static, private]

Parameters:

$text string

Returns:: mixed

Definition at line 1290 of file Sanitizer.php.

static Sanitizer::removeHTMLcomments ( $ text ) [static]

Remove '', and everything between.

To avoid leaving blank lines, when a comment is both preceded and followed by a newline (ignoring spaces), trim leading and trailing spaces and one of the newlines.

Access:: private

Parameters:

$text String

Returns:: string

Definition at line 580 of file Sanitizer.php.

References wfProfileIn(), and wfProfileOut().

Referenced by removeHTMLtags().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::removeHTMLtags	(	$	text,
		$	processCallback = `null`,
		$	args = `array()`,
		$	extratags = `array()`,
		$	removetags = `array()`
	)		`[static]`

Cleans up HTML, removes dangerous tags and attributes, and removes HTML comments.

Access:: private

Parameters:

$text	String
$processCallback	Callback to do any variable or parameter replacements in HTML attribute values
$args	Array for the processing callback
$extratags	Array for any extra tags to include
$removetags	Array for any tags (default or extra) to exclude

Returns:: string

Definition at line 366 of file Sanitizer.php.

References $t, $wgAllowImageTag, $wgUseTidy, fixTagAttributes(), removeHTMLcomments(), wfProfileIn(), wfProfileOut(), wfRestoreWarnings(), and wfSuppressWarnings().

Referenced by CoreParserFunctions\displaytitle(), Parser\internalParse(), OutputPage\setPageTitle(), SanitizerTest\testSelfClosingTag(), and Parser\testSrvus().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::safeEncodeAttribute ( $ text ) [static]

Encode an attribute value for HTML tags, with extra armoring against further wiki processing.

Parameters:

$text String

Returns:: HTML-encoded text fragment

Definition at line 1069 of file Sanitizer.php.

References encodeAttribute(), and wfUrlProtocols().

Referenced by fixTagAttributes().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::setupAttributeWhitelist ( ) [static]

Foreach array key (an allowed HTML element), return an array of allowed attributes.

Returns:: Array

Definition at line 1522 of file Sanitizer.php.

References $wgAllowImageTag, $wgAllowMicrodataAttributes, $wgAllowRdfaAttributes, and $wgHtml5.

Referenced by attributeWhitelist().

Here is the caller graph for this function:

static Sanitizer::stripAllTags ( $ text ) [static]

Take a fragment of (potentially invalid) HTML and return a version with any tags removed, encoded as plain text.

Warning: this return value must be further escaped for literal inclusion in HTML output as of 1.10!

Parameters:

$text String: HTML fragment

Returns:: String

Definition at line 1699 of file Sanitizer.php.

Referenced by CoreParserFunctions\displaytitle(), OutputPage\setPageTitle(), and Parser\stripAltText().

Here is the caller graph for this function:

static Sanitizer::validateAttributes	(	$	attribs,
		$	whitelist
	)		`[static]`

Take an array of attribute names and values and normalize or discard illegal values for the given whitelist.

Discards attributes not the given whitelist
Unsafe style attributes are discarded
Invalid id attributes are reencoded

Parameters:

$attribs	Array
$whitelist	Array: list of allowed attribute names

Returns:: Array

Todo:

Check for legal values where the DTD limits things.

Check for unique id attribute :P

Definition at line 746 of file Sanitizer.php.

References $out, $wgAllowMicrodataAttributes, $wgAllowRdfaAttributes, $wgHtml5, checkCss(), escapeId(), and wfUrlProtocols().

Referenced by validateTagAttributes().

Here is the call graph for this function:

Here is the caller graph for this function:

static Sanitizer::validateCodepoint ( $ codepoint ) [static, private]

Returns true if a given Unicode codepoint is a valid character in XML.

Parameters:

$codepoint Integer

Returns:: Boolean

Definition at line 1404 of file Sanitizer.php.

Referenced by decCharReference(), decodeChar(), and hexCharReference().

Here is the caller graph for this function:

static Sanitizer::validateEmail ( $ addr ) [static]

Does a string look like an e-mail address?

This validates an email address using an HTML5 specification found at: http://www.whatwg.org/specs/web-apps/current-work/multipage/states-of-the-type-attribute.html#valid-e-mail-address Which as of 2011-01-24 says:

A valid e-mail address is a string that matches the ABNF production 1*( atext / "." ) "@" ldh-str *( "." ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.

This function is an implementation of the specification as requested in bug 22449.

Client-side forms will use the same standard validation rules via JS or HTML 5 validation; additional restrictions can be enforced server-side by extensions via the 'isValidEmailAddr' hook.

Note that this validation doesn't 100% match RFC 2822, but is believed to be liberal enough for wide use. Some invalid addresses will still pass validation here.

Since:: 1.18

Parameters:

$addr String E-mail address

Returns:: Bool

Definition at line 1811 of file Sanitizer.php.

Referenced by LoginForm\addNewAccount(), LoginForm\addNewAccountInternal(), SpecialChangeEmail\attemptChange(), Autopromote\checkCondition(), SanitizerValidateEmailTest\checkEmail(), EmailConfirmation\execute(), User\isEmailConfirmed(), User\isValidEmailAddr(), SpecialPasswordReset\onSubmit(), and WebInstaller_Name\submit().

Here is the caller graph for this function:

static Sanitizer::validateTagAttributes	(	$	attribs,
		$	element
	)		`[static]`

Take an array of attribute names and values and normalize or discard illegal values for the given element type.

Discards attributes not on a whitelist for the given element
Unsafe style attributes are discarded
Invalid id attributes are reencoded

Parameters:

$attribs	Array
$element	String

Returns:: Array

Todo:

Check for legal values where the DTD limits things.

Check for unique id attribute :P

Definition at line 726 of file Sanitizer.php.

References attributeWhitelist(), and validateAttributes().

Referenced by fixTagAttributes(), CoreTagHooks\pre(), and Parser\renderImageGallery().

Here is the call graph for this function:

Here is the caller graph for this function:

Member Data Documentation

Sanitizer::$attribsRegex [static]

Lazy-initialised attributes regex, see getAttribsRegex()

Definition at line 326 of file Sanitizer.php.

Referenced by getAttribsRegex().

Sanitizer::$htmlEntities [static]

List of all named character entities defined in HTML 4.01 http://www.w3.org/TR/html4/sgml/entities.html As well as ' which is only defined starting in XHTML1.

Access:: private

Definition at line 59 of file Sanitizer.php.

Sanitizer::$htmlEntityAliases [static]

Initial value:

 array(
                'רלמ' => 'rlm',
                'رلم' => 'rlm',
        )

Character entity aliases accepted by MediaWiki.

Definition at line 318 of file Sanitizer.php.

const Sanitizer::CHAR_REFS_REGEX

Initial value:

                '/&([A-Za-z0-9\x80-\xff]+);
                 |&\#([0-9]+);
                 |&\#[xX]([0-9A-Fa-f]+);
                 |(&)/x'

Regular expression to match various types of character references in Sanitizer::normalizeCharReferences and Sanitizer::decodeCharReferences.

Definition at line 36 of file Sanitizer.php.

const Sanitizer::EVIL_URI_PATTERN = '!(^|\s|\*/\s*)(javascript|vbscript)([^\w]|$)!i'

Blacklist for evil uris like javascript: WARNING: DO NOT use this in any place that actually requires blacklisting for security reasons.

There are NUMEROUS[1] ways to bypass blacklisting, the only way to be secure from javascript: uri based xss vectors is to whitelist things that you know are safe and deny everything else. [1]: http://ha.ckers.org/xss.html

Definition at line 50 of file Sanitizer.php.

const Sanitizer::XMLNS_ATTRIBUTE_PATTERN = "/^xmlns:[:A-Z_a-z-.0-9]+$/"

Definition at line 51 of file Sanitizer.php.

The documentation for this class was generated from the following file:

includes/Sanitizer.php

Static Public Member Functions

Public Attributes

Static Public Attributes

Static Private Member Functions

Detailed Description

Member Function Documentation

Member Data Documentation