MediaWiki
master
|
Class for reading xmp data containing properties relevant to images, and spitting out an array that FormatMetadata accepts. More...
Public Member Functions | |
__construct (LoggerInterface $logger=null) | |
Constructor. More... | |
char ($parser, $data) | |
Character data handler Called whenever character data is found in the xmp document. More... | |
endElement ($parser, $elm) | |
Handler for hitting a closing element. More... | |
getResults () | |
Get the result array. More... | |
parse ($content, $allOfIt=true) | |
Main function to call to parse XMP. More... | |
parseExtended ($content) | |
Entry point for XMPExtended blocks in jpeg files. More... | |
setLogger (LoggerInterface $logger) | |
startElement ($parser, $elm, $attribs) | |
Hits an opening element. More... | |
Static Public Member Functions | |
static | isSupported () |
Check if this instance supports using this class. More... | |
Public Attributes | |
const | MODE_ALT = 15 |
const | MODE_BAG = 13 |
const | MODE_BAGSTRUCT = 16 |
const | MODE_IGNORE = 1 |
const | MODE_INITIAL = 0 |
These are various mode constants. More... | |
const | MODE_LANG = 14 |
const | MODE_LI = 2 |
const | MODE_LI_LANG = 3 |
const | MODE_QDESC = 4 |
const | MODE_SEQ = 12 |
const | MODE_SIMPLE = 10 |
const | MODE_STRUCT = 11 |
const | NS_RDF = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' |
const | NS_XML = 'http://www.w3.org/XML/1998/namespace' |
const | PARSABLE_BUFFERING = 2 |
const | PARSABLE_NO = 3 |
const | PARSABLE_OK = 1 |
const | PARSABLE_UNKNOWN = 0 |
Protected Attributes | |
array | $items |
XMP item configuration array. More... | |
Private Member Functions | |
checkParseSafety ($content) | |
Check if a block of XML is safe to pass to xml_parse, i.e. More... | |
destroyXMLParser () | |
free the XML parser. More... | |
doAttribs ($attribs) | |
Process attributes. More... | |
endElementModeIgnore ($elm) | |
When we hit a closing element in MODE_IGNORE Check to see if this is the element we started to ignore, in which case we get out of MODE_IGNORE. More... | |
endElementModeLi ($elm) | |
Hit a closing element in MODE_LI (either rdf:Seq, or rdf:Bag ) Add information about what type of element this is. More... | |
endElementModeQDesc ($elm) | |
End element while in MODE_QDESC mostly when ending an element when we have a simple value that has qualifiers. More... | |
endElementModeSimple ($elm) | |
Hit a closing element when in MODE_SIMPLE. More... | |
endElementNested ($elm) | |
Hit a closing element in MODE_STRUCT, MODE_SEQ, MODE_BAG generally means we've finished processing a nested structure. More... | |
resetXMLParser () | |
Main use is if a single item has multiple xmp documents describing it. More... | |
saveValue ($ns, $tag, $val) | |
Given an extracted value, save it to results array. More... | |
startElementModeBag ($elm) | |
Start element in MODE_BAG (unordered array) this should always be <rdf:Bag> More... | |
startElementModeIgnore ($elm) | |
Hit an opening element while in MODE_IGNORE. More... | |
startElementModeInitial ($ns, $tag, $attribs) | |
Starting an element when in MODE_INITIAL This usually happens when we hit an element inside the outer rdf:Description. More... | |
startElementModeLang ($elm) | |
Start element in MODE_LANG (language alternative) this should always be <rdf:Alt> More... | |
startElementModeLi ($elm, $attribs) | |
opening element in MODE_LI process elements of arrays. More... | |
startElementModeLiLang ($elm, $attribs) | |
Opening element in MODE_LI_LANG. More... | |
startElementModeQDesc ($elm) | |
Start an element when in MODE_QDESC. More... | |
startElementModeSeq ($elm) | |
Start element in MODE_SEQ (ordered array) this should always be <rdf:Seq> More... | |
startElementModeSimple ($elm, $attribs) | |
Handle an opening element when in MODE_SIMPLE. More... | |
startElementModeStruct ($ns, $tag, $attribs) | |
Hit an opening element when in a Struct (MODE_STRUCT) This is generally for fields of a compound property. More... | |
Private Attributes | |
bool string | $ancestorStruct = false |
The structure name when processing nested structures. More... | |
bool string | $charContent = false |
Temporary holder for character data that appears in xmp doc. More... | |
bool string | $charset = false |
Character set like 'UTF-8'. More... | |
array | $curItem = [] |
Array to hold the current element (and previous element, and so on) More... | |
int | $extendedXMPOffset = 0 |
bool string | $itemLang = false |
Used for lang alts only. More... | |
LoggerInterface | $logger |
array | $mode = [] |
Stores the state the xmpreader is in (see MODE_FOO constants) More... | |
int | $parsable = 0 |
Flag determining if the XMP is safe to parse. More... | |
bool | $processingArray = false |
If we're doing a seq or bag. More... | |
array | $results = [] |
Array to hold results. More... | |
string | $xmlParsableBuffer = '' |
Buffer of XML to parse. More... | |
resource | $xmlParser |
A resource handle for the XML parser. More... | |
Class for reading xmp data containing properties relevant to images, and spitting out an array that FormatMetadata accepts.
Note, this is not meant to recognize every possible thing you can encode in XMP. It should recognize all the properties we want. For example it doesn't have support for structures with multiple nesting levels, as none of the properties we're supporting use that feature. If it comes across properties it doesn't recognize, it should ignore them.
The public methods one would call in this class are
Note XMP kind of looks like rdf. They are not the same thing - XMP is encoded as a specific subset of rdf. This class can read XMP. It cannot read rdf.
XMPReader::__construct | ( | LoggerInterface | $logger = null | ) |
Constructor.
Primary job is to initialize the XMLParser
Definition at line 137 of file XMP.php.
References $logger, XMPInfo\getItems(), resetXMLParser(), and setLogger().
XMPReader::char | ( | $parser, | |
$data | |||
) |
Character data handler Called whenever character data is found in the xmp document.
does nothing if we're in MODE_IGNORE or if the data is whitespace throws an error if we're not in MODE_SIMPLE (as we're not allowed to have character data in the other modes).
As an example, this happens when we encounter XMP like: <exif:DigitalZoomRatio>0/10</exif:DigitalZoomRatio> and are processing the 0/10 bit.
XMLParser | $parser | XMLParser reference to the xml parser |
string | $data | Character data |
RuntimeException | On invalid data |
Definition at line 498 of file XMP.php.
Referenced by doAttribs().
|
private |
Check if a block of XML is safe to pass to xml_parse, i.e.
doesn't contain a doctype declaration which could contain a dos attack if we parse it and expand internal entities (T85848).
string | $content | xml string to check for parse safety |
Definition at line 535 of file XMP.php.
References $content.
Referenced by parse().
|
private |
free the XML parser.
Definition at line 165 of file XMP.php.
Referenced by parse(), and resetXMLParser().
|
private |
Process attributes.
Simple values can be stored as either a tag or attribute
Often the initial "<rdf:Description>" tag just has all the simple properties as attributes.
array | $attribs | Array attribute=>value |
RuntimeException |
Definition at line 1289 of file XMP.php.
References $attribs, $name, $tag, as, char(), list, and saveValue().
Referenced by startElementModeInitial(), startElementModeLi(), and startElementModeStruct().
XMPReader::endElement | ( | $parser, | |
$elm | |||
) |
Handler for hitting a closing element.
generally just calls a helper function depending on what mode we're in.
Ignores the outer wrapping elements that are optional in xmp and have no meaning.
XMLParser | $parser | |
string | $elm | Namespace . ' ' . element name |
RuntimeException |
Definition at line 783 of file XMP.php.
References endElementModeIgnore(), endElementModeLi(), endElementModeQDesc(), endElementModeSimple(), and endElementNested().
|
private |
When we hit a closing element in MODE_IGNORE Check to see if this is the element we started to ignore, in which case we get out of MODE_IGNORE.
string | $elm | Namespace of element followed by a space and then tag name of element. |
Definition at line 590 of file XMP.php.
Referenced by endElement().
|
private |
Hit a closing element in MODE_LI (either rdf:Seq, or rdf:Bag ) Add information about what type of element this is.
Note we still have to hit the outer "</property>"
This method is called when we hit the "</rdf:Seq>". (For comparison, we call endElementModeSimple when we hit the "</rdf:li>")
string | $elm | Namespace . ' ' . element name |
RuntimeException |
Definition at line 717 of file XMP.php.
Referenced by endElement().
|
private |
End element while in MODE_QDESC mostly when ending an element when we have a simple value that has qualifiers.
Qualifiers aren't all that common, and we don't do anything with them.
string | $elm | Namespace and element |
Definition at line 757 of file XMP.php.
References $tag, list, and saveValue().
Referenced by endElement().
|
private |
Hit a closing element when in MODE_SIMPLE.
This generally means that we finished processing a property value, and now have to save the result to the results array
For example, when processing: <exif:DigitalZoomRatio>0/10</exif:DigitalZoomRatio> this deals with when we hit </exif:DigitalZoomRatio>.
Or it could be if we hit the end element of a property of a compound data structure (like a member of an array).
string | $elm | Namespace, space, and tag name. |
Definition at line 612 of file XMP.php.
References $tag, list, and saveValue().
Referenced by endElement().
|
private |
Hit a closing element in MODE_STRUCT, MODE_SEQ, MODE_BAG generally means we've finished processing a nested structure.
resets some internal variables to indicate that.
Note this means we hit the closing element not the "</rdf:Seq>".
This method is called when we hit the "</exif:ISOSpeedRatings>" tag.
string | $elm | Namespace . space . tag name. |
RuntimeException |
Definition at line 647 of file XMP.php.
Referenced by endElement().
XMPReader::getResults | ( | ) |
|
static |
Check if this instance supports using this class.
Definition at line 197 of file XMP.php.
Referenced by BitmapMetadataHandler\GIF(), BitmapMetadataHandler\Jpeg(), BitmapMetadataHandler\PNG(), and JpegMetadataExtractor\segmentSplitter().
XMPReader::parse | ( | $content, | |
$allOfIt = true |
|||
) |
Main function to call to parse XMP.
Use getResults to get results.
Also catches any errors during processing, writes them to debug log, blanks result array and returns false.
string | $content | XMP data |
bool | $allOfIt | If this is all the data (true) or if its split up (false). Default true |
RuntimeException |
Definition at line 302 of file XMP.php.
References $code, $content, $e, $line, checkParseSafety(), destroyXMLParser(), and resetXMLParser().
Referenced by parseExtended(), and XMPTest\testXMPParse().
XMPReader::parseExtended | ( | $content | ) |
Entry point for XMPExtended blocks in jpeg files.
string | $content | XMPExtended block minus the namespace signature |
Definition at line 418 of file XMP.php.
References $content, parse(), and resetXMLParser().
|
private |
Main use is if a single item has multiple xmp documents describing it.
For example in jpeg's with extendedXMP
Definition at line 176 of file XMP.php.
References destroyXMLParser().
Referenced by __construct(), parse(), and parseExtended().
|
private |
Given an extracted value, save it to results array.
note also uses $this->ancestorStruct and $this->processingArray to determine what name to save the value under. (in addition to $tag).
string | $ns | Namespace of tag this is for |
string | $tag | Tag name |
string | $val | Value to save |
Definition at line 1338 of file XMP.php.
References $ancestorStruct, $itemLang, and $tag.
Referenced by doAttribs(), endElementModeQDesc(), endElementModeSimple(), and startElementModeSimple().
XMPReader::setLogger | ( | LoggerInterface | $logger | ) |
XMPReader::startElement | ( | $parser, | |
$elm, | |||
$attribs | |||
) |
Hits an opening element.
Generally just calls a helper based on what MODE we're in. Also does some initial set up for the wrapper element
XMLParser | $parser | |
string | $elm | Namespace "<space>" element |
array | $attribs | Attribute name => value |
RuntimeException |
Definition at line 1197 of file XMP.php.
References $attribs, $tag, list, startElementModeBag(), startElementModeIgnore(), startElementModeInitial(), startElementModeLang(), startElementModeLi(), startElementModeLiLang(), startElementModeQDesc(), startElementModeSeq(), startElementModeSimple(), and startElementModeStruct().
|
private |
Start element in MODE_BAG (unordered array) this should always be <rdf:Bag>
string | $elm | Namespace . ' ' . tag |
RuntimeException | If we have an element that's not <rdf:Bag> |
Definition at line 879 of file XMP.php.
Referenced by startElement().
|
private |
Hit an opening element while in MODE_IGNORE.
XMP is extensible, so ignore any tag we don't understand.
Mostly ignores, unless we encounter the element that we are ignoring. in which case we add it to the item stack, so we can ignore things that are nested, correctly.
string | $elm | Namespace . ' ' . tag name |
Definition at line 865 of file XMP.php.
Referenced by startElement().
|
private |
Starting an element when in MODE_INITIAL This usually happens when we hit an element inside the outer rdf:Description.
This is generally where most properties start.
string | $ns | Namespace |
string | $tag | Tag name (without namespace prefix) |
array | $attribs | Array of attributes |
RuntimeException |
Definition at line 1006 of file XMP.php.
References $attribs, $mode, $tag, and doAttribs().
Referenced by startElement().
|
private |
Start element in MODE_LANG (language alternative) this should always be <rdf:Alt>
This tag tends to be used for metadata like describe this picture, which can be translated into multiple languages.
XMP supports non-linguistic alternative selections, which are really only used for thumbnails, which we don't care about.
string | $elm | Namespace . ' ' . tag |
RuntimeException | If we have an element that's not <rdf:Alt> |
Definition at line 921 of file XMP.php.
Referenced by startElement().
|
private |
opening element in MODE_LI process elements of arrays.
Example: <exif:ISOSpeedRatings> <rdf:Seq> <rdf:li>64</rdf:li> </rdf:Seq> </exif:ISOSpeedRatings> This method is called when we hit the <rdf:li> element.
string | $elm | Namespace . ' ' . tagname |
array | $attribs | Attributes. (needed for BAGSTRUCTS) |
RuntimeException | If gets a tag other than <rdf:li> |
Definition at line 1116 of file XMP.php.
References $attribs, doAttribs(), and list.
Referenced by startElement().
|
private |
Opening element in MODE_LI_LANG.
process elements of language alternatives
Example: <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">My house </rdf:li> </rdf:Alt> </dc:title>
This method is called when we hit the <rdf:li> element.
string | $elm | Namespace . ' ' . tag |
array | $attribs | Array of elements (most importantly xml:lang) |
RuntimeException | If gets a tag other than <rdf:li> or if no xml:lang |
Definition at line 1166 of file XMP.php.
References $attribs.
Referenced by startElement().
|
private |
Start an element when in MODE_QDESC.
This generally happens when a simple element has an inner rdf:Description to hold qualifier elements.
For example in: <exif:DigitalZoomRatio><rdf:Description><rdf:value>0/10</rdf:value> <foo:someQualifier>Bar</foo:someQualifier> </rdf:Description> </exif:DigitalZoomRatio> Called when processing the <rdf:value> or <foo:someQualifier>.
string | $elm | Namespace and tag name separated by a space. |
Definition at line 984 of file XMP.php.
Referenced by startElement().
|
private |
Start element in MODE_SEQ (ordered array) this should always be <rdf:Seq>
string | $elm | Namespace . ' ' . tag |
RuntimeException | If we have an element that's not <rdf:Seq> |
Definition at line 894 of file XMP.php.
Referenced by startElement().
|
private |
Handle an opening element when in MODE_SIMPLE.
This should not happen often. This is for if a simple element already opened has a child element. Could happen for a qualified element.
For example: <exif:DigitalZoomRatio><rdf:Description><rdf:value>0/10</rdf:value> <foo:someQualifier>Bar</foo:someQualifier> </rdf:Description> </exif:DigitalZoomRatio>
This method is called when processing the <rdf:Description> element
string | $elm | Namespace and tag names separated by space. |
array | $attribs | Attributes of the element. |
RuntimeException |
Definition at line 947 of file XMP.php.
References $attribs, $tag, list, and saveValue().
Referenced by startElement().
|
private |
Hit an opening element when in a Struct (MODE_STRUCT) This is generally for fields of a compound property.
Example of a struct (abbreviated; flash has more properties):
<exif:Flash> <rdf:Description> <exif:Fired>True</exif:Fired> <exif:Mode>1</exif:Mode></rdf:Description></exif:Flash>
or:
<exif:Flash rdf:parseType='Resource'> <exif:Fired>True</exif:Fired> <exif:Mode>1</exif:Mode></exif:Flash>
string | $ns | Namespace |
string | $tag | Tag name (no ns) |
array | $attribs | Array of attribs w/ values. |
RuntimeException |
Definition at line 1068 of file XMP.php.
References $attribs, $tag, and doAttribs().
Referenced by startElement().
The structure name when processing nested structures.
Definition at line 61 of file XMP.php.
Referenced by saveValue().
|
private |
|
protected |
|
private |
Definition at line 130 of file XMP.php.
Referenced by __construct(), and setLogger().
|
private |
Stores the state the xmpreader is in (see MODE_FOO constants)
Definition at line 67 of file XMP.php.
Referenced by startElementModeInitial().
|
private |
|
private |
|
private |
|
private |
|
private |
const XMPReader::MODE_INITIAL = 0 |
These are various mode constants.
they are used to figure out what to do with an element when its encountered.
For example, MODE_IGNORE is used when processing a property we're not interested in. So if a new element pops up when we're in that mode, we ignore it.
const XMPReader::NS_RDF = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#' |
const XMPReader::NS_XML = 'http://www.w3.org/XML/1998/namespace' |