Another way of dealing with the semi-structured data is to query for one of
a number of possibilities. This section covers UNION
patterns, where one of a number of
possibilities is tried.
Both the vCard vocabulary and the FOAF vocabulary have properties for people's names. In vCard, it is vCard:FN, the "formatted name", and in FOAF, it is foaf:name. In this section, we will look at a small set of data where the names of people can be given by either the FOAF or the vCard vocabulary.
Suppose we have an RDF graph that contains name information using both the vCard and FOAF vocabularies.
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix vcard: <http://www.w3.org/2001/vcard-rdf/3.0#> . _:a foaf:name "Matt Jones" . _:b foaf:name "Sarah Jones" . _:c vcard:FN "Becky Smith" . _:d vcard:FN "John Smith" .
A query to access the name information, when it can be in either form, could be (q-union1.rq):
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT ?name WHERE { { [] foaf:name ?name } UNION { [] vCard:FN ?name } }
This returns the results:
----------------- | name | ================= | "Matt Jones" | | "Sarah Jones" | | "Becky Smith" | | "John Smith" | -----------------
It didn't matter which form of expression was used for the name, the ?name
variable is set. This can be achieved using a FILTER
as this query (q-union-1alt.rq)
shows:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT ?name WHERE { [] ?p ?name FILTER ( ?p = foaf:name || ?p = vCard:FN ) }
testing whether the property is one URI or another. The solutions may
not come out in the same order. The first form is more likely to be
faster, depending on the data and the storage used, because the second form may have to get
all the triples from the graph to match the triple pattern with unbound
variables (or blank nodes) in each slot, then test each ?p
to
see it it matches one of the values. It will depend on the sophistication of
the query optimizer as to whether it spots that it can perform the query
more efficiently and is able to pass the constraint down as will as to the storage
layer.
The example above used the same variable in each branch. If different variables are used, the application can discover which sub-pattern caused the match (q-union2.rq):
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT ?name1 ?name2 WHERE { { [] foaf:name ?name1 } UNION { [] vCard:FN ?name2 } }
--------------------------------- | name1 | name2 | ================================= | "Matt Jones" | | | "Sarah Jones" | | | | "Becky Smith" | | | "John Smith" | ---------------------------------
This second query has retained information of where the name of the person came from by assigning the name to different variables.
In practice, OPTIONAL
is more common than UNION
but they both have their uses. OPTIONAL
are useful for augmenting the
solutions found, UNION
is useful for concatenating the solutions from two
possibilities. They don't necessary return the information in the same way:
Query(q-union3.rq):
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX vCard: <http://www.w3.org/2001/vcard-rdf/3.0#> SELECT ?name1 ?name2 WHERE { ?x a foaf:Person OPTIONAL { ?x foaf:name ?name1 } OPTIONAL { ?x vCard:FN ?name2 } }
--------------------------------- | name1 | name2 | ================================= | "Matt Jones" | | | "Sarah Jones" | | | | "Becky Smith" | | | "John Smith" | ---------------------------------
but beware of using ?name
in each OPTIONAL
because that is an order-dependent query.