About XQuery

Timothy J. Finney


Table of Contents

1. Introduction
2. History
3. Characteristics
4. Omissions
5. Algebra
6. Examples
7. Toolkit
8. Books
Reference List

XQuery is to XML as SQL is to relational databases.

Unknown

1. Introduction

XQuery 1.0: An XML Query Language became a W3C Candidate Recommendation on 3 November 2005. The specification's abstract says,

XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.

2. History

Charles Goldfarb invented SGML (1974). This markup language had its beginnings in IBM's GML (1960s). Tim Berners-Lee used SGML to make HTML (1993). SGML is also the basis of the XML specification (1998) edited by Tim Bray, Jean Paoli and Michael Sperberg-McQueen. Michael Sperberg-McQueen is also one of the editors of the original TEI specification (1994).

XML has now become a popular medium for data storage and exchange. A number of other specifications have been produced to help use XML. One of these is XQuery, which grew out of a conference held at Boston in 1998 to discuss a query language for XML.

3. Characteristics

This borrows heavily from the Wikipedia entries for XQuery and Functional Programming:

Data model

XML is a tree-like data structure with seven kinds of nodes (document, element, attribute, text, comment, processing instruction, namespace). The XQuery data model has these seven node and 50 atomic types as well. The 50 may be arranged under the headings of untyped, Boolean, numeric, string, calendar, qualified name, and other types.

XPath navigation

Documents are interrogated using XPath expressions. (See the Wikipedia entry for XPath.)

Constructs XML

It is very easy to construct XML using XQuery.

Operates on sequences

In XQuery, all input and output must be an instance of the XQuery data model. Every instance is a sequence of zero or more items, and every item is one of the seven node or 50 atomic types.

Functional

Some of you are old enough to remember the GOTO statement, which became extinct because of an article by Edsger Dijkstra called A Case against the GO TO Statement (1968). The 1977 Turing Award lecture by John Backus (who invented FORTRAN and BNF) entitled Can Programming Be Liberated From the von Neumann Style? may send the assignment statement the same way. [Kay 2004, 625-9]

What!? How then do I program? In order to become functional, you are best to forget everything you know about procedural programming. Functional programs are like that frightful mathematics you learned at university with statements such as for every, there exists, let N be the set of all positive integers. Functions do everything interesting. Side-effects are forbidden (i.e. existing values are not clobbered). If the procedural left-side of your brain tells you to use iteration, the functional right-side must come up with a recursive solution instead.

Advantages of this approach are:

  • invariance: the result of a function will be the same for a given set of parameters no matter how it is evaluated. This makes it easier to prove program correctness and to use parallel computing.

  • closure: no side effects (i.e. no changes of state) occur. Consequently, pure functional programs are guaranteed to be thread safe.

  • modularity: order of execution no longer matters so a result built from independent parts can be constructed as the parts come to hand. One need not wait for all of the parts before beginning to produce the output. One need not worry about other parts when one part is updated.

4. Omissions

These things are not in the current specification, but will be in a future edition.

Updates

The original XML document can be transformed but not updated.

Full text search

One can only apply pattern matching to individual text nodes. A program is necessary to look through the whole document.

It would also be nice to make XQuery object-oriented (without getters and setters, which require assignment), but I don't even know if it makes sense to try.

5. Algebra

Joins and other forms of algebra are achieved using the FLWOR construct (for, let, where, order by, return). This is an operation, not a loop! It transforms one sequence into another. Don't ask how, just believe. The return of FLWOR is nothing to do with that of a procedural language subroutine.

6. Examples

These are lifted from XML Query Use Cases:

Sample data

<bib>
    
    <book year="1994">
        <title>TCP/IP Illustrated</title>
        <author><last>Stevens</last><first>W.</first></author>
        <publisher>Addison-Wesley</publisher>
        <price>65.95</price>
    </book>
 
    <book year="1992">
        <title>Advanced Programming in the Unix environment</title>
        <author><last>Stevens</last><first>W.</first></author>
        <publisher>Addison-Wesley</publisher>
        <price>65.95</price>
    </book>
 
    <book year="2000">
        <title>Data on the Web</title>
        <author><last>Abiteboul</last><first>Serge</first></author>
        <author><last>Buneman</last><first>Peter</first></author>
        <author><last>Suciu</last><first>Dan</first></author>
        <publisher>Morgan Kaufmann Publishers</publisher>
        <price>39.95</price>
    </book>
 
    <book year="1999">
        <title>The Economics of Technology and Content for Digital TV</title>
        <editor>
               <last>Gerbarg</last><first>Darcy</first>
                <affiliation>CITI</affiliation>
        </editor>
            <publisher>Kluwer Academic Publishers</publisher>
        <price>129.95</price>
    </book>
 
</bib>

    
List books published by Addison-Wesley after 1991, including their year and title.

<bib>
 {
  for $b in doc("http://bstore1.example.com/bib.xml")/bib/book
  where $b/publisher = "Addison-Wesley" and $b/@year > 1991
  return
    <book year="{ $b/@year }">
     { $b/title }
    </book>
 }
</bib>

          
Result

<bib>
    <book year="1994">
        <title>TCP/IP Illustrated</title>
    </book>
    <book year="1992">
        <title>Advanced Programming in the Unix environment</title>
    </book>
</bib>

          
For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a "result" element.

<results>
  {
    let $a := doc("http://bstore1.example.com/bib/bib.xml")//author
    for $last in distinct-values($a/last),
        $first in distinct-values($a[last=$last]/first)
    order by $last, $first
    return
        <result>
            <author>
               <last>{ $last }</last>
               <first>{ $first }</first>
            </author>
            {
                for $b in doc("http://bstore1.example.com/bib.xml")/bib/book
                where some $ba in $b/author 
                      satisfies ($ba/last = $last and $ba/first=$first)
                return $b/title
            }
        </result>
  }
</results>

          
Result

<results>
    <result>
        <author>
            <last>Abiteboul</last>
            <first>Serge</first>
        </author>
        <title>Data on the Web</title>
    </result>
    <result>
        <author>
            <last>Buneman</last>
            <first>Peter</first>
        </author>
        <title>Data on the Web</title>
    </result>
    <result>
        <author>
            <last>Stevens</last>
            <first>W.</first>
        </author>
        <title>TCP/IP Illustrated</title>
        <title>Advanced Programming in the Unix environment</title>
    </result>
    <result>
        <author>
            <last>Suciu</last>
            <first>Dan</first>
        </author>
        <title>Data on the Web</title>
    </result>
</results>

          
For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors.

<bib>
  {
    for $b in doc("http://bstore1.example.com/bib.xml")//book
    where count($b/author) > 0
    return
        <book>
            { $b/title }
            {
                for $a in $b/author[position()<=2]  
                return $a
            }
            {
                if (count($b/author) > 2)
                 then <et-al/>
                 else ()
            }
        </book>
  }
</bib>
          
Result

<bib>
    <book>
        <title>TCP/IP Illustrated</title>
        <author>
            <last>Stevens</last>
            <first>W.</first>
        </author>
    </book>
    <book>
        <title>Advanced Programming in the Unix environment</title>
        <author>
            <last>Stevens</last>
            <first>W.</first>
        </author>
    </book>
    <book>
        <title>Data on the Web</title>
        <author>
            <last>Abiteboul</last>
            <first>Serge</first>
        </author>
        <author>
            <last>Buneman</last>
            <first>Peter</first>
        </author>
        <et-al/>
    </book>
</bib>

          

7. Toolkit

8. Books

The book by Brundage and the one edited by Katz given in the references will get you started. Look out for a forthcoming book by Priscilla Walmsley, possibly from O'Reilly.

[Note]Note

Priscilla Walmsley's book on XQuery was published the year after this short introduction was written. The book is now listed in the reference list.

Reference List

Boag, Scott, Don Chamberlin, Mary F. Fernández, Daniela Florescu, Jonathan Robie, Jérôme Siméon. XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery/.

Brundage, Michael. 2004. XQuery: The XML Query Language. Boston: Addison-Wesley.

Chamberlin, Don, Peter Fankhauser, Daniela Florescu, Massimo Marchiori, Jonathan Robie. XML Query Use Cases. http://www.w3.org/TR/xquery-use-cases/.

Functional Programming. Wikipedia. http://en.wikipedia.org/wiki/Functional_language.

Katz, Howard, ed. 2004. XQuery from the Experts: A Guide to the W3C XML Query Language. Boston: Addison-Wesley.

Kay, Michael. 2004. XSLT 2.0. 3rd ed. Indianapolis: Wiley Publishing.

Walmsley, Priscilla. 2007. XQuery. Sebastopol: O'Reilly.