Blog of Raivo Laanemets

Stories about web development, consulting and personal computers.

Extracting values from XML with Xidel

On 2014-02-13

Today I needed to extract some values from an XML file. The file was rather large with long lines and I had not much success working on it with simpler tools like grep and sed.

The file (can be obtained from here) contains historical EU Central Bank exchange rates with the following structure:

<Cube>
    <Cube time="2014-02-10">
        <Cube currency="USD" rate="1.3638"/>
        <Cube currency="JPY" rate="139.26"/>
        <Cube currency="BGN" rate="1.9558"/>
        <Cube currency="CZK" rate="27.547"/>
        ...

I needed to extract the rate for the given date and currency. The file contains rates back to 1999. As I only needed values for very few dates I hoped that there is a command-line tool for querying XPath or CSS expressions against it. I first looked at xsltproc which is installed on most *nix systems by default. However, it did not help as I would have to write an XSL file. So I kept looking for another tool as I wanted to avoid writing XML as much as possible. I finally found Xidel.

Xidel is a command line tool to download and extract data from html/xml pages. It supports XPath 2, CSS 3, XQuery 1 and JSONiq. XPath and CSS queries can be easily mixed. With Xidel I was able to get what I needed by combining CSS attribute selectors and XPath into a single query:

$ xidel eurofxref-hist.xml -e "css('Cube[time=1999-01-13] Cube[currency=USD]')/@rate"
1.1744

I know that the CSS part of the selector could be probably done with XPath too but I like CSS more. Xidel is written in Free Pascal but is statically linked and can be easily installed from binary package. Binary packages and the source is available from its home page.