Comparing XML using line-oriented diff tools

If you need to compare two XML files that have only a few differences, you can use vimdiff or other line-oriented diff tools. You first need to format the XML with one attribute per line. I use xml_pp like this:

xml_pp -i -s nsgmls data.xml

The -i option formats the file in-place (that is, it modifies the file). The -s nsgmls option formats one attribute per line, with no indentation. From the documentation:

Line breaks are inserted in safe places: that is within tags, between a tag and an attribute, between attributes and before the > at the end of a tag.

This is quite ugly but better than “none”, and it is very safe, the document will still be valid (conforming to its DTD).

My thanks go to the authors of these useful tools for making them freely available: Michel Rodriguez for xml_pp and Bram Moolenaar for vimdiff.

To install xml_pp (and its underlying Perl module XML::Twig, plus dependencies) on Mac OS X:

  1. Download and install MacPorts
  2. sudo port selfupdate
  3. sudo port install p5-xml-twig

Don’t try to install it using cpan XML::Twig. Installation fails miserably and messily.

Update: Here’s how to truncate numeric attribute values to 6 decimal places:

perl -pi -w -e "s/(\.[0-9]{6})[0-9]+/$1/" data.xml

Again, this modifies the file in-place. The number of decimal places is between the curly braces. Adjust it as needed.

Published 26 November 2008, tags: ,

Comments

You can email me at stephen#viles.nz (change # to @) or tweet me at @svilesnz.

Articles