What I wanted to do is to perform a 'Diff' on 2 XML documents and determine whether or not they are the same document (regardless of whitespace or child element ordering).
In JAVA, I came across XmlUnit. This piece of software is excellent for determining whether or not 2 separate XML documents are equal.
In Perl, I came across XML-SemanticDiff. I thought it was great until I re-ordered the elements in one of my documents. Then this module wasn't so great anymore.
Since I really needed a piece of software equivalent to XmlUnit in Perl, I decided to create my own module and to call it XML-SemanticCompare. This new module really does perform a semantic diff on XML documents:
- Child element re-ordering doesn't result in false negatives.
- Whitespace is trimmed from text by default when comparing text and attribute values [can be turned off].
- Attributes can be ignored [turned off by default].
Using the module is extremely straightforward:
use XML::SemanticCompare;
my $x = XML::SemanticCompare->new;
# compare 2 different files
my $isSame = $x->compare($control_xml, $test_xml);
# are they the same
print "XML matches!\n"
if $isSame;
print "XML files are semantically different!\n"
unless $isSame;
# get the diffs
my $diffs_arrayref = $x->diff($control_xml, $test_xml);
# test xpath statement against XML
my $success = $x->test_xpath($xpath, $test_xml);
print "xpath success!\n" if $success;
The only downside to this piece of software is that it isn't very efficient (although, it isn't terribly inefficient either).
If you find yourself trying to compare XML documents DOM trees for equality and you are using Perl, please check out XML-SemanticCompare. If you can make the code more robust and efficient, please do!