XPath 2.0 deep-equal Does Not Match Like Expected – The Problem With Whitespace

I just stumbled accros a problem with the deep-equal()-method introduced by XPath 2.0.
It costs me two hours at minimum to find out, what was going on.
So I want to share this with you, in case your are wasting time on the same problem and try to find a solution via google ;)

If you never heard of deep-equal() and just wonder how to compare XML-nodes in the right way, you should probably read this exelent article about equality in XSLT as a starter.

My Problem

My problem was, that I wanted to parse/output a node only, if there exists no node on the ancestor-axis, that has a exact duplicate of that node as a direct child.

The Difference Between A Comparison With = And With deep-equal()

If you just use simple equality (with = or eq), the two compared nodes are converted into strings implicitly.
That is no problem, if you are comparing attributes, or nodes, that only contain text.
But in all other cases, you will only compare the text-contents of the two nodes and their children.
Hence, if they differ only in an attribute, your test will report that they are equal, which might not be what you are expecting.

For example, the XPath-expression

//child/ref[ancestor::parent/ref=.]

will match the <ref>-node with @id='bar', that is nested insiede the <child>-node in this example-XML, what I was not expecting:

<root>
  <parent>
    <ref id="foo"><content>Same Text-Content</content></ref>
    <child>
      <ref id="bar"><content>Same Text-Content</content></ref>
    </child>
  <parent>
<list>

So, what I tried, after I found out about deep-equal() was the following Xpath-expression, which solves the problem in the above example:

//child/ref[deep-equal(ancestor::parent/ref,.)]

The Unexpected Behaviour Of deep-equal()

But, moving on I stumbled accross cases, where I was expecting a match, but deep-equal() does not match the nodes.
For example:

<root>
  <parent>
    <ref id="same">
      <content>Same Text-Content</content>
    </ref>
    <child>
      <ref id="same">
        <content>Same Text-Content</content>
      </ref>
    </child>
  <parent>
<list>

You probably catch the diffrenece at first glance, since I laid out the examples accordingly and gave you a hint in the heading of this post – but it really took me a long time to get that:

It is all about whitespace!

deep-equal() compares all child-nodes and only yields a match, if the compared nodes have exactly the same child-nodes.
But in the second example, the compared <ref>-nodes contain whitespace befor and after their child-node <content>.
And these whitespace are in fact implicite child-nodes of type text.
Hence, the two nodes in the second example differe, because the indentation on the second one has two more spaces.

The solution…?

Unfortunatly, I do not really know a good solution.
(If you come up with one, feel free to note or link it in the comments!)

The best solution would be an option additional argument for deep-equal(), that can be selected to tell the function to ignore such whitespace.
In fact, some XSLT-parsers do provide such an argument.

The only other solution, I can think of, is, to write another XSLT-script to remove all the whitespaces between tags to circumvent this at the first glance unexpected behaviour of deep-equal()

Funded by the Europian Union

This article was published in the course of a
resarch-project,
that is funded by the European Union and the federal state Northrhine-Wetphalia.


Europäische Union: Investitionen in unsere Zukunft - Europäischer Fonds für regionale Entwicklung
EFRE.NRW 2014-2020: Invesitionen in Wachstum und Beschäftigung