This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Any (efficient) way of finding the parent Node of a scala.xml.Node?

9 replies
Ken McDonald
Joined: 2011-02-13,
User offline. Last seen 42 years 45 weeks ago.
There doesn't seem to be anything in the API, and I'm guessing that XML trees are implemented as immutable data structures, which means no backlinks without contortions, which means finding the parent would pretty much involve traversing, on average, one half of the tree. But I'm curious if I'm missing something :-)
Thanks,Ken
Alec Zorab
Joined: 2010-05-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Any (efficient) way of finding the parent Node of a scala.x

There's a rather nice xml zippper implementation at [1] which I've
used with great success. Might be worth taking a look?

[1] http://szeiger.de/blog/2009/12/27/a-zipper-for-scala-xml/

On Wed, Jul 13, 2011 at 10:21 PM, Ken McDonald wrote:
> There doesn't seem to be anything in the API, and I'm guessing that XML
> trees are implemented as immutable data structures, which means no backlinks
> without contortions, which means finding the parent would pretty much
> involve traversing, on average, one half of the tree. But I'm curious if I'm
> missing something :-)
> Thanks,
> Ken

vpatryshev
Joined: 2009-02-16,
User offline. Last seen 1 year 24 weeks ago.
Re: Any (efficient) way of finding the parent Node of a scala.x
I may be wrong, but I believe that a node does not have a parent. That is, another node can have a node as a child, but that's similar to one object pointing to another - do we want an object to know all the object that reference it? Or rather know "the" referencing object?


Thanks,
-Vlad


On Wed, Jul 13, 2011 at 2:21 PM, Ken McDonald <ykkenmcd@gmail.com> wrote:
There doesn't seem to be anything in the API, and I'm guessing that XML trees are implemented as immutable data structures, which means no backlinks without contortions, which means finding the parent would pretty much involve traversing, on average, one half of the tree. But I'm curious if I'm missing something :-)
Thanks,Ken

Chris Twiner
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Any (efficient) way of finding the parent Node of a scala.x

On Wed, Jul 13, 2011 at 11:21 PM, Ken McDonald wrote:
> There doesn't seem to be anything in the API, and I'm guessing that XML
> trees are implemented as immutable data structures, which means no backlinks
> without contortions, which means finding the parent would pretty much
> involve traversing, on average, one half of the tree. But I'm curious if I'm
> missing something :-)
> Thanks,
> Ken

Aside from the zipper that Alec mentions there is no straightforward
way directly in Scala XML. Its one of the things that drove me to
make ScalesXML, in particular I wanted the structure of the xml and
the data to be seperated, allowing a zipper to be the base data
structure.

This allows, for example, that XPaths could be modelled almost exactly
directly in Scala and that xml manipulation could be based not only in
a DSL but also via XPaths.

Funny thing is I started doing that in Nov 2009 and a month later the
zipper blog post was made, kinda validating really.

Andreas Scheinert
Joined: 2011-02-11,
User offline. Last seen 42 years 45 weeks ago.
Re: Any (efficient) way of finding the parent Node of a scala.xm

Hi Chris!
How does ScalesXML differ from Anti-XML?

Regards Andreas

On 14 Jul., 12:13, Chris Twiner wrote:
> On Wed, Jul 13, 2011 at 11:21 PM, Ken McDonald wrote:
> > There doesn't seem to be anything in the API, and I'm guessing that XML
> > trees are implemented as immutable data structures, which means no backlinks
> > without contortions, which means finding the parent would pretty much
> > involve traversing, on average, one half of the tree. But I'm curious if I'm
> > missing something :-)
> > Thanks,
> > Ken
>
> Aside from the zipper that Alec mentions there is no straightforward
> way directly in Scala XML.  Its one of the things that drove me to
> make ScalesXML, in particular I wanted the structure of the xml and
> the data to be seperated, allowing a zipper to be the base data
> structure.
>
> This allows, for example, that XPaths could be modelled almost exactly
> directly in Scala and that xml manipulation could be based not only in
> a DSL but also via XPaths.
>
> Funny thing is I started doing that in Nov 2009 and a month later the
> zipper blog post was made, kinda validating really.

Chris Twiner
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Any (efficient) way of finding the parent Node of a sca

Hi Andreas/All,

I'm not a big fan of such comparisons coming from the developers, they
are always one sided (hopefully led by passion though ^_^), or come
off as attacking.

That said I think the large visible differences between ScalesXML and
both ScalaXML and Ant-XML are:

- Scales has QName types instead of strings for qnames, this also
applies to querying via XPaths
- Scales has a unified model. This means that a full parse (SAX)
uses the same types as Pull Parsing (via StAX)
- Scales allows optimisation during the parse. This could be memory
reduction via cached QNames or Elems or even modifying trees at parse
(e.g. skip stuff when needed).
- Scales leverages Scalaz iteratees for pull parsing, eg. to provide
sub sections of a tree as a stream

Probably best to look at:
http://scala-scales.googlecode.com/svn/sites/snapshots/scales/scales-xml...

That should give a feel for where I'm going with the library.

cheers,
Chris

On Thu, Jul 14, 2011 at 2:16 PM, Andreas Scheinert
wrote:
> Hi Chris!
> How does ScalesXML differ from Anti-XML?
>
> Regards Andreas
>
> On 14 Jul., 12:13, Chris Twiner wrote:
>> On Wed, Jul 13, 2011 at 11:21 PM, Ken McDonald wrote:
>> > There doesn't seem to be anything in the API, and I'm guessing that XML
>> > trees are implemented as immutable data structures, which means no backlinks
>> > without contortions, which means finding the parent would pretty much
>> > involve traversing, on average, one half of the tree. But I'm curious if I'm
>> > missing something :-)
>> > Thanks,
>> > Ken
>>
>> Aside from the zipper that Alec mentions there is no straightforward
>> way directly in Scala XML.  Its one of the things that drove me to
>> make ScalesXML, in particular I wanted the structure of the xml and
>> the data to be seperated, allowing a zipper to be the base data
>> structure.
>>
>> This allows, for example, that XPaths could be modelled almost exactly
>> directly in Scala and that xml manipulation could be based not only in
>> a DSL but also via XPaths.
>>
>> Funny thing is I started doing that in Nov 2009 and a month later the
>> zipper blog post was made, kinda validating really.

DaveScala
Joined: 2011-03-18,
User offline. Last seen 1 year 21 weeks ago.
Re: Any (efficient) way of finding the parent Node of a scala.xm

Hi Chris,
Scales looks interesting. I am curious about its XPath compliance so I
am looking forward to the release.

Do you also have performance measurements like anti-xml
http://anti-xml.org/performance.html

What is Scales good at in terms of speed/memory usage and in what not
comparing to javax.xml, anti-xml and scala.xml?

Andreas Scheinert
Joined: 2011-02-11,
User offline. Last seen 42 years 45 weeks ago.
Re: Any (efficient) way of finding the parent Node of a scala.xm

Hi Chris,

thanks for explaining, I think i will take a closer look.

regards andreas

On 14 Jul., 21:35, Chris Twiner wrote:
> Hi Andreas/All,
>
> I'm not a big fan of such comparisons coming from the developers, they
> are always one sided (hopefully led by passion though ^_^), or come
> off as attacking.
>
> That said I think the large visible differences between ScalesXML and
> both ScalaXML and Ant-XML are:
>
> - Scales has QName types instead of strings for qnames, this also
> applies to querying via XPaths
> - Scales has a unified model.   This means that a full parse (SAX)
> uses the same types as Pull Parsing (via StAX)
> - Scales allows optimisation during the parse.  This could be memory
> reduction via cached QNames or Elems or even modifying trees at parse
> (e.g. skip stuff when needed).   QNames and the model Scales eats more that Scala, but can eat less
> than DOM>
> - Scales leverages Scalaz iteratees for pull parsing, eg. to provide
> sub sections of a tree as a stream
>
> Probably best to look at:http://scala-scales.googlecode.com/svn/sites/snapshots/scales/scales-...
>
> That should give a feel for where I'm going with the library.
>
> cheers,
> Chris
>
> On Thu, Jul 14, 2011 at 2:16 PM, Andreas Scheinert
>
>
>
> wrote:
> > Hi Chris!
> > How does ScalesXML differ from Anti-XML?
>
> > Regards Andreas
>
> > On 14 Jul., 12:13, Chris Twiner wrote:
> >> On Wed, Jul 13, 2011 at 11:21 PM, Ken McDonald wrote:
> >> > There doesn't seem to be anything in the API, and I'm guessing that XML
> >> > trees are implemented as immutable data structures, which means no backlinks
> >> > without contortions, which means finding the parent would pretty much
> >> > involve traversing, on average, one half of the tree. But I'm curious if I'm
> >> > missing something :-)
> >> > Thanks,
> >> > Ken
>
> >> Aside from the zipper that Alec mentions there is no straightforward
> >> way directly in Scala XML.  Its one of the things that drove me to
> >> make ScalesXML, in particular I wanted the structure of the xml and
> >> the data to be seperated, allowing a zipper to be the base data
> >> structure.
>
> >> This allows, for example, that XPaths could be modelled almost exactly
> >> directly in Scala and that xml manipulation could be based not only in
> >> a DSL but also via XPaths.
>
> >> Funny thing is I started doing that in Nov 2009 and a month later the
> >> zipper blog post was made, kinda validating really.

Chris Twiner
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Any (efficient) way of finding the parent Node of a sca

Hi Dave,

tricky question to answer. The reason I haven't put any such page up
is because all it shows is how the performance is for that particular
set of documents and the actual performance will depend on the
applications actual XML and constraints.

However performance is the thing I'm currently working on, memory
being compared to Scala XML and JAXP DOM (parsed and then streamed to
force a like for like comparison), speed wise I've not added jaxp -
yet but I expect the DTM xerces approach to be faster. I'm using
Caliper to perform tests, and yourkit to tune. One of the main tests
I'm doing is:

Recon

Chris Twiner
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Any (efficient) way of finding the parent Node of a sca

On Fri, Jul 15, 2011 at 10:58 AM, Chris Twiner wrote:
> Hi Dave,
>
> tricky question to answer.  The reason I haven't put any such page up
> is because all it shows is how the performance is for that particular
> set of documents and the actual performance will depend on the
> applications actual XML and constraints.
>
> However performance is the thing I'm currently working on, memory
> being compared to Scala XML and JAXP DOM (parsed and then streamed to
> force a like for like comparison), speed wise I've not added jaxp -
> yet but I expect the DTM xerces approach to be faster.   I'm using
> Caliper to perform tests, and yourkit to tune.  One of the main tests
> I'm doing is:
>
> Recon
>  |-- Parts
>    |-- Part    <--- X times
>       | id
>       | version
>  |-- BOMs
>    |-- BOM    <--- X times
>       | id
>       | version
>  |-- Records
>    |-- Record    <--- X times
>       | id
>       | version
>
> Where there are are varying amounts of Part, BOM and Record.  The
> reason for this is to test the underlying IndexedSeq used and overall
> performance improvements for such a mass of elements.
>
> In short (for an essay ^_^), for smaller numbers of X (< 100) Scales
> is twice as fast, for 100s its still faster, for 1000 it struggles to
> keep up (when using the fast as possible optimisation its within 4%)
> and for 10000 and above I'm simply fighting the jvm/memory and timings
> are pretty much random.
>
> An example of the randomness is:
>
> [info] 40000  -Xmx512M           Scala 581284 ===========================
> [info] 40000  -Xmx512M   QNameAndSpeed 607828 ============================
> [info] 40000  -Xmx512M HighPerformance 633423 ==============================
> [info] 40000 -Xmx1024M           Scala 410289 ===================
> [info] 40000 -Xmx1024M   QNameAndSpeed 387535 ==================
> [info] 40000 -Xmx1024M HighPerformance 593929 ============================
>
> The 1024M QNameAndSpeed optimisation run is faster than the Scala, but
> the 512M is slower.  The HighPerformance option for Scales, in both
> cases, is generating so much data that new (+configuration) calls
> start to play havoc with the timings.
>
> This measuring strangeness is one of:
>
> 1) JVM doing what it does best
> 2) Really a result of my optimisations
> 3) Vector having issues at higher loads
> 4) Still not found the variable
>
> I suspect its a bit of all of the above :<
>
> When running a small homebrew test interleaving Scala and Scales
> parsing, Scales would always be faster or less than 10% slower.  After
> migrating to Caliper its clear that I was just seeing all of the
> rubbish that one sees with microbenchmarks :-)
>
> As for memory with the same structure, Scala comes in lower (yourkit
> measurements), Scales without memory optimisation (caching of QNames
> and Elems) is significantly larger and JAXP sits in the middle.  When
> optimising Scales memory usage its below JAXP levels.  Unoptimised
> Scales can't parse 40000 X in 256M at all.
>
> Honestly though, the clear lesson is that very large Vector usage can
> be slow, and such data is better parsed via pull, better for memory
> usage and runtime costs.  For full parses its best to use a fully
> memory optimisation at the cost of performance if its on a constrained
> system (or indeed you are simultaneously parsing in multiple threads).
>  Of course if the developer wants fullish speed and memory
> optimisation they can hand craft an OptimisationStrategy for exactly
> their needs.
>
> The next major memory optimisation I'm doing is to speed lookups of
> Elems in the cache and only create when necessary.
>
> As for the Vector performance both of the measurements above are
> working purely via a VectorBuilder like class pretending to be
> IndexedSeqs (only allowing += and updated( size -1, y)) and then
> performing init via a VectorPointer.  Thats as fast as I can make it
> for building.
>
> As to XPath compliance, its close enough (reverse axis aren't there)
> to be very usable, for example (found in the link I sent)
>
>   path.\\.*("urn:default"::"ShouldRedeclare").\^.\+.text.pos(4)
>
> where \+ (similar to child::) is due to an unfortunate problem with
> E1/E2 in XPath, E2 depends on E1s context to correctly evaluate.
>
> Of course unless I want to use threadlocals / wrap all xpath
> comprehensions in a monad I'll also be stuck with approaches like:
>
>   path.\*("NoNamespace").\*(Elements.localName("prefixed"))
>
> where Elements.localName returns an XmlPath => Boolean, or {implicit
> attr => pqName + "=" + value} style using the implicit param to
> provide pqName and value with the path.
>
> After getting a 0.1 out the door I may get an adapter for JXPath or
> Jaxen made as a separate lib.
>
> On Thu, Jul 14, 2011 at 10:16 PM, Dave wrote:
>> Hi Chris,
>> Scales looks interesting. I am curious about its XPath compliance so I
>> am looking forward to the release.
>>
>> Do you also have performance measurements like anti-xml
>> http://anti-xml.org/performance.html
>>
>> What is Scales good at in terms of speed/memory usage and in what not
>> comparing to javax.xml, anti-xml and scala.xml?
>

Hi All,

I don't want this to be a continuing thread of updates, its probably
not the best of places for that, however I just wanted to note the
JAXP performance and some small improvements I've made.

Basically optimising on QNames and performing extra arraycopys
improves Scales performance (reduced gc load I assume) so that its now
8-10% faster (512M and 1024M) - odd man out is 1000 -> 10000 elements
where Scala is faster - , expect other improvements to follow....

JAXP / Xerces DOMBuilder gets over 2x the performance of this however
(irrespective of deferred or non-deferred) for larger docs. For
smaller docs (X < 1000) its 10% faster than Scales and less than 5%
for X = 10.

Cheers,
Chris

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland