This page is no longer maintained — Please continue to the home page at www.scala-lang.org

How do I read very large XML files

1 reply
martin2
Joined: 2008-12-22,
User offline. Last seen 42 years 45 weeks ago.

Hi, I'm new to scala and would like to write a program to edit x3d files.
These tend to be very large XML files which contain large arrays of integers
and floating point numbers. It would therefore not be very practical to read
the file into memory in its semistructured text format. Instead I would like
to read it directly into a tree structure of classes each of which which
stores the numbers as Array[Int] and Array[Double] (or perhaps List[Int] and
List[Double] ?).

If I were using other languages I would use the SAX interface, and I can do
in this case, however since Scala has built in support for XML it seems a
pity if I cant use this?

As I understand it scala.xml.XML.loadFile uses the DOM interface which puts
the whole file into memory in text format which would use all the RAM and
slow everything down. I realise I could then create my own class structure
from this but then there would be two versions of the whole file in memory
at once.

So is there any built in way that I can read these files directly into class
types that I have defined rather than a DOM type structure? Or, to put is
another way, will Scala live up to its name and scale up to reading very
large XML files?

Thanks,

Martin

hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: How do I read very large XML files

martin2 wrote:

>
> So is there any built in way that I can read these files directly into
> class types that I have defined rather than a DOM type structure? Or, to
> put is another way, will Scala live up to its name and scale up to reading
> very large XML files?

I am not sure what you mean by built-in, but there's a pull parser available in the scala.xml.pull package. (I have never used it myself). Pull parsers are claimed to be a middle path between SAX and DOM parsers.

There's also a library, called FrostBridge, to make it easier to use a pull parser:
http://code.google.com/p/frostbridge/

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland