This page is no longer maintained — Please continue to the home page at www.scala-lang.org

XMLEventReader and large XML file

2 replies
Xiaobo Yang
Joined: 2010-01-28,
User offline. Last seen 42 years 45 weeks ago.

Hi,

I'm using Scala 2.8.0beta1 prerelease to parse a large XML file with
scala.xml.pull.XMLEventReader. The XML file is something like below.
What I'm doing is to extract XML fragments ... if for
example attr1 equals to abc.

......

Scala works fine except XMLEventReader does not stop when a user is
found - see my code below. Any idea why this happens and how to fix
it?

val p = new XMLEventReader(Source.fromFile(new File(inputFile)))
var bEnd: Boolean = false
while (p.hasNext && !bEnd) {
val v = p.next
v match {
case EvElemStart(_, "attr1", attrs, _) => {
// user found, extract XML fragments
bEnd = true
}
}
// loop stops fine
p.stop
// PROBLEM: the java process is still running!

Many thanks,
X YANG

Seth Tisue
Joined: 2008-12-16,
User offline. Last seen 34 weeks 3 days ago.
Re: XMLEventReader and large XML file

>>>>> "Xiaobo" == Xiaobo Yang writes:

Xiaobo> Hi, I'm using Scala 2.8.0beta1 prerelease to parse a large XML
Xiaobo> file with scala.xml.pull.XMLEventReader. The XML file is
Xiaobo> something like below. What I'm doing is to extract XML
Xiaobo> fragments ... if for example attr1 equals to abc.

Xiaobo> ......
Xiaobo>

Xiaobo> Scala works fine except XMLEventReader does not stop when a
Xiaobo> user is found - see my code below. Any idea why this happens
Xiaobo> and how to fix it?

I think it's a bug, and I don't see a ticket for it in Trac; I suggest
you open one. (You might also look at the source for XMLEventReader and
try to figure out why parserThread isn't terminating; I looked and it
wasn't obvious to me.)

huynhjl
Joined: 2009-10-27,
User offline. Last seen 42 years 45 weeks ago.
Re: XMLEventReader and large XML file

What I have noticed in my own use of XMLEventReader is that I needed to close
the source object in order to interrupt the processing, like this:
er.stop
source.close
Then I get a stack trace but at least it stops.

I've tried to see how to fix that before but did not go very far.

Also see this comment in the source
(http://lampsvn.epfl.ch/trac/scala/browser/scala/trunk/src/library/scala/...):

62 // Calling interrupt() on the parserThread is the only way we can get
63 // it to stop producing tokens since it's lost deep in document() -
64 // we cross our fingers the interrupt() gets to its target, but if it
65 // fails for whatever reason the iterator correctness is not impacted,
66 // only performance (because it will finish the entire XML document,
67 // or at least as much as it can fit in the queue.)

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland