- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
XML pull parser ignores CDATA?
Mon, 2010-08-02, 11:04
I am using Scala version 2.8.0.final with Java 1.6.0_21 on Linux 32 bit.
When I tried to use the pull parser scala.xml.pull._ to read a rather
large XML file into scala, I was surprised to get no content from XML
at all.
As it turned out, all content was embedded in CDATA sections and the
pull parser seems to totally ignore that:
new XMLEventReader(Source.fromString("")).foreach(println)
gives
EvElemStart(null,tag,,)
EvElemEnd(null,tag)
but
new XMLEventReader(Source.fromString("some
text")).foreach(println)
gives
EvElemStart(null,tag,,)
EvText(some text)
EvElemEnd(null,tag)
when mixing:
new XMLEventReader(Source.fromString("outside")).foreach(println)
gives
EvElemStart(null,tag,,)
EvText(outside)
EvElemEnd(null,tag)
Am I missing something here or is this broken?
I am not fixated on using the pull parser, it just seems it is the only
scala-esque way to read a large XML file, i.e. the only way to do it
without actually just doing it with Java libs the Java way from within
Scala?
Cheers,
Johann
Mon, 2010-08-02, 20:57
#2
Re: XML pull parser ignores CDATA?
I've created the following ticket:
https://lampsvn.epfl.ch/trac/scala/ticket/3720
It seems like a bug to me.
Line 329 of
http://lampsvn.epfl.ch/trac/scala/browser/scala/trunk/src/library/scala/...
does not contain a call back to handle.text.
If I recompile a version of the file with:
def mkResult(pos: Int, s: String): NodeSeq = {
handle.text(pos, s); PCData(s)
}
I then get:
scala> :load Test.scala
Loading Test.scala...
import io.Source
import xml.pull._
EvElemStart(null,tag,,)
EvText(some text)
EvElemEnd(null,tag)
I tested by just downloading that one file, editing it, compiling it with
scalac, making a jar of the classes (00MarkupParser.jar) and copying it to
the lib directory of the distribution.