This page is no longer maintained — Please continue to the home page at www.scala-lang.org

XML RuleTransformer in 2.7 vs. 2.8

2 replies
Patrick Roemer
Joined: 2009-02-18,
User offline. Last seen 42 years 45 weeks ago.
Hi, I've just been wondering about the behavior of a RuleTransformer in 2.7 (it seemed to swallow the header of an HTML document) and in 2.8 (it took almost a minute to complete a moderately complex HTML document and really seemed to fry my CPU). I've tried to cut the behavior down to a simple example: object SimpleXmlRewrite { def main(args: Array[String]) = { val orig = ConstructingParser.fromFile( new java.io.File(args(0)), false).document val rule = new RewriteRule() { override def transform(node: Node) = { println(System.identityHashCode(node)) node match { case
=>

ok

case n => n } } } val trans = new RuleTransformer(rule)(orig.docElem) println(new PrettyPrinter(80, 3).format(trans)) } } ...applied to... <?xml version="1.0" encoding="iso-8859-15"?>

...seems to eat the -Tag and its children with 2.7.7.final. In 2.8.0.final, the output looks ok, but it already seems to take 3x the time compared to 2.7 - furthermore, the output from the node logging seems to indicate that with 2.8, nodes are re-visited over and over again: With 2.7, 14 node visits are logged, with 2.8 it's 3000+ (including lots of duplicates, of course). Am I doing anything terribly wrong? Has something changed significantly in the way transformers should be used? Or are these bugs? Thanks in advance! Best regards, Patrick
dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: XML RuleTransformer in 2.7 vs. 2.8
Personally, I'd put it down as a bug (in fact, I just opened a ticket for it: http://lampsvn.epfl.ch/trac/scala/ticket/3689) -- it seems to me the complexity is exponential in N, where N is the nesting level of the XML. Here is the problem, from BasicTransformer.scala:
  def transform(ns: Seq[Node]): Seq[Node] = {    val (xs1, xs2) = ns span (n => unchanged(n, transform(n)))        if (xs2.isEmpty) ns    else xs1 ++ transform(xs2.head) ++ transform(xs2.tail)   }
What happens here is that each node that is transformed is called twice: once at the span, and then at the if/else. So consider what happens to <td><tr><br/></tr></td>:
1. the entire expression is called on span2. <tr><br/></tr> is called on span3. <br/> is called on span4. <br/> is called on if/else 5. <tr><br/></tr> is called on if/else6. <br/> is called on span7. <br/> is called on if else8. the entire expression is called on if/else9 through 14 -- repeat of 2 through 7
On Thu, Jul 15, 2010 at 5:39 PM, Patrick Roemer <sangamon@netcologne.de> wrote:
Hi,

I've just been wondering about the behavior of a RuleTransformer in 2.7 (it seemed to swallow the header of an HTML document) and in 2.8 (it took almost a minute to complete a moderately complex HTML document and really seemed to fry my CPU).

I've tried to cut the behavior down to a simple example:

object SimpleXmlRewrite {
 def main(args: Array[String]) = {
   val orig = ConstructingParser.fromFile(
       new java.io.File(args(0)), false).document
   val rule = new RewriteRule() {
     override def transform(node: Node) = {
       println(System.identityHashCode(node))
       node match {
         case <br/> => <p>ok</p>
         case n => n
       }
     }
   }
   val trans = new RuleTransformer(rule)(orig.docElem)
   println(new PrettyPrinter(80, 3).format(trans))
 }
}

...applied to...

<?xml version="1.0" encoding="iso-8859-15"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15" />
</head>
<body>
<center>
<table>
<tr>
 <td>
   <center>
   <table>
   <tr>
     <td>
       <br />
     </td>
     <td>
       <br />
     </td>
   </tr>
   </table>
   </center>
 </td>
</tr>
</table>
</center>
</body>
</html>

...seems to eat the <head/>-Tag and its children with 2.7.7.final. In 2.8.0.final, the output looks ok, but it already seems to take 3x the time compared to 2.7 - furthermore, the output from the node logging seems to indicate that with 2.8, nodes are re-visited over and over again: With 2.7, 14 node visits are logged, with 2.8 it's 3000+ (including lots of duplicates, of course).

Am I doing anything terribly wrong? Has something changed significantly in the way transformers should be used? Or are these bugs?

Thanks in advance!

Best regards,
Patrick




--
Daniel C. Sobral

I travel to the future all the time.
Patrick Roemer
Joined: 2009-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: XML RuleTransformer in 2.7 vs. 2.8
Responding to Daniel Sobral: > Personally, I'd put it down as a bug (in fact, I just opened a ticket > for it: http://lampsvn.epfl.ch/trac/scala/ticket/3689) -- it seems to me > the complexity is exponential in N, where N is the nesting level of the > XML. Here is the problem, from BasicTransformer.scala: > > def transform(ns: Seq[Node]): Seq[Node] = { > val (xs1, xs2) = ns span (n => unchanged(n, transform(n))) > if (xs2.isEmpty) ns > else xs1 ++ transform(xs2.head) ++ transform(xs2.tail) > } > > What happens here is that each node that is transformed is called twice: > once at the span, and then at the if/else. So consider what happens to >
: > > 1. the entire expression is called on span > 2.
is called on span > 3.
is called on span > 4.
is called on if/else > 5.
is called on if/else > 6.
is called on span > 7.
is called on if else > 8. the entire expression is called on if/else > 9 through 14 -- repeat of 2 through 7 Ah, I see. That looks wasteful indeed. Thanks a lot for the explanation. In the meantime, I've discovered some more pitfalls. I've added another rule to replace attribute values (which seems surprisingly painful, I've used an approach suggested on Stackoverflow[1] - am I missing a more straightforward way?), and these transformations seem to get lost higher up the stack, even though they're invoked. First I thought this was only due to some quirk in equality comparisons on Metadata, so I experimentally added protected override def unchanged(n: Node, ns: Seq[Node]) = false to my rule(s) - but this only seemed to push the vanishing of the transformed node one level "higher". The only thing that seems to help is to additionally explicitly invoke something like protected def transChild(node: Node) = node match { case elem: Elem => elem.copy(child = transform(elem.child)) case other => other } on anything that's piped through #transform() - but that certainly can't be the intended usage pattern, right? :/ I'll try to reduce my issue to a simple example and see if I can get my head around what's going on there. Best regards, Patrick [1] http://stackoverflow.com/questions/2569580/how-to-change-attribute-on-sc...

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland