This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: Re: Ampersands are escaped inside <![CDATA[ ... ]]> literals.

No replies
Dmitry Grigoriev
Joined: 2009-07-12,
User offline. Last seen 42 years 45 weeks ago.
Hello Burak & all,

m2gac0ce271004301605kd3de12c3r5fd092bd8ccfce20 [at] mail [dot] gmail [dot] com" type="cite"> The line of thinking that led me to not include CDATA nodes in the original scala.xml API is all based on the motto "CDATA is nothing but syntactic sugar". There are many people who expect it this way.
Yes, this is what CDATA was initially created for. For example, if one has <textarea> with HTML contents, instead of manually escaping each &lt; and &amp; like this:

<textarea>&lt;p>Hello world!&lt/p></textarea>

one could write:

<textarea><![CDATA[<p>Hello world!</p>]]></textarea>


m2gac0ce271004301605kd3de12c3r5fd092bd8ccfce20 [at] mail [dot] gmail [dot] com" type="cite"> Are folks aware of the <xml:unparsed> syntax?
Not until now. Honestly speaking, this looks like re-invention of CDATA. Non-standard, quite unnatural to any XML programmer (invalid content inside valid markup?! 8-()) and thus totally bad. I much prefer embedding code like {CDATA("""...""")} as David suggested; curiously, it's much easier to read. Besides, what if one day Scala starts supporting XML namespaces?


Together with my comment in ticket #3368, the dilemma is simple:

- if scala behaves cdata-sections=false (current), the opposite is emulated with embedded code like David suggested: {CDATA(NodeSeq)};

- if scala behaves cdata-sections=true (suggested), the opposite is emulated with embedded code {NodeSeq.toString}.


Both behaviors are correct. So I suppose the choice must be made based on practice, not theory: to support the most wide use-cases. When I consulted about standards, I was told:

- W3C DOM Core Level 3 (year 2004) preserves CDATA sections by default;

- Before 2004, there were no concrete instructions on CDATA processing, with one exception: Canonical XML (2001) which defines the means of XML semantic equality, states that CDATA must be converted to Text. But this spec explicitly states: "Although two XML documents are equivalent (aside from limitations given in this section) if their canonical forms are identical, it is not a goal of this work to establish a method such that two XML documents are equivalent if and only if their canonical forms are identical. Such a method is unachievable..." Also, Canonical XML is a very restrictive subset of XML Scala does not conform to anyway (e.g. Scala preserves user formatting and empty text nodes, it also serializes element attributes in non-deterministic order). Finally, Canonical XML is used extremely rare and for very specific tasks (AFAIR some sorts of services were supposed to idempotent XML before digitally signing it), so it's actually of no concern.

- XML spec itself only states that parser must provide application with all document character data, but does not specify how. So, again, hold to practice: DOM and web. The example I started this discussion with works in all major browsers.

Thanks for reading so much letters. :)

-- 
Cheers,
dimgel

http://dimgel.ru/lib.web
Thin, stateless, strictly typed Scala web framework.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland