- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
[scala-bts] #3286: scala.xml.PrettyPrinter changes attribute values by removing multiple whitespace
Tue, 2010-04-13, 17:22
whitespace
-------------------------------------------+--------------------------------
Reporter: nikolaj | Owner: scala-xml_team
Type: defect | Status: new
Priority: normal | Component: XML support
Keywords: PrettyPrinter, xml, whitespace |
-------------------------------------------+--------------------------------
{{{scala.xml.PrettyPrinter}}} seems to change the values of attributes in
some instances, by replacing repeated white space. Not always, though.
Notice in the example below how
{{{
}}}
turns into
{{{
}}}
after {{{PrettyPrinting}}}:
{{{
Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
VM, Java 1.6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.
scala>
res0: scala.xml.Elem =
scala> new xml.PrettyPrinter(200, 2)
res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34
scala> res1.format(res0)
res2: String =
}}}
Crazy width and indentation nukes the multiple whitespaces in the
attributes of both nodes:
{{{
scala> new xml.PrettyPrinter(2, 20)
res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8
scala> res8.format(res0)
res9: String =
}}}
We ran into this problem when checking whether some XML attributes were
identical to the original input.
I guess you should be able to trust {{{Pretty Mr Printer}}} not to change
the values of any attributes?
Since I'm far from an expert in XML, I might be wrong about what is the
correct way of treating whitespace inside attribute values. Sorry if this
already works according to the XML specs.
Kind regards,
/nikolaj lindberg
Wed, 2010-04-14, 22:07
#2
Re: [scala-bts] #3286: scala.xml.PrettyPrinter changes attribu
Tony,
thanks for the clarification. (I did take a quick look at the spec you refer to, but I was not sure whether it was relevant in this case or not. That is, I didn't quite get it...)
Thanks,
/nikolaj
On Wed, Apr 14, 2010 at 9:39 PM, Anthony B. Coates (Londata) <abcoates@londata.com> wrote:
thanks for the clarification. (I did take a quick look at the spec you refer to, but I was not sure whether it was relevant in this case or not. That is, I didn't quite get it...)
Thanks,
/nikolaj
On Wed, Apr 14, 2010 at 9:39 PM, Anthony B. Coates (Londata) <abcoates@londata.com> wrote:
The XML Information Set spec (http://www.w3.org/TR/xml-infoset/, Appendix B) says
4. An XML processor must normalize the value of attributes according to the rules in clause 3.3.3 before passing them to the application.
As such, it is certainly a mistake for an application to rely on whitespace in attributes being preserved. If you need whitespace to be preserved, use element content, not attribute content.
The question of whether a PrettyPrinter should perform this normalization of whitespace could go either way, depending on whether you consider it to be a "processor" or not. My personal view is that normalizing attribute content is appropriate, since (in my view) one thing a PrettyPrinter should allow you to do is format an XML document consistently so that you can compare it with a document with the same or similar content. Since applications should never rely on attribute whitespace being preserved, it is fair then for a PrettyPrinter to normalize the attribute content to facilitate comparison.
Cheers, Tony.
On Tue, 13 Apr 2010 17:21:34 +0100, Scala <scala-devel@epfl.ch> wrote:
#3286: scala.xml.PrettyPrinter changes attribute values by removing multiple
whitespace
-------------------------------------------+--------------------------------
Reporter: nikolaj | Owner: scala-xml_team
Type: defect | Status: new
Priority: normal | Component: XML support
Keywords: PrettyPrinter, xml, whitespace |
-------------------------------------------+--------------------------------
{{{scala.xml.PrettyPrinter}}} seems to change the values of attributes in
some instances, by replacing repeated white space. Not always, though.
Notice in the example below how
{{{
<babba orth="B A"/>
}}}
turns into
{{{
<babba orth="B A"></babba>
}}}
after {{{PrettyPrinting}}}:
{{{
Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
VM, Java 1.6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.
scala> <abba orth="A B"><babba orth="B A"/></abba>
res0: scala.xml.Elem = <abba orth="A B"><babba orth="B
A"></babba></abba>
scala> new xml.PrettyPrinter(200, 2)
res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34
scala> res1.format(res0)
res2: String =
<abba orth="A B">
<babba orth="B A"></babba>
</abba>
}}}
Crazy width and indentation nukes the multiple whitespaces in the
attributes of both nodes:
{{{
scala> new xml.PrettyPrinter(2, 20)
res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8
scala> res8.format(res0)
res9: String =
<abba orth="A B"><babba orth="B A"></babba></abba>
}}}
We ran into this problem when checking whether some XML attributes were
identical to the original input.
I guess you should be able to trust {{{Pretty Mr Printer}}} not to change
the values of any attributes?
Since I'm far from an expert in XML, I might be wrong about what is the
correct way of treating whitespace inside attribute values. Sorry if this
already works according to the XML specs.
Kind regards,
/nikolaj lindberg
Thu, 2010-04-15, 06:57
#3
Re: [scala-bts] #3286: scala.xml.PrettyPrinter changes attribu
On Wed, Apr 14, 2010 at 9:39 PM, Anthony B. Coates (Londata) <abcoates@londata.com> wrote:
The XML Information Set spec (http://www.w3.org/TR/xml-infoset/, Appendix B) says
4. An XML processor must normalize the value of attributes according to the rules in clause 3.3.3 before passing them to the application.
As such, it is certainly a mistake for an application to rely on whitespace in attributes being preserved. If you need whitespace to be preserved, use element content, not attribute content.
PS, it seems as if PrettyPrinter, at least as a deafult, normalize whitespace in content also:
scala> <a b=" I love space "><c d=" Me to "> It's lonely here </c></a>
res0: scala.xml.Elem = <a b=" I love space "><c d=" Me to "> It's lonely here </c></a>
scala> new xml.PrettyPrinter(200, 2)
res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@197d09f
scala> res1.format(res0)
res2: String =
<a b=" I love space ">
<c d=" Me to "> It's lonely here </c>
</a>
The lesson (for me) is to stay away from PrettyPrinter.
Kind regards,
/nikolaj
The question of whether a PrettyPrinter should perform this normalization of whitespace could go either way, depending on whether you consider it to be a "processor" or not. My personal view is that normalizing attribute content is appropriate, since (in my view) one thing a PrettyPrinter should allow you to do is format an XML document consistently so that you can compare it with a document with the same or similar content. Since applications should never rely on attribute whitespace being preserved, it is fair then for a PrettyPrinter to normalize the attribute content to facilitate comparison.
Cheers, Tony.
On Tue, 13 Apr 2010 17:21:34 +0100, Scala <scala-devel@epfl.ch> wrote:
#3286: scala.xml.PrettyPrinter changes attribute values by removing multiple
whitespace
-------------------------------------------+--------------------------------
Reporter: nikolaj | Owner: scala-xml_team
Type: defect | Status: new
Priority: normal | Component: XML support
Keywords: PrettyPrinter, xml, whitespace |
-------------------------------------------+--------------------------------
{{{scala.xml.PrettyPrinter}}} seems to change the values of attributes in
some instances, by replacing repeated white space. Not always, though.
Notice in the example below how
{{{
<babba orth="B A"/>
}}}
turns into
{{{
<babba orth="B A"></babba>
}}}
after {{{PrettyPrinting}}}:
{{{
Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
VM, Java 1.6.0_16).
Type in expressions to have them evaluated.
Type :help for more information.
scala> <abba orth="A B"><babba orth="B A"/></abba>
res0: scala.xml.Elem = <abba orth="A B"><babba orth="B
A"></babba></abba>
scala> new xml.PrettyPrinter(200, 2)
res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34
scala> res1.format(res0)
res2: String =
<abba orth="A B">
<babba orth="B A"></babba>
</abba>
}}}
Crazy width and indentation nukes the multiple whitespaces in the
attributes of both nodes:
{{{
scala> new xml.PrettyPrinter(2, 20)
res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8
scala> res8.format(res0)
res9: String =
<abba orth="A B"><babba orth="B A"></babba></abba>
}}}
We ran into this problem when checking whether some XML attributes were
identical to the original input.
I guess you should be able to trust {{{Pretty Mr Printer}}} not to change
the values of any attributes?
Since I'm far from an expert in XML, I might be wrong about what is the
correct way of treating whitespace inside attribute values. Sorry if this
already works according to the XML specs.
Kind regards,
/nikolaj lindberg
The XML Information Set spec (http://www.w3.org/TR/xml-infoset/, Appendix
B) says
4. An XML processor must normalize the value of attributes according to
the rules in clause 3.3.3 before passing them to the application.
As such, it is certainly a mistake for an application to rely on
whitespace in attributes being preserved. If you need whitespace to be
preserved, use element content, not attribute content.
The question of whether a PrettyPrinter should perform this normalization
of whitespace could go either way, depending on whether you consider it to
be a "processor" or not. My personal view is that normalizing attribute
content is appropriate, since (in my view) one thing a PrettyPrinter
should allow you to do is format an XML document consistently so that you
can compare it with a document with the same or similar content. Since
applications should never rely on attribute whitespace being preserved, it
is fair then for a PrettyPrinter to normalize the attribute content to
facilitate comparison.
Cheers, Tony.
On Tue, 13 Apr 2010 17:21:34 +0100, Scala wrote:
> #3286: scala.xml.PrettyPrinter changes attribute values by removing
> multiple
> whitespace
> -------------------------------------------+--------------------------------
> Reporter: nikolaj | Owner: scala-xml_team
> Type: defect | Status: new
> Priority: normal | Component: XML support
> Keywords: PrettyPrinter, xml, whitespace |
> -------------------------------------------+--------------------------------
> {{{scala.xml.PrettyPrinter}}} seems to change the values of attributes
> in
> some instances, by replacing repeated white space. Not always, though.
>
> Notice in the example below how
> {{{
>
> }}}
> turns into
> {{{
>
> }}}
> after {{{PrettyPrinting}}}:
> {{{
> Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client
> VM, Java 1.6.0_16).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala>
> res0: scala.xml.Elem = A">
>
> scala> new xml.PrettyPrinter(200, 2)
> res1: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1886a34
>
> scala> res1.format(res0)
> res2: String =
>
>
>
> }}}
>
>
> Crazy width and indentation nukes the multiple whitespaces in the
> attributes of both nodes:
> {{{
> scala> new xml.PrettyPrinter(2, 20)
> res8: scala.xml.PrettyPrinter = scala.xml.PrettyPrinter@1f0f0c8
>
> scala> res8.format(res0)
> res9: String =
>
> }}}
> We ran into this problem when checking whether some XML attributes were
> identical to the original input.
>
> I guess you should be able to trust {{{Pretty Mr Printer}}} not to
> change
> the values of any attributes?
>
> Since I'm far from an expert in XML, I might be wrong about what is the
> correct way of treating whitespace inside attribute values. Sorry if
> this
> already works according to the XML specs.
>
> Kind regards,
>
> /nikolaj lindberg
>