This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Empty elements, #1118

4 replies
Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.

I've been thinking about https://lampsvn.epfl.ch/trac/scala/ticket/1118
tonight.

There are really several related problems here:
- The original "emptiness" of each element that was parsed from XML is lost;
- There's no way to specify, on an element-by-element basis, whether
that element should be serialized as an empty tag or an open/close pair;
- Making an arbitrary choice at serialization time isn't optimal.

Ignoring backward compatibility for a moment, I posit that it might make
sense for Elem.child to be an Either[Option[Boolean],Seq[Node]], where
the Right would be the usual children, but the Left would have semantics
something like the following:

- None [the default] => the element was built synthetically (as opposed
to having been parsed) and the user has expressed no preference as to
empty elements. Use the current behaviour, in which we default to
open/close tags and the serializer can be asked to do things differently
on an ad hoc basis.

- Some(true) => Serialize the element as an empty tag (),
either by preference or because that's how it was parsed

- Some(false) => Serialize the element as open/close tags
()

The two cases are really mutually exclusive: Either it's not empty, so
there's no point asking how empty elements should be serialized, or it
is, and you don't need children.

Returning to the real world where we like backward compatibility, we
could hide the Either behind a pair of overloaded constructors, although
I'm not sure how well that would work with repeated parameters.

It's tempting to say all these extra bits don't belong in every instance
of Elem, but in the absence of type information, and with the desire to
keep things simple for people who don't want to invest in a schema/DTD,
I don't see too many other ways of meeting this request. This approach
does seem somewhat heavyweight, it'd be nice to hear some options on how
else we might tackle it.

Thoughts?

Thanks,

-0xe1a

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: Empty elements, #1118

In truth, I don't think there is much of a use case for "read and write
the following document, which sometimes uses empty elements and something
an open/close tag pair with empty content, keeping the same style on
output", well, not unless you are writing (effectively) an XML editor.
Additionally, I rarely see anyone deliberately choosing the full
open/close tag form over the shorter empty element form.

I don't see any problem with having the use of empty elements or not as a
formatter or serializer option, that's not unreasonable, though I would
set the default to be empty elements. I'm less sure of the value of
tracking what syntax was used for an element with no content, since in XML
infoset terms it's the same thing, just a lexical difference, and I wonder
if people would want the overhead.

So, perhaps I should ask, how is actually asking for this, and what do
they need it for?

Cheers, Tony.

On Tue, 16 Feb 2010 07:20:47 -0000, Alex Cruise wrote:

> I've been thinking about https://lampsvn.epfl.ch/trac/scala/ticket/1118
> tonight.
>
> There are really several related problems here:
> - The original "emptiness" of each element that was parsed from XML is
> lost;
> - There's no way to specify, on an element-by-element basis, whether
> that element should be serialized as an empty tag or an open/close pair;
> - Making an arbitrary choice at serialization time isn't optimal.
>
> Ignoring backward compatibility for a moment, I posit that it might make
> sense for Elem.child to be an Either[Option[Boolean],Seq[Node]], where
> the Right would be the usual children, but the Left would have semantics
> something like the following:
>
> - None [the default] => the element was built synthetically (as opposed
> to having been parsed) and the user has expressed no preference as to
> empty elements. Use the current behaviour, in which we default to
> open/close tags and the serializer can be asked to do things differently
> on an ad hoc basis.
>
> - Some(true) => Serialize the element as an empty tag (),
> either by preference or because that's how it was parsed
>
> - Some(false) => Serialize the element as open/close tags
> ()
>
> The two cases are really mutually exclusive: Either it's not empty, so
> there's no point asking how empty elements should be serialized, or it
> is, and you don't need children.
>
> Returning to the real world where we like backward compatibility, we
> could hide the Either behind a pair of overloaded constructors, although
> I'm not sure how well that would work with repeated parameters.
>
> It's tempting to say all these extra bits don't belong in every instance
> of Elem, but in the absence of type information, and with the desire to
> keep things simple for people who don't want to invest in a schema/DTD,
> I don't see too many other ways of meeting this request. This approach
> does seem somewhat heavyweight, it'd be nice to hear some options on how
> else we might tackle it.
>
> Thoughts?
>
> Thanks,
>
> -0xe1a

Mark Howe
Joined: 2009-10-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Empty elements, #1118

Le mardi 16 février 2010 à 08:10 +0000, Anthony B. Coates (Londata) a
écrit :

> I don't see any problem with having the use of empty elements or not as a
> formatter or serializer option, that's not unreasonable, though I would
> set the default to be empty elements.

I agree. The main place I've seen this become an issue is when
outputting XHTML, because some browsers object to and to
. That sort of problem is probably best dealt with by an
intelligent HTML-aware serialisation option.

Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: Empty elements, #1118

On 2/16/2010 12:10 AM, Anthony B. Coates (Londata) wrote:
> In truth, I don't think there is much of a use case for "read and
> write the following document, which sometimes uses empty elements and
> something an open/close tag pair with empty content, keeping the same
> style on output", well, not unless you are writing (effectively) an
> XML editor. Additionally, I rarely see anyone deliberately choosing
> the full open/close tag form over the shorter empty element form.
XHTML mandates that some elements must not be in short form, and indeed
Firefox chokes on , preferring , as nonsensical as it
may seem. No doubt there are other applications that are unforgiving of
this kind of change.
> I don't see any problem with having the use of empty elements or not
> as a formatter or serializer option, that's not unreasonable, though I
> would set the default to be empty elements. I'm less sure of the
> value of tracking what syntax was used for an element with no content,
> since in XML infoset terms it's the same thing, just a lexical
> difference, and I wonder if people would want the overhead.
The major complaint is that when you parse a document that has some
elements in short form, optionally make some changes, and write it back
out again, all the empty elements are now in open/close form, no matter
what they looked like on input. And, to Mark's suggestion, in fact
there's already scala.xml.Xhtml.toXhtml(Node) that does the right thing
(as long as you know it's there :)

Optimizing for memory efficiency, a nullable Boolean would capture the
three states we need (don't care/default, short, long) but wouldn't
exactly be idiomatic Scala. Option[Boolean] sounds like the right type
to me, although it would consume more memory per element. I like the
Either option (ha) from a type semantics perspective but is certainly
easy to argue against on grounds of aesthetics and memory efficiency.

-0xe1a

Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Re: Empty elements, #1118

OK, fair point. There is clearly a use case there for any application
which is a 'filter' for XHTML files (in the sense that 'sed' is an editing
filter). It may be safer to re-create the original use of empty elements
vs. open/close tags with empty content, rather than trying to apply an
'XHTML normalisation' that might be effective for some browsers but not
for others.

It does kind of back the question about whether we need to support having
a space or not before the "/>" in an empty element. I know some browsers
used to be sensitive to that, I've no idea if that is still an issue in
practice or not.

Cheers, Tony.

On Wed, 17 Feb 2010 05:02:25 -0000, Alex Cruise wrote:

> On 2/16/2010 12:10 AM, Anthony B. Coates (Londata) wrote:
>> In truth, I don't think there is much of a use case for "read and write
>> the following document, which sometimes uses empty elements and
>> something an open/close tag pair with empty content, keeping the same
>> style on output", well, not unless you are writing (effectively) an XML
>> editor. Additionally, I rarely see anyone deliberately choosing the
>> full open/close tag form over the shorter empty element form.
> XHTML mandates that some elements must not be in short form, and indeed
> Firefox chokes on , preferring language="text/javascript" src="http://www.scala-lang.org/foo.js">, as nonsensical as it
> may seem. No doubt there are other applications that are unforgiving of
> this kind of change.
>> I don't see any problem with having the use of empty elements or not as
>> a formatter or serializer option, that's not unreasonable, though I
>> would set the default to be empty elements. I'm less sure of the value
>> of tracking what syntax was used for an element with no content, since
>> in XML infoset terms it's the same thing, just a lexical difference,
>> and I wonder if people would want the overhead.
> The major complaint is that when you parse a document that has some
> elements in short form, optionally make some changes, and write it back
> out again, all the empty elements are now in open/close form, no matter
> what they looked like on input. And, to Mark's suggestion, in fact
> there's already scala.xml.Xhtml.toXhtml(Node) that does the right thing
> (as long as you know it's there :)
>
> Optimizing for memory efficiency, a nullable Boolean would capture the
> three states we need (don't care/default, short, long) but wouldn't
> exactly be idiomatic Scala. Option[Boolean] sounds like the right type
> to me, although it would consume more memory per element. I like the
> Either option (ha) from a type semantics perspective but is certainly
> easy to argue against on grounds of aesthetics and memory efficiency.
>
> -0xe1a

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland