This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Need some feedback about PrettyPrinter, and how it should work

No replies
Anthony B. Coates
Joined: 2009-09-12,
User offline. Last seen 2 years 35 weeks ago.
Hi everyone. I've been looking through the code of scala.xml.PrettyPrinter in the Scala trunk. It depends on scala.xml.Utility#toXML, so the two go hand-in-hand.

Right at the outset, I want to ask who has a strong dependency on the PrettyPrinter keeping its current behaviour and API. It's important to know whether, subject to what this group thinks, we can change the API for 2.8.

In the current implementation, when you instantiate a PrettyPrinter you have to supply an integer width and an integer step (indent). The 'format' methods call Utility#toXML, which has been enhanced already in 2.8 to have new parameters 'decodeEntities', 'preserveWhitespace' and 'minimizeTags' which cannot be set directly when using PrettyPrinter in its current form.

I've written a new 'partest' test for PrettyPrinter (well, for a slightly enhanced experimental version of PrettyPrinter), and in writing it I found
  • text isn't wrapped at all, so the 'width' setting is misleading, it applies to markup but not content;
  • inline elements like a <b> in an XHTML document are indented, which is often not what you want with such mixed content;
  • with the 'toXML' method, when the parameter 'minimizeTags' is 'true', it creates XHTML style empty tags with an extra space, e.g. '<br />' rather than normal XML empty tags without that space, e.g. '<br/>'.  This is likely to annoy people who aren't working with XHTML.
Here are some thoughts on what one could do for PrettyPrinter in 2.8, if there isn't a strong backwards compatibility requirement for this and for Utility#toXML
  • change the PrettyPrinter constructor so you don't need to specify a width if you don't care about a particular width.  Myself, I often only care about the indent and the fact that the XML is line-wrapped, I often don't care about the actual width;
  • if a width is specified, make sure it is applied to everything, including text;
  • move options, e.g. decodeEntities', 'preserveWhitespace' and 'minimizeTags', into a separate XMLFormatOptions class, rather than having a long list of parameters;
  • have a parameters to control whether an empty element should precede the "/>" with a space (for XHTML) or not;
  • allow elements in mixed content to remain inline without being specifically indented (but wrapped with the text as part of the text);
  • also provide a way to write the XML on a single line with minimal whitespace (or should 'toXML' do that as its default behaviour?).
How does that sound?  Anything else?  Is this what people would like?  Again, how big an issue is backwards compatibility of pretty printing with 2.7.x?  Please say, so we can decide what kind of proposal should be made for modifying the PrettyPrinter for 2.8.
Thanks a lot in advance, Cheers, Tony.-- Anthony B. Coates
Director and CTO
Londata Ltd
abcoates@londata.com
UK: +44 (20) 8816 7700, US: +1 (239) 344 7700
Mobile/Cell: +44 (79) 0543 9026
Skype: abcoates
Data standards participant: genericode, ISO 20022 (ISO 15022 XML), UN/CEFACT, MDDL, FpML, UBL.
http://www.londata.com/

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland