canonical AST representation

7 replies

Tue, 2009-06-09, 14:54

extempore

Joined: 2008-12-17,

Since it'll labor in obscurity as a trac comment, I duplicate to
internals and I solicit your further examples of usefulness so as to
sharply unbalance any scales which might presently merely be teetering.

http://lampsvn.epfl.ch/trac/scala/ticket/1980#comment:4

Funnily enough, on saturday I briefly advocated to martin that the
parser does too much and that I would be very keen on our separating
some of the fancier desugaring that's done on the fly from the pure
parsing. The context for that discussion was that we are losing
parentheses too early to disambiguate foo(x = 5) from foo((x = 5)), but
this issue is also a direct result of it.

Yet another justification for it came up sunday when brainstorming about
the viability of scala->javascript translation. We decided the most
viable approach was likely to involve using scalac on the original
source and running it at least past the typer phase, then doing a
syntax-based translation from the original parse tree which essentially
abandoned all the types. However to do that we need an "original" parse
tree, something not presently offered by scalac!

And the eclipse and other IDE people would gain much from this as well.

And all forms of translation (I would be tempted to pick up scalify
again and perform a direct AST->AST translation.) A nice consistent
scala pretty printer could be written to go AST->source.

The downside would be the creation of some additional AST nodes which
would quickly be eliminated, and a (very minor I believe) performance
penalty. Defining a canonical AST representation for a given source
representation has many many upsides which I think dwarf the downside.

Tue, 2009-06-09, 15:17

Adriaan Moors

Joined: 2009-04-03,

Re: canonical AST representation

+1
I think a precise syntax tree would be great to have. (I'm not sure about the specifics, but maybe we want a ConcreteST, which is turned into an AST with phases such as infer semicolons/resolve fixity&associativity/add parens&dots/determine variable binding/... ? Then again, this will probably be too inefficient. Dreaming some more, maybe we can express these transforms using combinators and fuse them into a single one, while using a zipper-like structure to allow undoing some of the transformations when we need the original source.)
On Tue, Jun 9, 2009 at 3:54 PM, Paul Phillips <paulp@improving.org> wrote:

Since it'll labor in obscurity as a trac comment, I duplicate to
internals and I solicit your further examples of usefulness so as to
sharply unbalance any scales which might presently merely be teetering.

http://lampsvn.epfl.ch/trac/scala/ticket/1980#comment:4

Funnily enough, on saturday I briefly advocated to martin that the
parser does too much and that I would be very keen on our separating
some of the fancier desugaring that's done on the fly from the pure
parsing. The context for that discussion was that we are losing
parentheses too early to disambiguate foo(x = 5) from foo((x = 5)), but
this issue is also a direct result of it.

Yet another justification for it came up sunday when brainstorming about
the viability of scala->javascript translation. We decided the most
viable approach was likely to involve using scalac on the original
source and running it at least past the typer phase, then doing a
syntax-based translation from the original parse tree which essentially
abandoned all the types. However to do that we need an "original" parse
tree, something not presently offered by scalac!

And the eclipse and other IDE people would gain much from this as well.

And all forms of translation (I would be tempted to pick up scalify
again and perform a direct AST->AST translation.) A nice consistent
scala pretty printer could be written to go AST->source.

The downside would be the creation of some additional AST nodes which
would quickly be eliminated, and a (very minor I believe) performance
penalty. Defining a canonical AST representation for a given source
representation has many many upsides which I think dwarf the downside.

--
Paul Phillips | Every normal man must be tempted at times
Vivid | to spit on his hands, hoist the black flag,
Empiricist | and begin to slit throats.
pal, i pill push | -- H. L. Mencken

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Tue, 2009-06-09, 15:27

Randall R Schulz

Joined: 2008-12-16,

Re: canonical AST representation

On Tuesday June 9 2009, Adriaan Moors wrote:
> +1
> I think a precise syntax tree would be great to have. (I'm not sure
> about the specifics, but maybe we want a ConcreteST, which is turned
> into an AST with phases such as infer semicolons/resolve
> fixity&associativity/add parens&dots/determine variable binding/...
> ...

An AST would be nice and useful for various purposes. But a parse tree?
Yuck!

Being able to, say, apply Stratego or Kiama or something like them to a
Scala program's AST would be very useful, I think.

Randall Schulz

Tue, 2009-06-09, 19:37

Jorge Ortiz

Joined: 2008-12-16,

Re: canonical AST representation

Do want.

Something like this could also be useful for an eventual revival of scala.reflect.Code. (A boy can dream...)

--j

On Tue, Jun 9, 2009 at 6:54 AM, Paul Phillips <paulp@improving.org> wrote:

Since it'll labor in obscurity as a trac comment, I duplicate to
internals and I solicit your further examples of usefulness so as to
sharply unbalance any scales which might presently merely be teetering.

http://lampsvn.epfl.ch/trac/scala/ticket/1980#comment:4

Funnily enough, on saturday I briefly advocated to martin that the
parser does too much and that I would be very keen on our separating
some of the fancier desugaring that's done on the fly from the pure
parsing. The context for that discussion was that we are losing
parentheses too early to disambiguate foo(x = 5) from foo((x = 5)), but
this issue is also a direct result of it.

Yet another justification for it came up sunday when brainstorming about
the viability of scala->javascript translation. We decided the most
viable approach was likely to involve using scalac on the original
source and running it at least past the typer phase, then doing a
syntax-based translation from the original parse tree which essentially
abandoned all the types. However to do that we need an "original" parse
tree, something not presently offered by scalac!

And the eclipse and other IDE people would gain much from this as well.

And all forms of translation (I would be tempted to pick up scalify
again and perform a direct AST->AST translation.) A nice consistent
scala pretty printer could be written to go AST->source.

The downside would be the creation of some additional AST nodes which
would quickly be eliminated, and a (very minor I believe) performance
penalty. Defining a canonical AST representation for a given source
representation has many many upsides which I think dwarf the downside.

--
Paul Phillips | Every normal man must be tempted at times
Vivid | to spit on his hands, hoist the black flag,
Empiricist | and begin to slit throats.
pal, i pill push | -- H. L. Mencken

Tue, 2009-06-09, 19:47

David Pollak

Joined: 2008-12-16,

Re: canonical AST representation

On Tue, Jun 9, 2009 at 6:54 AM, Paul Phillips <paulp@improving.org> wrote:

Since it'll labor in obscurity as a trac comment, I duplicate to
internals and I solicit your further examples of usefulness so as to
sharply unbalance any scales which might presently merely be teetering.

http://lampsvn.epfl.ch/trac/scala/ticket/1980#comment:4

Funnily enough, on saturday I briefly advocated to martin that the
parser does too much and that I would be very keen on our separating
some of the fancier desugaring that's done on the fly from the pure
parsing. The context for that discussion was that we are losing
parentheses too early to disambiguate foo(x = 5) from foo((x = 5)), but
this issue is also a direct result of it.

Yet another justification for it came up sunday when brainstorming about
the viability of scala->javascript translation.

This is weird... I've been noodling about Scala -> JS xlation as well... was it something in the air at the Scala Lift Off?

We decided the most
viable approach was likely to involve using scalac on the original
source and running it at least past the typer phase, then doing a
syntax-based translation from the original parse tree which essentially
abandoned all the types. However to do that we need an "original" parse
tree, something not presently offered by scalac!

And the eclipse and other IDE people would gain much from this as well.

And all forms of translation (I would be tempted to pick up scalify
again and perform a direct AST->AST translation.) A nice consistent
scala pretty printer could be written to go AST->source.

The downside would be the creation of some additional AST nodes which
would quickly be eliminated, and a (very minor I believe) performance
penalty. Defining a canonical AST representation for a given source
representation has many many upsides which I think dwarf the downside.

--
Paul Phillips | Every normal man must be tempted at times
Vivid | to spit on his hands, hoist the black flag,
Empiricist | and begin to slit throats.
pal, i pill push | -- H. L. Mencken

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Tue, 2009-06-09, 19:57

geoff

Joined: 2008-08-20,

Re: canonical AST representation

I think this could be useful for some of the things that I have in mind,
especially the possibility of seeing non-desugared for expressions.
Wouldn't this also make it easier to, at some point in the future,
replace the parser with one based on combinators?

Wed, 2009-06-10, 10:27

odersky

Joined: 2008-07-29,

Re: canonical AST representation

I think one could envision a separate parse tree + desugaring phase,
but there are constraints:

First, performance: Parsing speed does not matter much for builds, but
it will matter for the IDEs. Essentially, a unit is parsed and a tree
is built on every keystroke. So parsing needs to be very fast (and it
can be fast enough if we optimize it).

Second, type and symbol atributes. There is no hope to combine a
precise parse tree with those attributes. Several parts of Scala are
speced as ``first transform then typecheck''. So it will have to be
either unattributed parse tree or attributed transformed AST. I am not
sure what the utility of an unattributed parse tree would be.

Cheers

Wed, 2009-06-10, 17:17

Blair Zajac

Joined: 2009-01-12,

Re: canonical AST representation

On Jun 10, 2009, at 2:20 AM, martin odersky wrote:

> I think one could envision a separate parse tree + desugaring phase,
> but there are constraints:
>
> First, performance: Parsing speed does not matter much for builds, but
> it will matter for the IDEs. Essentially, a unit is parsed and a tree
> is built on every keystroke. So parsing needs to be very fast (and it
> can be fast enough if we optimize it).

Where is most of the time spent in a compile?

We have a application with 261 Scala source files containing 42,824
lines of code which takes a good amount of time to compile with -
optimize. This is on a 2.16 GHz MacBook Pro running a 32-bit JDK 7
build with Scala 2.7.5:

$ /usr/bin/time ant

main_scala_compile:
[mkdir] Created dir: /Users/blair/Code/foobar.git/vnp-ice-server/
build_scala/main
[scalac] Compiling 197 source files to /Users/blair/Code/
foobar.git/vnp-ice-server/build_scala/main

test_scala_compile:
[mkdir] Created dir: /Users/blair/Code/foobar.git/vnp-ice-server/
build_scala/test
[scalac] Compiling 64 source files to /Users/blair/Code/foobar.git/
vnp-ice-server/build_scala/test

BUILD SUCCESSFUL
Total time: 2 minutes 1 second
121.81 real 176.60 user 14.12 sys

I guess this is a case where any optimizations that Scala does for
itself pays off in the compiler.

I would rather see Scala gain new features and capabilities, but with
larger code bases using Scala, at some point, getting some
optimizations in would make the interactive edit, compile, test loop
faster for Emacs and ant users :)

Regards,
Blair

Scala Main Menu

canonical AST representation

Scala Quick Links

Featured News

User login