This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Speeding up the compiler

2 replies
odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.

A happy new year 2010 to everyone!

I was using some of the free time of the holidays to instrument the
compiler for better performance analysis. I have tried profilers
(jprofiler and yourkit) in the past, but somehow they never were able
to give me the information I needed. Maybe things have improved now, I
did not try for the last year.

I concentrated initially on the typechecker: Anyway, here are some of
the things I found:

For a medium sized project (~80 source files, ~4000 lines):

tree nodes created by parser: ~25,000
once the type checker is done: ~60,000, of which ~30,000 are retained
in the produced tree
once cleanup is done ~150,000, of which ~60'000 are retained in the
produced tree.

So trees grow by about 140% from parsing to cleanup, and every tree
node is rewritten more than once on average.

raw types created ~1,200,000 (yes, one point two million)
unique types created ~60,000 (so hash consing types is quite effective)
time spent in implicit search relative to all of type checking 50-65%

For the compiler itself, figures were more than 10 times higher, but
roughly in the same proportion.

I then narrowed down one particularly wasteful aspect of implicit
search: It was in the parts method of Implicits.scala
which computed all parts of a type where implicits are searched in
companion objects. This in spite of the fact that parts are already
cached. Some optimizations here and in other spots knocked off about
half of implicit search time.

That was quite encouraging, so I'd like continue with this for while
to see how far we can push things. A faster compiler helps build
times, and a faster type checker in particular also makes IDEs more
responsive, so it's a double win.

If you want to have a peek yourself how scalac performs on your code
base, use option -Ystatistics. If you find anything that differs
significantly from the data I showed, I'd like to know. I attach the
complete output of my test project to this mail (after optimizations
to implicits). The project which was done by my son Jakob is found at
sourceforge, with name "simplemechanics'", in case you want to
reproduce the data.

Cheers

Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: Speeding up the compiler
Thankyouthankyouthankyouthankyou

On Tue, Jan 5, 2010 at 12:55 AM, martin odersky <martin.odersky@epfl.ch> wrote:
A happy new year 2010 to everyone!

I was using some of the free time of the holidays to instrument the
compiler for better performance analysis. I have tried profilers
(jprofiler and yourkit) in the past, but somehow they never were able
to give me the information I needed. Maybe things have improved now, I
did not try for the last year.

I concentrated initially on the typechecker: Anyway, here are some of
the things I found:

For a medium sized project (~80 source files, ~4000 lines):

tree nodes created by parser: ~25,000
once the type checker is done: ~60,000, of which ~30,000 are retained
in the produced tree
once cleanup is done ~150,000, of which ~60'000 are retained in the
produced tree.

So trees grow by about 140% from parsing to cleanup, and every tree
node is rewritten more than once on average.

raw types created ~1,200,000 (yes, one point two million)
unique types created ~60,000 (so hash consing types is quite effective)
time spent in implicit search relative to all of type checking 50-65%

For the compiler itself, figures were more than 10 times higher, but
roughly in the same proportion.

I then narrowed down one particularly wasteful aspect of implicit
search: It was in the parts method of Implicits.scala
which computed all parts of a type where implicits are searched in
companion objects. This in spite of the fact that parts are already
cached. Some optimizations here and in other spots knocked off about
half of implicit search time.

That was quite encouraging, so I'd like continue with this for while
to see how far we can push things. A faster compiler helps build
times, and a faster type checker in particular also makes IDEs more
responsive, so it's a double win.

If you want to have a peek yourself how scalac performs on your code
base, use option -Ystatistics. If you find anything that differs
significantly from the data I showed, I'd like to know. I attach the
complete output of my test project to this mail (after optimizations
to implicits). The project which was done by my son Jakob is found at
sourceforge, with name "simplemechanics'", in case you want to
reproduce the data.

Cheers

Miguel Garcia
Joined: 2009-06-10,
User offline. Last seen 42 years 45 weeks ago.
Re: Speeding up the compiler

Hi,

I confess not to have run -Ystatistics (not even once), nor looked at the
caching techniques during typing. Instead, I've focused on the AST
transformation phases (superaccessors till cleanup), paying attention to the
data and control dependencies between them, so as to detect opportunities
for task-style parallelism.

Most (yes, most) of the transformations that each of superaccessors till
cleanup perform, one after the other, one compilation unit after the other,
actually operate on their portion of an AST. For example, the effects of a
single tailcalls-rewriting are limited to a single method. In principle, all
those rewritings could be performed in parallel (the other method
definitions need not know how a particular method is rewritten). I'm not
saying the current implementation of tailcalls is thread-safe (I don't
know). What I'm saying is that if phases were structure following the
architecture for task-level parallelism then most worries about compilation
speed would vanish.

Summing up, the proposed "architecture for task-level parallelism" involves:
all collected ASTs constitute the shared mutable state, each transformation
is realized by a component spawning tasks, "component" meaning keeping all
of the mutable state required for its operation stack-local or
thread-confined.

The programming techniques above (that have to be manually applied) are
adopted as the "native" programming model by X10 and Fortress (trees instead
of lists, semi-balanced trees, hierarchical places) only that for numerical
computing. But the lessons are the same for AST processing.

Miguel
http://www.sts.tu-harburg.de/people/mi.garcia

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland