This page is no longer maintained — Please continue to the home page at www.scala-lang.org

shaving nanos by the billion

5 replies
extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.

I took 30+% off all startup times tonight.

% time rcscalac -J-d32 -d /tmp/out /scala/trunk/src/library/scala/Immutable.scala
real 0m2.515s

% time pscalac -J-d32 -d /tmp/out /scala/trunk/src/library/scala/Immutable.scala
real 0m1.584s

% time rcscala -J-d32 -nc -e '5+5'
real 0m3.161s

% time pscala -J-d32 -nc -e '5+5'
real 0m2.278s

Hey scala fans, these kinds of gains have been dangling from the fruit tree for years, waiting for the arrival of someone with a clue how to wield a profiler. No such luck, and eventually a blind squirrel found his way to one of the nuts. If you were holding back because you thought all the easy coconuts had already been picked, now you know better. I bet this was the first nut someone who knew something about profiling would have gone for. I'm sure I'll be hammering on it until only brown powder and coconut residue remain.

http://lampsvn.epfl.ch/trac/scala/changeset/24909

Well I missed 2.9, but now we all have an advance reason to move to 2.9.1.

Sebastien Bocq
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: shaving nanos by the billion

2011/5/8 Paul Phillips <paulp@improving.org>
I took 30+% off all startup times tonight.

 % time rcscalac -J-d32 -d /tmp/out /scala/trunk/src/library/scala/Immutable.scala
 real  0m2.515s

 % time pscalac -J-d32 -d /tmp/out /scala/trunk/src/library/scala/Immutable.scala
 real  0m1.584s

 % time rcscala -J-d32 -nc -e '5+5'
 real  0m3.161s

 % time pscala -J-d32 -nc -e '5+5'
 real  0m2.278s

Hey scala fans, these kinds of gains have been dangling from the fruit tree for years, waiting for the arrival of someone with a clue how to wield a profiler.  No such luck, and eventually a blind squirrel found his way to one of the nuts.  If you were holding back because you thought all the easy coconuts had already been picked, now you know better.  I bet this was the first nut someone who knew something about profiling would have gone for.  I'm sure I'll be hammering on it until only brown powder and coconut residue remain.

 http://lampsvn.epfl.ch/trac/scala/changeset/24909

Well I missed 2.9, but now we all have an advance reason to move to 2.9.1.


Nice! Was there really a significant overhead for drop and dropRight compared to calling substring directly? If I look at how StringOps is implemented (2.8) there are only a few levels of indirection between the two.

Thanks,
Sébastien
fanf
Joined: 2009-03-17,
User offline. Last seen 2 years 30 weeks ago.
Re: shaving nanos by the billion

On 08/05/2011 09:13, Paul Phillips wrote:
> I took 30+% off all startup times tonight.
>
> [...]

Wow, that's just amazing ! Congrats

> http://lampsvn.epfl.ch/trac/scala/changeset/24909
>
> Well I missed 2.9, but now we all have an advance reason to move to 2.9.1.

That's too bad, but you're right, now we will all be waiting for 2.9.1
with eagerness.

(that's just impressive...)

Ruediger Keller 2
Joined: 2010-04-30,
User offline. Last seen 42 years 45 weeks ago.
Re: shaving nanos by the billion

Paul, very nice!

I thought you had written before that you were using a profiler to
find spots for optimization. But perhaps then you were profiling the
compiler, instead of the "startup infrastructure". Just wondering.

Regarding you changeset's comment, it makes me want to have more
optimizations in Scala. ;-)

> Frequent guest stars in the parade of slowness were: using Lists and ListBuffers?
> when any amount of random access is needed,

Ok, not much that can be done about this besides changing the code.

> using Strings as if one shouldn't have to supply 80 characters of .substring
> noise to drop a character here and there, imagining that code can be reused in any way
> shape or form without a savage slowness burn being unleashed upon you
> and everything you have ever loved,

Hmm, perhaps some String specific optimizations to drop and take, etc.
would help, but I see you did just that by optimizing the underlying
slice. What remains is optimizing away the implicit conversion, I
guess.

> String.format,

Hmm, probably not the fasted method to create a String.

> methods which return tuples,

I remember you writing a patch for optimizing this a lot, also
removing the secretly created tuple field when used in a constructor.
AFAIR Martin was against it, so it never went in. Perhaps someone
should convince Martin, that we need this. ;-)

> and any method written with appealing scala features which
> turns out to be called a few orders of magnitude more often than the
> author probably supposed.

I have a feeling that implicit conversions (and the allocations they
cause) might have a rather big part in this. I remember someone
posting an analysis of the optimizer and it's (rather severe)
shortcomings. If I remember correctly implicit conversions are
(almost?) never optimized. Is there currently anyone maintaining the
optimizer? Any chances of some enhancements coming in after 2.9?

Btw. up to now I thought the Scala team was already profiling the
compiler and time critical infractructure, but if you are not, I can
take a stab at it. Perhaps I can find something...

Ah, and also when I began rewriting the Scaladoc HTML generator, I
noticed that it uses ++ for String concatenation. That seems to be a
whole lot slower than plain String concatenation with +.

Regards,
Rüdiger

PS: Which profiler do you use? I'm using VisualVM at home, because I
don't know a better free alternative. Is there?

2011/5/8 Paul Phillips :
> I took 30+% off all startup times tonight.
>
>  % time rcscalac -J-d32 -d /tmp/out /scala/trunk/src/library/scala/Immutable.scala
>  real  0m2.515s
>
>  % time pscalac -J-d32 -d /tmp/out /scala/trunk/src/library/scala/Immutable.scala
>  real  0m1.584s
>
>  % time rcscala -J-d32 -nc -e '5+5'
>  real  0m3.161s
>
>  % time pscala -J-d32 -nc -e '5+5'
>  real  0m2.278s
>
> Hey scala fans, these kinds of gains have been dangling from the fruit tree for years, waiting for the arrival of someone with a clue how to wield a profiler.  No such luck, and eventually a blind squirrel found his way to one of the nuts.  If you were holding back because you thought all the easy coconuts had already been picked, now you know better.  I bet this was the first nut someone who knew something about profiling would have gone for.  I'm sure I'll be hammering on it until only brown powder and coconut residue remain.
>
>  http://lampsvn.epfl.ch/trac/scala/changeset/24909
>
> Well I missed 2.9, but now we all have an advance reason to move to 2.9.1.
>

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: shaving nanos by the billion

On 5/8/11 1:41 AM, Sébastien Bocq wrote:
> Nice! Was there really a significant overhead for drop and dropRight
> compared to calling substring directly?

Anything I touched in that patch is a direct result of repeatedly going
after the biggest lump under the carpet.

> If I look at how StringOps is implemented (2.8) there are only a few
> levels of indirection between the two.

It looks like less than it is. For one thing all the chars are boxed.
Unnecessary boxing overhead was probably the biggest factor overall.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: shaving nanos by the billion

On 5/8/11 1:50 AM, Ruediger Keller wrote:
> I thought you had written before that you were using a profiler to
> find spots for optimization. But perhaps then you were profiling the
> compiler, instead of the "startup infrastructure". Just wondering.

I've profiled plenty - whole days have vanished into its gaping maw. I
just haven't been very good at translating that effort into anything
useful. I didn't set out to optimize startup last night, all I did was
stop trying to profile anything interesting and focused on profiling the
compilation of one program, which was:

trait Immutable

> PS: Which profiler do you use? I'm using VisualVM at home, because I
> don't know a better free alternative. Is there?

Free, no. I mean, I don't know; I've tried them all at various times,
but none of them compared to yourkit. Some people find the netbeans
profiler useful.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland