This page is no longer maintained — Please continue to the home page at www.scala-lang.org

2.9 performance regression ?

43 replies
hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.

With this code:

final class Atom {
final int v;

public int square() { return v*v; }
}

Scala 2.7.7 and 2.8.1 with -optimise :

public int square();
Code:
0: aload_0
1: getfield #12; //Field v:I
4: aload_0
5: getfield #12; //Field v:I
8: imul
9: ireturn

Scala 2.9.0-1 with -optimise:

public int square();
Code:
0: aload_0
1: invokevirtual #14; //Method v:()I
4: aload_0
5: invokevirtual #14; //Method v:()I
8: imul
9: ireturn

Is this expected or a regression?

cheers,

hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: 2.9 performance regression ?

Harshad wrote:

> With this code:
>
> final class Atom {
> final int v;
>
> public int square() { return v*v; }
> }
>

Oops, that seems to be Java. Sorry, copy paste error. I meant:

final class AtomTest(vIn:Int) {
final val v = vIn

def square = v*v
}

The disassembly that I posted was for the above Scala code.

hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: 2.9 performance regression ?

Harshad wrote:

> Scala 2.9.0-1 with -optimise:
>
> public int square();
> Code:
> 0: aload_0
> 1: invokevirtual #14; //Method v:()I
> 4: aload_0
> 5: invokevirtual #14; //Method v:()I
> 8: imul
> 9: ireturn
>
> Is this expected or a regression?
>

No response yet from anyone? Is it considered a trivial regression because
HotSpot JIT is supposed to optimize it away?

If so, then:

1. There are many JVMs now, esp mobile JVMs. Android Dalvik VM

2. Startup times are higher. Even if the JVM could JIT it, it increases the
startup time.

3. This seems like an easy(*) fix.

(*) By easy, I don't profess to know compiler internals, but just that from a
user POV, all info is available at compile time and older versions of the
compiler already did this optimisation.

Thanks,
Harshad

DaveScala
Joined: 2011-03-18,
User offline. Last seen 1 year 21 weeks ago.
Re: 2.9 performance regression ?

Hi Harshad,
a lot of bytecode call the Field via the Method instead of the Field
directly (in places where it could call the Field directly of course)
which is I think an unnecessary performance slow-down. I don't know
the reason. Maybe there is a good reason they have done it that way.
The best thing is to report it in Jira as an enhancement (since
nothing crashes, if it is a regression bug they will change it to
(major) bug themselves) then you can follow your issue in the
dashboard and answer additional questions if they have any. I haven't
found this issue filed yet in Jira.
https://issues.scala-lang.org/secure/Dashboard.jspa
Greetings,
Dave

hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: 2.9 performance regression ?

Thanks Dave,

Have filed an issue in Jira:
https://issues.scala-lang.org/browse/SI-4689

Dave wrote:

> Hi Harshad,
> a lot of bytecode call the Field via the Method instead of the Field
> directly (in places where it could call the Field directly of course)
> which is I think an unnecessary performance slow-down. I don't know
> the reason. Maybe there is a good reason they have done it that way.
> The best thing is to report it in Jira as an enhancement (since
> nothing crashes, if it is a regression bug they will change it to
> (major) bug themselves) then you can follow your issue in the
> dashboard and answer additional questions if they have any. I haven't
> found this issue filed yet in Jira.
> https://issues.scala-lang.org/secure/Dashboard.jspa
> Greetings,
> Dave

Iulian Dragos
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: 2.9 performance regression ?


On Sat, Jun 11, 2011 at 7:03 AM, Harshad <harshad.rj@gmail.com> wrote:
Harshad wrote:

> Scala 2.9.0-1 with -optimise:
>
> public int square();
>   Code:
>    0:   aload_0
>    1:   invokevirtual   #14; //Method v:()I
>    4:   aload_0
>    5:   invokevirtual   #14; //Method v:()I
>    8:   imul
>    9:   ireturn
>
> Is this expected or a regression?
>

No response yet from anyone? Is it considered a trivial regression because
HotSpot JIT is supposed to optimize it away?

Partly yes. In the code you show, there's little incentive in inlining those accessors. Straight-line, short, methods are not subject to much static optimizations since the JIT will do it anyway, so better not duplicate the effort.
The main issue with static optimization and the JVM is that you don't have a fixed (or known) cost model. You are asserting that inlining these fiels yields faster code, but the JVM and the JIT compiler can change the intuitive notion of what is 'fast'. Imagine the JIT compiler has some threshold on the size of methods it considers for inlining, and simple optimizations like the one you propose bump the method size above the limit. You're going to make the code orders of magnitude slower by not jitting, than by following one indirection. It's just an example. And yes, you can always find one more example where you could optimize code better. The question is how much you pessimize the rest. 
The heuristics for what makes sense to inline and what's better left out are not at all simple. If the compiler would inline all possible, statically resolved methods it would run much, much slower for very little gains. Large methods take an indirect hit by making the JIT less likely to JIT-compile them, plus the time required to jit-compile them (startup-time).
The current strategy requires that the first inline decision to be a net win (basically, a higher-order method). Afterwards, the bar for inlinining is lowered, basically trying to inline as much as possible.
I suggest to read Cliff Click's blog on 'Fixing the inlining problem' (was discussed on this list recently), then come up with some good measurements of non-trivial programs. I was using the Scala compiler itself, and I noticed that many times more inlining hurts overall performance! 
http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
Regarding Dalvik, I never heard of any project that does not use ProGuard. ProGuard can do much more, since it sees the whole-program (while the Scala compiler has to assume an open-world) -- for instance, ProGuard 'knows' when a method is called from only one place, and it can inline and /remove/ the definition. So I don't think Scalac can replace ProGuard, nor should it try to.
Lastly, we are trying to improve the optimizer, and Paul and I have spent some time together during ScalaDays and I believe we got a bit farther. So maybe in 2.9 or 2.10 you'll see some fancier optimizations. I hope. :)
iulian 

If so, then:

1. There are many JVMs now, esp mobile JVMs. <cough>Android Dalvik VM</cough>

2. Startup times are higher. Even if the JVM could JIT it, it increases the
  startup time.

3. This seems like an easy(*) fix.

(*) By easy, I don't profess to know compiler internals, but just that from a
user POV, all info is available at compile time and older versions of the
compiler already did this optimisation.

Thanks,
Harshad




--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?

On Tue, Jun 14, 2011 at 9:38 AM, iulian dragos <jaguarul@gmail.com> wrote:

The main issue with static optimization and the JVM is that you don't have a fixed (or known) cost model. You are asserting that inlining these fiels yields faster code, but the JVM and the JIT compiler can change the intuitive notion of what is 'fast'. Imagine the JIT compiler has some threshold on the size of methods it considers for inlining, and simple optimizations like the one you propose bump the method size above the limit.

While the general point holds, this is a straw man for the _specific_ case of emitting field accesses instead of getter method calls.  Both take the same amount of space.

Field access vs. method access is an example of an optimization which, as far as I know, you should _always_ take, in that there are no examples where method access is faster--unless you can replace both with a constant, which could be faster still.  (I can't envision a case, barring a bug in the JIT or compiler*, where the compiler could not know that a constant replacement was okay, but the JIT would but only if it was a method not a field.)

Likewise, method aliases should _always_ be inlined.  Everything is improved if you do: number of methods, method size, byte code size, uncompiled speed, amount of work the JIT has to do, etc..  Hopefully the JIT can always do all of this itself, but some JITs are resource limited in ways that the compiler is not (at least with -optimise).

  --Rex

* For example, I consider that object C { val x = 5 } produces different code for x than object C { final val x = 5 } a compiler bug; how is a val in an object not final?

Jason Zaugg
Joined: 2009-05-18,
User offline. Last seen 38 weeks 5 days ago.
Re: Re: 2.9 performance regression ?

On Tue, Jun 14, 2011 at 5:20 PM, Rex Kerr wrote:
> [snip]
> (I can't envision a case, barring a bug in the JIT
> or compiler*, where the compiler could not know that a constant replacement
> was okay, but the JIT would but only if it was a method not a field.)

> * For example, I consider that object C { val x = 5 } produces different
> code for x than object C { final val x = 5 } a compiler bug; how is a val in
> an object not final?

That particular difference is as per the spec. The compiler could go
further, but I wouldn't call it a bug.

-jason

Ismael Juma 2
Joined: 2011-01-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: 2.9 performance regression ?
On Tue, Jun 14, 2011 at 4:42 PM, Jason Zaugg <jzaugg@gmail.com> wrote:
That particular difference is as per the spec.

Yeah. I don't know the reason, but if I had to guess, it would have to do with separate compilation. You may want to evolve your constants without recompiling clients. By adding "final", you make it clear that you don't care about that.
Best,Ismael
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?
On Tue, Jun 14, 2011 at 11:42 AM, Jason Zaugg <jzaugg@gmail.com> wrote:
On Tue, Jun 14, 2011 at 5:20 PM, Rex Kerr <ichoran@gmail.com> wrote:
> [snip]
> (I can't envision a case, barring a bug in the JIT
> or compiler*, where the compiler could not know that a constant replacement
> was okay, but the JIT would but only if it was a method not a field.)

> * For example, I consider that object C { val x = 5 } produces different
> code for x than object C { final val x = 5 } a compiler bug; how is a val in
> an object not final?

That particular difference is as per the spec. The compiler could go
further, but I wouldn't call it a bug.

-jason

Fine, it's a bug in the spec, then.  Scala knows full well that there is no difference, and the JVM cannot know.  It's a bug in that it is emulating the case where the JVM does know (i.e. static members of a class), and yet it fails to produce corresponding bytecode.  "Annotate every last thing in your objects with 'final' just to be safe" is an absurd requirement.  If, as Ismael suggests, it has something to do with recompilation, "final" is the wrong way to deal with it; that's the job of an annotation.

  --Rex

Ismael Juma 2
Joined: 2011-01-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: 2.9 performance regression ?
On Tue, Jun 14, 2011 at 4:55 PM, Rex Kerr <ichoran@gmail.com> wrote:
Fine, it's a bug in the spec, then.  Scala knows full well that there is no difference, and the JVM cannot know.  It's a bug in that it is emulating the case where the JVM does know (i.e. static members of a class), and yet it fails to produce corresponding bytecode.  "Annotate every last thing in your objects with 'final' just to be safe" is an absurd requirement.  If, as Ismael suggests, it has something to do with recompilation, "final" is the wrong way to deal with it; that's the job of an annotation.

I think your view is clouded by the type of code you write. For a lot of cases, the freedom to change the variables without recompilation is the "safe" case.
I agree that the approach used currently leads to some surprises and hence could be improved, but I'd not describe it as "absurd".
Best,Ismael
Jason Zaugg
Joined: 2009-05-18,
User offline. Last seen 38 weeks 5 days ago.
Re: Re: 2.9 performance regression ?

On Tue, Jun 14, 2011 at 5:55 PM, Rex Kerr wrote:
> Fine, it's a bug in the spec, then.  Scala knows full well that there is no
> difference, and the JVM cannot know.  It's a bug in that it is emulating the
> case where the JVM does know (i.e. static members of a class), and yet it
> fails to produce corresponding bytecode.  "Annotate every last thing in your
> objects with 'final' just to be safe" is an absurd requirement.  If, as
> Ismael suggests, it has something to do with recompilation, "final" is the
> wrong way to deal with it; that's the job of an annotation.

Don't forget: "... and be sure not to annotate the type".

I don't have a strong opinion on this one; I haven't had to go to that
level of detail performance tuning (yet).

-jason

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: 2.9 performance regression ?
The spec needs a way to say when something is a constant (for instance because Java annotations accept only constants). In Java, a field is a constant if it is marked final.
We wanted some syntax in Scala to model that case, without
spending too much of language estate on it (because it is a rather insignificant side issue). That's why the spec defines that

  final val x = <literal>

will turn `x' into a constant. Any differences in code generation are coincidental and are not the primary focus of this part of the spec.

Cheers

 -- Martin

Sciss
Joined: 2008-12-17,
User offline. Last seen 28 weeks 5 days ago.
Re: Re: 2.9 performance regression ?

from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.

does this mean, the following also doesn't get optimized? is there a difference in byte code between

def method {
def helper {
...
}
}

and

def method {
final def helper {
...
}
}

?

or:

object X {
def apply : X = new Impl

private class Impl {
class Helper {
}
}
}
trait X

and

object X {
def apply : X = new Impl

private class Impl {
final class Helper {
}
}
}
trait X

?

that would be very disappointing...

best, -sciss-

On 14 Jun 2011, at 16:55, Rex Kerr wrote:

> On Tue, Jun 14, 2011 at 11:42 AM, Jason Zaugg wrote:
> On Tue, Jun 14, 2011 at 5:20 PM, Rex Kerr wrote:
> > [snip]
> > (I can't envision a case, barring a bug in the JIT
> > or compiler*, where the compiler could not know that a constant replacement
> > was okay, but the JIT would but only if it was a method not a field.)
>
> > * For example, I consider that object C { val x = 5 } produces different
> > code for x than object C { final val x = 5 } a compiler bug; how is a val in
> > an object not final?
>
> That particular difference is as per the spec. The compiler could go
> further, but I wouldn't call it a bug.
>
> -jason
>
> Fine, it's a bug in the spec, then. Scala knows full well that there is no difference, and the JVM cannot know. It's a bug in that it is emulating the case where the JVM does know (i.e. static members of a class), and yet it fails to produce corresponding bytecode. "Annotate every last thing in your objects with 'final' just to be safe" is an absurd requirement. If, as Ismael suggests, it has something to do with recompilation, "final" is the wrong way to deal with it; that's the job of an annotation.
>
> --Rex
>

Jason Zaugg
Joined: 2009-05-18,
User offline. Last seen 38 weeks 5 days ago.
Re: Re: 2.9 performance regression ?

On Tue, Jun 14, 2011 at 6:55 PM, Sciss wrote:
> from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.
>
> does this mean, the following also doesn't get optimized? is there a difference in byte code between

:javap in the 2.9 REPL is your friend :)

-jason

Sciss
Joined: 2008-12-17,
User offline. Last seen 28 weeks 5 days ago.
Re: Re: 2.9 performance regression ?

although it's not possible to peak into the private class of an object ... ?

scala> object A { def b: AT = new Impl; private class Impl extends AT }; trait AT
scala> :javap -c A

nada...

On 14 Jun 2011, at 18:14, Jason Zaugg wrote:

> On Tue, Jun 14, 2011 at 6:55 PM, Sciss wrote:
>> from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.
>>
>> does this mean, the following also doesn't get optimized? is there a difference in byte code between
>
> :javap in the 2.9 REPL is your friend :)
>
> -jason

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?
You need to use -private to see private objects.  javap -help to see the command-line arguments.

  --Rex

On Tue, Jun 14, 2011 at 2:06 PM, Sciss <contact@sciss.de> wrote:
although it's not possible to peak into the private class of an object ... ?

scala> object A { def b: AT = new Impl; private class Impl extends AT }; trait AT
scala> :javap -c A

nada...



On 14 Jun 2011, at 18:14, Jason Zaugg wrote:

> On Tue, Jun 14, 2011 at 6:55 PM, Sciss <contact@sciss.de> wrote:
>> from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.
>>
>> does this mean, the following also doesn't get optimized? is there a difference in byte code between
>
> :javap in the 2.9 REPL is your friend :)
>
> -jason


Sciss
Joined: 2008-12-17,
User offline. Last seen 28 weeks 5 days ago.
Re: Re: 2.9 performance regression ?

but that doesn't nest....

scala> trait T
scala> object A { def b: T = new Impl; private class Impl extends T { class Inner }}
scala> object B { def b: T = new Impl; private class Impl extends T { private final class Inner }}

scala> :javap -c -private A
...

scala> :javap -c -private A.Impl
// error

On 14 Jun 2011, at 19:10, Rex Kerr wrote:

> You need to use -private to see private objects. javap -help to see the command-line arguments.
>
> --Rex
>
> On Tue, Jun 14, 2011 at 2:06 PM, Sciss wrote:
> although it's not possible to peak into the private class of an object ... ?
>
> scala> object A { def b: AT = new Impl; private class Impl extends AT }; trait AT
> scala> :javap -c A
>
> nada...
>
>
>
> On 14 Jun 2011, at 18:14, Jason Zaugg wrote:
>
> > On Tue, Jun 14, 2011 at 6:55 PM, Sciss wrote:
> >> from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.
> >>
> >> does this mean, the following also doesn't get optimized? is there a difference in byte code between
> >
> > :javap in the 2.9 REPL is your friend :)
> >
> > -jason
>
>

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?
On Tue, Jun 14, 2011 at 12:00 PM, Ismael Juma <ismael@juma.me.uk> wrote:
On Tue, Jun 14, 2011 at 4:55 PM, Rex Kerr <ichoran@gmail.com> wrote:
Fine, it's a bug in the spec, then.  Scala knows full well that there is no difference, and the JVM cannot know.  It's a bug in that it is emulating the case where the JVM does know (i.e. static members of a class), and yet it fails to produce corresponding bytecode.  "Annotate every last thing in your objects with 'final' just to be safe" is an absurd requirement.  If, as Ismael suggests, it has something to do with recompilation, "final" is the wrong way to deal with it; that's the job of an annotation.

I think your view is clouded by the type of code you write. For a lot of cases, the freedom to change the variables without recompilation is the "safe" case.

Point taken, especially if this recompilation-avoidance optimization is often used.  But how do people do this in Java-land, where the normal way of defining constants is "public static final"?  Do you just recompile because javac is almost infinitely fast?
 
I agree that the approach used currently leads to some surprises and hence could be improved, but I'd not describe it as "absurd".

What I was characterizing as "absurd" was one possible (and fairly natural) reaction to noticing that adding "final", which normally means "cannot be overridden", to something in an object, where nothing can be overridden, results in dramatically better performance.  Having the case where apparently functionally identical code--save for a part of the spec that is unnecessary in this situation--yields significantly different results tends to result in superstitious behavior of the type I mentioned (use "final" everywhere).

Having to _only_ put final on vals is merely silly in this case; it just adds to the list of "random quirks I have to remember", the relative lack of which is one of the most refreshing parts of using Scala.

  --Rex

Ismael Juma 2
Joined: 2011-01-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: 2.9 performance regression ?
On Tue, Jun 14, 2011 at 7:29 PM, Rex Kerr <ichoran@gmail.com> wrote:
Point taken, especially if this recompilation-avoidance optimization is often used.  But how do people do this in Java-land, where the normal way of defining constants is "public static final"?  Do you just recompile because javac is almost infinitely fast?

Actually, it's not uncommon for libraries to do things to avoid this in Java. I've seen that a few times and a quick search shows:
http://stackoverflow.com/questions/4701203/why-to-avoid-constant-folding-in-java-when
Best, Ismael
roland.kuhn
Joined: 2011-02-21,
User offline. Last seen 35 weeks 3 days ago.
Re: Re: 2.9 performance regression ?

This should probably be a FAQ somewhere:

Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_24).
Type in expressions to have them evaluated.
Type :help for more information.

scala> trait T
defined trait T

scala> object A{private class Impl extends T}
defined module A

scala> println(A.getClass)
class $line2.$read$$iw$$iw$A$

scala> :javap -p $line2.$read$$iw$$iw$A$Impl
Compiled from ""
public class A$Impl extends java.lang.Object implements T,scala.ScalaObject{
public A$Impl();
}

On Jun 14, 2011, at 20:18 , Sciss wrote:

> but that doesn't nest....
>
> scala> trait T
> scala> object A { def b: T = new Impl; private class Impl extends T { class Inner }}
> scala> object B { def b: T = new Impl; private class Impl extends T { private final class Inner }}
>
> scala> :javap -c -private A
> ...
>
> scala> :javap -c -private A.Impl
> // error
>
>
> On 14 Jun 2011, at 19:10, Rex Kerr wrote:
>
>> You need to use -private to see private objects. javap -help to see the command-line arguments.
>>
>> --Rex
>>
>> On Tue, Jun 14, 2011 at 2:06 PM, Sciss wrote:
>> although it's not possible to peak into the private class of an object ... ?
>>
>> scala> object A { def b: AT = new Impl; private class Impl extends AT }; trait AT
>> scala> :javap -c A
>>
>> nada...
>>
>>
>>
>> On 14 Jun 2011, at 18:14, Jason Zaugg wrote:
>>
>>> On Tue, Jun 14, 2011 at 6:55 PM, Sciss wrote:
>>>> from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.
>>>>
>>>> does this mean, the following also doesn't get optimized? is there a difference in byte code between
>>>
>>> :javap in the 2.9 REPL is your friend :)
>>>
>>> -jason
>>
>>
>

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re: 2.9 performance regression ?

Ask on Stack Overflow and then answer yourself. :-)

On Tue, Jun 14, 2011 at 15:43, Roland Kuhn wrote:
> This should probably be a FAQ somewhere:
>
> Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_24).
> Type in expressions to have them evaluated.
> Type :help for more information.
>
> scala> trait T
> defined trait T
>
> scala> object A{private class Impl extends T}
> defined module A
>
> scala> println(A.getClass)
> class $line2.$read$$iw$$iw$A$
>
> scala> :javap -p $line2.$read$$iw$$iw$A$Impl
> Compiled from ""
> public class A$Impl extends java.lang.Object implements T,scala.ScalaObject{
>    public A$Impl();
> }
>
> On Jun 14, 2011, at 20:18 , Sciss wrote:
>
>> but that doesn't nest....
>>
>> scala> trait T
>> scala> object A { def b: T = new Impl; private class Impl extends T { class Inner }}
>> scala> object B { def b: T = new Impl; private class Impl extends T { private final class Inner }}
>>
>> scala> :javap -c -private A
>> ...
>>
>> scala> :javap -c -private A.Impl
>> // error
>>
>>
>> On 14 Jun 2011, at 19:10, Rex Kerr wrote:
>>
>>> You need to use -private to see private objects.  javap -help to see the command-line arguments.
>>>
>>>  --Rex
>>>
>>> On Tue, Jun 14, 2011 at 2:06 PM, Sciss wrote:
>>> although it's not possible to peak into the private class of an object ... ?
>>>
>>> scala> object A { def b: AT = new Impl; private class Impl extends AT }; trait AT
>>> scala> :javap -c A
>>>
>>> nada...
>>>
>>>
>>>
>>> On 14 Jun 2011, at 18:14, Jason Zaugg wrote:
>>>
>>>> On Tue, Jun 14, 2011 at 6:55 PM, Sciss wrote:
>>>>> from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.
>>>>>
>>>>> does this mean, the following also doesn't get optimized? is there a difference in byte code between
>>>>
>>>> :javap in the 2.9 REPL is your friend :)
>>>>
>>>> -jason
>>>
>>>
>>
>
>

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: Re: 2.9 performance regression ?


On Jun 14, 2011 11:32 PM, "Daniel Sobral" <dcsobral@gmail.com> wrote:
>
> Ask on Stack Overflow and then answer yourself. :-)
>

All the cool kids are doing it...

> On Tue, Jun 14, 2011 at 15:43, Roland Kuhn <google@rkuhn.info> wrote:
> > This should probably be a FAQ somewhere:
> >
> > Welcome to Scala version 2.9.0.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_24).
> > Type in expressions to have them evaluated.
> > Type :help for more information.
> >
> > scala> trait T
> > defined trait T
> >
> > scala> object A{private class Impl extends T}
> > defined module A
> >
> > scala> println(A.getClass)
> > class $line2.$read$$iw$$iw$A$
> >
> > scala> :javap -p $line2.$read$$iw$$iw$A$Impl
> > Compiled from "<console>"
> > public class A$Impl extends java.lang.Object implements T,scala.ScalaObject{
> >    public A$Impl();
> > }
> >
> > On Jun 14, 2011, at 20:18 , Sciss wrote:
> >
> >> but that doesn't nest....
> >>
> >> scala> trait T
> >> scala> object A { def b: T = new Impl; private class Impl extends T { class Inner }}
> >> scala> object B { def b: T = new Impl; private class Impl extends T { private final class Inner }}
> >>
> >> scala> :javap -c -private A
> >> ...
> >>
> >> scala> :javap -c -private A.Impl
> >> // error
> >>
> >>
> >> On 14 Jun 2011, at 19:10, Rex Kerr wrote:
> >>
> >>> You need to use -private to see private objects.  javap -help to see the command-line arguments.
> >>>
> >>>  --Rex
> >>>
> >>> On Tue, Jun 14, 2011 at 2:06 PM, Sciss <contact@sciss.de> wrote:
> >>> although it's not possible to peak into the private class of an object ... ?
> >>>
> >>> scala> object A { def b: AT = new Impl; private class Impl extends AT }; trait AT
> >>> scala> :javap -c A
> >>>
> >>> nada...
> >>>
> >>>
> >>>
> >>> On 14 Jun 2011, at 18:14, Jason Zaugg wrote:
> >>>
> >>>> On Tue, Jun 14, 2011 at 6:55 PM, Sciss <contact@sciss.de> wrote:
> >>>>> from what i understand, i agree with you. i always assumed scalac is doing its best optimizing `defs` and `vals` in an `object` so i don't need to clutter my code with dozens of `final` keywords.
> >>>>>
> >>>>> does this mean, the following also doesn't get optimized? is there a difference in byte code between
> >>>>
> >>>> :javap in the 2.9 REPL is your friend :)
> >>>>
> >>>> -jason
> >>>
> >>>
> >>
> >
> >
>
>
>
> --
> Daniel C. Sobral
>
> I travel to the future all the time.

hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: Re: 2.9 performance regression ?

iulian dragos wrote:

> Lastly, we are trying to improve the optimizer, and Paul and I have spent
> some time together during ScalaDays and I believe we got a bit farther. So
> maybe in 2.9 or 2.10 you'll see some fancier optimizations. I hope. :)

Thanks Iulian. Looking forward to the new optimisations.

> In the code you show, there's little incentive in inlining
> those accessors.

Wouldn't it lead to faster startup times?

> The heuristics for what makes sense to inline and what's better left out
> are not at all simple.

I agree in general, but in the specific case I mentioned, I can't see any
inlining cost.

> Regarding Dalvik, I never heard of any project that does not use ProGuard.

Ah, yes. Though I used ProGuard only for the final release build. Now I know
better :)

thanks,
Harshad

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: 2.9 performance regression ?


On Tue, Jun 14, 2011 at 8:29 PM, Rex Kerr <ichoran@gmail.com> wrote:
On Tue, Jun 14, 2011 at 12:00 PM, Ismael Juma <ismael@juma.me.uk> wrote:
On Tue, Jun 14, 2011 at 4:55 PM, Rex Kerr <ichoran@gmail.com> wrote:
Fine, it's a bug in the spec, then.  Scala knows full well that there is no difference, and the JVM cannot know.  It's a bug in that it is emulating the case where the JVM does know (i.e. static members of a class), and yet it fails to produce corresponding bytecode.  "Annotate every last thing in your objects with 'final' just to be safe" is an absurd requirement.  If, as Ismael suggests, it has something to do with recompilation, "final" is the wrong way to deal with it; that's the job of an annotation.

I think your view is clouded by the type of code you write. For a lot of cases, the freedom to change the variables without recompilation is the "safe" case.

Point taken, especially if this recompilation-avoidance optimization is often used.  But how do people do this in Java-land, where the normal way of defining constants is "public static final"?  Do you just recompile because javac is almost infinitely fast?
 
I agree that the approach used currently leads to some surprises and hence could be improved, but I'd not describe it as "absurd".

What I was characterizing as "absurd" was one possible (and fairly natural) reaction to noticing that adding "final", which normally means "cannot be overridden", to something in an object, where nothing can be overridden, results in dramatically better performance. 

Do you have data on the performance difference?

 -- Martin
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?
On Thu, Jun 16, 2011 at 12:51 PM, martin odersky <martin.odersky@epfl.ch> wrote:


On Tue, Jun 14, 2011 at 8:29 PM, Rex Kerr <ichoran@gmail.com> wrote:

What I was characterizing as "absurd" was one possible (and fairly natural) reaction to noticing that adding "final", which normally means "cannot be overridden", to something in an object, where nothing can be overridden, results in dramatically better performance. 

Do you have data on the performance difference?

 -- Martin

For tight loops, I've generally seen around 50%.  For instance, in the Fasta program of the Computer Languages Benchmark Game, one is required to implement the following linear congruential random number generator:

  object Rng {
    val IM = 139968
    val IA = 3877
    val IC = 29573
    val scale = 1023.toDouble/IM
    var last = 42
    def next = { last = (last*IA + IC) % IM; last*scale }
  }

Or you could stick final on all the constants (or turn them into defs, actually, save for scale):

  object Rng2 {
    final val IM = 139968
    final val IA = 3877
    final val IC = 29573
    final val scale = 1023.toDouble/IM
    var last = 42
    def next = { last = (last*IA + IC) % IM; last*scale }
  }

And then if you benchmark it running:

scala> ptime{ var i=0; var s=0d; while(i<100000000) { s+=Rng.next; i+=1 }; s }
Elapsed: 0.815 s
res6: Double = 5.1149569804410095E10

scala> ptime{ var i=0; var s=0d; while(i<100000000) { s+=Rng2.next; i+=1 }; s }
Elapsed: 0.532 s
res8: Double = 5.1149569804410095E10

so that's a 53% slowdown.  (Your return value may vary; these weren't my first runs.)

If you drop the double mulitplication and the modulus:

object Rng3 {
    val IA = 3877
    val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }
  object Rng4 {
    final val IA = 3877
    final val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }

and increase the number of iterations to make up for it:

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng3.next; i+=1 }; s }
Elapsed: 1.140 s
res26: Int = -827043584

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng4.next; i+=1 }; s }
Elapsed: 0.797 s
res27: Int = -827043584

we get a 43% slowdown.

Even more peculiar is if we make everything final:

  object Rng5 {
    final val IA = 3877
    final val IC = 29573
    final var last = 42
    final def next = { last = (last*IA + IC); last }
  }

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng5.next; i+=1 }; s }
Elapsed: 2.759 s
res42: Int = -586919680

Now we have a 346% slowdown!  (The "final var" is the culprit--somehow the getter or setter isn't being optimized, I think?)

If we write the code in Java:

public class Rng6 {
  public static final int IA = 3877;
  public static final int IC = 29573;
  static int last = 42;
  public static final int next() { last = (last*IA + IC); return last; }
}

then we get:

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng6.next; i+=1 }; s }
Elapsed: 0.786 s
res1: Int = -586919680

which is basically the same as the fast Scala result.

  --Rex

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: 2.9 performance regression ?


  object Rng {
    val IM = 139968
    val IA = 3877
    val IC = 29573
    val scale = 1023.toDouble/IM
    var last = 42
    def next = { last = (last*IA + IC) % IM; last*scale }
  }

Have you tried with 1023.0/IM instead? What does that give? I am asking because I would assume that detecting and folding constants is the first thing a good JIT should do. So I do not 

Or you could stick final on all the constants (or turn them into defs, actually, save for scale):

  object Rng2 {
    final val IM = 139968
    final val IA = 3877
    final val IC = 29573
    final val scale = 1023.toDouble/IM
    var last = 42
    def next = { last = (last*IA + IC) % IM; last*scale }
  }

And then if you benchmark it running:

scala> ptime{ var i=0; var s=0d; while(i<100000000) { s+=Rng.next; i+=1 }; s }
Elapsed: 0.815 s
res6: Double = 5.1149569804410095E10

scala> ptime{ var i=0; var s=0d; while(i<100000000) { s+=Rng2.next; i+=1 }; s }
Elapsed: 0.532 s
res8: Double = 5.1149569804410095E10

so that's a 53% slowdown.  (Your return value may vary; these weren't my first runs.)

If you drop the double mulitplication and the modulus:

object Rng3 {
    val IA = 3877
    val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }
  object Rng4 {
    final val IA = 3877
    final val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }

and increase the number of iterations to make up for it:

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng3.next; i+=1 }; s }
Elapsed: 1.140 s
res26: Int = -827043584

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng4.next; i+=1 }; s }
Elapsed: 0.797 s
res27: Int = -827043584

we get a 43% slowdown.

Even more peculiar is if we make everything final:

  object Rng5 {
    final val IA = 3877
    final val IC = 29573
    final var last = 42
    final def next = { last = (last*IA + IC); last }
  }

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng5.next; i+=1 }; s }
Elapsed: 2.759 s
res42: Int = -586919680

Now we have a 346% slowdown!  (The "final var" is the culprit--somehow the getter or setter isn't being optimized, I think?)

If we write the code in Java:

public class Rng6 {
  public static final int IA = 3877;
  public static final int IC = 29573;
  static int last = 42;
  public static final int next() { last = (last*IA + IC); return last; }
}

then we get:

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng6.next; i+=1 }; s }
Elapsed: 0.786 s
res1: Int = -586919680

which is basically the same as the fast Scala result.

  --Rex


Question: What times do you get if you leave out the final in Java?

 -- Martin
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?
On Thu, Jun 16, 2011 at 3:48 PM, martin odersky <martin.odersky@epfl.ch> wrote:


  object Rng {
    val IM = 139968
    val IA = 3877
    val IC = 29573
    val scale = 1023.toDouble/IM
    var last = 42
    def next = { last = (last*IA + IC) % IM; last*scale }
  }

Have you tried with 1023.0/IM instead? What does that give?

Same.  It doesn't matter whether it's 1023.0/IM or 0.007308813443072703.  The integer modulus is slower than the floating point multiplication anyway
 



Question: What times do you get if you leave out the final in Java?

public class Rng7 {
  public static int IA = 3877;
  public static int IC = 29573;
  static int last = 42;
  public static final int next() { last = (last*IA + IC); return last; }
}

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng7.next; i+=1 }; s }
Elapsed: 1.133 s
res2: Int = -555200256

Java without final exactly the same as the Scala without final.

Except in Java, "final" means "you can't modify this", which is what "val" means in Scala, especially in an object or final class.

So conceptually, the Java result makes perfect sense (it can't change--why not optimize further!), while the Scala result is nonintuitive at best.

  --Rex

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: 2.9 performance regression ?


On Thu, Jun 16, 2011 at 10:05 PM, Rex Kerr <ichoran@gmail.com> wrote:
On Thu, Jun 16, 2011 at 3:48 PM, martin odersky <martin.odersky@epfl.ch> wrote:


  object Rng {
    val IM = 139968
    val IA = 3877
    val IC = 29573
    val scale = 1023.toDouble/IM
    var last = 42
    def next = { last = (last*IA + IC) % IM; last*scale }
  }

Have you tried with 1023.0/IM instead? What does that give?

Same.  It doesn't matter whether it's 1023.0/IM or 0.007308813443072703.  The integer modulus is slower than the floating point multiplication anyway
 



Question: What times do you get if you leave out the final in Java?

public class Rng7 {
  public static int IA = 3877;
  public static int IC = 29573;
  static int last = 42;
  public static final int next() { last = (last*IA + IC); return last; }
}

scala> ptime{ var i,s=0; while(i<1000000000) { s+=Rng7.next; i+=1 }; s }
Elapsed: 1.133 s
res2: Int = -555200256

Java without final exactly the same as the Scala without final.

Except in Java, "final" means "you can't modify this", which is what "val" means in Scala, especially in an object or final class.

So conceptually, the Java result makes perfect sense (it can't change--why not optimize further!), while the Scala result is nonintuitive at best.

  --Rex

There are two questions here: Is Scala's syntax for constants workable? I think it is what it is, and probably just needs to be documented better. Threads like this one and questions on stackoverflow are a good start.

Then, should the compiler do more aggressive optimizations for non-final vals. I am more than a little skeptical whether it is worth doing. I think it will give meaningful performance gains only in very particular cases, and, furthermore, these cases can be hand-optimized. Put another way: Since this is as much in javac's reach as in scalac's, and javac does not do it, we should think twice whether this is worth doing at all. In any case there are 100's of more rewarding optimizations not yet done.

 -- Martin


ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?
On Fri, Jun 17, 2011 at 4:41 AM, martin odersky <martin.odersky@epfl.ch> wrote:

There are two questions here: Is Scala's syntax for constants workable? I think it is what it is, and probably just needs to be documented better. Threads like this one and questions on stackoverflow are a good start.

Fair enough--without -optimise I agree that the way things stand are perfectly fine.
 

Then, should the compiler do more aggressive optimizations for non-final vals. I am more than a little skeptical whether it is worth doing. I think it will give meaningful performance gains only in very particular cases, and, furthermore, these cases can be hand-optimized.

Granted, but there should be a "performance tuning" document that spells these things out, because it's unlikely that one will realize the problem; when I noticed this, I went back and checked all my high performance numeric code and found that I had in "good" programming practice substituted variables for constants in the code.
 
Put another way: Since this is as much in javac's reach as in scalac's, and javac does not do it, we should think twice whether this is worth doing at all.

Huh?  All javac knows, without "final", is that it's got a _mutable_ int sitting there!  javac can't (without effect tracking) do a thing unless you mark it final.  Once you mark it final, it does the optimization.

Scala knows that it is not a mutable int because it's marked "val".  And furthermore it knows that it won't be overridden because it's in an object or final class.

The JVM as far as I know doesn't have the level of effect tracking needed to optimize things _annotated_ as final; there is not any restriction on bytecode that actually prevents code from writing multiple times to a "final" field (as far as I can tell from the spec).  In fact, Scala uses this ability when overriding vals--it doesn't override the method, it instead overwrites the java-private-final value of the field in the constructor.

So the Scala compiler is the only place where the knowledge exists that this is a safe optimization.

You don't even have to give up the I-really-am-a-constant vs. I-am-a-method-masquerading-as-a-constant thing since the JVM is perfectly well able to elide methods that return a constant.  The problem is that Scala actually stores the value in a field, not that it uses a method.  So if you compare

  object Rng3 {
    val IA = 3877
    val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }

  object Rng4 {
    final val IA = 3877
    final val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }
  object Rng8 {
    def IA = 3877
    def IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }

you find that the performance is identical for 4 and 8.  In the Rng4 case the constants are pushed into the next method, but in Rng8 the code for next is bytecode-identical to that for Rng3.  The difference is

// Rng3 -- val
public int IA();
  Code:
   0:    aload_0
   1:    getfield    #20; //Field IA:I
   4:    ireturn

// Rng8 -- def
public int IA();
  Code:
   0:    sipush    3877
   3:    ireturn

and in the former case, the JVM is, as far as I know, powerless to elide the field access.
 
In any case there are 100's of more rewarding optimizations not yet done.

I look forward to seeing them.  So far I've been underwhelmed by what -optimise does, but compile and other things on the horizon are potentially very exciting.

I think hundreds might be an exaggeration, though, unless you're talking about library changes.  It's not very often that you can speed up a significant amount of code that is intended to run fast via a relatively simple change that also makes the internal workings more logical conceptually.

  --Rex

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: 2.9 performance regression ?


On Fri, Jun 17, 2011 at 5:24 PM, Rex Kerr <ichoran@gmail.com> wrote:
On Fri, Jun 17, 2011 at 4:41 AM, martin odersky <martin.odersky@epfl.ch> wrote:

There are two questions here: Is Scala's syntax for constants workable? I think it is what it is, and probably just needs to be documented better. Threads like this one and questions on stackoverflow are a good start.

Fair enough--without -optimise I agree that the way things stand are perfectly fine.
 

Then, should the compiler do more aggressive optimizations for non-final vals. I am more than a little skeptical whether it is worth doing. I think it will give meaningful performance gains only in very particular cases, and, furthermore, these cases can be hand-optimized.

Granted, but there should be a "performance tuning" document that spells these things out, because it's unlikely that one will realize the problem; when I noticed this, I went back and checked all my high performance numeric code and found that I had in "good" programming practice substituted variables for constants in the code.
 
Put another way: Since this is as much in javac's reach as in scalac's, and javac does not do it, we should think twice whether this is worth doing at all.

Huh?  All javac knows, without "final", is that it's got a _mutable_ int sitting there!  javac can't (without effect tracking) do a thing unless you mark it final.  Once you mark it final, it does the optimization.

You are right. I had not realized this distinction before. So, there is a gap between what Java can do and what Scala can do. Unfortunately a Scala optimizer can inline constants only if definition and use appear in the same sourcefile. Not sure whether that special case is worth it. For common benchmark,s yes. But on the other hand, it would be even less comprehensible than the current scheme (scalac inlines constants in the same source file, but for separate source files, I need `final', why?). So I think the best answer is to document performance issues and implications better.

Cheers

 -- Martin

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?


On Fri, Jun 17, 2011 at 3:24 PM, martin odersky <martin.odersky@epfl.ch> wrote:


On Fri, Jun 17, 2011 at 5:24 PM, Rex Kerr <ichoran@gmail.com> wrote:
On Fri, Jun 17, 2011 at 4:41 AM, martin odersky <martin.odersky@epfl.ch> wrote:

There are two questions here: Is Scala's syntax for constants workable? I think it is what it is, and probably just needs to be documented better. Threads like this one and questions on stackoverflow are a good start.

Fair enough--without -optimise I agree that the way things stand are perfectly fine.
 

Then, should the compiler do more aggressive optimizations for non-final vals. I am more than a little skeptical whether it is worth doing. I think it will give meaningful performance gains only in very particular cases, and, furthermore, these cases can be hand-optimized.

Granted, but there should be a "performance tuning" document that spells these things out, because it's unlikely that one will realize the problem; when I noticed this, I went back and checked all my high performance numeric code and found that I had in "good" programming practice substituted variables for constants in the code.
 
Put another way: Since this is as much in javac's reach as in scalac's, and javac does not do it, we should think twice whether this is worth doing at all.

Huh?  All javac knows, without "final", is that it's got a _mutable_ int sitting there!  javac can't (without effect tracking) do a thing unless you mark it final.  Once you mark it final, it does the optimization.

You are right. I had not realized this distinction before. So, there is a gap between what Java can do and what Scala can do. Unfortunately a Scala optimizer can inline constants only if definition and use appear in the same sourcefile.

If you write the getter as a method that loads the constant, you have the same good performance (as I showed in my previous email) and it can apply anywhere.  So the only change that needs to be made is that vals of constants in objects and final classes are implemented as a method with no backing field--it just loads the constant in the bytecode.

So you don't even need better documentation, just a change in implementation that would be transparent to everyone yet have optimal performance.

  --Rex

hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: Re: 2.9 performance regression ?

Rex Kerr wrote:

> If you write the getter as a method that loads the constant, you have the
> same good performance (as I showed in my previous email) and it can apply
> anywhere. So the only change that needs to be made is that vals of
> constants in objects and final classes are implemented as a method with no
> backing field--it just loads the constant in the bytecode.
>
> So you don't even need better documentation, just a change in
> implementation that would be transparent to everyone yet have optimal
> performance.

Ditto.

I think Martin's confusion is because of my original post which did complain
about non-inlined accessors for final vals. (I still maintain that stance).

Whereas this sub-thread is about automatically promoting vals in final classes
and objects to final vals.

Iulian Dragos
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: 2.9 performance regression ?


On Fri, Jun 17, 2011 at 5:24 PM, Rex Kerr <ichoran@gmail.com> wrote:
On Fri, Jun 17, 2011 at 4:41 AM, martin odersky <martin.odersky@epfl.ch> wrote:

There are two questions here: Is Scala's syntax for constants workable? I think it is what it is, and probably just needs to be documented better. Threads like this one and questions on stackoverflow are a good start.

Fair enough--without -optimise I agree that the way things stand are perfectly fine.
 

Then, should the compiler do more aggressive optimizations for non-final vals. I am more than a little skeptical whether it is worth doing. I think it will give meaningful performance gains only in very particular cases, and, furthermore, these cases can be hand-optimized.

Granted, but there should be a "performance tuning" document that spells these things out, because it's unlikely that one will realize the problem; when I noticed this, I went back and checked all my high performance numeric code and found that I had in "good" programming practice substituted variables for constants in the code.
 
Put another way: Since this is as much in javac's reach as in scalac's, and javac does not do it, we should think twice whether this is worth doing at all.

Huh?  All javac knows, without "final", is that it's got a _mutable_ int sitting there!  javac can't (without effect tracking) do a thing unless you mark it final.  Once you mark it final, it does the optimization.

Scala knows that it is not a mutable int because it's marked "val".  And furthermore it knows that it won't be overridden because it's in an object or final class.

The JVM as far as I know doesn't have the level of effect tracking needed to optimize things _annotated_ as final; there is not any restriction on bytecode that actually prevents code from writing multiple times to a "final" field (as far as I can tell from the spec).  In fact, Scala uses this ability when overriding vals--it doesn't override the method, it instead overwrites the java-private-final value of the field in the constructor.

How can you override private members? Do you have an example? You get two (private) fields. 

So the Scala compiler is the only place where the knowledge exists that this is a safe optimization.

You don't even have to give up the I-really-am-a-constant vs. I-am-a-method-masquerading-as-a-constant thing since the JVM is perfectly well able to elide methods that return a constant.  The problem is that Scala actually stores the value in a field, not that it uses a method.  So if you compare

  object Rng3 {
    val IA = 3877
    val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }

  object Rng4 {
    final val IA = 3877
    final val IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }
  object Rng8 {
    def IA = 3877
    def IC = 29573
    var last = 42
    def next = { last = (last*IA + IC); last }
  }

you find that the performance is identical for 4 and 8.  In the Rng4 case the constants are pushed into the next method, but in Rng8 the code for next is bytecode-identical to that for Rng3.  The difference is

// Rng3 -- val
public int IA();
  Code:
   0:    aload_0
   1:    getfield    #20; //Field IA:I
   4:    ireturn

// Rng8 -- def
public int IA();
  Code:
   0:    sipush    3877
   3:    ireturn

and in the former case, the JVM is, as far as I know, powerless to elide the field access.
 
In any case there are 100's of more rewarding optimizations not yet done.

I look forward to seeing them.  So far I've been underwhelmed by what -optimise does, but compile and other things on the horizon are potentially very exciting.

I think hundreds might be an exaggeration, though, unless you're talking about library changes.  It's not very often that you can speed up a significant amount of code that is intended to run fast via a relatively simple change that also makes the internal workings more logical conceptually.

  --Rex




--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais
Ismael Juma 2
Joined: 2011-01-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: 2.9 performance regression ?
On Fri, Jun 17, 2011 at 11:39 PM, Rex Kerr <ichoran@gmail.com> wrote:
If you write the getter as a method that loads the constant, you have the same good performance (as I showed in my previous email) and it can apply anywhere.  So the only change that needs to be made is that vals of constants in objects and final classes are implemented as a method with no backing field--it just loads the constant in the bytecode.

So you don't even need better documentation, just a change in implementation that would be transparent to everyone yet have optimal performance.

Not in all cases. Consider the case where you use the `constant` with pattern matching and expect a switch to be generated. This is a situation that hasn't been mentioned in this thread, but can also be very important. At the moment, a final val is required for the switch to be generated.
Best,Ismael
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?


On Sat, Jun 18, 2011 at 7:11 AM, iulian dragos <jaguarul@gmail.com> wrote:


On Fri, Jun 17, 2011 at 5:24 PM, Rex Kerr <ichoran@gmail.com> wrote:
All javac knows, without "final", is that it's got a _mutable_ int sitting there!  javac can't (without effect tracking) do a thing unless you mark it final.  Once you mark it final, it does the optimization.

Scala knows that it is not a mutable int because it's marked "val".  And furthermore it knows that it won't be overridden because it's in an object or final class.

The JVM as far as I know doesn't have the level of effect tracking needed to optimize things _annotated_ as final; there is not any restriction on bytecode that actually prevents code from writing multiple times to a "final" field (as far as I can tell from the spec).  In fact, Scala uses this ability when overriding vals--it doesn't override the method, it instead overwrites the java-private-final value of the field in the constructor.

How can you override private members? Do you have an example? You get two (private) fields.

Hm, you're right.  I thought I had come up with an example where it just changed the value of the field, but I can't seem to recreate it with a simple example.  I probably made a mistake.

Still, the spec doesn't say that "final" is enforced in any way by the JVM, so it could only know that a private field is untouched if it examines all the bytecode of the class and determines that only the constructor sets it, and then only to a constant, and then only once (to one value) regardless of the execution paths through the code.  That seems a bit much to expect the JVM to do.

  --Rex
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: Re: 2.9 performance regression ?
On Sat, Jun 18, 2011 at 8:47 AM, Ismael Juma <ismael@juma.me.uk> wrote:
On Fri, Jun 17, 2011 at 11:39 PM, Rex Kerr <ichoran@gmail.com> wrote:
If you write the getter as a method that loads the constant, you have the same good performance (as I showed in my previous email) and it can apply anywhere.  So the only change that needs to be made is that vals of constants in objects and final classes are implemented as a method with no backing field--it just loads the constant in the bytecode.

So you don't even need better documentation, just a change in implementation that would be transparent to everyone yet have optimal performance.

Not in all cases. Consider the case where you use the `constant` with pattern matching and expect a switch to be generated. This is a situation that hasn't been mentioned in this thread, but can also be very important. At the moment, a final val is required for the switch to be generated.
Best,Ismael

Good point.  I'm not sure how to get around that one without annoying either people aiming for performance or those aiming to avoid recompilation.  Still, the move from getter-on-field to getter-loading-constant is no worse in this case.

  --Rex

Ismael Juma 2
Joined: 2011-01-22,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: 2.9 performance regression ?
On Sat, Jun 18, 2011 at 5:55 PM, Rex Kerr <ichoran@gmail.com> wrote:
Still, the spec doesn't say that "final" is enforced in any way by the JVM, so it could only know that a private field is untouched if it examines all the bytecode of the class and determines that only the constructor sets it, and then only to a constant, and then only once (to one value) regardless of the execution paths through the code.  That seems a bit much to expect the JVM to do.

David Homes, a HotSpot committer, says:
"So the compiler can optimize away re-loading of final fields, even ifreflection (or Unsafe) is mis-used. (Though compilation of deserialization code would have to be handled specially.)"
Check the full explanation here:
http://cs.oswego.edu/pipermail/concurrency-interest/2011-January/007723.html
Best,Ismael
hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: Re: 2.9 performance regression ?

iulian dragos wrote:

> Regarding Dalvik, I never heard of any project that does not use ProGuard.
> ProGuard can do much more, since it sees the whole-program (while the
> Scala compiler has to assume an open-world) -- for instance, ProGuard
> 'knows' when a method is called from only one place, and it can inline and
> /remove/ the definition. So I don't think Scalac can replace ProGuard, nor
> should it try to.

When I tried it with ProGuard, it didn't inline the calls.

http://bit.ly/jOJ5yD

Am I not using ProGuard correctly?

Iulian Dragos
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Re: 2.9 performance regression ?


On Wed, Jun 22, 2011 at 7:46 AM, Harshad <harshad.rj@gmail.com> wrote:
iulian dragos wrote:

> Regarding Dalvik, I never heard of any project that does not use ProGuard.
> ProGuard can do much more, since it sees the whole-program (while the
> Scala compiler has to assume an open-world) -- for instance, ProGuard
> 'knows' when a method is called from only one place, and it can inline and
> /remove/ the definition. So I don't think Scalac can replace ProGuard, nor
> should it try to.

When I tried it with ProGuard, it didn't inline the calls.

http://bit.ly/jOJ5yD

Am I not using ProGuard correctly?

I think it's best to ask this question on their mailing list. I just looked at the list of optimizations it can perform, and it says 'inline short methods'. So I assume those getters are short enough, but some other heuristic is preventing it from happening.
iulian 


--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais
hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: Re: Re: 2.9 performance regression ?

iulian dragos wrote:

> On Wed, Jun 22, 2011 at 7:46 AM, Harshad wrote:

>> When I tried it with ProGuard, it didn't inline the calls.
>>
>> http://bit.ly/jOJ5yD
>>
>> Am I not using ProGuard correctly?
>>
>
> I think it's best to ask this question on their mailing list. I just
> looked at the list of optimizations it can perform, and it says 'inline
> short methods'. So I assume those getters are short enough, but some other
> heuristic is preventing it from happening.

I got an answer on their mailing list [1]. The calls to v() were not inlined
because v() was marked as "kept". I think that's not a good decision by
ProGuard, but let's play along for a while.

So, I changed the ProGuard options to not "keep" the accessors, and then it
does inline. But the quality of inlining is rather poor:

public final int square();
Code:
0: aload_0
1: dup
2: astore_1
3: getfield #4; //Field a:I
6: aload_0
7: dup
8: astore_1
9: getfield #4; //Field a:I
12: imul
13: ireturn

Recall that with scala 2.8.8, we used to get:

public int square();
Code:
0: aload_0
1: getfield #12; //Field v:I
4: aload_0
5: getfield #12; //Field v:I
8: imul
9: ireturn

Iulian, I hope you are still working on the new inliner (in scalac).

hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: Re: Re: 2.9 performance regression ?

Harshad wrote:

> But the quality of inlining is rather poor:

Sorry, I was able to get good quality inlined code by increasing the number
of optimisation passes in ProGuard.

I apologise for the noise.

Iulian Dragos
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Re: Re: 2.9 performance regression ?


On Thu, Jun 23, 2011 at 11:58 AM, Harshad <harshad.rj@gmail.com> wrote:
iulian dragos wrote:

> On Wed, Jun 22, 2011 at 7:46 AM, Harshad <harshad.rj@gmail.com> wrote:

>> When I tried it with ProGuard, it didn't inline the calls.
>>
>> http://bit.ly/jOJ5yD
>>
>> Am I not using ProGuard correctly?
>>
>
> I think it's best to ask this question on their mailing list. I just
> looked at the list of optimizations it can perform, and it says 'inline
> short methods'. So I assume those getters are short enough, but some other
> heuristic is preventing it from happening.

I got an answer on their mailing list [1]. The calls to v() were not inlined
because v() was marked as "kept". I think that's not a good decision by
ProGuard, but let's play along for a while.

So, I changed the ProGuard options to not "keep" the accessors, and then it
does inline. But the quality of inlining is rather poor:

public final int square();
 Code:
  0:   aload_0
  1:   dup
  2:   astore_1
  3:   getfield        #4; //Field a:I
  6:   aload_0
  7:   dup
  8:   astore_1
  9:   getfield        #4; //Field a:I
  12:  imul
  13:  ireturn

Recall that with scala 2.8.8, we used to get:

public int square();
 Code:
  0:   aload_0
  1:   getfield        #12; //Field v:I
  4:   aload_0
  5:   getfield        #12; //Field v:I
  8:   imul
  9:   ireturn


That's probably because scalac does more than just inlining afterwards (dead code elimination and copy propagation). It's easy to enable inlining of accessors, if you feel like giving it a go, the code is in Inliners.scala, method 'shouldInline'. The only thing to be careful is compilation times, for instance the scala compiler optimized build shouldn't increase too much. :)
iulian 
Iulian, I hope you are still working on the new inliner (in scalac).




--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais
hrj
Joined: 2008-09-23,
User offline. Last seen 4 years 3 weeks ago.
Re: Re: Re: Re: 2.9 performance regression ?

iulian dragos wrote:

> It's easy to enable inlining
> of accessors, if you feel like giving it a go, the code is in
> Inliners.scala, method 'shouldInline'. The only thing to be careful is
> compilation times, for instance the scala compiler optimized build
> shouldn't increase too much.
> :)

Conflict of interest between compiler devs and users? :) JK, but it's not
only about my use-case. Scala could get bad PR due to speed problems (and
I care because that's bad for the Scala ecosystem).

For example, I "bumped" into this blog post by a popular Android product about
using Scala (pros and cons):
http://devblog.bu.mp/how-we-use-scala-in-bump-for-android

In other news, ProGuard refuses to inline kept members:
http://sourceforge.net/projects/proguard/forums/forum/182456/topic/4582113

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland