- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Re: home on the range
Tue, 2011-12-13, 12:40
2011/12/13 √iktor Ҡlang :
> You usually want to stay below 35 instructions per method, to get them
> inlined when hot.
Where 35 means 35 distinct opcodes? Here's what the inner loop looks
like right now:
public final void foreachDownIn(scala.Function1);
0: aload_0
1: invokevirtual #151515; //Method start:()I
4: istore_2
5: iload_2
6: aload_0
7: invokevirtual #111111; //Method end:()I
10: if_icmplt 30
13: aload_1
14: iload_2
15: invokeinterface #5f5f5f, 2; //InterfaceMethod
scala/Function1.apply$mcVI$sp:(I)V
20: iload_2
21: aload_0
22: invokevirtual #141414; //Method step:()I
25: iadd
26: istore_2
27: goto 5
30: return
And here's the foreach.
public final void foreach(scala.Function1);
0: aload_0
1: invokevirtual #141414; //Method step:()I
4: iconst_0
5: if_icmpge 31
8: aload_0
9: invokevirtual #1c1c1c; //Method isInclusive:()Z
12: ifeq 23
15: aload_0
16: aload_1
17: invokevirtual #4f4f4f; //Method foreachDownIn:(Lscala/Function1;)V
20: goto 51
23: aload_0
24: aload_1
25: invokevirtual #545454; //Method foreachDownEx:(Lscala/Function1;)V
28: goto 51
31: aload_0
32: invokevirtual #1c1c1c; //Method isInclusive:()Z
35: ifeq 46
38: aload_0
39: aload_1
40: invokevirtual #5a5a5a; //Method foreachUpIn:(Lscala/Function1;)V
43: goto 51
46: aload_0
47: aload_1
48: invokevirtual #575757; //Method foreachUpEx:(Lscala/Function1;)V
51: return
Tue, 2011-12-13, 13:11
#2
Re: home on the range
Have you've tried to pass "end" as a parameter, then you can use the same check for inclusive and exclusive, just add +1 to end?
Also, what happens if you make end on range a final val?
On Tue, Dec 13, 2011 at 12:58 PM, Ismael Juma <ismael@juma.me.uk> wrote:
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Also, what happens if you make end on range a final val?
On Tue, Dec 13, 2011 at 12:58 PM, Ismael Juma <ismael@juma.me.uk> wrote:
First of all, great work Paul. Thanks for spending your time on this.
On Tue, Dec 13, 2011 at 11:40 AM, Paul Phillips <paulp@improving.org> wrote:
2011/12/13 √iktor Ҡlang <viktor.klang@gmail.com>:
> You usually want to stay below 35 instructions per method, to get them
> inlined when hot.
Where 35 means 35 distinct opcodes? Here's what the inner loop looks
like right now:
I would say that the only way to deal with these things sanely is to have a set of benchmarks with different characteristics (e.g. loop with a small number of iterations, loop with a large number of iterations, loop with simple code inside, loop with complex code inside, nested loops with few/many iterations in inner/outer, etc.).
It's too easy to improve some cases and make others worse otherwise.
There are some improvements that make things obviously better, of course (where boxing is avoided, for example).
Best,Ismael
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Tue, 2011-12-13, 13:21
#3
Re: home on the range
On Dec 13, 2011, at 12:40 PM, Paul Phillips wrote:
> 2011/12/13 √iktor Ҡlang :
>> You usually want to stay below 35 instructions per method, to get them
>> inlined when hot.
>
> Where 35 means 35 distinct opcodes?
Is that opcodes or bytecode size? I was under the assumption that -XX:MaxInlineSize is measured in bytes, not opcodes.
- Tiark
Tue, 2011-12-13, 13:21
#4
Re: home on the range
On Tue, Dec 13, 2011 at 12:04 PM, Tiark Rompf <tiark.rompf@epfl.ch> wrote:
Yes. Some inlining parameters:
FreqInlineSizeMaxInlineSizeInlineSmallCodeMaxTrivialSizeNodeCountInliningCutoffMaxNodeLimitInlineThrowMaxSizeMaxTrivialSize
A good resource:
http://wikis.sun.com/display/HotSpotInternals/Inlining
Best,Ismael
Is that opcodes or bytecode size? I was under the assumption that -XX:MaxInlineSize is measured in bytes, not opcodes.
Yes. Some inlining parameters:
FreqInlineSizeMaxInlineSizeInlineSmallCodeMaxTrivialSizeNodeCountInliningCutoffMaxNodeLimitInlineThrowMaxSizeMaxTrivialSize
A good resource:
http://wikis.sun.com/display/HotSpotInternals/Inlining
Best,Ismael
Tue, 2011-12-13, 13:31
#5
Re: home on the range
вторник, 13 декабря 2011 г. 19:04:15 UTC+7 пользователь tiark написал:
On Dec 13, 2011, at 12:40 PM, Paul Phillips wrote:It is bytecode size, of course:
> 2011/12/13 √iktor Ҡlang <viktor...@gmail.com>:
>> You usually want to stay below 35 instructions per method, to get them
>> inlined when hot.
>
> Where 35 means 35 distinct opcodes?Is that opcodes or bytecode size? I was under the assumption that -XX:MaxInlineSize is measured in bytes, not opcodes.
"Method inlining eliminates the overhead of method dispatching. The client compiler uses a static inlining strategy: Methods with a size less than 35 bytes are inlined. This size is decreased for each nested inlining because the number of inlining candidates grows at each level. "
See section 2.2 of this paper:
@article{Kotzmann:2008:DJH:1369396.1370017, author = {Kotzmann, Thomas and Wimmer, Christian and M\"{o}ssenb\"{o}ck, Hanspeter and Rodriguez, Thomas and Russell, Kenneth and Cox, David}, title = {Design of the Java HotSpot\™ client compiler for Java 6}, journal = {ACM Trans. Archit. Code Optim.}, volume = {5}, issue = {1}, month = {May}, year = {2008}, issn = {1544-3566}, pages = {7:1--7:32}, articleno = {7}, numpages = {32}, url = {http://doi.acm.org/10.1145/1369396.1370017}, doi = {http://doi.acm.org/10.1145/1369396.1370017}, acmid = {1370017}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {Java, compiler, deoptimization, intermediate representation, just-in-time compilation, optimization, register allocation}, }
Tue, 2011-12-13, 13:41
#6
Re: home on the range
One other thing that's important: Always test with call sites calling
different methods (i.e. do a Range#foreach with a set of different
closures). If you do it only for a single one, you risk getting
unrealistic inlining performance. Then the micro-benchmark results
will not be repeatable in real code.
Cheers
Tue, 2011-12-13, 14:21
#7
Re: home on the range
On Tue, Dec 13, 2011 at 1:04 PM, Tiark Rompf <tiark.rompf@epfl.ch> wrote:
On Dec 13, 2011, at 12:40 PM, Paul Phillips wrote:
> 2011/12/13 √iktor Ҡlang <viktor.klang@gmail.com>:
>> You usually want to stay below 35 instructions per method, to get them
>> inlined when hot.
>
> Where 35 means 35 distinct opcodes?
Is that opcodes or bytecode size? I was under the assumption that -XX:MaxInlineSize is measured in bytes, not opcodes.
Yes of course, thanks for correcting my poor wording.
Cheers,
√
- Tiark
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Tue, 2011-12-13, 15:41
#8
Re: home on the range
2011/12/13 √iktor Ҡlang :
> Have you've tried to pass "end" as a parameter, then you can use the same
> check for inclusive and exclusive, just add +1 to end?
Ha ha, don't remind me of all the bugs in range I've already either
caused or fixed. It's really a slippery class, one always thinks
"gee, I could simplify it by doing this..." and then walks right into
a boundary condition.
(In this case, Int.MaxValue - 2 to Int.MaxValue.)
> Also, what happens if you make end on range a final val?
Nothing. That was one of the things I had going most of the time on
the which I unrolled as irrelevant. I tried taking things away one at
a time, and what's in there was what was left when taking anything
away also took away the results.
Tue, 2011-12-13, 16:01
#9
Re: home on the range
On Tue, Dec 13, 2011 at 3:35 PM, Paul Phillips <paulp@improving.org> wrote:
2011/12/13 √iktor Ҡlang <viktor.klang@gmail.com>:
> Have you've tried to pass "end" as a parameter, then you can use the same
> check for inclusive and exclusive, just add +1 to end?
Ha ha, don't remind me of all the bugs in range I've already either
caused or fixed. It's really a slippery class, one always thinks
"gee, I could simplify it by doing this..." and then walks right into
a boundary condition.
(In this case, Int.MaxValue - 2 to Int.MaxValue.)
> Also, what happens if you make end on range a final val?
Nothing. That was one of the things I had going most of the time on
the which I unrolled as irrelevant. I tried taking things away one at
a time, and what's in there was what was left when taking anything
away also took away the results.
Alright, and what if you just provide the end as a method parameter instead of calling the end-member?
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Tue, 2011-12-13, 16:11
#10
Re: home on the range
2011/12/13 √iktor Ҡlang :
> Alright, and what if you just provide the end as a method parameter instead
> of calling the end-member?
Yep. Nothing. In fact for a while I had all four implementations
taking (f, start, end, step) as parameters, in the interests of having
no data in the loop which wasn't known local. No difference.
Tue, 2011-12-13, 16:21
#11
Re: home on the range
On Tue, Dec 13, 2011 at 4:10 PM, Paul Phillips <paulp@improving.org> wrote:
2011/12/13 √iktor Ҡlang <viktor.klang@gmail.com>:
> Alright, and what if you just provide the end as a method parameter instead
> of calling the end-member?
Yep. Nothing. In fact for a while I had all four implementations
taking (f, start, end, step) as parameters, in the interests of having
no data in the loop which wasn't known local. No difference.
Sometimes optimization is more like using a divining rod... ;)
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Tue, 2011-12-13, 17:01
#12
Re: home on the range
On Tue, Dec 13, 2011 at 01:30:32PM +0100, martin odersky wrote:
> One other thing that's important: Always test with call sites calling
> different methods (i.e. do a Range#foreach with a set of different
> closures). If you do it only for a single one, you risk getting
> unrealistic inlining performance. Then the micro-benchmark results
> will not be repeatable in real code.
This is a great point, and something I've seen before.
It's always a rude awakening to see a benchmark run x5-10 slower after,
and it's easy to get confused about what's going on in situations like:
test 1: fast
test 2: slow
(test 1: slow, if you happen to run it again)
Also, thanks to Paul, Martin, and everyone else who's been working on
making Range#foreach faster!
On Tue, Dec 13, 2011 at 11:40 AM, Paul Phillips <paulp@improving.org> wrote:
I would say that the only way to deal with these things sanely is to have a set of benchmarks with different characteristics (e.g. loop with a small number of iterations, loop with a large number of iterations, loop with simple code inside, loop with complex code inside, nested loops with few/many iterations in inner/outer, etc.).
It's too easy to improve some cases and make others worse otherwise.
There are some improvements that make things obviously better, of course (where boxing is avoided, for example).
Best,Ismael