Some Inliner findings

4 replies

Fri, 2010-10-15, 10:24

Johannes Rudolph 2

Joined: 2010-02-12,

For quite some time, I've researched the inliner mainly wrt complete
simple for-loop optimizations and pimp-my-library style programming.
I've made several attempts and several times, I stupidly didn't
analyse the most recent version of the code or something changed while
I was at it.

Before I'll try to improve anything or do any new research, I'm just
going to post what I've found out in the past, even if the last time,
I thoroughly checked these observations, is many months ago, a bunch
of those observations are probably still holding. Please correct me,
if something is wrong or outdated, Iulian and Paul.

1. Static method calls are never inlined (through matching for case
CALL_METHOD(..., DYNAMIC))

2. Mixed-in methods are never inlined for two reasons:
* concreteMethod.isEffectivelyFinal is true for the abstract
definition of a mixed-in trait method, so the concrete implementation
in the final mixed-together Template is never looked for
* even if you 'fix' that, because of 1. the delegation call to
MixedIn$class is never inlined

3. isMonadMethod doesn't work with specialized methods (obviously,
because it checks against the unspecialized name)

4. Methods implemented in the superclass of the receiver's class are
never inlined (I don't think this is basically intended because you
wouldn't need lookupImpl's recursion then)
* lookupImpl finds the concrete implementation in the superclass
* shouldLoad and later
"icodes.icode(receiver).get.lookupMethod(concreteMethod)" tries to
find the method in the concrete receiver without looking for the
implementation in super classes and barks with "could not find icode"
--> Looks like this is fixed in trunk by [22818]

5. Methods which access constructor params are never inlined.
* If the param is private this is ok, the inlined code couldn't
access the data.
* If the param is public (val x: Int), then it should work, but it
doesn't because of Scala's private field + accessor scheme, then the
accessor can never be inlined for the same reasons.
* The workaround is to remove all constructor params and use an
anonymous closure, because closed over synthetic parameters' access is
automatically lifted to public.
* But: Making some fields public obviously never works with separate
compilation, since that would mean changing existing class files. This
really looks like the bottom of quite a few problems with the inliner
architecture: Because the inliner runs first and cannot depend on the
results of following phases, it can only do safe (= keeping the
semantics) transformations. So, we are in a kind of deadlock here:
Closure elimination depends on method calls being inlined, which
doesn't work with separate compilation, since often it would be
necessary to change the accessibility of fields to public to leave
code in a sensible state. It's the other way round for the Inliner: It
could inline the method if it would know that the field accesses to
the closure would later be eliminated.

6. Helper class allocations in the common pimp-my-library pattern are
never eliminated if they are not anonymous closures. This is caused by
anonymous closure constructors being defined as pure by declaration
(isSideEffecting in DCE). To get around this problem I already
implemented some simple, conservative pureness checking which seems to
work quite well, but should be limited to have no too big performance
impact itself in the compilation.
See http://github.com/jrudolph/scala-starrless/commit/e4861766a42041824d71f9...

This is no judgement, each of the points probably have some
explanation and reasoning behind. Still, before improving the
state-of-the-art, stating some facts seems useful.

What do you think?

Johannes

Fri, 2010-10-15, 20:37

extempore

Joined: 2008-12-17,

Re: Some Inliner findings

On Fri, Oct 15, 2010 at 11:24:17AM +0200, Johannes Rudolph wrote:
> For quite some time, I've researched the inliner mainly wrt complete
> simple for-loop optimizations and pimp-my-library style programming.
> I've made several attempts and several times, I stupidly didn't
> analyse the most recent version of the code or something changed while
> I was at it.

I've made a bunch of changes to the inliner in recent months but they
are almost exclusively to improve code structure and make it more
apparent what it is doing. I only can remember a couple behavior
modifying fixes, like r22291.

So without confirming or denying any of your list of things which aren't
inlined, of the factual parts (like "this is never inlined") nothing
sounds surprising to me. Only iulian knows real motivations.

Of late while trying things with the mixin implementation I have many
times wondered why there is no inlining done there, because it is very
common for the forwarder method to actually be larger than the method
receiving the forward! The only intentional motivation I can see
involves separate compilation, but if that is the only reason it seems
like a huge lose given that traits break instantly as soon as you make
any interface-modifying change anyway. So we end up having to recompile
all the time yet are penalized as if we didn't.

You know what would really rule? A test case which demonstrates all
these things whose behavior would change if they did. We so need a
simple framework for performing post-compilation tests which verify
properties of the generated bytecode.

> 3. isMonadMethod doesn't work with specialized methods (obviously,
> because it checks against the unspecialized name)

Hmm, that would be easy to fix assuming it's a bug (I think so.) I find
there is a class of often subtle issues which have arisen with
specialization because earlier assumptions were violated: for instance
in r23174 my commit message includes:

* An issue was introduced with specialization in that the implementation
of "isTupleType" in Definitions relied upon sym == TupleClass(elems.length).
This test is untrue for specialized tuples, causing mysterious behavior
because only some tuples are specialized. There is now "isTupleTypeOrSubtype"
although it seems likely the former implementation is unnecessary.
The issue is sidestepped if one uses "getProductArgs" to retrieve the element
types because it sifts through the base types for the Product symbol.

> So, we are in a kind of deadlock here: Closure elimination depends on
> method calls being inlined, which doesn't work with separate
> compilation, since often it would be necessary to change the
> accessibility of fields to public to leave code in a sensible state.
> It's the other way round for the Inliner: It could inline the method
> if it would know that the field accesses to the closure would later be
> eliminated.

I have a growing list of issues coming into focus which involve what
amount to cycles in the phase dependency graph. And there are these
various ways of trying to step outside the cycle or do something more
lazily or pretend one is at a different phase to avoid the problem, but
we really need a more principled approach to the whole thing. It'd
start with actually knowing what has to happen before what and which
transformations are composable and under what conditions. Not running
out of stuff to figure out...

Fri, 2010-10-22, 09:57

Johannes Rudolph 2

Joined: 2010-02-12,

Re: Some Inliner findings

I've got another one:

7. Inlining could do better wrt to erasure: Often you've got some code
involving generic signatures. Erasure removes type information and
adds runtime `checkcast`s instead. But then, after inlining a method
call, the concrete type of the result of a call may again be known but
it gets then discarded by a now useless up-cast, rendering consequent
inlinings impossible.

This may look like one of those many limitations of the inliner but it
actually might prevent quite a bunch of inlining. E.g. generically
calling a function A => B with A <: AnyRef and B <: AnyRef actually
call the `Function1.apply(Object): Object` method and then after the
call a `checkcast` is inserted. If you now inline the call to
Function1.apply, it contains a `checkcast` which prevents further
inlining of calls to the parameter's methods.

Here's a simple example:

def test = {
val func = (x: AnyRef) => () => x.toString.length
func("test")()
}

Although, obviously, all information is available to inline the calls,
inlining won't happen. In this simple case it isn't too obvious, why
the compiler should do the inlining in the first place, but you can
easily expand the scheme to occur when trying to optimize "monadic"
function calls, which the optimization was built for originally.

The solution could be, to remove the useless casts either in a
post-processing move after inlining a piece of code or, otherwise, as
a side-effect of methodTFA when looking for inlining opportunities.

On Fri, Oct 15, 2010 at 11:24 AM, Johannes Rudolph
wrote:
> For quite some time, I've researched the inliner mainly wrt complete
> simple for-loop optimizations and pimp-my-library style programming.
> I've made several attempts and several times, I stupidly didn't
> analyse the most recent version of the code or something changed while
> I was at it.
>
> Before I'll try to improve anything or do any new research, I'm just
> going to post what I've found out in the past, even if the last time,
> I thoroughly checked these observations, is many months ago, a bunch
> of those observations are probably still holding. Please correct me,
> if something is wrong or outdated, Iulian and Paul.
>
> 1. Static method calls are never inlined (through matching for case
> CALL_METHOD(..., DYNAMIC))
>
> 2. Mixed-in methods are never inlined for two reasons:
> * concreteMethod.isEffectivelyFinal is true for the abstract
> definition of a mixed-in trait method, so the concrete implementation
> in the final mixed-together Template is never looked for
> * even if you 'fix' that, because of 1. the delegation call to
> MixedIn$class is never inlined
>
> 3. isMonadMethod doesn't work with specialized methods (obviously,
> because it checks against the unspecialized name)
>
> 4. Methods implemented in the superclass of the receiver's class are
> never inlined (I don't think this is basically intended because you
> wouldn't need lookupImpl's recursion then)
> * lookupImpl finds the concrete implementation in the superclass
> * shouldLoad and later
> "icodes.icode(receiver).get.lookupMethod(concreteMethod)" tries to
> find the method in the concrete receiver without looking for the
> implementation in super classes and barks with "could not find icode"
> --> Looks like this is fixed in trunk by [22818]
>
> 5. Methods which access constructor params are never inlined.
> * If the param is private this is ok, the inlined code couldn't
> access the data.
> * If the param is public (val x: Int), then it should work, but it
> doesn't because of Scala's private field + accessor scheme, then the
> accessor can never be inlined for the same reasons.
> * The workaround is to remove all constructor params and use an
> anonymous closure, because closed over synthetic parameters' access is
> automatically lifted to public.
> * But: Making some fields public obviously never works with separate
> compilation, since that would mean changing existing class files. This
> really looks like the bottom of quite a few problems with the inliner
> architecture: Because the inliner runs first and cannot depend on the
> results of following phases, it can only do safe (= keeping the
> semantics) transformations. So, we are in a kind of deadlock here:
> Closure elimination depends on method calls being inlined, which
> doesn't work with separate compilation, since often it would be
> necessary to change the accessibility of fields to public to leave
> code in a sensible state. It's the other way round for the Inliner: It
> could inline the method if it would know that the field accesses to
> the closure would later be eliminated.
>
> 6. Helper class allocations in the common pimp-my-library pattern are
> never eliminated if they are not anonymous closures. This is caused by
> anonymous closure constructors being defined as pure by declaration
> (isSideEffecting in DCE). To get around this problem I already
> implemented some simple, conservative pureness checking which seems to
> work quite well, but should be limited to have no too big performance
> impact itself in the compilation.
> See http://github.com/jrudolph/scala-starrless/commit/e4861766a42041824d71f9...
>
> This is no judgement, each of the points probably have some
> explanation and reasoning behind. Still, before improving the
> state-of-the-art, stating some facts seems useful.
>
> What do you think?
>
> Johannes
>

Fri, 2010-10-22, 10:17

Johannes Rudolph 2

Joined: 2010-02-12,

Re: Some Inliner findings

On Fri, Oct 15, 2010 at 9:26 PM, Paul Phillips wrote:
> Of late while trying things with the mixin implementation I have many
> times wondered why there is no inlining done there, because it is very
> common for the forwarder method to actually be larger than the method
> receiving the forward! The only intentional motivation I can see
> involves separate compilation, but if that is the only reason it seems
> like a huge lose given that traits break instantly as soon as you make
> any interface-modifying change anyway.

For not inlining static method calls, I can imagine a possible reason:
The immediate benefit of inlining would be little because a static
method call is pretty cheap and probably handled by Hotspot pretty
well. What you might miss, are the indirect opportunities gained from
doing the inlining. (That used to happen with Range.ByOne before it
was removed in r22812).
Technically inlining static method calls should not be harder to do
than virtual calls, just another logic is needed where to find the
implementation.

Wed, 2010-10-27, 08:47

Iulian Dragos 2

Joined: 2009-02-10,

Re: Some Inliner findings

Sorry to take so long to reply, I am pretty swamped these days. I will have a more detailed reply when I have a chance to go through the code and test some of the points you make. However, this list is a really good starting point for work on the backend. If you feel like taking a plunge, I'll review your patches. ;-)
See some more comments inline:

On Fri, Oct 15, 2010 at 11:24 AM, Johannes Rudolph <johannes.rudolph@googlemail.com> wrote:

1. Static method calls are never inlined (through matching for case
CALL_METHOD(..., DYNAMIC))

This can be clearly improved.

2. Mixed-in methods are never inlined for two reasons:
* concreteMethod.isEffectivelyFinal is true for the abstract
definition of a mixed-in trait method, so the concrete implementation
in the final mixed-together Template is never looked for
* even if you 'fix' that, because of 1. the delegation call to
MixedIn$class is never inlined

This should also be fixed, but there an additional catch: implementation classes are not part of the classpath, so the icode reader cannot find them. Can be fixed as well.

3. isMonadMethod doesn't work with specialized methods (obviously,
because it checks against the unspecialized name)

Ups!

4. Methods implemented in the superclass of the receiver's class are
never inlined (I don't think this is basically intended because you
wouldn't need lookupImpl's recursion then)
* lookupImpl finds the concrete implementation in the superclass
* shouldLoad and later
"icodes.icode(receiver).get.lookupMethod(concreteMethod)" tries to
find the method in the concrete receiver without looking for the
implementation in super classes and barks with "could not find icode"
--> Looks like this is fixed in trunk by [22818]

It should be fixed.

5. Methods which access constructor params are never inlined.
* If the param is private this is ok, the inlined code couldn't
access the data.
* If the param is public (val x: Int), then it should work, but it
doesn't because of Scala's private field + accessor scheme, then the
accessor can never be inlined for the same reasons.

You mean that the public accessors are always inlined in the caller method, and prevent further inlining?

* The workaround is to remove all constructor params and use an
anonymous closure, because closed over synthetic parameters' access is
automatically lifted to public.
* But: Making some fields public obviously never works with separate
compilation, since that would mean changing existing class files. This
really looks like the bottom of quite a few problems with the inliner
architecture: Because the inliner runs first and cannot depend on the
results of following phases, it can only do safe (= keeping the
semantics) transformations. So, we are in a kind of deadlock here:
Closure elimination depends on method calls being inlined, which
doesn't work with separate compilation, since often it would be
necessary to change the accessibility of fields to public to leave
code in a sensible state. It's the other way round for the Inliner: It
could inline the method if it would know that the field accesses to
the closure would later be eliminated.

Yes, this is exactly the problem. One way out is to have a more sophisticated analysis, taking into account the calling context(s).

6. Helper class allocations in the common pimp-my-library pattern are
never eliminated if they are not anonymous closures. This is caused by
anonymous closure constructors being defined as pure by declaration
(isSideEffecting in DCE). To get around this problem I already
implemented some simple, conservative pureness checking which seems to
work quite well, but should be limited to have no too big performance
impact itself in the compilation.
See http://github.com/jrudolph/scala-starrless/commit/e4861766a42041824d71f96ac77d31ac3f4fe771

Excellent. That was always on the radar, just nobody had the time to write one.

This is no judgement, each of the points probably have some
explanation and reasoning behind. Still, before improving the
state-of-the-art, stating some facts seems useful.

Yes, even though sometimes the explanation is 'lack of time'.

What do you think?

Great list.

7. Inlining could do better wrt to erasure: Often you've got some code

involving generic signatures. Erasure removes type information and

adds runtime `checkcast`s instead. But then, after inlining a method

call, the concrete type of the result of a call may again be known but

it gets then discarded by a now useless up-cast, rendering consequent

inlinings impossible.

Very good observation. I think this can be done in the TFA analysis and dead code elimination. Since the inliner does not keep track of the types on the stack, it can't safely remove those instructions. During analysis, types are available and a useless cast can be 'skipped' (but again not removed, since analyses are read-only). The dead code elimination phase can mark such casts as being useless and remove them altogether later. I believe this could work.
There are a number of other things that can improve optimizations. Here's some of them:
- Handle ref cells created by closures. A captured var is stored in an Int/Float/..Ref class, so that both the anonfun and the enclosing method can read/write it. Closure elimination should understand those objets (similar to boxing)
- Turn throw/catch to jumps when it is safe. This may appear when using breakable/break, and the block has been inlined.
iulian

Johannes

--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais

Scala Main Menu

Some Inliner findings

Scala Quick Links

Featured News

User login