This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Why type checking is so slow

5 replies
DRMacIver
Joined: 2008-09-02,
User offline. Last seen 42 years 45 weeks ago.

Ok. I think I've understood something which could make a big
difference to type checker performance.

Consider the following classic example:

def fib(n : Int) : Int = n match { case 0|1 => 1; case n => fib(n - 1)
+ fib(n - 2) }

Why is it slow? Well, it accidentally runs in exponential time because
it recursively calls itself twice, thus the number of calls doubles at
each level of recursion, as the results are not shared. Classic first
year compsci.

Now consider the definition of

def isSubType(tp1 : Type, tp2 : Type) : Boolean.

This descends into isSubType0, which calls <:< all over the place,
which... calls isSubType for each time.

With no reuse for specific pairs of types.

Oops.

So we're in exactly the same position as the fibonacci example. We've
created an accidentally exponential algorithm (well, sortof. It's not
clear what "n" is here. But we're potentially failing to reuse a lot
of information).

Ooops.

One solution to the fibonacci problem is to add caching. So I did that
to type checking, adding a cache which caches mutual calls to
isSubtype and isSubtype0.

The particular pathological example I was testing on (an
implementation of peano arithmetic in type system) went from never
finishing compiling to compiling *instantly*.

That's the good news.

The bad news is that all sorts of tests started failing. :-)

After some tinkering I've got stuff a little closer to approximating
working, but it's still broken. I have to go out now or I'll be late,
so I've got to put this down and probably won't be able to pick up
this line of attack until some time later in the coming week. In the
mean time would love suggestions on improving this to make it correct
(or frankly if someone who knows more about the type checker than I do
wants to take this off my hands, go for it!). I've attached a patch
that shows my current work so far.

I'm not sure what all of the latest set of test failures is about.
They're quite varied. Earlier ones were basically caused because
isSameType (which we need for the caching) was looping back into
isSubType in certain cases (e.g. wildcard bounds) and causing things
to blow up.

Basically in order to make this work I need a way of testing pairs of
types (s1, s2) and (t1, t2) for equality that:

- doesn't rely on subtype testing
- has no false positives but may have false negatives
- is relatively fast!

What I've done currently is introduce a "precise" flag to isSameType.
Passing false for precise allows the possibility of false negatives
and shortcuts cases where we'd have to test for subtyping. This seems
to catch the immediate worst cases I found but breaks in other cases.

I also need a hash code to match it. I think the current hash code
implementation is outright wrong (it's stolen verbatim from
SubTypePair. I think that implementation's hashCode is outright wrong
too), but this should only be causing unnecessary cache misses rather
than bugs.

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Why type checking is so slow

Hi David,

Maybe, maybe not. I profiled the type checker extensively. isSubType
did not show up as a particular bottleneck. In fact, run-time cost
seems to be smeared around rather widely; there are no obvious
bottlenecks. This does not mean there are not some pathological
programs that will cause the subtype check to take exponential time.
But one has to weigh these things very, very carefully. In particular
trading time for space is not always the right idea. Memory leaks are
very nightmarish to chase down and do lots of really unpleasant things
in the eclipse plugin. What's most important is to keep things
reasonably clear and predictable.

Finally, I'm thinking of a major redesign of Types.scala. So I would
not spend a lot of effort now in optimizing the current version.

Cheers

DRMacIver
Joined: 2008-09-02,
User offline. Last seen 42 years 45 weeks ago.
Re: Why type checking is so slow

On Mon, Mar 9, 2009 at 2:17 PM, martin odersky wrote:
> Hi David,
>
> Maybe, maybe not. I profiled the type checker extensively. isSubType
> did not show up as a particular bottleneck.

Hm. I've seen it as a bottleneck before when profiling builds. Maybe
this is is just in pathological cases though.

> In fact, run-time cost
> seems to be smeared around rather widely; there are no obvious
> bottlenecks. This does not mean there are not some pathological
> programs that will cause the subtype check to take exponential time.
> But one has to weigh these things very, very carefully. In particular
> trading time for space is not always the right idea. Memory leaks are
> very nightmarish to chase down and do lots of really unpleasant things
> in the eclipse plugin. What's most important is to keep things
> reasonably clear and predictable.

The caching design is such that by design it can't leak memory: It
only caches recursive calls, and the cache is cleared as soon as you
exit a non-recursive call. Essentially what happens is that every time
you enter an isSubType0 call a counter is incremented, every time you
leave it the counter is decremented. When the counter hits 0 the cache
is reset.

> Finally, I'm thinking of a major redesign of Types.scala. So I would
> not spend a lot of effort now in optimizing the current version.

Ok.

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Why type checking is so slow

On Mon, Mar 9, 2009 at 3:23 PM, David MacIver wrote:
> On Mon, Mar 9, 2009 at 2:17 PM, martin odersky wrote:
>> Hi David,
>>
>> Maybe, maybe not. I profiled the type checker extensively. isSubType
>> did not show up as a particular bottleneck.
>
> Hm. I've seen it as a bottleneck before when profiling builds. Maybe
> this is is just in pathological cases though.
>
>> In fact, run-time cost
>> seems to be smeared around rather widely; there are no obvious
>> bottlenecks. This does not mean there are not some pathological
>> programs that will cause the subtype check to take exponential time.
>> But one has to weigh these things very, very carefully. In particular
>> trading time for space is not always the right idea. Memory leaks are
>> very nightmarish to chase down and do lots of really unpleasant things
>> in the eclipse plugin. What's most important is to keep things
>> reasonably clear and predictable.
>
> The caching design is such that by design it can't leak memory: It
> only caches recursive calls, and the cache is cleared as soon as you
> exit a non-recursive call. Essentially what happens is that every time
> you enter an isSubType0 call a counter is incremented, every time you
> leave it the counter is decremented. When the counter hits 0 the cache
> is reset.

I see. Let's talk about this when we meet up.

Cheers

DRMacIver
Joined: 2008-09-02,
User offline. Last seen 42 years 45 weeks ago.
Re: Why type checking is so slow

On Mon, Mar 9, 2009 at 5:42 PM, martin odersky wrote:
> On Mon, Mar 9, 2009 at 3:23 PM, David MacIver wrote:
>> On Mon, Mar 9, 2009 at 2:17 PM, martin odersky wrote:
>>> Hi David,
>>>
>>> Maybe, maybe not. I profiled the type checker extensively. isSubType
>>> did not show up as a particular bottleneck.
>>
>> Hm. I've seen it as a bottleneck before when profiling builds. Maybe
>> this is is just in pathological cases though.
>>
>>> In fact, run-time cost
>>> seems to be smeared around rather widely; there are no obvious
>>> bottlenecks. This does not mean there are not some pathological
>>> programs that will cause the subtype check to take exponential time.
>>> But one has to weigh these things very, very carefully. In particular
>>> trading time for space is not always the right idea. Memory leaks are
>>> very nightmarish to chase down and do lots of really unpleasant things
>>> in the eclipse plugin. What's most important is to keep things
>>> reasonably clear and predictable.
>>
>> The caching design is such that by design it can't leak memory: It
>> only caches recursive calls, and the cache is cleared as soon as you
>> exit a non-recursive call. Essentially what happens is that every time
>> you enter an isSubType0 call a counter is incremented, every time you
>> leave it the counter is decremented. When the counter hits 0 the cache
>> is reset.
>
> I see. Let's talk about this when we meet up.

Sure. Sounds good.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Why type checking is so slow

On Mon, Mar 09, 2009 at 06:04:21PM +0000, David MacIver wrote:
> Sure. Sounds good.

Pretend I'm somewhere nearby convincingly expounding on the particularly
nifty idea I have which requires fast type-level arithmetic with peano
numbers up to 32, but speaking to nobody in particular and in fact
freaking you out more than a little.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland