- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
typerefs and allocations
Fri, 2011-12-30, 17:22
Since I will go nuts if I allow myself to work on this for one more minute, I will just document something now. I have spent the last several days ignoring things to wrestle with a bear. The bear won, but I got in a few weak blows.
Attempted TypeRef creations: 10525999 Surviving uniques: 569797 Doomed allocations avoided: 0
These are the statistics from when I started for quick.lib. This means that we allocate ten million typerefs, 94% of which are discarded after discovering that uniques already holds an equivalent typeref. If that's not already obviously suboptimal, note that we must calculate the hashcode for each of those 9.4M, which means a reasonably expensive murmurhash3 which itself creates yet more garbage.
As of now I have those numbers at:
Attempted TypeRef creations: 10525999 Surviving uniques: 569797 Doomed allocations avoided: 1863767
I am morally certain that the last number can be much higher, but woe to he who gets too ambitious here. (Or maybe that's "whoa.") The major difficulty is as usual the introduction of cycles. If I rework the tpeCache mechanism in Symbol a bit (something I wish I'd pursued before I despair enveloped me) I think I can make that part predictable. I also discovered that one does not want to try to cache types separately from symbols unless they are never transformed, and even trivial types like packages are transformed in erasure. So it's a matter of populating the tpeCache of applicable symbols without cycling on the caching mechanism, but while still being able to use the cache subsequently.
What would be helpful is for anyone to share what they know about which symbols are *in principle* always mapped to the same typeref (during a single phase.) I estimate at least the following:
If args.isEmpty:
- a class without typeParams whose prefix is ThisType(aPackageClass):. - a type parameter without typeParams. (This is probably incomplete.)
If args.nonEmpty (and I didn't do anything with this yet, but some of these come in by the tens of thousands.)
- a parameterized top level class which has been parameterized on its own type parameters to stand in as dummies so it can be kind *. (This whole setup "desperately needs cleaning up" as moors put it.)
Finally, the most deflating news is that the profiler is fairly impressed by these gains but the wall clock is not. My working theory at this point is that collecting allocation information takes the profiler time out of proportion with the impact in real life; so it heavily exaggerates the significance of short-lived allocations (okay, probably it's the naive profile output reader doing the exaggerating.) Obviously hotspot can't eliminate either the allocation or the calculation of the hashcode given that the hashcode is in the constructor and we use the hashcode to look up the pre-existing object, but the amortized allocation cost seems to be close to zero. The wall clock gain appears to be entirely from eliminating the hashcode calculation.
Since the cached typerefs can key on symbol and symbols use reference equality, I was able to use IdentityHashMap for this, which also lowers the burden on uniques.
Attempted TypeRef creations: 10525999 Surviving uniques: 569797 Doomed allocations avoided: 0
These are the statistics from when I started for quick.lib. This means that we allocate ten million typerefs, 94% of which are discarded after discovering that uniques already holds an equivalent typeref. If that's not already obviously suboptimal, note that we must calculate the hashcode for each of those 9.4M, which means a reasonably expensive murmurhash3 which itself creates yet more garbage.
As of now I have those numbers at:
Attempted TypeRef creations: 10525999 Surviving uniques: 569797 Doomed allocations avoided: 1863767
I am morally certain that the last number can be much higher, but woe to he who gets too ambitious here. (Or maybe that's "whoa.") The major difficulty is as usual the introduction of cycles. If I rework the tpeCache mechanism in Symbol a bit (something I wish I'd pursued before I despair enveloped me) I think I can make that part predictable. I also discovered that one does not want to try to cache types separately from symbols unless they are never transformed, and even trivial types like packages are transformed in erasure. So it's a matter of populating the tpeCache of applicable symbols without cycling on the caching mechanism, but while still being able to use the cache subsequently.
What would be helpful is for anyone to share what they know about which symbols are *in principle* always mapped to the same typeref (during a single phase.) I estimate at least the following:
If args.isEmpty:
- a class without typeParams whose prefix is ThisType(aPackageClass):. - a type parameter without typeParams. (This is probably incomplete.)
If args.nonEmpty (and I didn't do anything with this yet, but some of these come in by the tens of thousands.)
- a parameterized top level class which has been parameterized on its own type parameters to stand in as dummies so it can be kind *. (This whole setup "desperately needs cleaning up" as moors put it.)
Finally, the most deflating news is that the profiler is fairly impressed by these gains but the wall clock is not. My working theory at this point is that collecting allocation information takes the profiler time out of proportion with the impact in real life; so it heavily exaggerates the significance of short-lived allocations (okay, probably it's the naive profile output reader doing the exaggerating.) Obviously hotspot can't eliminate either the allocation or the calculation of the hashcode given that the hashcode is in the constructor and we use the hashcode to look up the pre-existing object, but the amortized allocation cost seems to be close to zero. The wall clock gain appears to be entirely from eliminating the hashcode calculation.
Since the cached typerefs can key on symbol and symbols use reference equality, I was able to use IdentityHashMap for this, which also lowers the burden on uniques.
Fri, 2011-12-30, 18:01
#2
Re: Re: typerefs and allocations
On Fri, Dec 30, 2011 at 5:50 PM, Martin Odersky <odersky@gmail.com> wrote:
Hi Paul,
Can we avoid the redundant allocations by prechecking in thr typeRef factory method whether a reference exists? That would leave murmur as the only bottleneck. Do we know that the benefits of murmur outweigh the cost, or should wevinvestigate a cheaper has?
Switch to Phils hash?
Cheers
Martin
Sent from my phone
On Dec 30, 2011, at 17:22, Paul Phillips <paulp@improving.org> wrote:
Since I will go nuts if I allow myself to work on this for one more minute, I will just document something now. I have spent the last several days ignoring things to wrestle with a bear. The bear won, but I got in a few weak blows.
Attempted TypeRef creations: 10525999
Surviving uniques: 569797
Doomed allocations avoided: 0
These are the statistics from when I started for quick.lib. This means that we allocate ten million typerefs, 94% of which are discarded after discovering that uniques already holds an equivalent typeref. If that's not already obviously suboptimal, note that we must calculate the hashcode for each of those 9.4M, which means a reasonably expensive murmurhash3 which itself creates yet more garbage.
As of now I have those numbers at:
Attempted TypeRef creations: 10525999
Surviving uniques: 569797
Doomed allocations avoided: 1863767
I am morally certain that the last number can be much higher, but woe to he who gets too ambitious here. (Or maybe that's "whoa.") The major difficulty is as usual the introduction of cycles. If I rework the tpeCache mechanism in Symbol a bit (something I wish I'd pursued before I despair enveloped me) I think I can make that part predictable. I also discovered that one does not want to try to cache types separately from symbols unless they are never transformed, and even trivial types like packages are transformed in erasure. So it's a matter of populating the tpeCache of applicable symbols without cycling on the caching mechanism, but while still being able to use the cache subsequently.
What would be helpful is for anyone to share what they know about which symbols are *in principle* always mapped to the same typeref (during a single phase.) I estimate at least the following:
If args.isEmpty:
- a class without typeParams whose prefix is ThisType(aPackageClass):.
- a type parameter without typeParams. (This is probably incomplete.)
If args.nonEmpty (and I didn't do anything with this yet, but some of these come in by the tens of thousands.)
- a parameterized top level class which has been parameterized on its own type parameters to stand in as dummies so it can be kind *. (This whole setup "desperately needs cleaning up" as moors put it.)
Finally, the most deflating news is that the profiler is fairly impressed by these gains but the wall clock is not. My working theory at this point is that collecting allocation information takes the profiler time out of proportion with the impact in real life; so it heavily exaggerates the significance of short-lived allocations (okay, probably it's the naive profile output reader doing the exaggerating.) Obviously hotspot can't eliminate either the allocation or the calculation of the hashcode given that the hashcode is in the constructor and we use the hashcode to look up the pre-existing object, but the amortized allocation cost seems to be close to zero. The wall clock gain appears to be entirely from eliminating the hashcode calculation.
Since the cached typerefs can key on symbol and symbols use reference equality, I was able to use IdentityHashMap for this, which also lowers the burden on uniques.
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Fri, 2011-12-30, 18:41
#3
Re: typerefs and allocations
On Fri, Dec 30, 2011 at 8:50 AM, Martin Odersky <odersky@gmail.com> wrote:
Can we avoid the redundant allocations by prechecking in thr typeRef factory method whether a reference exists?
That's where I'm doing it. The challenge is that it's easy to go right through the factory method again while you're trying to check whether the typeref being constructed can be cached, because calling almost anything on a symbol can lead to its tpeCache being populated, which goes right through the factory again.
That would leave murmur as the only bottleneck. Do we know that the benefits of murmur outweigh the cost, or should wevinvestigate a cheaper has?
We saw immediate speed benefits to murmur, but we could use it a lot better. We can keep it on the back burner, but for now I think the hash is fine and we should just be smarter about avoiding hashing.
Hi Paul,
Can we avoid the redundant allocations by prechecking in thr typeRef
factory method whether a reference exists? That would leave murmur as
the only bottleneck. Do we know that the benefits of murmur outweigh
the cost, or should wevinvestigate a cheaper has?
Cheers
Martin
Sent from my phone
On Dec 30, 2011, at 17:22, Paul Phillips wrote:
> Since I will go nuts if I allow myself to work on this for one more
> minute, I will just document something now. I have spent the last
> several days ignoring things to wrestle with a bear. The bear won,
> but I got in a few weak blows.
>
> Attempted TypeRef creations: 10525999
> Surviving uniques: 569797
> Doomed allocations avoided: 0
>
> These are the statistics from when I started for quick.lib. This
> means that we allocate ten million typerefs, 94% of which are
> discarded after discovering that uniques already holds an equivalent
> typeref. If that's not already obviously suboptimal, note that we
> must calculate the hashcode for each of those 9.4M, which means a
> reasonably expensive murmurhash3 which itself creates yet more
> garbage.
>
> As of now I have those numbers at:
>
> Attempted TypeRef creations: 10525999
> Surviving uniques: 569797
> Doomed allocations avoided: 1863767
>
> I am morally certain that the last number can be much higher, but
> woe to he who gets too ambitious here. (Or maybe that's "whoa.")
> The major difficulty is as usual the introduction of cycles. If I
> rework the tpeCache mechanism in Symbol a bit (something I wish I'd
> pursued before I despair enveloped me) I think I can make that part
> predictable. I also discovered that one does not want to try to
> cache types separately from symbols unless they are never
> transformed, and even trivial types like packages are transformed in
> erasure. So it's a matter of populating the tpeCache of applicable
> symbols without cycling on the caching mechanism, but while still
> being able to use the cache subsequently.
>
> What would be helpful is for anyone to share what they know about
> which symbols are *in principle* always mapped to the same typeref
> (during a single phase.) I estimate at least the following:
>
> If args.isEmpty:
>
> - a class without typeParams whose prefix is ThisType
> (aPackageClass):.
> - a type parameter without typeParams. (This is probably
> incomplete.)
>
> If args.nonEmpty (and I didn't do anything with this yet, but some
> of these come in by the tens of thousands.)
>
> - a parameterized top level class which has been parameterized on
> its own type parameters to stand in as dummies so it can be kind *.
> (This whole setup "desperately needs cleaning up" as moors put it.)
>
> Finally, the most deflating news is that the profiler is fairly
> impressed by these gains but the wall clock is not. My working
> theory at this point is that collecting allocation information takes
> the profiler time out of proportion with the impact in real life; so
> it heavily exaggerates the significance of short-lived allocations
> (okay, probably it's the naive profile output reader doing the
> exaggerating.) Obviously hotspot can't eliminate either the
> allocation or the calculation of the hashcode given that the
> hashcode is in the constructor and we use the hashcode to look up
> the pre-existing object, but the amortized allocation cost seems to
> be close to zero. The wall clock gain appears to be entirely from
> eliminating the hashcode calculation.
>
> Since the cached typerefs can key on symbol and symbols use
> reference equality, I was able to use IdentityHashMap for this,
> which also lowers the burden on uniques.
>