Fwd: collections/buildable

8 replies

Tue, 2009-07-14, 19:11

DRMacIver

Joined: 2008-09-02,

For general information (moving from scala-devel)

---------- Forwarded message ----------
From: David MacIver
Date: 2009/7/14
Subject: Re: collections/buildable
To: martin odersky
Cc: scala-devel@epfl.ch
2009/7/14 martin odersky :
> Hi David,
>
> Did you already investigate collections with Buildables instead of
> Builders? I wanted to ask to get an idea how we should proceed with
> that and with collections in general.
>
> There are also two new concerns that have come up: parallelism and
> performance: Tiark did some performance measurements and the results
> were quite surpising at first: First, without special optimizations,
> iterators turn out to be much faster than foreach. The ONLY exception
> is when the program consists of exactly one foreach (i.e. is a
> microbenchmark). The reason seems to be that the foreach itself is not
> inlined, so the call to the closure becomes megamorphic and therefore
> slow. We probably can circumvent this by agressive inlining of
> foreach. But it is a concern for the Buildable scheme, because it
> relies on even more indirection. Essentially, every indirection is a
> potential inlining failure, with bad performance consequences. I am
> not saying we should not do it but it has to be carefully measured.
>
> The other issue is parallel collections, which will become very
> important soon (we are working with Doug Lea on getting some parallel
> collections into the standard library). Hence, every builder/buildable
> pattern
> should be nice to parallel builds. Right now neither seems to be. I
> believe we need a way to compose
> Builders or Buildables. I.e. something like
>
> trait Builder[T, C] {
> def ++ (other: Builder[T, C])
> }
>
> Maybe that's even simpler with Buildable? But it looks like the simple
> foreach interface won't do because it's sequential.
>
> Cheers
>
> -- Martin
>

Hi Martin,

I think it would be useful if these conversations were carried on in
scala-internals. There are a number of people on the periphery who
know a lot about JVM internals (Ismael Juma comes to mind), so it
would be good to get them involved if there's no compelling reason not
to. Is it ok if we move this thread there?

On Buildable, I investigated it briefly, but it ended up with my
having to change a significant chunk of the way things were working
with companion objects in order to get the type system to behave
without repeating myself endlessly (essentially interactions between
implicitly providing Buildable instances and the BuilderFactory type
system trick) as a precondition to getting anything working and things
spiralled out of the time slot I had allocated for it. Unfortunately
my time is quite limited at the moment, and I'm not actively using
Scala for anything except, well, developing Scala, so for the moment
at least I'm not likely to be able to do much in the way of active
development on things. Sorry. :-(

Regarding foreach peformance, that's quite unfortunate. Apologies if I
lead you astray on that front! I think I probably did just test it
with microbenchmarks, which is what lead me to assume things behaved
sensibly. I find the result quite surprising though, because the calls
to Iterator.next and Iterator.hasNext should also be megamorphic and
thus equally not inlinable so the amount of indirection should
actually *decrease* when using foreach.

I don't think aggressive inlining of foreach can solve the problem
though - the problem is that in the overwhelming majority of cases you
actually can't know what the foreach implementation is calling. e.g.
suppose I have

xs.find(x => stuff)

and find is implemented in terms of foreach then there's actually two
levels of inlining to resolve - one to inline the find, the other to
inline the foreach. This seems impractical.

There's no particular reason this should be a death knell for the
Buildable approach though: You only have one Buildable call per
construction, so the call to it is not a bottleneck, and then its
logic can equally use foreach or iterators. If anything it should
actually be an improvement to it - the calls to Builder.+= are equally
subject to being megamorphic.

As far as parallelism goes, I think Buildable is again a win here: Its
construction logic is effectively black box, and free to run in
parallel if it wants to.

Tue, 2009-07-14, 20:07

ijuma

Joined: 2008-08-20,

Re: Fwd: collections/buildable

On Tue, 2009-07-14 at 19:10 +0100, David MacIver wrote:
> For general information (moving from scala-devel)

Thank you for forwarding this, David, interesting findings. Is the
source code for the benchmark(s) available? It would also be good to
know what environment was used for testing (JDK version, options used,
server or client JIT compiler, 32 or 64 bits, machine type, etc.). These
settings can have a big influence on these types of results.

Ismael

Wed, 2009-07-15, 14:27

Tiark Rompf

Joined: 2009-02-18,

Re: Fwd: collections/buildable

The code for the benchmarks I ran is available from

http://lamp.epfl.ch/~rompf/vector2/

Feel free to test more machines, VM versions, parameters etc.

Regarding the iterator vs. foreach issue, there is a large difference
in behavior between JDK 1.6.13 (=HotSpot 11) and 1.6.14 (=HotSpot 14)
as well as OpenJDK trunk. For newer versions of HotSpot (>= 14), the
difference is indeed much smaller. It might have to do with bimorphic
inlining or the way 'warm' call sites are treated.

- Tiark

On 14.07.2009, at 20:57, Ismael Juma wrote:

> On Tue, 2009-07-14 at 19:10 +0100, David MacIver wrote:
>> For general information (moving from scala-devel)
>
> Thank you for forwarding this, David, interesting findings. Is the
> source code for the benchmark(s) available? It would also be good to
> know what environment was used for testing (JDK version, options used,
> server or client JIT compiler, 32 or 64 bits, machine type, etc.).
> These
> settings can have a big influence on these types of results.
>
> Ismael

On 14.07.2009, at 20:57, Ismael Juma wrote:

Thu, 2009-07-16, 10:57

ijuma

Joined: 2008-08-20,

Re: Fwd: collections/buildable

On Wed, 2009-07-15 at 15:18 +0200, Tiark Rompf wrote:
> The code for the benchmarks I ran is available from
>
> http://lamp.epfl.ch/~rompf/vector2/

Thank you Tiark.

> Feel free to test more machines, VM versions, parameters etc.

I will. Hopefully this weekend.

> Regarding the iterator vs. foreach issue, there is a large difference
> in behavior between JDK 1.6.13 (=HotSpot 11) and 1.6.14 (=HotSpot 14)
> as well as OpenJDK trunk. For newer versions of HotSpot (>= 14), the
> difference is indeed much smaller. It might have to do with bimorphic
> inlining or the way 'warm' call sites are treated.

That's encouraging.

Best,
Ismael

Mon, 2009-10-26, 15:07

ijuma

Joined: 2008-08-20,

Re: Fwd: collections/buildable

Hey Tiark and others,

On Wed, Jul 15, 2009 at 1:18 PM, Tiark Rompf <tiark.rompf@epfl.ch> wrote:

The code for the benchmarks I ran is available from

http://lamp.epfl.ch/~rompf/vector2/

Feel free to test more machines, VM versions, parameters etc.

I finally had a chance to play with this and because I wanted to include charts, I wrote a blog post about it instead:

http://blog.juma.me.uk/2009/10/26/new-jvm-options-and-scala-iteration-performance/

A summary:

- Compressed references provide a huge performance boost in 64-bit JVMs.

- Ticket 1338 (https://lampsvn.epfl.ch/trac/scala/ticket/1338) or some variation is very important if we want arrays to perform well as the inlining of foreach seems to be too fragile in current HotSpot, even with -optimise (see RawArrayForeachMega, for example).

- The code is now in github (http://github.com/ijuma/benchmark) so it's easier for others to play with.

Best,
Ismael

Mon, 2009-10-26, 15:47

Tiark Rompf

Joined: 2009-02-18,

Re: Fwd: collections/buildable

Hi Ismael,
nice blog post and a great analysis! It's good to see our findings confirmed.I'll reply later with a comment on your blog.
Cheers,- Tiark

On 26.10.2009, at 15:04, Ismael Juma wrote:

Hey Tiark and others,

On Wed, Jul 15, 2009 at 1:18 PM, Tiark Rompf <tiark.rompf@epfl.ch> wrote:
The code for the benchmarks I ran is available from

http://lamp.epfl.ch/~rompf/vector2/

Feel free to test more machines, VM versions, parameters etc.

I finally had a chance to play with this and because I wanted to include charts, I wrote a blog post about it instead:

http://blog.juma.me.uk/2009/10/26/new-jvm-options-and-scala-iteration-performance/

A summary:

- Compressed references provide a huge performance boost in 64-bit JVMs.

- Ticket 1338 (https://lampsvn.epfl.ch/trac/scala/ticket/1338) or some variation is very important if we want arrays to perform well as the inlining of foreach seems to be too fragile in current HotSpot, even with -optimise (see RawArrayForeachMega, for example).

- The code is now in github (http://github.com/ijuma/benchmark) so it's easier for others to play with.

Best,
Ismael

Mon, 2009-10-26, 16:37

Iulian Dragos 2

Joined: 2009-02-10,

Re: Fwd: collections/buildable

Hi Ismael,

There are a couple of issues with the optimizer not being able to inline the 'megamorphic' foreach benchmark. One of them is that these calls are inside a constructor, and constructors are not optimized. The second is that, even if the code is moved into a method of its own, 'foreach' cannot be proven final because the implicit in Predef that wraps Java arrays to operations has an explicit return type -- ArrayOps (an abstract class). Even if that is solved, foreach is still not inlined because none of the leaf classes ArrayOps.ofByte/ofShort etc. are marked final.

I am working on making the optimizer a bit smarter about what constitutes a final class (mainly, to recognize that leaf classes defined inside an object are practically final). I hope that will have a positive impact on that benchmark.

Thanks for the benchmarks, they are really helpful in improving our implementation.

cheers,
iulian

On Mon, Oct 26, 2009 at 3:39 PM, Tiark Rompf <tiark.rompf@epfl.ch> wrote:

Hi Ismael,
nice blog post and a great analysis! It's good to see our findings confirmed. I'll reply later with a comment on your blog.
Cheers,- Tiark

On 26.10.2009, at 15:04, Ismael Juma wrote:
Hey Tiark and others,

On Wed, Jul 15, 2009 at 1:18 PM, Tiark Rompf <tiark.rompf@epfl.ch> wrote:

The code for the benchmarks I ran is available from

http://lamp.epfl.ch/~rompf/vector2/

Feel free to test more machines, VM versions, parameters etc.

I finally had a chance to play with this and because I wanted to include charts, I wrote a blog post about it instead:

http://blog.juma.me.uk/2009/10/26/new-jvm-options-and-scala-iteration-performance/

A summary:

- Compressed references provide a huge performance boost in 64-bit JVMs.

- Ticket 1338 (https://lampsvn.epfl.ch/trac/scala/ticket/1338) or some variation is very important if we want arrays to perform well as the inlining of foreach seems to be too fragile in current HotSpot, even with -optimise (see RawArrayForeachMega, for example).

- The code is now in github (http://github.com/ijuma/benchmark) so it's easier for others to play with.

Best,
Ismael

--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais

Mon, 2009-10-26, 17:07

ijuma

Joined: 2008-08-20,

Re: Fwd: collections/buildable

Hey Iulian,

On Mon, 2009-10-26 at 16:27 +0100, Iulian Dragos wrote:
> There are a couple of issues with the optimizer not being able to
> inline the 'megamorphic' foreach benchmark. One of them is that these
> calls are inside a constructor, and constructors are not optimized.

Not sure if you looked at the right place. The calls that are just there
to make the foreach call site megamorphic run in the constructor, but
the measured execution (and the one we actually care about inlining) is
inside the run method and that doesn't run in a constructor (unless I
missed something).

> The second is that, even if the code is moved into a method of its
> own, 'foreach' cannot be proven final because the implicit in Predef
> that wraps Java arrays to operations has an explicit return type --
> ArrayOps (an abstract class). Even if that is solved, foreach is still
> not inlined because none of the leaf classes ArrayOps.ofByte/ofShort
> etc. are marked final.

Indeed, there are many obstacles in having all of this work statically.
I have a feeling the JVM would be better at this if Java made heavy use
of closure-like constructs in their own collections. Oh well.

> I am working on making the optimizer a bit smarter about what
> constitutes a final class (mainly, to recognize that leaf classes
> defined inside an object are practically final). I hope that will have
> a positive impact on that benchmark.

Sounds good, looking forward to it.

> Thanks for the benchmarks, they are really helpful in improving our
> implementation.

Sure. Most of the credit should go to Tiark, anyway. I mostly did some
copy & paste and minor changes.

Best,
Ismael

Mon, 2009-10-26, 17:17

Iulian Dragos 2

Joined: 2009-02-10,

Re: Fwd: collections/buildable

On Mon, Oct 26, 2009 at 4:58 PM, Ismael Juma <mlists@juma.me.uk> wrote:

Hey Iulian,

On Mon, 2009-10-26 at 16:27 +0100, Iulian Dragos wrote:
> There are a couple of issues with the optimizer not being able to
> inline the 'megamorphic' foreach benchmark. One of them is that these
> calls are inside a constructor, and constructors are not optimized.

Not sure if you looked at the right place. The calls that are just there
to make the foreach call site megamorphic run in the constructor, but
the measured execution (and the one we actually care about inlining) is
inside the run method and that doesn't run in a constructor (unless I
missed something).

You're right, I thought at first you're benchmarking everything. Nevertheless, the other points are still valid.

iulian

> The second is that, even if the code is moved into a method of its
> own, 'foreach' cannot be proven final because the implicit in Predef
> that wraps Java arrays to operations has an explicit return type --
> ArrayOps (an abstract class). Even if that is solved, foreach is still
> not inlined because none of the leaf classes ArrayOps.ofByte/ofShort
> etc. are marked final.

Indeed, there are many obstacles in having all of this work statically.
I have a feeling the JVM would be better at this if Java made heavy use
of closure-like constructs in their own collections. Oh well.

> I am working on making the optimizer a bit smarter about what
> constitutes a final class (mainly, to recognize that leaf classes
> defined inside an object are practically final). I hope that will have
> a positive impact on that benchmark.

Sounds good, looking forward to it.

> Thanks for the benchmarks, they are really helpful in improving our
> implementation.

Sure. Most of the credit should go to Tiark, anyway. I mostly did some
copy & paste and minor changes.

Best,
Ismael

--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais

Scala Main Menu

Fwd: collections/buildable

Scala Quick Links

Featured News

User login