This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Parallel Collections fail

7 replies
Johannes Rudolph 2
Joined: 2010-02-12,
User offline. Last seen 42 years 45 weeks ago.

Hi all,

this weekend, I was stumped that abstracting over parallel and
sequential collections with the generic GenSeq etc types is completely
broken. See this example:

val items = (0 to 10000)
val parItems: collection.GenSeq[Int] = if (true) items.par else items.seq
parItems.map {
_ => Thread.currentThread.getName
}

ParVector(Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Threa...

In summary, an operation is never executed in parallel if the static
type of the collection is not a parallel one which is defeating the
purpose of the generic collections construction completely. This is,
because dynamic dispatch - as I would have expected it - is
circumvented by deciding if something should be run in parallel by the
instance of the CanBuildFrom which is chosen statically instead of at
runtime.

I find this a bit upsetting since we had quite a discussion months ago
about the design of the parallel collections and it seems no one
(including me) did actually test that the resulting implementation did
what was promised.

Another problem here is that you can't write

val parItems = if (true) items.par else items.seq

without the type annotation, because type inference fails badly here
(though that seems to be fixed in trunk already).

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Parallel Collections fail

I believe this is due to the fact that the `map` in `ParIterableLike`
checks whether the builder factory is parallel, instead of
instantiating the builder and checking if the builder is parallel.

I'll fix this.

Cheers,
Alex

On Jul 4, 1:18 pm, Johannes Rudolph
wrote:
> Hi all,
>
> this weekend, I was stumped that abstracting over parallel and
> sequential collections with the generic GenSeq etc types is completely
> broken. See this example:
>
> val items = (0 to 10000)
> val parItems: collection.GenSeq[Int] = if (true) items.par else items.seq
> parItems.map {
>  _ => Thread.currentThread.getName
>
> }
>
> ParVector(Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Threa...
>
> In summary, an operation is never executed in parallel if the static
> type of the collection is not a parallel one which is defeating the
> purpose of the generic collections construction completely. This is,
> because dynamic dispatch - as I would have expected it - is
> circumvented by deciding if something should be run in parallel by the
> instance of the CanBuildFrom which is chosen statically instead of at
> runtime.
>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>
> Another problem here is that you can't write
>
> val parItems = if (true) items.par else items.seq
>
> without the type annotation, because type inference fails badly here
> (though that seems to be fixed in trunk already).
>
> --Johannes
>
> -----------------------------------------------JohannesRudolphhttp://virtual-void.net

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Parallel Collections fail

One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.

Cheers,
Alex

>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>

Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: Re: Parallel Collections fail


On Tue, Jul 5, 2011 at 12:56 PM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:
One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.

Assert on the behavior of the builder you said was wrong?
 

Cheers,
Alex


>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>



--
Viktor Klang

Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang
Tony Morris 2
Joined: 2009-03-20,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Parallel Collections fail

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/07/11 20:56, Aleksandar Prokopec wrote:
> One question is: How would you write a unit test for this?
forall s, t in ExecutionStrategy. s(f) == t(f)

Johannes Rudolph 2
Joined: 2010-02-12,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Parallel Collections fail

Hi Aleksandar,

On Mon, Jul 4, 2011 at 5:25 PM, Aleksandar Prokopec
wrote:
> I believe this is due to the fact that the `map` in `ParIterableLike`
> checks whether the builder factory is parallel, instead of
> instantiating the builder and checking if the builder is parallel.
>
> I'll fix this.

Thanks for taking care of this, should I create a ticket anyways?

Chris Marshall 2
Joined: 2011-05-26,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: Parallel Collections fail
If the underlying pool were pluggable, you could insert a "sequential" pool in the tests which tracks whether it was used or not:
def beginTest = { pool = new TestPool; Parallel.pool = pool }
def runTest {  ...  assert(pool.wasUsed)}
def endTest = { pool.shutdown() }

On Tue, Jul 5, 2011 at 11:56 AM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:
One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.

Cheers,
Alex


>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: Parallel Collections fail

Hi!
Sure, you can do that, just in case I've missed something when I was
making changes.

As for the omitting the type annotation, I've just observed in the
REPL that a structural type is inferred, not GenSeq[Int].

The builder that gets resolved is only visible inside the collection
method.
Plugging a custom pool may be the way to go for tests.

Cheers,
Alex

On Jul 5, 1:24 pm, Johannes Rudolph
wrote:
> Hi Aleksandar,
>
> On Mon, Jul 4, 2011 at 5:25 PM, Aleksandar Prokopec
>
> wrote:
> > I believe this is due to the fact that the `map` in `ParIterableLike`
> > checks whether the builder factory is parallel, instead of
> > instantiating the builder and checking if the builder is parallel.
>
> > I'll fix this.
>
> Thanks for taking care of this, should I create a ticket anyways?
>
> --
> Johannes
>
> -----------------------------------------------
> Johannes Rudolphhttp://virtual-void.net

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland