- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Parallel Collections fail
Mon, 2011-07-04, 12:18
Hi all,
this weekend, I was stumped that abstracting over parallel and
sequential collections with the generic GenSeq etc types is completely
broken. See this example:
val items = (0 to 10000)
val parItems: collection.GenSeq[Int] = if (true) items.par else items.seq
parItems.map {
_ => Thread.currentThread.getName
}
ParVector(Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
Thread-107, Threa...
In summary, an operation is never executed in parallel if the static
type of the collection is not a parallel one which is defeating the
purpose of the generic collections construction completely. This is,
because dynamic dispatch - as I would have expected it - is
circumvented by deciding if something should be run in parallel by the
instance of the CanBuildFrom which is chosen statically instead of at
runtime.
I find this a bit upsetting since we had quite a discussion months ago
about the design of the parallel collections and it seems no one
(including me) did actually test that the resulting implementation did
what was promised.
Another problem here is that you can't write
val parItems = if (true) items.par else items.seq
without the type annotation, because type inference fails badly here
(though that seems to be fixed in trunk already).
Tue, 2011-07-05, 11:57
#2
Re: Parallel Collections fail
One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.
Cheers,
Alex
>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>
Tue, 2011-07-05, 12:17
#3
Re: Re: Parallel Collections fail
On Tue, Jul 5, 2011 at 12:56 PM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:
One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.
Assert on the behavior of the builder you said was wrong?
Cheers,
Alex
>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Tue, 2011-07-05, 12:27
#4
Re: Re: Parallel Collections fail
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05/07/11 20:56, Aleksandar Prokopec wrote:
> One question is: How would you write a unit test for this?
forall s, t in ExecutionStrategy. s(f) == t(f)
Tue, 2011-07-05, 12:27
#5
Re: Re: Parallel Collections fail
Hi Aleksandar,
On Mon, Jul 4, 2011 at 5:25 PM, Aleksandar Prokopec
wrote:
> I believe this is due to the fact that the `map` in `ParIterableLike`
> checks whether the builder factory is parallel, instead of
> instantiating the builder and checking if the builder is parallel.
>
> I'll fix this.
Thanks for taking care of this, should I create a ticket anyways?
Tue, 2011-07-05, 12:37
#6
Re: Re: Parallel Collections fail
If the underlying pool were pluggable, you could insert a "sequential" pool in the tests which tracks whether it was used or not:
On Tue, Jul 5, 2011 at 11:56 AM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:
def beginTest = { pool = new TestPool; Parallel.pool = pool }
def runTest { ... assert(pool.wasUsed)}
def endTest = { pool.shutdown() }
On Tue, Jul 5, 2011 at 11:56 AM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:
One question is: How would you write a unit test for this?
One possibility I'm thinking of is to compare times with a version
known to be sequential. But since partest runs tests in parallel
already, such a test would fail, since there would not be enough
processors available to speed up execution.
Cheers,
Alex
>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>
Tue, 2011-07-05, 14:47
#7
Re: Parallel Collections fail
Hi!
Sure, you can do that, just in case I've missed something when I was
making changes.
As for the omitting the type annotation, I've just observed in the
REPL that a structural type is inferred, not GenSeq[Int].
The builder that gets resolved is only visible inside the collection
method.
Plugging a custom pool may be the way to go for tests.
Cheers,
Alex
On Jul 5, 1:24 pm, Johannes Rudolph
wrote:
> Hi Aleksandar,
>
> On Mon, Jul 4, 2011 at 5:25 PM, Aleksandar Prokopec
>
> wrote:
> > I believe this is due to the fact that the `map` in `ParIterableLike`
> > checks whether the builder factory is parallel, instead of
> > instantiating the builder and checking if the builder is parallel.
>
> > I'll fix this.
>
> Thanks for taking care of this, should I create a ticket anyways?
>
> --
> Johannes
>
> -----------------------------------------------
> Johannes Rudolphhttp://virtual-void.net
I believe this is due to the fact that the `map` in `ParIterableLike`
checks whether the builder factory is parallel, instead of
instantiating the builder and checking if the builder is parallel.
I'll fix this.
Cheers,
Alex
On Jul 4, 1:18 pm, Johannes Rudolph
wrote:
> Hi all,
>
> this weekend, I was stumped that abstracting over parallel and
> sequential collections with the generic GenSeq etc types is completely
> broken. See this example:
>
> val items = (0 to 10000)
> val parItems: collection.GenSeq[Int] = if (true) items.par else items.seq
> parItems.map {
> _ => Thread.currentThread.getName
>
> }
>
> ParVector(Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Thread-107, Thread-107, Thread-107, Thread-107,
> Thread-107, Threa...
>
> In summary, an operation is never executed in parallel if the static
> type of the collection is not a parallel one which is defeating the
> purpose of the generic collections construction completely. This is,
> because dynamic dispatch - as I would have expected it - is
> circumvented by deciding if something should be run in parallel by the
> instance of the CanBuildFrom which is chosen statically instead of at
> runtime.
>
> I find this a bit upsetting since we had quite a discussion months ago
> about the design of the parallel collections and it seems no one
> (including me) did actually test that the resulting implementation did
> what was promised.
>
> Another problem here is that you can't write
>
> val parItems = if (true) items.par else items.seq
>
> without the type annotation, because type inference fails badly here
> (though that seems to be fixed in trunk already).
>
> --Johannes
>
> -----------------------------------------------JohannesRudolphhttp://virtual-void.net