This page is no longer maintained — Please continue to the home page at www.scala-lang.org

REPL: null-values in parallel ranges?

170 replies

Sun, 2011-04-03, 04:37

#51

Bill Venners

Joined: 2008-12-18,

Re: REPL: null-values in parallel ranges?

Hi Again,

In thinking further, I think the way this would have to work is the
extra method idea that was proposed earlier, and Martin said they
considered but rejected. The approach would be:

Tighten the foreach contract to require in-order execution.

Add a pareach method or something that does foreach but in a parallel
way. Maybe this is in trait Parallel so all Parallel things have to
implement it. And this is the method a for expression would translate
to invocations of if the generator type is Parallel, not invocations
of foreach.

Bill

On Sat, Apr 2, 2011 at 8:07 PM, Bill Venners wrote:
> Hi Paul,
>
> Actually I said that wrongly. I meant that if I really want to do a
> parallel foreach on a Seq type that doesn't extend Parallel, I could
> write it like this:
>
> for (e <- ls.par) {
> // some thread-safe side-effecting code
> }
>
> The example I showed was a method that took a Parallel Seq already.
>
> Bill
>
> On Sat, Apr 2, 2011 at 8:03 PM, Bill Venners wrote:
>> Hi Paul,
>>
>> I'm not sure how to implement it, but I wonder if the following
>> behavior would make sense: have for expression rewriting rules for
>> foreach take into account whether the generator mixes in the Parallel
>> marker trait.
>>
>> scala> import scala.collection.Parallel
>> import scala.collection.Parallel
>>
>> scala> val ls = List(1, 2, 3, 4)
>> ls: List[Int] = List(1, 2, 3, 4)
>>
>> scala> ls.isInstanceOf[Parallel]
>> res3: Boolean = false
>>
>> scala> val pls = ls.par
>> pls: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3, 4)
>>
>> scala> pls.isInstanceOf[Parallel]
>> res5: Boolean = true
>>
>> If I do this:
>>
>> for (e <- ls) println(e)
>>
>> This would somehow cause a an in-order execution, because the type of
>> ls doesn't mix in Parallel.
>>
>> But if I do this:
>>
>> for (e <- pls) println(e)
>>
>> It would happen in Parallel, because the type of generator pls mixes
>> in Parallel. That way if people have a method that takes a Seq[T],
>> say, and uses that in a for expression, then because Seq[T] isn't a
>> subtype of Parallel, it would do it in order (somehow). That means all
>> old code should not break.
>>
>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>
>> But new code that really wants parallel execution could just do .par
>> on that Seq[T], which would somehow kick it into parallel execution.
>>
>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>>
>> By saying the method takes a Seq[String] with Parallel, it is
>> announcing that it might do foreaches in parallel.
>>
>> I think it would be fine to let flatMap, filter, withFilter, map be
>> rewritten in the same way, because most times people wouldn't have
>> side effects in those. But foreach is all about the side effects.
>>
>> Bill
>>
>> On Sat, Apr 2, 2011 at 7:49 PM, Paul Phillips wrote:
>>> On 4/2/11 1:00 PM, martin odersky wrote:
>>>>
>>>> But yes, it is a worry that we'll have to think about this issue now
>>>> everywhere. Hopefully, it will translate into a much stronger push to be
>>>> fully functional.
>>>
>>> For the record I don't mind worrying about it everywhere from now on. I'm
>>> all for it. I just don't want to spend the rest of my life fixing bugs in
>>> pre-existing code.
>>>
>>>> If you write
>>>>
>>>> for { x <- pxs } ...
>>>>
>>>> where pxs is a parallel collection you expect the loop to be parallel.
>>>> So foreach needs to be parallel.
>>>
>>> That tells me the foreach at the base of the parallel collections has to be
>>> parallel, but not that the foreach we already have has to be. If that's the
>>> main motivation then we can re-root the parallel collections around a
>>> parallel foreach. I already pushed foreach backward for TraversableOnce,
>>> I'd be glad to push it sideways for this one.
>>>
>>> There's another interesting tie-in here, which I already had on my agenda
>>> for a while, ever since looking too closely at slice. There is a real issue
>>> with having the same trait provide default implementations for both mutable
>>> and immutable collections. The default implementation is either outright
>>> wrong or grossly inefficient half the time. So the side which was not
>>> chosen as the favorite side has to be sure every method implementation is
>>> overridden (and many bugs have been the result of not achieving this.) But
>>> what is the point of having default implementations if they are menaces
>>> which must be guarded against half the time?
>>>
>>> Example. What is the default implementation of tail? There are pretty much
>>> two choices.
>>>
>>> // if this is a mutable seq, you're now sharing mutable cells
>>> def tail = pointer to the second element of this collection
>>> // if this is an immutable seq, you're now copying a billion
>>> // elements for no reason
>>> def tail = copy this entire collection except the first
>>>
>>> For almost every method in TraversableLike this same issue arises. (The
>>> methods where one can safely use the same implementation are generally folds
>>> and other chew-up-the-sequence methods, and those are in TraversableOnce.)
>>> The conclusion I drew is that the mutable and immutable collections should
>>> not share any code past TraversableOnce. They should (according to this
>>> theory) share a pure interface in TraversableLike and each have their own
>>> default implementations.
>>>
>>> So back to the parallel foreach problem: same thing.
>>>
>>> trait MutableTraversableLike extends TraversableMethods
>>> trait ImmutableTraversableLike extends TraversableMethods
>>> trait ParallelTraversableLike extends TraversableMethods
>>>
>>> (I won't speculate on whether it should really be
>>> ParallelImmutableTraversableLike and so on or how these might compose unless
>>> I see a glimmer of hope we could go this way.)
>>>
>>>
>>
>>
>>
>> --
>> Bill Venners
>> Artima, Inc.
>> http://www.artima.com
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Sun, 2011-04-03, 07:57

#52

H-star Development

Joined: 2010-04-14,

Re: REPL: null-values in parallel ranges?

that is convincing that something has to be fixed. the question is what.
making foreach sequencial pretty much makes any parallel collection
sequencial again because many methods are based on foreach.
creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
would make it impossible to use parallel collections where the api
designer wanted to use single threaded ones but on the other hand
sacrifice compatibility with old code. the old collections would all be
single threaded ones, and in old code you could not use parallel ones,
even if you knew it would work.

the solution could be something like:
some collection methods could be annotated with @OnlySeq (like "zipped"
in the broken example), which would add a compile time conversion and a
runtime assertion. in the case of zipped, it's clear you want to keep
the order of both ranges so you annotate the method with @OnlySeq. the
compiler knows using parallel ranges makes no sense here -> compiler
warning and/or compiler inserted conversion to seq if you do it anyway.
if the collection is not explicitly declared as parallel, the compiler
cannot warn -> runtime type assertion

still , you'd have to be careful, but everyone knows that.

Am 03.04.2011 02:12, schrieb Paul Phillips:
> I looked around trunk a little bit for stuff which is broken because
> of this. Here are the first couple examples I found.
>
> // more "dynamic" hashcodes
> scala> val x = (1 to 10).par
> x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
> 5, 6, 7, 8, 9, 10)
>
> scala> x.hashCode
> res0: Int = 1137090302
>
> scala> x.hashCode
> res1: Int = -555212555
>
> scala> x.hashCode
> res2: Int = -1216588130
>
>
> // grab your partner, do-sa-do
> scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + "
> " + y))
> 3 2
> 6 3
> 2 4
> 7 6
> 1 1
> 8 7
> 4 5
> 5 9
> 9 8
> 10 10
>
>

Sun, 2011-04-03, 08:07

#53

Tony Morris 2

Joined: 2009-03-20,

Re: REPL: null-values in parallel ranges?

On 03/04/11 16:55, HamsterofDeath wrote:
> something has to be fixed. the question is what.
> ...
> many methods are based on foreach.

I think I know where the problem is.

Tony Morris
http://tmorris.net/

Sun, 2011-04-03, 08:47

#54

bjohanns

Joined: 2009-10-23,

Re: REPL: null-values in parallel ranges?

Hi,
I am glad we are discussing all this because I am expecting severe
trouble ahead with the current outline.

> creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
> would make it impossible to use parallel collections where the api
> designer wanted to use single threaded ones but on the other hand

I think it's a good and highly appreciable thing to have easy to use
parallel collections which can be easily introduced instead of
sequencial ones.
For what it's worth I don't think that "transparent" parallel
collections instead of sequencial collections is the way to go.
Sneaking in parallel behaviour means sneaking in errors as the code
being forcefully parallelized might never have been intended for
parallelization.

I would try to go for a design where all hitherto existing ordered /
sequencial traits / classes are a special case of unordered tolerant /
parallelizable collections. That is Seq and co. is narrowing down the
degree of freedom to sequencial behaviour. This way existing code
could continue to operate as expected.

A redesigner could review code in question and relax the type
signature of methods to allow for parallel stuff to come in. That way
the contract is clear and there is an easy road to adopt
parallelization. (I'm sure there are drawback to this approach as well
- but at least there is no automagical parallelization which will
cause automagic indeterministic errors).

The main problem I currently see is that we're a bit late with this
discussion as RC1 is already released. But better late than never...

Greetings
Bernd

Sun, 2011-04-03, 08:57

#55

Alex Cruise

Joined: 2008-12-17,

Re: REPL: null-values in parallel ranges?

On Sat, Apr 2, 2011 at 2:39 PM, Rex Kerr <ichoran@gmail.com> wrote:

But if, two years ago, I wrote for {x <- xs } ...
and stuck it in a library, I'm pretty sure I was _not_ expecting it to be parallel.

So you cannot possibly trust that code--unless you've inspected it yourself--to work sensibly in a parallel context. It might break horribly and obviously every time, or it might break subtly on rare occasions, or it might not break at all. The point is that you have no way to know which it is without looking at every use of the offending operations.

This is The Big One as far as I'm concerned. We may have learned not to care too much about binary compatibility, but we should still care about backward compatibility in its other, easier form. Grizzled, battle-scarred APIs are sitting out there in the field expecting that when someone sends them a 2003-vintage bottle of Seq it won't explode into a million pieces when you uncork it.
Toning down the metaphor a bit... It's really, really easy to come up with examples of useful, idiomatic methods that accept collections, foreach over them, and expect that actions will be taken in the order of the elements. I don't think there's much Scala code in the wild that uses iterator instead of foreach specifically to avoid this kind of implementation detail--leaving aside the question of whether that particular distinction is a *good* signal of one's desire for sequentiality.
-0xe1a

Sun, 2011-04-03, 09:27

#56

Jonas Bonér 3

Joined: 2009-11-25,

Re: REPL: null-values in parallel ranges?

I'm all with you Paul. This is clearly where we don't want to end up, with an impl that looks like something familiar that users know they can understand and reason about but then without any further notice turns semantics upside down and inside out.
In my view concurrency is something that needs to be explicit and embraced as it is and not disguised as familiar territory or we end up with a leaky abstraction that leaks so much that it is useless.

On 3 April 2011 04:23, Josh Suereth <joshua.suereth@gmail.com> wrote:

Yeah... you just convinced me.
foreach should stay sequential.. *OR* (crazy thought) Parallel collections should *not* derive from the current non-parallel collections. (I'm looking at you sequential-order-assuming Traversable base class).
Instead there can be some other types of interfaces that make sense in a parallel world that both parallel collections and non-parallel collections can derive from.
If you want parallelism, you're stuck with map/flatMap/filter. You want sequential, you convert back to a regular collection and win.
- Josh

On Sat, Apr 2, 2011 at 10:11 PM, Paul Phillips <paulp@improving.org> wrote:

On 4/2/11 6:18 PM, Josh Suereth wrote:

hashCode seems like a deal breaker...

So I may not have been paying attention. Are parallel collections
"invasive" in the sense that code needs to be sprinkled everywhere
throughout the std library to ensure correctness, or can we limit the
fixes just to the parallel collections themselves?

They are "invasive" in that sense. To use the hashCode example, here's why the .par hashCode is non-deterministic.

// SeqLike.scala
override def hashCode() = {
val h = new util.MurmurHash[A](Seq.hashSeed)
this.foreach(h)
h.hash
}

Easy enough to fix of course. But only with the aforementioned sprinkling. When the method was written one could pretty reasonably expect the same outcome on each run.

I'm having trouble imagining happy outcomes if we go forward with this.

--
Jonas Bonér

Specialist at Large
work: http://scalablesolutions.se
code: http://akka.io
blog: http://jonasboner.com
twtr: @jboner

Sun, 2011-04-03, 09:37

#57

roland.kuhn

Joined: 2011-02-21,

Re: REPL: null-values in parallel ranges?

+1 (I knew you could say it so much more convincingly)

While the old types continue to keep their behavior, the "default" Map etc. could change to parallel, making for a pretty compile time error when passing them to legacy code. Problem solved.

On Apr 3, 2011, at 4:49, Paul Phillips wrote:

> On 4/2/11 1:00 PM, martin odersky wrote:
>> But yes, it is a worry that we'll have to think about this issue now
>> everywhere. Hopefully, it will translate into a much stronger push to be
>> fully functional.
>
> For the record I don't mind worrying about it everywhere from now on. I'm all for it. I just don't want to spend the rest of my life fixing bugs in pre-existing code.
>
>> If you write
>>
>> for { x <- pxs } ...
>>
>> where pxs is a parallel collection you expect the loop to be parallel.
>> So foreach needs to be parallel.
>
> That tells me the foreach at the base of the parallel collections has to be parallel, but not that the foreach we already have has to be. If that's the main motivation then we can re-root the parallel collections around a parallel foreach. I already pushed foreach backward for TraversableOnce, I'd be glad to push it sideways for this one.
>
> There's another interesting tie-in here, which I already had on my agenda for a while, ever since looking too closely at slice. There is a real issue with having the same trait provide default implementations for both mutable and immutable collections. The default implementation is either outright wrong or grossly inefficient half the time. So the side which was not chosen as the favorite side has to be sure every method implementation is overridden (and many bugs have been the result of not achieving this.) But what is the point of having default implementations if they are menaces which must be guarded against half the time?
>
> Example. What is the default implementation of tail? There are pretty much two choices.
>
> // if this is a mutable seq, you're now sharing mutable cells
> def tail = pointer to the second element of this collection
> // if this is an immutable seq, you're now copying a billion
> // elements for no reason
> def tail = copy this entire collection except the first
>
> For almost every method in TraversableLike this same issue arises. (The methods where one can safely use the same implementation are generally folds and other chew-up-the-sequence methods, and those are in TraversableOnce.) The conclusion I drew is that the mutable and immutable collections should not share any code past TraversableOnce. They should (according to this theory) share a pure interface in TraversableLike and each have their own default implementations.
>
> So back to the parallel foreach problem: same thing.
>
> trait MutableTraversableLike extends TraversableMethods
> trait ImmutableTraversableLike extends TraversableMethods
> trait ParallelTraversableLike extends TraversableMethods
>
> (I won't speculate on whether it should really be ParallelImmutableTraversableLike and so on or how these might compose unless I see a glimmer of hope we could go this way.)
>

Sun, 2011-04-03, 10:07

#58

fanf

Joined: 2009-03-17,

Re: REPL: null-values in parallel ranges?

On 03/04/2011 04:49, Paul Phillips wrote:
>
> For the record I don't mind worrying about it everywhere from now on.
> I'm all for it. I just don't want to spend the rest of my life fixing
> bugs in pre-existing code.

I'm all with that. Without the certainty that using parallel collection
won't break existing code, I just won't take the chance. And if other
take it with my code, I see an interesting times coming.

Moreover I think that (at least for me), preservation of order of things
in Seq is not something I test really often... When I want unordered
collection, I try to make it explicit with Sets.

Most of the time, it's not a matter, but you never know where side
effects will come from...

[...]
> That tells me the foreach at the base of the parallel collections has to
> be parallel, but not that the foreach we already have has to be. If
> that's the main motivation then we can re-root the parallel collections
> around a parallel foreach. I already pushed foreach backward for
> TraversableOnce, I'd be glad to push it sideways for this one.
>
> There's another interesting tie-in here, which I already had on my
> agenda for a while, ever since looking too closely at slice. There is a
> real issue with having the same trait provide default implementations
> for both mutable and immutable collections.
[...]

I'm all with you here (for what it matters...).
With slices, you showed that even in a piece of code like the Scala
standard library, developed, looked and tested by lots of competent
people, that strategy leads to big errors - it's just too hard to keep
in mind all the implicit requirements and contracts of a complex code.

Cheers,

Sun, 2011-04-03, 16:27

#59

Bill Venners

Joined: 2008-12-18,

Re: REPL: null-values in parallel ranges?

Hi All,

On Sat, Apr 2, 2011 at 11:55 PM, HamsterofDeath wrote:
> that is convincing that something has to be fixed. the question is what.
> making foreach sequencial pretty much makes any parallel collection
> sequencial again because many methods are based on foreach.
>
I think that's OK, because those implementations can be changed so
they don't rely on foreach, either prior to 2.9.0.final or after. The
penalty before that is they wouldn't be as fast as the parallel
foreach versions, but they'd be as fast as they are now in 2.8. I can
live with that.

I slept on this and fleshed this proposal out a bit more in my mind.
Traversable is at the top of the hierarchy, and its abstract method is
foreach. So there's nothing to define the order other than foreach
itself. So there I think it's contract would need to stay the same:

Applies a function f to all elements of this collection.

Or, state that it will run sequentially not in parallel, but without
specifying any order:

Applies a function f to all elements of this collection sequentially
in an undefined order (but not in parallel).

Iterable however adds the iterator method, and that defines an order.
The contract of foreach in both Iterator could be changed from:

Applies a function f to all values produced by this iterator.

to:

Applies a function f to all values produced by this iterator
sequentially in the order returned by repeated invocations of the next
method.

And the contract of Iterable's foreach could be changed from:

Applies a function f to all elements of this iterable collection.

to:

Applies a function f to all elements of this iterable collection
sequentially in the order returned by repeated invocations of the next
method on the iterator returned by the iterator method.

Add a foreachParallel method to Iterator that has the contract:

Applies a function f to all elements of this collection in parallel.

And add a foreachParallel method to Iterable that has the contract:

Applies a function f to all elements of this iterable collection in parallel.

The rules for for expression translation could be left alone, at least
for now. What that would mean is that anytime you don't have a yield,
which would result in a foreach called, you'd get sequential behavior,
at least for the foreach part. That again could mean some things might
run more slowly than if they were parallelized, but I'm OK with that.
They'd run as fast as they are running now. And if someone wants to
really parallelize a side effecting loop, they could just invoke
foreachParallel passing in a function rather than using a for
expression.

I don't have a problem with map, flatMap, filter, withFilter, and any
other "data transformation" style higher order functions on
collections doing things sequentially or in parallel depending on the
type of the object. Most likely some people have written code that
does a map, for example, and passes a transformation function to it
that has a side effect. That code will break and I think we should
allow that to happen. It should be very little code I'd guess, because
people usually just pass data transformers to methods like map. And it
is worth it.

But foreach is not about data transformation. It's about side effects.
Because it doesn't have a result, if it doesn't have a side effect it
has no impact on the program other than heating up the CPU. And to do
that kind of loop in parallel means you have to write thread safe
code. No one has been doing that, and it would be very hard to do in
the future. And it would be non-intuitive because this looks like
sequential-in-order for loops in any other language anyone would be
coming from:

for (arg <- args)
println(arg)

I think this really ought to keep working like it always has, printing
out args in iterator order, no matter the type of args. Yesterday I
was asking how I would get sequential behavior out of a for
expression. Well the above bit of code should be how. That's how
everyone expects to get an in-order execution.

Narrowing the contracts of foreach in Traversable, Iterator, and
Iterable could break some code, but I doubt much. What it in effect
does is change it to the contract that's been in people's heads, which
they've been programming to. I think much more code could break, and
sometimes in hard -to-catch, subtle ways, if a collection you are
passed does foreach in parallel.

I don't know the details of how parallel collections are implemented,
but I'm hopeful that the places where data transformation methods are
implemented in terms of foreach, an invocation of foreachParallel
could easily be substituted. Then you'd get your same parallel
behavior back for those method that we have now in RC1.

What do folks think? Would that address the problem sufficiently?

Bill

> creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
> would make it impossible to use parallel collections where the api
> designer wanted to use single threaded ones but on the other hand
> sacrifice compatibility with old code. the old collections would all be
> single threaded ones, and in old code you could not use parallel ones,
> even if you knew it would work.
>
> the solution could be something like:
> some collection methods could be annotated with @OnlySeq (like "zipped"
> in the broken example), which would add a compile time conversion and a
> runtime assertion. in the case of zipped, it's clear you want to keep
> the order of both ranges so you annotate the method with @OnlySeq. the
> compiler knows using parallel ranges makes no sense here -> compiler
> warning and/or compiler inserted conversion to seq if you do it anyway.
> if the collection is not explicitly declared as parallel, the compiler
> cannot warn -> runtime type assertion
>
> still , you'd have to be careful, but everyone knows that.
>
>
>
>
> Am 03.04.2011 02:12, schrieb Paul Phillips:
>> I looked around trunk a little bit for stuff which is broken because
>> of this. Here are the first couple examples I found.
>>
>> // more "dynamic" hashcodes
>> scala> val x = (1 to 10).par
>> x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
>> 5, 6, 7, 8, 9, 10)
>>
>> scala> x.hashCode
>> res0: Int = 1137090302
>>
>> scala> x.hashCode
>> res1: Int = -555212555
>>
>> scala> x.hashCode
>> res2: Int = -1216588130
>>
>>
>> // grab your partner, do-sa-do
>> scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + "
>> " + y))
>> 3 2
>> 6 3
>> 2 4
>> 7 6
>> 1 1
>> 8 7
>> 4 5
>> 5 9
>> 9 8
>> 10 10
>>
>>
>
>

Sun, 2011-04-03, 16:37

#60

Bill Venners

Joined: 2008-12-18,

Re: REPL: null-values in parallel ranges?

Hi All,

Forgot to mention one thing. The scala.collection.Parallel trait would
declare the foreachParallel method, so that code can call foreach or
foreachParallel, depending on whether the collection is parallel one:

coll match {
case pColl: Parallel => pColl.foreachParallel(threadSafeF)
case _ => coll.foreach(threadSafeF)
}

So in side effecting code, it is explicit whether it is sequential or
parallel, but in data transformation code, you'd just say:

col.map(f)

And frankly, most of the time I think people would say:

col.foreach(f)

Because it is hard to write a thread safe f.

Bill

On Sun, Apr 3, 2011 at 8:20 AM, Bill Venners wrote:
> Hi All,
>
> On Sat, Apr 2, 2011 at 11:55 PM, HamsterofDeath wrote:
>> that is convincing that something has to be fixed. the question is what.
>> making foreach sequencial pretty much makes any parallel collection
>> sequencial again because many methods are based on foreach.
>>
> I think that's OK, because those implementations can be changed so
> they don't rely on foreach, either prior to 2.9.0.final or after. The
> penalty before that is they wouldn't be as fast as the parallel
> foreach versions, but they'd be as fast as they are now in 2.8. I can
> live with that.
>
> I slept on this and fleshed this proposal out a bit more in my mind.
> Traversable is at the top of the hierarchy, and its abstract method is
> foreach. So there's nothing to define the order other than foreach
> itself. So there I think it's contract would need to stay the same:
>
> Applies a function f to all elements of this collection.
>
> Or, state that it will run sequentially not in parallel, but without
> specifying any order:
>
> Applies a function f to all elements of this collection sequentially
> in an undefined order (but not in parallel).
>
> Iterable however adds the iterator method, and that defines an order.
> The contract of foreach in both Iterator could be changed from:
>
> Applies a function f to all values produced by this iterator.
>
> to:
>
> Applies a function f to all values produced by this iterator
> sequentially in the order returned by repeated invocations of the next
> method.
>
> And the contract of Iterable's foreach could be changed from:
>
> Applies a function f to all elements of this iterable collection.
>
> to:
>
> Applies a function f to all elements of this iterable collection
> sequentially in the order returned by repeated invocations of the next
> method on the iterator returned by the iterator method.
>
> Add a foreachParallel method to Iterator that has the contract:
>
> Applies a function f to all elements of this collection in parallel.
>
> And add a foreachParallel method to Iterable that has the contract:
>
> Applies a function f to all elements of this iterable collection in parallel.
>
> The rules for for expression translation could be left alone, at least
> for now. What that would mean is that anytime you don't have a yield,
> which would result in a foreach called, you'd get sequential behavior,
> at least for the foreach part. That again could mean some things might
> run more slowly than if they were parallelized, but I'm OK with that.
> They'd run as fast as they are running now. And if someone wants to
> really parallelize a side effecting loop, they could just invoke
> foreachParallel passing in a function rather than using a for
> expression.
>
> I don't have a problem with map, flatMap, filter, withFilter, and any
> other "data transformation" style higher order functions on
> collections doing things sequentially or in parallel depending on the
> type of the object. Most likely some people have written code that
> does a map, for example, and passes a transformation function to it
> that has a side effect. That code will break and I think we should
> allow that to happen. It should be very little code I'd guess, because
> people usually just pass data transformers to methods like map. And it
> is worth it.
>
> But foreach is not about data transformation. It's about side effects.
> Because it doesn't have a result, if it doesn't have a side effect it
> has no impact on the program other than heating up the CPU. And to do
> that kind of loop in parallel means you have to write thread safe
> code. No one has been doing that, and it would be very hard to do in
> the future. And it would be non-intuitive because this looks like
> sequential-in-order for loops in any other language anyone would be
> coming from:
>
> for (arg <- args)
> println(arg)
>
> I think this really ought to keep working like it always has, printing
> out args in iterator order, no matter the type of args. Yesterday I
> was asking how I would get sequential behavior out of a for
> expression. Well the above bit of code should be how. That's how
> everyone expects to get an in-order execution.
>
> Narrowing the contracts of foreach in Traversable, Iterator, and
> Iterable could break some code, but I doubt much. What it in effect
> does is change it to the contract that's been in people's heads, which
> they've been programming to. I think much more code could break, and
> sometimes in hard -to-catch, subtle ways, if a collection you are
> passed does foreach in parallel.
>
> I don't know the details of how parallel collections are implemented,
> but I'm hopeful that the places where data transformation methods are
> implemented in terms of foreach, an invocation of foreachParallel
> could easily be substituted. Then you'd get your same parallel
> behavior back for those method that we have now in RC1.
>
> What do folks think? Would that address the problem sufficiently?
>
> Bill
>
>> creating a branched (singlethreadtravlike, parthreadtravlike) hierachy
>> would make it impossible to use parallel collections where the api
>> designer wanted to use single threaded ones but on the other hand
>> sacrifice compatibility with old code. the old collections would all be
>> single threaded ones, and in old code you could not use parallel ones,
>> even if you knew it would work.
>>
>> the solution could be something like:
>> some collection methods could be annotated with @OnlySeq (like "zipped"
>> in the broken example), which would add a compile time conversion and a
>> runtime assertion. in the case of zipped, it's clear you want to keep
>> the order of both ranges so you annotate the method with @OnlySeq. the
>> compiler knows using parallel ranges makes no sense here -> compiler
>> warning and/or compiler inserted conversion to seq if you do it anyway.
>> if the collection is not explicitly declared as parallel, the compiler
>> cannot warn -> runtime type assertion
>>
>> still , you'd have to be careful, but everyone knows that.
>>
>>
>>
>>
>> Am 03.04.2011 02:12, schrieb Paul Phillips:
>>> I looked around trunk a little bit for stuff which is broken because
>>> of this. Here are the first couple examples I found.
>>>
>>> // more "dynamic" hashcodes
>>> scala> val x = (1 to 10).par
>>> x: scala.collection.parallel.immutable.ParRange = ParRange(1, 2, 3, 4,
>>> 5, 6, 7, 8, 9, 10)
>>>
>>> scala> x.hashCode
>>> res0: Int = 1137090302
>>>
>>> scala> x.hashCode
>>> res1: Int = -555212555
>>>
>>> scala> x.hashCode
>>> res2: Int = -1216588130
>>>
>>>
>>> // grab your partner, do-sa-do
>>> scala> (1 to 10 par, 1 to 10).zipped foreach ((x, y) => println(x + "
>>> " + y))
>>> 3 2
>>> 6 3
>>> 2 4
>>> 7 6
>>> 1 1
>>> 8 7
>>> 4 5
>>> 5 9
>>> 9 8
>>> 10 10
>>>
>>>
>>
>>
>
>
>
> --
> Bill Venners
> Artima, Inc.
> http://www.artima.com
>

Mon, 2011-04-04, 08:37

#61

Chris Marshall

Joined: 2009-06-17,

RE: REPL: null-values in parallel ranges?

Scalaz's foreach is called |>|
This could be intuitively extended to a parallel version ("parallelForeach", "foreachParallel" == UGH!) of ||>||
Is it about time we started adding symbolically-named methods into the core library? Often their intent is just as clear as the English equivalent
Chris

> Date: Sun, 3 Apr 2011 17:00:23 +1000
> From: tonymorris@gmail.com
> To: scala-user@googlegroups.com
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
>
> On 03/04/11 16:55, HamsterofDeath wrote:
> > something has to be fixed. the question is what.
> > ...
> > many methods are based on foreach.
>
> I think I know where the problem is.
>
>
> Tony Morris
> http://tmorris.net/
>
>

Mon, 2011-04-04, 08:47

#62

Jason Zaugg

Joined: 2009-05-18,

Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 9:34 AM, Chris Marshall wrote:
> Scalaz's foreach is called |>|
> This could be intuitively extended to a parallel version ("parallelForeach",
> "foreachParallel" == UGH!) of ||>||
> Is it about time we started adding symbolically-named methods into the core
> library? Often their intent is just as clear as the English equivalent

At this stage, the only interesting question is whether there are one
or two names that encompass the sequential and parallel natures of
foreach. Let's not distract this (important!) discussion with a side
debate over what the names might be and how long they are.

-jason

Mon, 2011-04-04, 08:57

#63

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

On 02/04/11 23:02, Jonas Bonér wrote:
>
> I don't think parallel collections should "lie" and provide a leaky abstraction over concurrency. When someone asks for parallel execution I think it is best that they get what they are asking for. Sure it might break sequential code, but threads does that; they turn determinism into non-determinism. Let's embrace parallelism at its core and reap the benefits of it rather than trying to force it into something that it is not. It is a matter of education; documentation needs to be a lot better. People needs to understand that concurrency and parallelism is here to stay and a lot of the time it still ain't that pretty.
>

I agree that people should get used to the fact that parallelism is
out there. The only downside here is that if one really wants to be
sure the code always executes correctly, a user has to write:

for (x <- xs.seq) ...

all the time.

The problem is that someone might not know he's asking for parallel
execution.

> ...
> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>
> But new code that really wants parallel execution could just do .par
> on that Seq[T], which would somehow kick it into parallel execution.
>
> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>
> By saying the method takes a Seq[String] with Parallel, it is
> announcing that it might do foreaches in parallel.
>
> I think it would be fine to let flatMap, filter, withFilter, map be
> rewritten in the same way, because most times people wouldn't have
> side effects in those. But foreach is all about the side effects.
>
> Bill

I agree something like a compiler rewrite might possibly be a good
idea.
However, adding an additional `pareach` method means for-
comprehensions no longer work.
A couple of alternatives I see are:
- have `foreach` being sequential, and add a method `inPar` to
parallel collections, which returns a thin wrapper around the
collection such that its `foreach` is executed in parallel (forwards
to `pareach`). The compiler could then rewrite the for-comprehension
like this if it detects that the generator isInstanceOf[Parallel] at
compile time:

for (x <- xs) { ... }

<>

--> for (x <- xs.inPar) { ... }

(now it calls `foreach` of the thin wrapper which forwards to
`pareach`)

This does not solve the problem with map, flatMap, filter, and
friends, but it is a start - something like this could work for them.

- have a compiler rewrite rule which adds a `.seq` call to the
generator if it knows it is a collection, but not an instance of
`Parallel` at compile time:

for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
3).seq)

but:

for (x <- Iterable(1, 2, 3).par) { ... } stays what it is

- a non-rewriting approach - have a parallel block which envelops the
for-comprehension, ensuring that everything inside executes in
parallel:

parallel {
for ( ... ) {
// ...
}
}

This requires the user to be more explicit. The downside is that
nested methods called in the for-body might be affected.

Mon, 2011-04-04, 09:07

#64

Chris Marshall

Joined: 2009-06-17,

RE: REPL: null-values in parallel ranges?

But it seems feasible (at least to me) that opinions on that matter are swayed by the possible names that such a secondary method may take!

> Date: Mon, 4 Apr 2011 09:44:50 +0200
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: jzaugg@gmail.com
> To: oxbow_lakes@hotmail.com
> CC: tmorris@tmorris.net; scala-user@googlegroups.com
>
> On Mon, Apr 4, 2011 at 9:34 AM, Chris Marshall <oxbow_lakes@hotmail.com> wrote:
> > Scalaz's foreach is called |>|
> > This could be intuitively extended to a parallel version ("parallelForeach",
> > "foreachParallel" == UGH!) of ||>||
> > Is it about time we started adding symbolically-named methods into the core
> > library? Often their intent is just as clear as the English equivalent
>
> At this stage, the only interesting question is whether there are one
> or two names that encompass the sequential and parallel natures of
> foreach. Let's not distract this (important!) discussion with a side
> debate over what the names might be and how long they are.
>
> -jason

Mon, 2011-04-04, 09:17

#65

Bill Venners

Joined: 2008-12-18,

Re: REPL: null-values in parallel ranges?

Hi Chris,

On Mon, Apr 4, 2011 at 12:34 AM, Chris Marshall wrote:
> Scalaz's foreach is called |>|
> This could be intuitively extended to a parallel version ("parallelForeach",
> "foreachParallel" == UGH!) of ||>||
>
"||>||" == &%*#$@%!

There's the comfy with operators sub-culture and the not-as-comfy with
operators subculture. Anyway, as someone else said, I think the
important question is whether there should be another foreach method
at all. Only if the answer is yes does it matter what its name is.

Bill

> Is it about time we started adding symbolically-named methods into the core
> library? Often their intent is just as clear as the English equivalent
> Chris
>
>> Date: Sun, 3 Apr 2011 17:00:23 +1000
>> From: tonymorris@gmail.com
>> To: scala-user@googlegroups.com
>> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
>>
>> On 03/04/11 16:55, HamsterofDeath wrote:
>> > something has to be fixed. the question is what.
>> > ...
>> > many methods are based on foreach.
>>
>> I think I know where the problem is.
>>
>>
>> Tony Morris
>> http://tmorris.net/
>>
>>
>

Mon, 2011-04-04, 09:27

#66

Bill Venners

Joined: 2008-12-18,

Re: Re: REPL: null-values in parallel ranges?

Hi Aleksandar,

On Mon, Apr 4, 2011 at 12:51 AM, Aleksandar Prokopec
wrote:
> On 02/04/11 23:02, Jonas Bonér wrote:
>>
>> I don't think parallel collections should "lie" and provide a leaky abstraction over concurrency. When someone asks for parallel execution I think it is best that they get what they are asking for. Sure it might break sequential code, but threads does that; they turn determinism into non-determinism. Let's embrace parallelism at its core and reap the benefits of it rather than trying to force it into something that it is not. It is a matter of education; documentation needs to be a lot better. People needs to understand that concurrency and parallelism is here to stay and a lot of the time it still ain't that pretty.
>>
>
> I agree that people should get used to the fact that parallelism is
> out there. The only downside here is that if one really wants to be
> sure the code always executes correctly, a user has to write:
>
> for (x <- xs.seq) ...
>
> all the time.
>
> The problem is that someone might not know he's asking for parallel
> execution.
>
>> ...
>> def oldMethod(seq: Seq[String]) { for (e <- seq) println(e) }
>>
>> But new code that really wants parallel execution could just do .par
>> on that Seq[T], which would somehow kick it into parallel execution.
>>
>> def newMethod(pSeq: Seq[String] with Parallel) { for (e <- pSeq) println(e) }
>>
>> By saying the method takes a Seq[String] with Parallel, it is
>> announcing that it might do foreaches in parallel.
>>
>> I think it would be fine to let flatMap, filter, withFilter, map be
>> rewritten in the same way, because most times people wouldn't have
>> side effects in those. But foreach is all about the side effects.
>>
>> Bill
>
> I agree something like a compiler rewrite might possibly be a good
> idea.
> However, adding an additional `pareach` method means for-
> comprehensions no longer work.
>
Actually after sleeping on it last night, I emailed a different note
this morning in which I changed my tune. I suggested the rules for
translating for expressions to invocations of foreach, withFilter,
map, and flatMap should stay the same as they are now. For expressions
are about more than just collections.

Now to your suggestion that adding a separate pareach method would
make for expressions not work, I'd say they still work like they
always did, but just that foreach is not something that will be done
in parallel in parallel collections, whereas map, flatMap, and
withFilter (and a bunch of other higher order methods on collections)
will be done in parallel on parallel collections. That would mean you
could never get a parallel foreach out of a for expression, and I
don't think that's so bad. A parallel foreach requires a thread-safe
side-effecting function, which is hard to write. So making the method
name long and (to some) ugly, such as parallelForeach or
foreachParallel, discourages its use for frivolous endeavors, and so
does not providing a way to get a parallel foreach out of a for
expression.

Bill

> A couple of alternatives I see are:
> - have `foreach` being sequential, and add a method `inPar` to
> parallel collections, which returns a thin wrapper around the
> collection such that its `foreach` is executed in parallel (forwards
> to `pareach`). The compiler could then rewrite the for-comprehension
> like this if it detects that the generator isInstanceOf[Parallel] at
> compile time:
>
> for (x <- xs) { ... }
>
> <>
>
> --> for (x <- xs.inPar) { ... }
>
> (now it calls `foreach` of the thin wrapper which forwards to
> `pareach`)
>
> This does not solve the problem with map, flatMap, filter, and
> friends, but it is a start - something like this could work for them.
>
> - have a compiler rewrite rule which adds a `.seq` call to the
> generator if it knows it is a collection, but not an instance of
> `Parallel` at compile time:
>
> for (x <- Iterable(1, 2, 3)) { ... } becomes for (x <- Iterable(1, 2,
> 3).seq)
>
> but:
>
> for (x <- Iterable(1, 2, 3).par) { ... } stays what it is
>
> - a non-rewriting approach - have a parallel block which envelops the
> for-comprehension, ensuring that everything inside executes in
> parallel:
>
> parallel {
> for ( ... ) {
> // ...
> }
> }
>
> This requires the user to be more explicit. The downside is that
> nested methods called in the for-body might be affected.
>
>

Mon, 2011-04-04, 09:27

#67

Chris Marshall

Joined: 2009-06-17,

RE: REPL: null-values in parallel ranges?

Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill@artima.com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill

Mon, 2011-04-04, 09:37

#68

Tony Morris 2

Joined: 2009-03-20,

Re: REPL: null-values in parallel ranges?

On 04/04/11 18:19, Chris Marshall wrote:
> indication that there is something wrong with the fundamental design
Just smile Chris :)

Mon, 2011-04-04, 09:47

#69

Aleksandar Prokopec

Joined: 2010-04-04,

Re: Re: REPL: null-values in parallel ranges?

Hi, Bill,

> Now to your suggestion that adding a separate pareach method would
> make for expressions not work, I'd say they still work like they
> always did, but just that foreach is not something that will be done
> in parallel in parallel collections, whereas map, flatMap, and
> withFilter (and a bunch of other higher order methods on collections)
> will be done in parallel on parallel collections.

Of course, that's what I've meant - one can still call `pareach`, but
cannot use it in a nice way as before.

> That would mean you
> could never get a parallel foreach out of a for expression, and I
> don't think that's so bad. A parallel foreach requires a thread-safe
> side-effecting function, which is hard to write. So making the method
> name long and (to some) ugly, such as parallelForeach or
> foreachParallel, discourages its use for frivolous endeavors, and so
> does not providing a way to get a parallel foreach out of a for
> expression.
>
> Bill
>

I see merit in this - and I agree. Foreach is about side-effecting, and
most of the time it's sideeffecting in a non-thread-safe way. Just for
the record, the method you mention was already part of the framework,
but was removed some revisions back (used to be called `pforeach`, but I
like `pareach` much more).

But, even changing this, does not mean that the legacy code will be
safe. The `pareach` alternative is still a rule of the thumb - `map`,
`flatMap` and others still only usually do not contain sideeffecting
expressions. In cases where they do, things may still fall apart.

Alex

Mon, 2011-04-04, 09:47

#70

odersky

Joined: 2008-07-29,

Re: Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 10:41 AM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:

> It isn't taking me long trying .par in the repl to see all kinds of
> crazy stuff. The main thing is that foreach seems to return things in a
> random order and iterator doesn't. That doesn't sound like something
> intentional, but I try not to assume too much. But it is likely at the
> heart of what you are seeing too.
>

We used to have `pforeach` being implemented in parallel, and
`foreach` sequential, but this was changed recently.
I'm still not sure if that was the best thing to do, but, yes, it was
intentional.

In fact, it was on my insistence that we changed this. So I take the blame of the storm that unfolded. That's good, however. It's how we learn things and make progress.

Cheers

-- Martin

Mon, 2011-04-04, 09:57

#71

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

> It isn't taking me long trying .par in the repl to see all kinds of
> crazy stuff. The main thing is that foreach seems to return things in a
> random order and iterator doesn't. That doesn't sound like something
> intentional, but I try not to assume too much. But it is likely at the
> heart of what you are seeing too.
>

We used to have `pforeach` being implemented in parallel, and
`foreach` sequential, but this was changed recently.
I'm still not sure if that was the best thing to do, but, yes, it was
intentional.

> scala> def fx = (1 to 5).par
> fx: scala.collection.parallel.immutable.ParRange
>
> scala> fx foreach println
> 1
> 3
> 4
> 2
> 5
>
> scala> fx foreach println
> 2
> 3
> 1
> 5
> 4
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5

Mon, 2011-04-04, 10:07

#72

odersky

Joined: 2008-07-29,

Re: Re: REPL: null-values in parallel ranges?

I see two possible ways out of this:

1) Fiddle with foreach, so that it and for-loops are sequential for parallel collections. You still need a parallel foreach, for instance in situations like this:

(1 until N) pforeach { i => a(i) = a(i) * scale }

2) Change the inheritance hierarchy. The problem is that our current design weakens the contract for Traversable, Iterable, Seq and friends. Their operations might now be executed in parallel. It's most glaring in foreach, but there are other operations where this can make a difference. What's more, there's no way to specify sequential execution in the type system except going down to actual implementation classes. This is a bit having it backwards. First, sequential Seq is conceptionally a subclass of potentially parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful to state that something is sequential than that something is potentially parallel. But right now, we have explicit parallel collections, but no explicit sequential collections.

I see two possible redesigns:

a) Reverse the inheritance order, so that

   Traversable <: ParTraversable
   Iterable <: ParIterable
   Seq <: ParSeq
   ...

This means that all existing code will be safe from surprises, and no existing code can make use of parallel execution. Note that this is conceptually sound, as Par... means "might be run in parallel", not "will be run in parallel". A problem with this approach is that, without further refactorings, sequential collections inherit some unnecessary plumbing operations from parallel collections.

b) Have both implementations side by side. E.g. have a base class collection.Seq with subclasses

collection.immutable.Seq
collection.mutable.Seq
collection.parimmutable.Seq
collection.parmutable.Seq

There is no inheritance relationship between the 4 different kinds of Seq. In this design, code that requires a collection.Seq can be passed a parallel Seq, but code the requires a mutable.Seq or immutable.Seq cannot.

Something to mull over...

Cheers

-- Martin

Mon, 2011-04-04, 10:17

#73

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges - Seq.foreach acting un

I would just like to point out that this is a bit more general - it's
not only about order, it's also about thread-safety. The problem is
not inherent in parallel sequences, but in any kind of a parallel
collection - processing an element may occur concurrently with
processing some other element. An out of order execution is a
consequence of this. This is why the ParRange printing used to show
nulls.

Alex

On Apr 2, 4:50 pm, HamsterofDeath wrote:
> a parallel sequence can replace a sequencial sequence (feels weird
> writing that) for every method except foreach. maybe there should be an
> ordered-foreach-able trait and a foreachable-random-order-trait? then it
> could be added to any collection that can do it.
>

Mon, 2011-04-04, 10:17

#74

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

> Overkill -- only Seq is at issue here.
>

I'm afraid I do not agree here - the problem is with any kind of a
parallel collection the foreach is called for.

Whereas the some of the regular collections may say "order is not
deterministic", they still all implicitly say "while order may not be
deterministic, you have the guarantee that I will not start processing
the next element before I'm done with the previous". Parallel
collection break this rule. Even if you do not require that the order
is specified, having a mutable.HashSet and inserting elements into it
concurrently will break things.

Alex

Mon, 2011-04-04, 10:27

#75

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

I do not believe this should be done. Almost every method takes a
closure, and every closure could be mutable. If you want complete
safety, and not rule-of-the-tumb safety, one would then need to
replace all the methods.

Alex

>
> Parallel collections should add methods when it makes a difference.
>
> There aren't _that_ many that even make sense to parallelize and where
> out-of-order simultaneous execution matters. For seq:
> - collect
> - corresponds
> - count
> - exists
> - filter / filterNot / withFilter
> - find
> - flatMap
> - forall
> - foreach
> - groupBy
> - map / reverseMap
> - partition
> - reduce
> - segmentLength
> - sortBy / sortWith
> - transpose
>
> The relevant methods are easy to identify--if they take a function and they
> do not intrinsically start at one end of the collection and go from there
> (e.g. firstIndexOf), then they need their own method since it is very easy
> to pass in a closure that has mutable state and thereby mess everything up
> in parallel. Not only do you have all the "well, this didn't happen in
> order" problems, you also have the "ack, this underlying structure isn't
> thread safe" problem (which can be solved by introducing the "ack,
> everything is synchronized and it runs so slow" problem).
>
> I'm sorry that I haven't been paying much attention until we're already into
> release candidates, but I really don't see how one can avoid making separate
> parallel methods for these things without mysteriously and unexpectedly
> breaking all sorts of code that's out there.
>
> I strongly advise that these things be named as new methods, or that they
> all go into a trait that is NOT shared by the parallel collections hierarchy
> so that people can realize when they use them that they have to re-think
> what the code might be doing.
>
> --Rex

Mon, 2011-04-04, 10:27

#76

jibal

Joined: 2010-12-01,

Re: REPL: null-values in parallel ranges?

Frankly, I think that's absurd. The second name doesn't "have to" come from anywhere, and the name choice says nothing about the fundamental design.

The reality is that the contract for what is called "foreach", as a matter of expectation and of implementation (prior to parallel collections), even though not explicitly documented, is "forEachInSequence". Had such a name been chosen instead of "foreach", that would not have made the design any more or less wrong, "fundamentally" or otherwise. Ditto for giving what is by contract "forEachWithoutRegardToSequence" an "unwieldy" name or a short one.

If you think there's something wrong with the fundamental design then you should say what, rather than resting on totally bogus inferences.

-- Jim

On Mon, Apr 4, 2011 at 1:19 AM, Chris Marshall <oxbow_lakes@hotmail.com> wrote:

Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill@artima.com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill

Mon, 2011-04-04, 10:27

#77

Tommaso Galleri

Joined: 2011-02-04,

RE: Re: REPL: null-values in parallel ranges?

v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);}

If there was a need to change things, out of the three options, I think (b) would probably be the most appropriate.

From: scala-user@googlegroups.com [mailto:scala-user@googlegroups.com] On Behalf Of martin odersky
Sent: 04 April 2011 09:50
To: Aleksandar Prokopec
Cc: Bill Venners; scala-user
Subject: Re: [scala-user] Re: REPL: null-values in parallel ranges?

I see two possible ways out of this:

1) Fiddle with foreach, so that it and for-loops are sequential for parallel collections. You still need a parallel foreach, for instance in situations like this:

(1 until N) pforeach { i => a(i) = a(i) * scale }

2) Change the inheritance hierarchy. The problem is that our current design weakens the contract for Traversable, Iterable, Seq and friends. Their operations might now be executed in parallel. It's most glaring in foreach, but there are other operations where this can make a difference. What's more, there's no way to specify sequential execution in the type system except going down to actual implementation classes. This is a bit having it backwards. First, sequential Seq is conceptionally a subclass of potentially parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful to state that something is sequential than that something is potentially parallel. But right now, we have explicit parallel collections, but no explicit sequential collections.

I see two possible redesigns:

a) Reverse the inheritance order, so that

   Traversable <: ParTraversable
   Iterable <: ParIterable
   Seq <: ParSeq
   ...

This means that all existing code will be safe from surprises, and no existing code can make use of parallel execution. Note that this is conceptually sound, as Par... means "might be run in parallel", not "will be run in parallel". A problem with this approach is that, without further refactorings, sequential collections inherit some unnecessary plumbing operations from parallel collections.

b) Have both implementations side by side. E.g. have a base class collection.Seq with subclasses

collection.immutable.Seq
collection.mutable.Seq
collection.parimmutable.Seq
collection.parmutable.Seq

There is no inheritance relationship between the 4 different kinds of Seq. In this design, code that requires a collection.Seq can be passed a parallel Seq, but code the requires a mutable.Seq or immutable.Seq cannot.

Something to mull over...

Cheers

-- Martin

The information included in this email and any files transmitted with it may contain information that is confidential and it must not be used by, or its contents or attachments copied or disclosed, to persons other than the intended addressee. If you have received this email in error, please notify BJSS. In the absence of written agreement to the contrary BJSS' relevant standard terms of contract for any work to be undertaken will apply. Please carry out virus or such other checks as you consider appropriate in respect of this email. BJSS do not accept responsibility for any adverse effect upon your system or data in relation to this email or any files transmitted with it. BJSS Limited, a company registered in England and Wales (Company Number 2777575), VAT Registration Number 613295452, Registered Office Address, First Floor, Coronet House, Queen Street, Leeds, LS1 2TW

Mon, 2011-04-04, 10:37

#78

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

Right, I should have been more explicit - it was.
Parallel collection iterators themselves never had their operation
implemented in parallel, but could be split into subsets efficiently
(method `split`), enabling the elements to be processed in parallel.

Alex

On Apr 4, 11:23 am, Jim Balter wrote:
> I think the question was whether the difference between foreach and iterator
> is intentional.
>

Mon, 2011-04-04, 10:37

#79

wilfredspringer

Joined: 2010-04-21,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

It's interesting that you're saying that foreach should be sequential in all cases. The foreach method is defined on Traversable. That class does not make *any* assumptions on ordering of the collection whatsoever. In fact, this is what it says:
A traversable class might or might not have two properties: strictness and orderedness. Neither is represented as a type.
If Traversable does not make any guarantees on strictness and orderedness, then consequently its methods cannot assume anything about that either. That includes the foreach method. Right?

2011/4/2 HamsterofDeath <h-star@gmx.de>

a parallel sequence can replace a sequencial sequence (feels weird
writing that) for every method except foreach. maybe there should be an
ordered-foreach-able trait and a foreachable-random-order-trait? then it
could be added to any collection that can do it.

Am 02.04.2011 16:28, schrieb Roland Kuhn:
> … from which follows that parallel collections cannot be subtypes of Seq.
>
> It has been said several times and is obviously true, so why not acknowledge the fact?
>
> (pesky old me)
>
> On Apr 2, 2011, at 16:00, HamsterofDeath <h-star@gmx.de> wrote:
>
>> if the order matters, you cannot use multiple threads. every
>> multithreading coding knows that. or am i wrong?
>>
>> Am 02.04.2011 15:34, schrieb Bernd Johannes:
>>> Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
>>>> if you need a sequencial foreach, make your parallel collection a
>>>> sequential one. according to the documentation, this conversion is
>>>> basically free.
>>>> for side effect free code, the order doesn't matter at all. i don't see
>>>> a big problem here.
>>> It's true - it's just about operations where the order of sequencial traversal
>>> matters (either globally or locally). That's why I don't see big issues with
>>> map() and the like.
>>>
>>> Maybe I am naiv - or the average coding quality has improved a lot beyond that
>>> what I used to see during my testing years (that's been a while in the past
>>> from now).
>>> With the obvious and simple example below I want to illustrate where I fear
>>> that parallelized foreach will wreak havoc. It's not obvious during
>>> compilation - maybe it goes unnoticed on small test systems... but it will
>>> fail in production environment at some point in time.
>>>
>>> And that's what I am concerned about. It's not obvious, not fast failing and
>>> may be introduced almost "silently" many many lines of code away...
>>>
>>> Just my 5 cents...
>>> Greetings
>>> Bernd
>>>
>>> scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
>>> | // normally does things like derive filename, cleanup old versions
>>> | // opens streams, error handling and the like and then writes
>>> | // the content
>>> | res.foreach((s) => println(s))
>>> | }
>>> writeResult: (res: scala.collection.immutable.Seq[String])Unit
>>>
>>> // just a make up result (and almost always in result files order matters)
>>> scala> val a = List("a", "b", "c", "d", "e")
>>> a: List[java.lang.String] = List(a, b, c, d, e)
>>>
>>> // that one is ok
>>> scala> writeResult(a)
>>> a
>>> b
>>> c
>>> d
>>> e
>>>
>>> // that one might be introduced because the Seq is a bigger one
>>> // and there are many cores awaiting work...
>>> scala> val pa = a.par
>>> pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
>>> ParVector(a, b, c, d, e)
>>>
>>> // after some processing the Seq is to be saved...
>>> scala> writeResult(pa)
>>> a
>>> b
>>> d
>>> e
>>> c
>>>
>>> // an unexpected behaviour that spreads from the point of par
>>> // introduction into
>>> // formerly proper working pieces of code which silently assumed that
>>> // foreach is acting faithfully on the sequence of the Seq
>>>

Mon, 2011-04-04, 10:47

#80

jibal

Joined: 2010-12-01,

Re: Re: REPL: null-values in parallel ranges?

I think the question was whether the difference between foreach and iterator is intentional.

-- Jim

On Mon, Apr 4, 2011 at 1:41 AM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:

> It isn't taking me long trying .par in the repl to see all kinds of
> crazy stuff. The main thing is that foreach seems to return things in a
> random order and iterator doesn't. That doesn't sound like something
> intentional, but I try not to assume too much. But it is likely at the
> heart of what you are seeing too.
>

We used to have `pforeach` being implemented in parallel, and
`foreach` sequential, but this was changed recently.
I'm still not sure if that was the best thing to do, but, yes, it was
intentional.

> scala> def fx = (1 to 5).par
> fx: scala.collection.parallel.immutable.ParRange
>
> scala> fx foreach println
> 1
> 3
> 4
> 2
> 5
>
> scala> fx foreach println
> 2
> 3
> 1
> 5
> 4
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5
>
> scala> fx.iterator foreach println
> 1
> 2
> 3
> 4
> 5

Mon, 2011-04-04, 10:47

#81

Chris Marshall

Joined: 2009-06-17,

RE: REPL: null-values in parallel ranges?

My point was that, when we have to distinguish two almost-identical things by clumsy names (foreachParallel is clumsy, I confidently assert!), we might take a step back and ask ourselves why one entity can be asked to do two such almost-identical things in the first place. The problem here is quite that we have ended up with a design whereby we might intuitively expect that a for-comprehension is parallel and yet that we have thus broken (in a non-obvious and non-detectable way) some legacy code (because a parallel collection is also a sequential one by design).
That is, our parallel iterable is doing two things at the same time (it's a separation of concerns problem). I think this is *nicely demonstrated* (or illustrated) by the naming issue of some putative second method (and hence not absurd in the least).
Chris

From: Jim@Balter.name
Date: Mon, 4 Apr 2011 02:17:02 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes@hotmail.com
CC: scala-user@googlegroups.com

Frankly, I think that's absurd. The second name doesn't "have to" come from anywhere, and the name choice says nothing about the fundamental design.

The reality is that the contract for what is called "foreach", as a matter of expectation and of implementation (prior to parallel collections), even though not explicitly documented, is "forEachInSequence". Had such a name been chosen instead of "foreach", that would not have made the design any more or less wrong, "fundamentally" or otherwise. Ditto for giving what is by contract "forEachWithoutRegardToSequence" an "unwieldy" name or a short one.

If you think there's something wrong with the fundamental design then you should say what, rather than resting on totally bogus inferences.

-- Jim

On Mon, Apr 4, 2011 at 1:19 AM, Chris Marshall <oxbow_lakes@hotmail.com> wrote:

Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill@artima.com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill

Mon, 2011-04-04, 10:57

#82

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

Perhaps the 2)a) approach would be the most correct thing to do, as it
would eliminate the possibility of errors in legacy code.

This may require a great deal of refactorings, though.

Alex

On Apr 4, 10:50 am, martin odersky wrote:
> I see two possible ways out of this:
>
> 1) Fiddle with foreach, so that it and for-loops are sequential for parallel
> collections. You still need a parallel foreach, for instance in situations
> like this:
>
> (1 until N) pforeach { i => a(i) = a(i) * scale }
>
> 2) Change the inheritance hierarchy. The problem is that our current design
> weakens the contract for Traversable, Iterable, Seq and friends. Their
> operations might now be executed in parallel. It's most glaring in foreach,
> but there are other operations where this can make a difference. What's
> more, there's no way to specify sequential execution in the type system
> except going down to actual implementation classes. This is a bit having it
> backwards. First, sequential Seq is conceptionally a subclass of potentially
> parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful
> to state that something is sequential than that something is potentially
> parallel. But right now, we have explicit parallel collections, but no
> explicit sequential collections.
>
> I see two possible redesigns:
>
> a) Reverse the inheritance order, so that
>
> Traversable <: ParTraversable
> Iterable <: ParIterable
> Seq <: ParSeq
> ...
>
> This means that all existing code will be safe from surprises, and no
> existing code can make use of parallel execution. Note that this is
> conceptually sound, as Par... means "might be run in parallel", not "will be
> run in parallel". A problem with this approach is that, without further
> refactorings, sequential collections inherit some unnecessary plumbing
> operations from parallel collections.
>
> b) Have both implementations side by side. E.g. have a base class
> collection.Seq with subclasses
>
> collection.immutable.Seq
> collection.mutable.Seq
> collection.parimmutable.Seq
> collection.parmutable.Seq
>
> There is no inheritance relationship between the 4 different kinds of Seq.
> In this design, code that requires a collection.Seq can be passed a parallel
> Seq, but code the requires a mutable.Seq or immutable.Seq cannot.
>
> Something to mull over...
>
> Cheers
>
> -- Martin

Mon, 2011-04-04, 11:07

#83

Tommaso Galleri

Joined: 2011-02-04,

RE: Re: REPL: null-values in parallel ranges?

v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);}

Looks like I have not read it properly:

> “In this design, code that requires a collection.Seq can be passed a parallel Seq, but code the requires a mutable.Seq or immutable.Seq cannot.”

This is not ideal.

From: Tommaso Galleri
Sent: 04 April 2011 10:26
To: 'martin odersky'; Aleksandar Prokopec
Cc: Bill Venners; scala-user
Subject: RE: [scala-user] Re: REPL: null-values in parallel ranges?

If there was a need to change things, out of the three options, I think (b) would probably be the most appropriate.

From: scala-user@googlegroups.com [mailto:scala-user@googlegroups.com] On Behalf Of martin odersky
Sent: 04 April 2011 09:50
To: Aleksandar Prokopec
Cc: Bill Venners; scala-user
Subject: Re: [scala-user] Re: REPL: null-values in parallel ranges?

I see two possible ways out of this:

1) Fiddle with foreach, so that it and for-loops are sequential for parallel collections. You still need a parallel foreach, for instance in situations like this:

(1 until N) pforeach { i => a(i) = a(i) * scale }

2) Change the inheritance hierarchy. The problem is that our current design weakens the contract for Traversable, Iterable, Seq and friends. Their operations might now be executed in parallel. It's most glaring in foreach, but there are other operations where this can make a difference. What's more, there's no way to specify sequential execution in the type system except going down to actual implementation classes. This is a bit having it backwards. First, sequential Seq is conceptionally a subclass of potentially parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful to state that something is sequential than that something is potentially parallel. But right now, we have explicit parallel collections, but no explicit sequential collections.

I see two possible redesigns:

a) Reverse the inheritance order, so that

   Traversable <: ParTraversable
   Iterable <: ParIterable
   Seq <: ParSeq
   ...

This means that all existing code will be safe from surprises, and no existing code can make use of parallel execution. Note that this is conceptually sound, as Par... means "might be run in parallel", not "will be run in parallel". A problem with this approach is that, without further refactorings, sequential collections inherit some unnecessary plumbing operations from parallel collections.

b) Have both implementations side by side. E.g. have a base class collection.Seq with subclasses

collection.immutable.Seq
collection.mutable.Seq
collection.parimmutable.Seq
collection.parmutable.Seq

There is no inheritance relationship between the 4 different kinds of Seq. In this design, code that requires a collection.Seq can be passed a parallel Seq, but code the requires a mutable.Seq or immutable.Seq cannot.

Something to mull over...

Cheers

-- Martin

The information included in this email and any files transmitted with it may contain information that is confidential and it must not be used by, or its contents or attachments copied or disclosed, to persons other than the intended addressee. If you have received this email in error, please notify BJSS. In the absence of written agreement to the contrary BJSS' relevant standard terms of contract for any work to be undertaken will apply. Please carry out virus or such other checks as you consider appropriate in respect of this email. BJSS do not accept responsibility for any adverse effect upon your system or data in relation to this email or any files transmitted with it. BJSS Limited, a company registered in England and Wales (Company Number 2777575), VAT Registration Number 613295452, Registered Office Address, First Floor, Coronet House, Queen Street, Leeds, LS1 2TW

Mon, 2011-04-04, 11:17

#84

jibal

Joined: 2010-12-01,

Re: REPL: null-values in parallel ranges?

I already addressed all of this handwaving with specifics."forEachInSequence" and "forEachWithoutRegardToSequence" are non-identical in exactly the way that their different names say.

In a side-effect-free, time-order-free universe, we would not have invented necessarily-sequential collections first and then only later realized that we wanted not-necessarily-sequential collections ... we would have invented only the latter and would have defined necessarily-sequential operations for them. But the fact is that we don't live in such a universe so there isn't anything fundamentally wrong with the design of having both sorts of collections -- certainly not inferable from grossly misrepresentative nonsense as you have written.

-- Jim

On Mon, Apr 4, 2011 at 2:32 AM, Chris Marshall <oxbow_lakes@hotmail.com> wrote:

My point was that, when we have to distinguish two almost-identical things by clumsy names (foreachParallel is clumsy, I confidently assert!), we might take a step back and ask ourselves why one entity can be asked to do two such almost-identical things in the first place. The problem here is quite that we have ended up with a design whereby we might intuitively expect that a for-comprehension is parallel and yet that we have thus broken (in a non-obvious and non-detectable way) some legacy code (because a parallel collection is also a sequential one by design).
That is, our parallel iterable is doing two things at the same time (it's a separation of concerns problem). I think this is *nicely demonstrated* (or illustrated) by the naming issue of some putative second method (and hence not absurd in the least).
Chris

From: Jim@Balter.name
Date: Mon, 4 Apr 2011 02:17:02 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes@hotmail.com
CC: scala-user@googlegroups.com

Frankly, I think that's absurd. The second name doesn't "have to" come from anywhere, and the name choice says nothing about the fundamental design.

The reality is that the contract for what is called "foreach", as a matter of expectation and of implementation (prior to parallel collections), even though not explicitly documented, is "forEachInSequence". Had such a name been chosen instead of "foreach", that would not have made the design any more or less wrong, "fundamentally" or otherwise. Ditto for giving what is by contract "forEachWithoutRegardToSequence" an "unwieldy" name or a short one.

If you think there's something wrong with the fundamental design then you should say what, rather than resting on totally bogus inferences.

-- Jim

On Mon, Apr 4, 2011 at 1:19 AM, Chris Marshall <oxbow_lakes@hotmail.com> wrote:

Frankly, I think that a second method's name (probably) having to come from the unwieldy foreachParallel/parallelForeach is a strong indication that there is something wrong with the fundamental design (i.e. if the path of a second method is chosen).

> Date: Mon, 4 Apr 2011 01:03:18 -0700
> Subject: Re: [scala-user] REPL: null-values in parallel ranges?
> From: bill@artima.com
>
> Anyway, as someone else said, I think the
> important question is whether there should be another foreach method
> at all. Only if the answer is yes does it matter what its name is.
>
> Bill

Mon, 2011-04-04, 11:37

#85

jibal

Joined: 2010-12-01,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

Wrong:

"If a collection is an instance of an ordered collection class, traversing
its elements with `foreach` will always visit elements in the
same order, even for different runs of the program."

-- Jim

On Mon, Apr 4, 2011 at 2:31 AM, Wilfred Springer <wilfredspringer@gmail.com> wrote:

It's interesting that you're saying that foreach should be sequential in all cases. The foreach method is defined on Traversable. That class does not make *any* assumptions on ordering of the collection whatsoever. In fact, this is what it says:
A traversable class might or might not have two properties: strictness and orderedness. Neither is represented as a type.
If Traversable does not make any guarantees on strictness and orderedness, then consequently its methods cannot assume anything about that either. That includes the foreach method. Right?

2011/4/2 HamsterofDeath <h-star@gmx.de>
a parallel sequence can replace a sequencial sequence (feels weird
writing that) for every method except foreach. maybe there should be an
ordered-foreach-able trait and a foreachable-random-order-trait? then it
could be added to any collection that can do it.

Am 02.04.2011 16:28, schrieb Roland Kuhn:
> … from which follows that parallel collections cannot be subtypes of Seq.
>
> It has been said several times and is obviously true, so why not acknowledge the fact?
>
> (pesky old me)
>
> On Apr 2, 2011, at 16:00, HamsterofDeath <h-star@gmx.de> wrote:
>
>> if the order matters, you cannot use multiple threads. every
>> multithreading coding knows that. or am i wrong?
>>
>> Am 02.04.2011 15:34, schrieb Bernd Johannes:
>>> Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
>>>> if you need a sequencial foreach, make your parallel collection a
>>>> sequential one. according to the documentation, this conversion is
>>>> basically free.
>>>> for side effect free code, the order doesn't matter at all. i don't see
>>>> a big problem here.
>>> It's true - it's just about operations where the order of sequencial traversal
>>> matters (either globally or locally). That's why I don't see big issues with
>>> map() and the like.
>>>
>>> Maybe I am naiv - or the average coding quality has improved a lot beyond that
>>> what I used to see during my testing years (that's been a while in the past
>>> from now).
>>> With the obvious and simple example below I want to illustrate where I fear
>>> that parallelized foreach will wreak havoc. It's not obvious during
>>> compilation - maybe it goes unnoticed on small test systems... but it will
>>> fail in production environment at some point in time.
>>>
>>> And that's what I am concerned about. It's not obvious, not fast failing and
>>> may be introduced almost "silently" many many lines of code away...
>>>
>>> Just my 5 cents...
>>> Greetings
>>> Bernd
>>>
>>> scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
>>> | // normally does things like derive filename, cleanup old versions
>>> | // opens streams, error handling and the like and then writes
>>> | // the content
>>> | res.foreach((s) => println(s))
>>> | }
>>> writeResult: (res: scala.collection.immutable.Seq[String])Unit
>>>
>>> // just a make up result (and almost always in result files order matters)
>>> scala> val a = List("a", "b", "c", "d", "e")
>>> a: List[java.lang.String] = List(a, b, c, d, e)
>>>
>>> // that one is ok
>>> scala> writeResult(a)
>>> a
>>> b
>>> c
>>> d
>>> e
>>>
>>> // that one might be introduced because the Seq is a bigger one
>>> // and there are many cores awaiting work...
>>> scala> val pa = a.par
>>> pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
>>> ParVector(a, b, c, d, e)
>>>
>>> // after some processing the Seq is to be saved...
>>> scala> writeResult(pa)
>>> a
>>> b
>>> d
>>> e
>>> c
>>>
>>> // an unexpected behaviour that spreads from the point of par
>>> // introduction into
>>> // formerly proper working pieces of code which silently assumed that
>>> // foreach is acting faithfully on the sequence of the Seq
>>>

Mon, 2011-04-04, 11:37

#86

jibal

Joined: 2010-12-01,

Re: REPL: null-values in parallel ranges - Seq.foreach acting u

And I should have included the rest of it:

"If the class is not ordered, foreach can visit elements in different orders for different runs (but it will keep the same order in the same run)."

So `foreach` promises to be deterministic within a run of a program, even when no order is defined on the collection.

-- Jim

On Mon, Apr 4, 2011 at 3:31 AM, Jim Balter <Jim@balter.name> wrote:

Wrong:

"If a collection is an instance of an ordered collection class, traversing
its elements with `foreach` will always visit elements in the
same order, even for different runs of the program."

-- Jim

On Mon, Apr 4, 2011 at 2:31 AM, Wilfred Springer <wilfredspringer@gmail.com> wrote:
It's interesting that you're saying that foreach should be sequential in all cases. The foreach method is defined on Traversable. That class does not make *any* assumptions on ordering of the collection whatsoever. In fact, this is what it says:
A traversable class might or might not have two properties: strictness and orderedness. Neither is represented as a type.
If Traversable does not make any guarantees on strictness and orderedness, then consequently its methods cannot assume anything about that either. That includes the foreach method. Right?

2011/4/2 HamsterofDeath <h-star@gmx.de>
a parallel sequence can replace a sequencial sequence (feels weird
writing that) for every method except foreach. maybe there should be an
ordered-foreach-able trait and a foreachable-random-order-trait? then it
could be added to any collection that can do it.

Am 02.04.2011 16:28, schrieb Roland Kuhn:
> … from which follows that parallel collections cannot be subtypes of Seq.
>
> It has been said several times and is obviously true, so why not acknowledge the fact?
>
> (pesky old me)
>
> On Apr 2, 2011, at 16:00, HamsterofDeath <h-star@gmx.de> wrote:
>
>> if the order matters, you cannot use multiple threads. every
>> multithreading coding knows that. or am i wrong?
>>
>> Am 02.04.2011 15:34, schrieb Bernd Johannes:
>>> Am Samstag, 2. April 2011, 13:14:11 schrieb HamsterofDeath:
>>>> if you need a sequencial foreach, make your parallel collection a
>>>> sequential one. according to the documentation, this conversion is
>>>> basically free.
>>>> for side effect free code, the order doesn't matter at all. i don't see
>>>> a big problem here.
>>> It's true - it's just about operations where the order of sequencial traversal
>>> matters (either globally or locally). That's why I don't see big issues with
>>> map() and the like.
>>>
>>> Maybe I am naiv - or the average coding quality has improved a lot beyond that
>>> what I used to see during my testing years (that's been a while in the past
>>> from now).
>>> With the obvious and simple example below I want to illustrate where I fear
>>> that parallelized foreach will wreak havoc. It's not obvious during
>>> compilation - maybe it goes unnoticed on small test systems... but it will
>>> fail in production environment at some point in time.
>>>
>>> And that's what I am concerned about. It's not obvious, not fast failing and
>>> may be introduced almost "silently" many many lines of code away...
>>>
>>> Just my 5 cents...
>>> Greetings
>>> Bernd
>>>
>>> scala> def writeResult(res: scala.collection.immutable.Seq[String]): Unit = {
>>> | // normally does things like derive filename, cleanup old versions
>>> | // opens streams, error handling and the like and then writes
>>> | // the content
>>> | res.foreach((s) => println(s))
>>> | }
>>> writeResult: (res: scala.collection.immutable.Seq[String])Unit
>>>
>>> // just a make up result (and almost always in result files order matters)
>>> scala> val a = List("a", "b", "c", "d", "e")
>>> a: List[java.lang.String] = List(a, b, c, d, e)
>>>
>>> // that one is ok
>>> scala> writeResult(a)
>>> a
>>> b
>>> c
>>> d
>>> e
>>>
>>> // that one might be introduced because the Seq is a bigger one
>>> // and there are many cores awaiting work...
>>> scala> val pa = a.par
>>> pa: scala.collection.parallel.immutable.ParSeq[java.lang.String] =
>>> ParVector(a, b, c, d, e)
>>>
>>> // after some processing the Seq is to be saved...
>>> scala> writeResult(pa)
>>> a
>>> b
>>> d
>>> e
>>> c
>>>
>>> // an unexpected behaviour that spreads from the point of par
>>> // introduction into
>>> // formerly proper working pieces of code which silently assumed that
>>> // foreach is acting faithfully on the sequence of the Seq
>>>

Mon, 2011-04-04, 11:47

#87

Chris Marshall

Joined: 2009-06-17,

RE: REPL: null-values in parallel ranges?

I'm bagging "grossly misrepresentative nonsense" for both my new motto and my epitaph.

From: Jim@Balter.name
Date: Mon, 4 Apr 2011 02:55:44 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes@hotmail.com
CC: scala-user@googlegroups.com

I already addressed all of this handwaving with specifics."forEachInSequence" and "forEachWithoutRegardToSequence" are non-identical in exactly the way that their different names say.

In a side-effect-free, time-order-free universe, we would not have invented necessarily-sequential collections first and then only later realized that we wanted not-necessarily-sequential collections ... we would have invented only the latter and would have defined necessarily-sequential operations for them. But the fact is that we don't live in such a universe so there isn't anything fundamentally wrong with the design of having both sorts of collections -- certainly not inferable from grossly misrepresentative nonsense as you have written.

-- Jim

Mon, 2011-04-04, 11:47

#88

jibal

Joined: 2010-12-01,

Re: REPL: null-values in parallel ranges?

Seems appropriate to me. Your argument is like saying that the fact that "contents of address register" and "contents of data register" are nearly identical in name and operation demonstrates a fundamental design error in LISP -- a cons cell should not have two such similar operations; separation of concerns and all that. Since you're so focused on the length of names and how "unwieldy" they are, you should be satisfied if Scala goes with feio and feuo.

-- Jim

On Mon, Apr 4, 2011 at 3:32 AM, Chris Marshall <oxbow_lakes@hotmail.com> wrote:

I'm bagging "grossly misrepresentative nonsense" for both my new motto and my epitaph.

From: Jim@Balter.name
Date: Mon, 4 Apr 2011 02:55:44 -0700
Subject: Re: [scala-user] REPL: null-values in parallel ranges?
To: oxbow_lakes@hotmail.com
CC: scala-user@googlegroups.com

I already addressed all of this handwaving with specifics."forEachInSequence" and "forEachWithoutRegardToSequence" are non-identical in exactly the way that their different names say.

In a side-effect-free, time-order-free universe, we would not have invented necessarily-sequential collections first and then only later realized that we wanted not-necessarily-sequential collections ... we would have invented only the latter and would have defined necessarily-sequential operations for them. But the fact is that we don't live in such a universe so there isn't anything fundamentally wrong with the design of having both sorts of collections -- certainly not inferable from grossly misrepresentative nonsense as you have written.

-- Jim

Mon, 2011-04-04, 11:57

#89

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges - Seq.foreach acting un

Just to recap:

Also, it additionally implicitly promises that 2 elements will not be
traversed at the same time.

ParSeq's are special in the sense that while they have an order
implied by element indices, they process them in no prespecified
order, and concurrently.
Various methods, such as firstIndexOf, take that ordering into account
(the first element in the ordering is found).

Alex

On Apr 4, 12:35 pm, Jim Balter wrote:
> And I should have included the rest of it:
>
> "If the class is not ordered, foreach can visit elements in different orders
> for different runs (but it will keep the same order in the same run)."
>
> So `foreach` promises to be deterministic within a run of a program, even
> when no order is defined on the collection.
>

Mon, 2011-04-04, 12:27

#90

H-star Development

Joined: 2010-04-14,

Re: Re: REPL: null-values in parallel ranges?

the more i think about it, the more i think it should be:
trait Collection
trait CollectionWithOrderedData extends Collection
trait CollectionWithOrderedForeach extends CollectionWithOrderedData

ordered foreach is a special case of i-don't-care-how-foreach

then, if someone relies on an ordered foreach, he can enforce it statically by only accepting collections implementing the orderedforeach-trait aka the 2.8 collections

----
another thought:
what exactly was the reason to make foreach parallel and not add pforeach? any parallel method (map, collect and so on) could rely on an ordered foreach to get the elements, but still execute the functions in parallel
a for-comprehension could rely on a method "ppforeach" (possibly parallel foreach) which relies on foreach OR pforeach, depending on the concrete collection type.

-------- Original-Nachricht --------
> Datum: Mon, 4 Apr 2011 10:50:16 +0200
> Von: martin odersky
> An: Aleksandar Prokopec
> CC: Bill Venners , scala-user
> Betreff: Re: [scala-user] Re: REPL: null-values in parallel ranges?

> I see two possible ways out of this:
>
> 1) Fiddle with foreach, so that it and for-loops are sequential for
> parallel
> collections. You still need a parallel foreach, for instance in situations
> like this:
>
> (1 until N) pforeach { i => a(i) = a(i) * scale }
>
> 2) Change the inheritance hierarchy. The problem is that our current
> design
> weakens the contract for Traversable, Iterable, Seq and friends. Their
> operations might now be executed in parallel. It's most glaring in
> foreach,
> but there are other operations where this can make a difference. What's
> more, there's no way to specify sequential execution in the type system
> except going down to actual implementation classes. This is a bit having
> it
> backwards. First, sequential Seq is conceptionally a subclass of
> potentially
> parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful
> to state that something is sequential than that something is potentially
> parallel. But right now, we have explicit parallel collections, but no
> explicit sequential collections.
>
> I see two possible redesigns:
>
> a) Reverse the inheritance order, so that
>
> Traversable <: ParTraversable
> Iterable <: ParIterable
> Seq <: ParSeq
> ...
>
> This means that all existing code will be safe from surprises, and no
> existing code can make use of parallel execution. Note that this is
> conceptually sound, as Par... means "might be run in parallel", not "will
> be
> run in parallel". A problem with this approach is that, without further
> refactorings, sequential collections inherit some unnecessary plumbing
> operations from parallel collections.
>
> b) Have both implementations side by side. E.g. have a base class
> collection.Seq with subclasses
>
> collection.immutable.Seq
> collection.mutable.Seq
> collection.parimmutable.Seq
> collection.parmutable.Seq
>
> There is no inheritance relationship between the 4 different kinds of Seq.
> In this design, code that requires a collection.Seq can be passed a
> parallel
> Seq, but code the requires a mutable.Seq or immutable.Seq cannot.
>
> Something to mull over...
>
> Cheers
>

Mon, 2011-04-04, 13:07

#91

Matthew Pocock 3

Joined: 2010-07-30,

Re: REPL: null-values in parallel ranges?

I'm starting to loose track of what the actual issue is in all the proposals for fixing it. To my mind, the problem is two-fold:
1. writing parallel code that touches mutable state and talks to IO is hard
2. lots of existing code exists that assumes (implicitly or explicitly) that a. foreach will proceed in series b. foreach will process each element in the same order each time it is called
Now, 1 is one of those things that are fundamental truths about programming. We can do things with a side-effects system, encourage monadic computation and so on, but ultimately we have to put some of the responsibility for getting this right on the coder.
As for 2, it is potentially very serious (silent/pernicious bugs, no obvious cause, faults in legacy code that used to work) but also within our gift to address these through how we structure the APIs. My preferred option would be to inject an extra trait above the existing collections, so that we have a parallel and serial hierarchy, and to keep the two using the exact same method names for map/foreach/... . If this is put into the right place, all existing code not only gets serial instances but the types enforce that they must get serial instances. If we mix in the 'what kind of collection it is' traits correctly, we allow code to be polymorphic over serial/parallel implementations, if that's what the coder asks for. Similarly, the ordering issue should be flagged by a trait - I can well imagine mutable implementations of set/tree where every lookup potentially causes the foreach order to alter as commonly-accessed keys are brought towards the root.
Matthew

-- Matthew Pocockmailto: turingatemyhamster@gmail.comgchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozer(0191) 2566550

Mon, 2011-04-04, 13:47

#92

roland.kuhn

Joined: 2011-02-21,

Re: Re: REPL: null-values in parallel ranges?

Scala's strength is in the type system, so I vote for using it.

2a) has the nice property that it encodes the capabilities of your use of the collection: expect a Seq and you are not prepared to handle parallelism, expect a ParSeq and you say that your code is safe. Guaranteeing parallelism is not possible anyway in general (e.g. single-core, do they still exist?), so leaving that out of the type system seems adequate to me.

That doesn't say anything about the complexity of the change, though, since I am completely out of my depth on that one.

[hope my 2¢ don't bother you too much]

PS: I hope that the bike-shedding ends once one variant of 2) is chosen ;-)

On Apr 4, 2011, at 10:50 , martin odersky wrote:

> I see two possible ways out of this:
>
> 1) Fiddle with foreach, so that it and for-loops are sequential for parallel collections. You still need a parallel foreach, for instance in situations like this:
>
> (1 until N) pforeach { i => a(i) = a(i) * scale }
>
> 2) Change the inheritance hierarchy. The problem is that our current design weakens the contract for Traversable, Iterable, Seq and friends. Their operations might now be executed in parallel. It's most glaring in foreach, but there are other operations where this can make a difference. What's more, there's no way to specify sequential execution in the type system except going down to actual implementation classes. This is a bit having it backwards. First, sequential Seq is conceptionally a subclass of potentially parallel ParSeq. But right now ParSeq <: Seq. Second, it's far more useful to state that something is sequential than that something is potentially parallel. But right now, we have explicit parallel collections, but no explicit sequential collections.
>
> I see two possible redesigns:
>
> a) Reverse the inheritance order, so that
>
> Traversable <: ParTraversable
> Iterable <: ParIterable
> Seq <: ParSeq
> ...
>
> This means that all existing code will be safe from surprises, and no existing code can make use of parallel execution. Note that this is conceptually sound, as Par... means "might be run in parallel", not "will be run in parallel". A problem with this approach is that, without further refactorings, sequential collections inherit some unnecessary plumbing operations from parallel collections.
>
> b) Have both implementations side by side. E.g. have a base class collection.Seq with subclasses
>
> collection.immutable.Seq
> collection.mutable.Seq
> collection.parimmutable.Seq
> collection.parmutable.Seq
>
> There is no inheritance relationship between the 4 different kinds of Seq. In this design, code that requires a collection.Seq can be passed a parallel Seq, but code the requires a mutable.Seq or immutable.Seq cannot.
>
> Something to mull over...
>
> Cheers
>

Mon, 2011-04-04, 13:57

#93

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

We're in the process of solving this.
We're considering the following and will begin to experiment with the
necessary refactorings.

This will be the new hierarchy (can't draw 3d in ascii yet, but I'm sure
you'll figure this out):

TraversableOnce -------------> TraversableOnceLike
^ ^
| |
Traversable ---> ParTraversable ---> TraversableLike
^ ^ ^
| | |
Iterable ---> ParIterable ---> IterableLike
^ ^ ^
| | |
Seq ---> ParSeq ---> SeqLike
^ ^
| |
List, ParArray, ---------\
Vector, v
ArrayBuffer, ParVector, ---> ParSeqImpl ---> ParIterableImpl
Stream, ^
Range, ParRange ---------/
...

(only Seqs shown above, but Sets and Maps follow the same pattern).
(ParIterableImpl also extends IterableLike, and ParSeqImpl extends
SeqLike, which is not shown in the figure)

And separately from that:

TraversableOnce -------------> TraversableOnceLike
^
|
Iterator
^
|
Splitter <----------- IterableSplitter
^ ^
| |
PreciseSplitter <----- SeqSplitter

Note that:

There are no more ParIterators nor Par*Iterators, only splitters.
There are no more Par*Like classes, only (^Par)*Like classes. The
parallel implementations come from Par*Impl classes which get mixed in
to concrete parallel collection classes (such as ParArray).

Explanation:
- On the top of it all sits the TraversableOnceLike class. It has
implementations for all the basic accessors (reduceLeft, find, count,
...) based on foreach.
- TraversableOnce has already been used in some of the contracts, so it
can no longer be a supertype of TraversableLike, which could be
parallel. Therefore, it extends TraversableOnceLike, and is only
extended by collections known to be sequential.
- Traversable, Iterable, and Seq are the same things they were before
- ParTraversable, ParIterable and ParSeq are collections with methods
possibly executing in parallel (note that this does not only mean
invoking closures out-of-order, but also invoking them at the same time)
- TraversableLike, IterableLike and SeqLike are also collections with
methods possibly executing in parallel, but they are parametrized on the
representation type of the collection.
- the parallelIterator method is now called splitter and can only be
found in the Par*Impl traits declared abstract and defined in concrete
parallel collections
- there are some other minor refactorings involved we are aware this
might involve, not shown above

If anyone sees a fault with this design, or any serious showstoppers,
please say so.

Cheers,
Alex

Mon, 2011-04-04, 14:47

#94

Peter C. Chapin 2

Joined: 2011-01-07,

Re: Re: REPL: null-values in parallel ranges?

On Mon, 4 Apr 2011, Roland Kuhn wrote:

> Scala's strength is in the type system, so I vote for using it.
>
> 2a) has the nice property that it encodes the capabilities of your use of
> the collection: expect a Seq and you are not prepared to handle
> parallelism, expect a ParSeq and you say that your code is safe.
> Guaranteeing parallelism is not possible anyway in general (e.g.
> single-core, do they still exist?), so leaving that out of the type system
> seems adequate to me.

I'm going to add my thoughts by making a pitch for the original design (that
is, the design in 2.9 RC1).

As I understand it, right now (in 2.9 RC1) you need to explicitly ask for
sequential execution if you need it. Thus

def someMethod[T](someArray: Array[T]) = {
...
for (item <- someArray seq) { ... }
...
}

Otherwise, without the 'seq' you might get parallel execution with
nondeterministic ordering.

For new code this seems fine to me. It's an educational issue: 1) write code
in a pure style for which ordering doesn't matter or 2) if the first option
isn't possible ask for sequential ordering where necessary. Such a policy
favors parallel thinking and that's a good place to be headed, it seems to
me.

The problem is regarding existing code that isn't expecting to receive
parallel containers. However... Scala is a relatively new language. I
realize that there are major Scala projects and, in fact, that many people
on this list are maintaining such projects. But in the grand scheme of
things the amount of existing Scala code is tiny compared to, for example,
C++ or Java. I'm more concerned about the legions of future Scala
programmers, and the code they haven't yet written, than I am about the
existing Scala code base. I understand others might disagree. However, from
my point of view the real question is: what's right for future code?

I am also a believer in using the type system to encode design choices so
the compiler can catch logical inconsistencies in my program. This seems
like a good area to use that approach. My main concern is that I don't want
to relegate the parallel case to second class status. If the pundits are
right and parallel programming is going to be a new paradigm driving the
development of software in the future, it will be an irritation to have to
constantly remember to declare my willingness to accept parallel containers
(via the use of a specific type). In a few years that might look like an
anachronism... a throwback to the days when most programmers wrote
sequential code by default.

I don't know the best answer but those are a few thoughts to consider.

Peter

Mon, 2011-04-04, 15:07

#95

Johannes Rudolph 2

Joined: 2010-02-12,

Re: Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 2:46 PM, Aleksandar Prokopec
wrote:
> We're in the process of solving this.
> We're considering the following and will begin to experiment with the
> necessary refactorings.

This looks like the proposal 2a) from Martin from above. Perhaps one
should talk a bit more what this design implicates than how it is
actually implemented (I don't understand the details fully nor do I
want to have to):

I'll try an overview:
A sequential collection now inherits from a parallel collection, this means
0.) a parallel collection is not a sequential one any more
1.) therefore, old libraries by design can't handle parallel
collections as input per default.
2.) a library writer has to "enable" the use of parallel collections
at each method parameter by requiring "only" the more general parallel
collection type variant if it supports parallel collections
3.) a library writer should inform the users if he changes a return
type from e.g. Seq to ParSeq because it can be that the user's program
will still be compiling, but now lacks the guarantees the sequential
collection variants give

open questions:
* foreach stays parallel?
* how to write a library method that can be used both by users
understanding and using parallel collections and others who are
completely agnostic of parallelism concerns without putting the burden
on the user to use `.seq` everywhere to be on the safe side?

IMO the choice to use this design is a conservative one. Users are on
the safe side and old code shouldn't break (even if HOFs like map,
collect, etc. are safe even when they use side-effects) What I still
don't like is that the default `foreach` is parallel. In my
experience, executing side-effects concurrently will produce problems
in more cases than not. For me the parallel foreach is the special
case not the sequential one.

Contrast that to the current solution where the onus is on the user to
make sure to don't use side-effecting closures with parallel
collections.

Johannes

Mon, 2011-04-04, 15:07

#96

roland.kuhn

Joined: 2011-02-21,

Re: Re: REPL: null-values in parallel ranges?

On Apr 4, 2011, at 15:50 , Johannes Rudolph wrote:

> On Mon, Apr 4, 2011 at 2:46 PM, Aleksandar Prokopec
> wrote:
>> We're in the process of solving this.
>> We're considering the following and will begin to experiment with the
>> necessary refactorings.
>
[snip]
> Contrast that to the current solution where the onus is on the user to
> make sure to don't use side-effecting closures with parallel
> collections.
>
Having parallel foreach on the parallel collections can come in quite handy if you know what you're doing, especially in space-constrained environments (i.e. when building a complete result collection would blow up).

Regards,

Roland

Mon, 2011-04-04, 15:17

#97

Matthew Pocock 3

Joined: 2010-07-30,

Re: Re: REPL: null-values in parallel ranges?

This looks good to me. It will need clearly documenting - users may assume that Par* is parallel, but with this hierarchy, Par* has the option of being parallel, and it's the implementations that decide. I'd personally have introduced another trait for things which definitely are going to be parallel with the Par prefix, and named these 'may or may not be parallel' things something else. However, I see good modelling reasons to keep things as you have them. Either way, I think your hierarchy is a reasonable way to provide parallelism without breaking every single line of existing collections code in the wild :D
Matthew

On 4 April 2011 13:46, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:

We're in the process of solving this.
We're considering the following and will begin to experiment with the necessary refactorings.

This will be the new hierarchy (can't draw 3d in ascii yet, but I'm sure you'll figure this out):

TraversableOnce -------------> TraversableOnceLike
^ ^
| |
Traversable ---> ParTraversable ---> TraversableLike
^ ^ ^
| | |
Iterable ---> ParIterable ---> IterableLike
^ ^ ^
| | |
Seq ---> ParSeq ---> SeqLike
^ ^
| |
List, ParArray, ---------\
Vector, v
ArrayBuffer, ParVector, ---> ParSeqImpl ---> ParIterableImpl
Stream, ^
Range, ParRange ---------/

...
Cheers,
Alex

--
Matthew Pocockmailto: turingatemyhamster@gmail.com gchat: turingatemyhamster@gmail.commsn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozer(0191) 2566550

Mon, 2011-04-04, 15:37

#98

Aleksandar Prokopec

Joined: 2010-04-04,

Re: REPL: null-values in parallel ranges?

>
> I'll try an overview:
> A sequential collection now inherits from a parallel collection, this means
> 0.) a parallel collection is not a sequential one any more

True. A sequential collection is now a parallel collection.

> 1.) therefore, old libraries by design can't handle parallel
> collections as input per default.

That's the idea with this redesign, yes. Since the (valid) complaint (it
seems to me from most people) is that old libraries would be
compromised, this would disallow old code to implicitly use them.

_Perhaps_ a solution to this requirement could be to provide a Iterable
wrapper around a ParIterable. If a client wants to violate things in
lust for performance at his own responsibility, he would be allowed to
do so. A risk would then be explicit in the code.

legacyCallImSureDoesntHaveSideeffects(someParallelCollection.wrapSeq)

> 2.) a library writer has to "enable" the use of parallel collections
> at each method parameter by requiring "only" the more general parallel
> collection type variant if it supports parallel collections

True. If the code within a method can use the parameter in parallel
ways, this is expressed through a weaker precondition in the type of the
method parameter.

> 3.) a library writer should inform the users if he changes a return
> type from e.g. Seq to ParSeq because it can be that the user's program
> will still be compiling, but now lacks the guarantees the sequential
> collection variants give

This is a good point.

If he uses a ParSeq as a return type, the code calling a method may
still compile due to type inference. Note, however, that this does not
hurt existing libraries, since there aren't any yet. Furthermore, we've
had issues like this and we've used migration warnings to make them more
detectable.

> open questions:
> * foreach stays parallel?

Yes, foreach stays parallel. In ParTraversable, ParIterable and ParSeq
its possibly parallel, in a concrete collection class like ParArray it's
surely parallel.

> * how to write a library method that can be used both by users
> understanding and using parallel collections and others who are
> completely agnostic of parallelism concerns without putting the burden
> on the user to use `.seq` everywhere to be on the safe side?
>

In the current proposal, if the library writer knows his method won't do
side-effects:

Library writer writes:
def libraryMethod(sq: ParSeq[Int]) = {
sq.map(_ * 2)
}

User writes:
libraryMethod(ParArray(1, 2, 3))

If the library writer knows his method will do sideeffects in closure
passed to the collection argument:

def libraryMethod(sq: Seq[Int]) = {
sq foreach {
x => println(x)
}
}

If the library writer writes a method which takes a closure argument, he
would have to assume the closure is side-effecting just to be on the
safe side:

def libraryMethod(sq: Seq[Int])(f: Int => Int) = sq.map(f)

> IMO the choice to use this design is a conservative one. Users are on
> the safe side and old code shouldn't break (even if HOFs like map,
> collect, etc. are safe even when they use side-effects) What I still
> don't like is that the default `foreach` is parallel. In my
> experience, executing side-effects concurrently will produce problems
> in more cases than not. For me the parallel foreach is the special
> case not the sequential one.

I agree with you that it's easier to make a mistake if a foreach
executes in parallel, especially in the presence of type inference.
However, if a reference is known at compile time to be executed
sequentially, then there are no risks whatsoever.
A ".seq" or ": Traversable[A]" can always be used to fight type inference.

Although an always sequential `foreach` and an always parallel `pareach`
would solve the `foreach` issue, this is just a rule of the thumb. 90%
of collection methods take closures and any closure can contain a
sideeffect. To be completely safe, all other HOFs would require parallel
counterparts as well.

> Contrast that to the current solution where the onus is on the user to
> make sure to don't use side-effecting closures with parallel
> collections.

In the current solution, the user must make sure closures are pure for
any collection.
In the proposed solution, the user must make sure closure are pure
unless he knows at compile time that his collections is a sequential one.
This seems like an improvement to me..

Alex

Mon, 2011-04-04, 15:37

#99

Viktor Klang

Joined: 2008-12-17,

Re: Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 4:22 PM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:

I'll try an overview:
A sequential collection now inherits from a parallel collection, this means
0.) a parallel collection is not a sequential one any more

True. A sequential collection is now a parallel collection.

1.) therefore, old libraries by design can't handle parallel
collections as input per default.

That's the idea with this redesign, yes. Since the (valid) complaint (it seems to me from most people) is that old libraries would be compromised, this would disallow old code to implicitly use them.

_Perhaps_ a solution to this requirement could be to provide a Iterable wrapper around a ParIterable. If a client wants to violate things in lust for performance at his own responsibility, he would be allowed to do so. A risk would then be explicit in the code.

legacyCallImSureDoesntHaveSideeffects(someParallelCollection.wrapSeq)

2.) a library writer has to "enable" the use of parallel collections
at each method parameter by requiring "only" the more general parallel
collection type variant if it supports parallel collections

True. If the code within a method can use the parameter in parallel ways, this is expressed through a weaker precondition in the type of the method parameter.

3.) a library writer should inform the users if he changes a return
type from e.g. Seq to ParSeq because it can be that the user's program
will still be compiling, but now lacks the guarantees the sequential
collection variants give

This is a good point.

If he uses a ParSeq as a return type, the code calling a method may still compile due to type inference. Note, however, that this does not hurt existing libraries, since there aren't any yet. Furthermore, we've had issues like this and we've used migration warnings to make them more detectable.

open questions:
* foreach stays parallel?

Yes, foreach stays parallel. In ParTraversable, ParIterable and ParSeq its possibly parallel, in a concrete collection class like ParArray it's surely parallel.

* how to write a library method that can be used both by users
understanding and using parallel collections and others who are
completely agnostic of parallelism concerns without putting the burden
on the user to use `.seq` everywhere to be on the safe side?

In the current proposal, if the library writer knows his method won't do side-effects:

Library writer writes:
def libraryMethod(sq: ParSeq[Int]) = {
sq.map(_ * 2)
}

User writes:
libraryMethod(ParArray(1, 2, 3))

If the library writer knows his method will do sideeffects in closure passed to the collection argument:

def libraryMethod(sq: Seq[Int]) = {
sq foreach {
x => println(x)
}
}

If the library writer writes a method which takes a closure argument, he would have to assume the closure is side-effecting just to be on the safe side:

def libraryMethod(sq: Seq[Int])(f: Int => Int) = sq.map(f)

IMO the choice to use this design is a conservative one. Users are on
the safe side and old code shouldn't break (even if HOFs like map,
collect, etc. are safe even when they use side-effects) What I still
don't like is that the default `foreach` is parallel. In my
experience, executing side-effects concurrently will produce problems
in more cases than not. For me the parallel foreach is the special
case not the sequential one.

I agree with you that it's easier to make a mistake if a foreach executes in parallel, especially in the presence of type inference.
However, if a reference is known at compile time to be executed sequentially, then there are no risks whatsoever.
A ".seq" or ": Traversable[A]" can always be used to fight type inference.

Although an always sequential `foreach` and an always parallel `pareach` would solve the `foreach` issue, this is just a rule of the thumb. 90% of collection methods take closures and any closure can contain a sideeffect. To be completely safe, all other HOFs would require parallel counterparts as well.

I'm more worried about visibility, races etc.

Contrast that to the current solution where the onus is on the user to
make sure to don't use side-effecting closures with parallel
collections.

In the current solution, the user must make sure closures are pure for any collection.
In the proposed solution, the user must make sure closure are pure unless he knows at compile time that his collections is a sequential one.
This seems like an improvement to me..

Alex

--
Viktor Klang,
Relentless is more
Work:   Scalable Solutions
Code:   github.com/viktorklang
Follow: twitter.com/viktorklang
Read:   klangism.tumblr.com

Mon, 2011-04-04, 15:47

#555

Ismael Juma 2

Joined: 2011-01-22,

Re: Re: REPL: null-values in parallel ranges?

On Mon, Apr 4, 2011 at 1:46 PM, Aleksandar Prokopec
wrote:
> (only Seqs shown above, but Sets and Maps follow the same pattern).

I have a question. If I do seq.par.map(identity), will the resulting
Seq be == to the source Seq (i.e. the order of the Seq will be
maintained even if multiple operations are executed in parallel)?

Best,
Ismael

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland