- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
fold/reduce vs foldLeft/reduceLeft?
Fri, 2012-02-10, 19:37
Hello Everyone,
I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity, but how about
the non-specific fold/reduce? The scaladoc for reduce says:
Reduces the elements of this sequence using the specified associative
binary operator.
The order in which operations are performed on elements is unspecified
and may be nondeterministic.
Which sounds very unsettling, particularly the "unspecified and
nondeterministic" part! Is there any case where I should use a plain
"reduce", rather than reduceLeft? I can't think of a case where i'd
prefer a unspecified and non-determistic execution order, unless it's
something to do with letting parallel collections work properly in
parallel. Or is the unspecified and nondeterministic part a non-issue,
and I can shorten all my ".reduceLeft"s to ".reduce"s?
Thanks!
-Haoyi
Fri, 2012-02-10, 20:01
#2
Re: fold/reduce vs foldLeft/reduceLeft?
On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li <haoyi.sg@gmail.com> wrote:
That would be a strange implementation. The typical implementation requires O(n) stack space, which is the real problem--for many collections of practical size, you throw stack overflow exceptions.
The type of the function is also different. Fold left is [B,A] => B, while fold is [A,A] => A (just like reduce and reduceLeft).
Whenever it doesn't matter, which is exactly when the operator is transitive. That is, if your operator is O and
(a O b) O c == a O (b O c)
then all the rearrangements that fold and reduce are allowed to perform are inconsequential. But, by doing them in possibly random order, it can be parallelized. So you can just drop in a .par and suddenly it will run much faster.
Transitive operations include +, *, min, max, &, |, ^, set union, set intersection. They're usually the ones you would think of using in a fold anyway.
Subtraction, division, and set difference are not transitive. For example:
(1 - 2) - 3 == -1 -3 == -4
1 - (2 - 3) == 1 - (-1) == 2
Although you could use these in a fold, you're less likely to be tempted anyway, because
xs.foldLeft(x0)(_ - _)
is equal to either of
x0 - xs.sum
x0 - xs.foldLeft(0)(_ + _)
and the latter two are easier to understand. So you'd probably do the latter anyway.
--Rex
Hello Everyone,
I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity
That would be a strange implementation. The typical implementation requires O(n) stack space, which is the real problem--for many collections of practical size, you throw stack overflow exceptions.
, but how about
the non-specific fold/reduce? The scaladoc for reduce says:
Reduces the elements of this sequence using the specified associative
binary operator.
The type of the function is also different. Fold left is [B,A] => B, while fold is [A,A] => A (just like reduce and reduceLeft).
The order in which operations are performed on elements is unspecified
and may be nondeterministic.
Which sounds very unsettling, particularly the "unspecified and
nondeterministic" part! Is there any case where I should use a plain
"reduce", rather than reduceLeft?
Whenever it doesn't matter, which is exactly when the operator is transitive. That is, if your operator is O and
(a O b) O c == a O (b O c)
then all the rearrangements that fold and reduce are allowed to perform are inconsequential. But, by doing them in possibly random order, it can be parallelized. So you can just drop in a .par and suddenly it will run much faster.
Transitive operations include +, *, min, max, &, |, ^, set union, set intersection. They're usually the ones you would think of using in a fold anyway.
Subtraction, division, and set difference are not transitive. For example:
(1 - 2) - 3 == -1 -3 == -4
1 - (2 - 3) == 1 - (-1) == 2
Although you could use these in a fold, you're less likely to be tempted anyway, because
xs.foldLeft(x0)(_ - _)
is equal to either of
x0 - xs.sum
x0 - xs.foldLeft(0)(_ + _)
and the latter two are easier to understand. So you'd probably do the latter anyway.
--Rex
I can't think of a case where i'd
prefer a unspecified and non-determistic execution order, unless it's
something to do with letting parallel collections work properly in
parallel. Or is the unspecified and nondeterministic part a non-issue,
and I can shorten all my ".reduceLeft"s to ".reduce"s?
Thanks!
-Haoyi
Sun, 2012-02-12, 14:51
#3
Re: fold/reduce vs foldLeft/reduceLeft?
Thanks for the replies!
The use case I'm looking at is using the | operator in the parser
combinator library: I want to parametrize a parser by a list of tokens
it can accept. i.e. instead of hard coding it into the parser:
object MyParser extends RegexParsers{
def operator = "+" | "-" | "*" | "/"
}
I do something like this
class MyParser(opChars: List[String] = List("+", "-", "*", "/")
extends RegexParsers{
def operator = opChars.reduce(_|_)
}
Which would then let the person instantiating the parser to override
the default choice of operators if need be. My worry is that if I have
something like
A | B | C
I could end up with either
A | (B | C)
or
(A | B) | C
Which are technically different parsers: one has the subtree on the
left and one has the subtree on the right. However, both of them
should parse the same input in the same way. Is there any reason I
shouldn't use the non-deterministic reduce in this case? I'm
completely new to formal language theory/parser generators/parser
combinators, so it's also possible that what i'm trying to do is
completely off base.
Thanks!
-Haoyi
On Fri, Feb 10, 2012 at 1:57 PM, Rex Kerr wrote:
> On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li wrote:
>>
>> Hello Everyone,
>>
>> I've been learning scala recently (already know java/c#/python) and
>> been thoroughly enjoying the higher order collections functions.
>> filter/map is pretty obvious, and I've more or less figured out
>> foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
>> use foldRight/reduceRight because of O(n^2) complexity
>
>
> That would be a strange implementation. The typical implementation requires
> O(n) stack space, which is the real problem--for many collections of
> practical size, you throw stack overflow exceptions.
>
>>
>> , but how about
>> the non-specific fold/reduce? The scaladoc for reduce says:
>>
>> Reduces the elements of this sequence using the specified associative
>> binary operator.
>
>
> The type of the function is also different. Fold left is [B,A] => B, while
> fold is [A,A] => A (just like reduce and reduceLeft).
>
>>
>>
>> The order in which operations are performed on elements is unspecified
>> and may be nondeterministic.
>>
>> Which sounds very unsettling, particularly the "unspecified and
>> nondeterministic" part! Is there any case where I should use a plain
>> "reduce", rather than reduceLeft?
>
>
> Whenever it doesn't matter, which is exactly when the operator is
> transitive. That is, if your operator is O and
> (a O b) O c == a O (b O c)
> then all the rearrangements that fold and reduce are allowed to perform are
> inconsequential. But, by doing them in possibly random order, it can be
> parallelized. So you can just drop in a .par and suddenly it will run much
> faster.
>
> Transitive operations include +, *, min, max, &, |, ^, set union, set
> intersection. They're usually the ones you would think of using in a fold
> anyway.
>
> Subtraction, division, and set difference are not transitive. For example:
> (1 - 2) - 3 == -1 -3 == -4
> 1 - (2 - 3) == 1 - (-1) == 2
>
> Although you could use these in a fold, you're less likely to be tempted
> anyway, because
> xs.foldLeft(x0)(_ - _)
> is equal to either of
> x0 - xs.sum
> x0 - xs.foldLeft(0)(_ + _)
> and the latter two are easier to understand. So you'd probably do the
> latter anyway.
>
> --Rex
>
>
>>
>> I can't think of a case where i'd
>> prefer a unspecified and non-determistic execution order, unless it's
>> something to do with letting parallel collections work properly in
>> parallel. Or is the unspecified and nondeterministic part a non-issue,
>> and I can shorten all my ".reduceLeft"s to ".reduce"s?
>>
>> Thanks!
>> -Haoyi
>
>
Sun, 2012-02-12, 15:01
#4
Re: fold/reduce vs foldLeft/reduceLeft?
2012/2/10 Rex Kerr <ichoran@gmail.com>
On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li <haoyi.sg@gmail.com> wrote:
Hello Everyone,
I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity
That would be a strange implementation. The typical implementation requires O(n) stack space, which is the real problem--for many collections of practical size, you throw stack overflow exceptions.
Does it have to be on the stack though ? Can't the function be tail recursive on the reversed collection ?
, but how about
the non-specific fold/reduce? The scaladoc for reduce says:
Reduces the elements of this sequence using the specified associative
binary operator.
The type of the function is also different. Fold left is [B,A] => B, while fold is [A,A] => A (just like reduce and reduceLeft).
The order in which operations are performed on elements is unspecified
and may be nondeterministic.
Which sounds very unsettling, particularly the "unspecified and
nondeterministic" part! Is there any case where I should use a plain
"reduce", rather than reduceLeft?
Whenever it doesn't matter, which is exactly when the operator is transitive. That is, if your operator is O and
(a O b) O c == a O (b O c)
then all the rearrangements that fold and reduce are allowed to perform are inconsequential. But, by doing them in possibly random order, it can be parallelized. So you can just drop in a .par and suddenly it will run much faster.
Transitive operations include +, *, min, max, &, |, ^, set union, set intersection. They're usually the ones you would think of using in a fold anyway.
Subtraction, division, and set difference are not transitive. For example:
(1 - 2) - 3 == -1 -3 == -4
1 - (2 - 3) == 1 - (-1) == 2
Although you could use these in a fold, you're less likely to be tempted anyway, because
xs.foldLeft(x0)(_ - _)
is equal to either of
x0 - xs.sum
x0 - xs.foldLeft(0)(_ + _)
and the latter two are easier to understand. So you'd probably do the latter anyway.
--Rex
I can't think of a case where i'd
prefer a unspecified and non-determistic execution order, unless it's
something to do with letting parallel collections work properly in
parallel. Or is the unspecified and nondeterministic part a non-issue,
and I can shorten all my ".reduceLeft"s to ".reduce"s?
Thanks!
-Haoyi
Sun, 2012-02-12, 18:21
#5
Re: fold/reduce vs foldLeft/reduceLeft?
On 2012-02-10 20:25, Haoyi Li wrote:
> The use case I'm looking at is using the | operator in the parser
> combinator library: I want to parametrize a parser by a list of tokens
> it can accept. i.e. instead of hard coding it into the parser:
>
> object MyParser extends RegexParsers{
> def operator = "+" | "-" | "*" | "/"
> }
>
> I do something like this
>
> class MyParser(opChars: List[String] = List("+", "-", "*", "/")
> extends RegexParsers{
> def operator = opChars.reduce(_|_)
> }
>
> Which would then let the person instantiating the parser to override
> the default choice of operators if need be. My worry is that if I have
> something like
>
> A | B | C
>
> I could end up with either
>
> A | (B | C)
>
> or
>
> (A | B) | C
>
> Which are technically different parsers: one has the subtree on the
> left and one has the subtree on the right. However, both of them
> should parse the same input in the same way. Is there any reason I
> shouldn't use the non-deterministic reduce in this case? I'm
> completely new to formal language theory/parser generators/parser
> combinators, so it's also possible that what i'm trying to do is
> completely off base.
Hi Haoyi,
| combinator is technically left-associative, so that A | B | C is interpreted as (A | B) | C.
But semantically it is (fully) associative, i.e. (A | B) | C and A | (B | C) have exactly the
same behavior: first A is tried, then B, then C. That is these variants will always produce the same
result.
This is how it works:
1) (A | B) | C is an alternation parser as a whole, so first its left-hand-side is tried (that is
the ´A|B´ parser), then the right-hand-side (that is the ´C´ parser). The ´A|B´ parser is again an
alternation parser, so first ´A´ is tried, then ´B´. The overall order of applying parsers to the
input is A, B, C.
2) in ´A | (B | C)´, first ´A´ is tried, then ´B|C´. When ´B|C´ comes into play, then first ´B´,
then ´C´. The overall order is again A, B, C.
So, no difference in results regardless of folding direction.
But as for the runtime behavior, the right-folded version could be more effective, since the call to
the first parser goes directly to it from the outer parser. In the left-folded version it have to go
through (several) intermediate alternation combinators. But if the optimizer did a good job and
these calls got inlined, then there should be no difference at runtime, too.
(What can also make a difference is the overall order of alternatives depending on their
probabilities, but it's not related to the folding question.)
For an example of folding operator-parsers you might take a look at implementation in StdLexical [1]
and usage example in Json-Parser [2].
--
EL
[1]
https://github.com/scala/scala/blob/v2.9.1/src/library/scala/util/parsin...
[2] https://github.com/scala/scala/blob/v2.9.1/src/library/scala/util/parsin...
Sun, 2012-02-12, 18:31
#6
Re: fold/reduce vs foldLeft/reduceLeft?
On 2012-02-12 18:14, Eugen Labun wrote:
> But semantically it is (fully) associative, i.e. (A | B) | C and A | (B | C) have exactly the
> same behavior: first A is tried, then B, then C. That is these variants will always produce the same
> result.
should be better read as
But semantically it is (fully) associative, i.e. (A | B) | C and A | (B | C) always produce the
same result, since the order of applying involved parsers is always the same: first A is tried, then
B, then C.
Sun, 2012-02-12, 18:41
#7
Re: fold/reduce vs foldLeft/reduceLeft?
While the semantics are associative, typically you would want to avoid as much computation as possible in a parser. An execution strategy that applied A, B and C to the input in parallel would be costly unless there was a way to terminate the 'right-hand' parses once any to the 'left' succeeded. I can imagine lots of things that could be sped up with speculate-and-kill execution (e.g. takeWhile), but I don't know if it is one of the models supported by the parallel collections framework.
Matthew
On 12 February 2012 17:21, Eugen Labun <labun@gmx.net> wrote:
--
Dr Matthew PocockIntegrative Bioinformatics Group, School of Computing Science, Newcastle Universitymailto: turingatemyhamster@gmail.com gchat: turingatemyhamster@gmail.commsn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozerskype: matthew.pococktel: (0191) 2566550mob: +447535664143
Matthew
On 12 February 2012 17:21, Eugen Labun <labun@gmx.net> wrote:
On 2012-02-12 18:14, Eugen Labun wrote:
But semantically it is (fully) associative, i.e. (A | B) | C and A | (B | C) have exactly the
same behavior: first A is tried, then B, then C. That is these variants will always produce the same
result.
should be better read as
But semantically it is (fully) associative, i.e. (A | B) | C and A | (B | C) always produce the same result, since the order of applying involved parsers is always the same: first A is tried, then B, then C.
--
Dr Matthew PocockIntegrative Bioinformatics Group, School of Computing Science, Newcastle Universitymailto: turingatemyhamster@gmail.com gchat: turingatemyhamster@gmail.commsn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozerskype: matthew.pococktel: (0191) 2566550mob: +447535664143
On Fri, Feb 10, 2012 at 7:36 PM, Haoyi Li wrote:
> Hello Everyone,
>
> I've been learning scala recently (already know java/c#/python) and
> been thoroughly enjoying the higher order collections functions.
> filter/map is pretty obvious, and I've more or less figured out
> foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
> use foldRight/reduceRight because of O(n^2) complexity, but how about
> the non-specific fold/reduce? The scaladoc for reduce says:
>
> Reduces the elements of this sequence using the specified associative
> binary operator.
>
> The order in which operations are performed on elements is unspecified
> and may be nondeterministic.
>
> Which sounds very unsettling, particularly the "unspecified and
> nondeterministic" part! Is there any case where I should use a plain
> "reduce", rather than reduceLeft? I can't think of a case where i'd
> prefer a unspecified and non-determistic execution order, unless it's
> something to do with letting parallel collections work properly in
> parallel. Or is the unspecified and nondeterministic part a non-issue,
> and I can shorten all my ".reduceLeft"s to ".reduce"s?
_ + _
_ - _
_ * _
....
Cheers,
√
>
> Thanks!
> -Haoyi