This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: fold/reduce vs foldLeft/reduceLeft?

8 replies

Sat, 2012-02-11, 16:03

Haoyi Li

Joined: 2012-02-10,

Why isn't it, though? It seems to me that having O(n) heap space (for
the reversed collection) would be superior to having O(n) stack space,
which may blow up your stack. And they're both O(n) runtime, so
risking a stack overflow seems much more dangerous than running a
constant factor slower.

-Haoyi

On Sat, Feb 11, 2012 at 7:32 AM, Rex Kerr wrote:
> On Sat, Feb 11, 2012 at 5:59 AM, Alex Repain wrote:
>>
>>
>>
>> 2012/2/10 Rex Kerr
>>>
>>> On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li wrote:
>>>>
>>>> Hello Everyone,
>>>>
>>>> I've been learning scala recently (already know java/c#/python) and
>>>> been thoroughly enjoying the higher order collections functions.
>>>> filter/map is pretty obvious, and I've more or less figured out
>>>> foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
>>>> use foldRight/reduceRight because of O(n^2) complexity
>>>
>>>
>>> That would be a strange implementation. The typical implementation
>>> requires O(n) stack space, which is the real problem--for many collections
>>> of practical size, you throw stack overflow exceptions.
>>
>>
>> Does it have to be on the stack though ? Can't the function be tail
>> recursive on the reversed collection ?
>
>
> Of course. xs.foldRight(z)(f) is the same as xs.reverse.foldLeft(z)((a,b)
> => f(b,a)). It's often not implemented that way in the library, however.
>
> --Rex

Sun, 2012-02-12, 11:11

#1

Alex Repain

Joined: 2010-07-27,

Re: fold/reduce vs foldLeft/reduceLeft?

2012/2/12 Tony Morris <tonymorris@gmail.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/12/2012 08:24 AM, Rex Kerr wrote:
> For example, if you have something that is lazily computed (e.g. a Stream
> or a view), you have just committed yourself to computing the entire
> collection (potentially). Depending on your fold, you might not need to.

More importantly than algorithmic complexity, for non-strict data
structures, a foldRight written using foldLeft is a different function
altogether, since for some values that would typically terminate, they
no longer do. That is to say, your program will no longer work.

Interesting, could you provide an example for that case, please ?

- --
Tony Morris
http://tmorris.net/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJPN3whAAoJEPxHMY3rBz0PgdEH+waEDVG2gveEjFrT0waONK6k
TFlh2wJistsBE3/CrOAXlszRddad/RORpvHmlEL7j1hMFeE1iTlYg4BW9SBS1O2I
NlhCBOKFZmwGOok+A+t/OyDmjs7Bg+Mdo4xnu1BZQxQkfaedVYsGCitCtqvppeZU
HIsVlUYAmx6Dr9E3rl15zcyGQNb9INPFXIcSvBNjYCBUsGkudO+NfeoyH6P7H+ho
nEwPzazVWuL5dH0GPBQrAwe4ASO9rSqE8tSl9hf2W8hNEMKdY46KIZtB8LZmUD2G
KL1SdUBq728nszqAkuQE4u9nPFWOn8xkhCHKRAtL+uuJ8AvFsNzitN7PLNKqrUI=
=L9ZZ
-----END PGP SIGNATURE-----

Sun, 2012-02-12, 12:11

#2

Jason Zaugg

Joined: 2009-05-18,

Re: fold/reduce vs foldLeft/reduceLeft?

On Sun, Feb 12, 2012 at 11:06 AM, Alex Repain <alex.repain@gmail.com> wrote:

2012/2/12 Tony Morris <tonymorris@gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/12/2012 08:24 AM, Rex Kerr wrote:
> For example, if you have something that is lazily computed (e.g. a Stream
> or a view), you have just committed yourself to computing the entire
> collection (potentially). Depending on your fold, you might not need to.

More importantly than algorithmic complexity, for non-strict data
structures, a foldRight written using foldLeft is a different function
altogether, since for some values that would typically terminate, they
no longer do. That is to say, your program will no longer work.

Interesting, could you provide an example for that case, please ?

scala> def foldRight[A, B](fa: Stream[A], z: => B)(f: (A, => B) => B): B = | if (fa.isEmpty) z else f(fa.head, foldRight(fa.tail, z)(f))foldRight: [A, B](fa: Stream[A], z: => B)(f: (A, => B) => B)B
scala> def contains[A](as: Stream[A], a: A) = foldRight(as, false){(aa, z) => | a == aa || z | }contains: [A](as: Stream[A], a: A)Boolean
scala> contains(Stream.continually(2), 2)res6: Boolean = true
-jason

Sun, 2012-02-12, 14:51

#3

ichoran

Joined: 2009-08-14,

Re: fold/reduce vs foldLeft/reduceLeft?

On Sat, Feb 11, 2012 at 4:50 PM, Tony Morris <tmorris@tmorris.net> wrote:

On Feb 12, 2012 1:03 AM, "Haoyi Li" <haoyi.sg@gmail.com> wrote:
>
> Why isn't it, though? It seems to me that having O(n) heap space (for
> the reversed collection) would be superior to having O(n) stack space,
> which may blow up your stack. And they're both O(n) runtime, so
> risking a stack overflow seems much more dangerous than running a
> constant factor slower.

It is not true that for all values of xs then foldRight can be written with foldLeft. I should imagine this is why it is not implemented that way. Indeed, it is only true for very few heap-based structures like List.

To expand on Tony's point, while it is true that anything traversable (once) or any iterator can be packed into another data structure on the heap in opposite order (a List is a reasonable choice), after which a tail-recursive foldLeft can be used, there are a number of data structures for which that operation is questionable.

For example, if you have something that is lazily computed (e.g. a Stream or a view), you have just committed yourself to computing the entire collection (potentially). Depending on your fold, you might not need to.

Also, if you have an infinite collection _but_ for some values of input your function doesn't care what would have been computed so far, you can perform a right fold like so:
f(x0, f(x1 , f(x2, f(x3, whoCares) ) ) )
(that is, f(x3,y) is the same for any value of y, so you don't need to fold any deeper even if it's _infinitely_ deep).

There is a symmetric case for foldLeft that also works for infinite collections, namely
f( f( f( f(z, x0), x1), x2), whoCares)
where for some values of the carried term, f(carry, x) is the same for all x. Then you need not fold left any more, even if it's infinitely deep.

So left and right folds are interesting creatures and not theoretically trivially interconvertible. For cases of strictly evaluated finite collections, however, the reverse version is the way to go. But you can do that yourself when you need it, so it doesn't desperately need to be in the library.

--Rex

Sun, 2012-02-12, 14:51

#4

Tony Morris 2

Joined: 2009-03-20,

Re: fold/reduce vs foldLeft/reduceLeft?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/12/2012 08:06 PM, Alex Repain wrote:
> 2012/2/12 Tony Morris
>
> On 02/12/2012 08:24 AM, Rex Kerr wrote:
>>>> For example, if you have something that is lazily computed (e.g. a Stream
>>>> or a view), you have just committed yourself to computing the entire
>>>> collection (potentially). Depending on your fold, you might not need to.
>
> More importantly than algorithmic complexity, for non-strict data
> structures, a foldRight written using foldLeft is a different function
> altogether, since for some values that would typically terminate, they
> no longer do. That is to say, your program will no longer work.
>
>
>> Interesting, could you provide an example for that case, please ?

Yepper. http://paste.pocoo.org/show/549682/

case class EStream[A](x: Option[(() => A, () => EStream[A])]) {
import EStream._

def foldRight[B](b: => B)(f: (=> A, => B) => B): B =
x match {
case Some((h,t)) => f(h(), t().foldRight(b)(f))
case None => b
}

@annotation.tailrec
final def foldLeft[B](b: B)(f: (B, A) => B): B =
x match {
case Some((h,t)) => t().foldLeft(f(b,h()))(f)
case None => b
}

def reverse: EStream[A] =
foldLeft(nil[A])((b, a) => cons(() => a, () => b))

def foldRightBroken[B](b: => B)(f: (=> A, => B) => B): B =
foldLeft(b)((a, b) => f(b, a))
}

object EStream {
def rep[A](a: => A): EStream[A] =
cons(() => a, () => rep(a))

def cons[A](h: () => A, t: () => EStream[A]): EStream[A] =
EStream(Some(h, t))

def nil[A]: EStream[A] =
EStream(None)
}

object M {
import EStream._

def main(args: Array[String]) {
val ones = rep(1)
val twos = ones.foldRight(nil[Int])((a, b) => cons(() => a+1, () => b))

println(twos.x match {
case None => "hi!"
case Some((h, _)) => "lo" + h()
})

val oopsie = ones.foldRightBroken(nil[Int])((a, b) => cons(() =>
a+1, () => b))
println("you ain't never gonna be seeing this")
}
}

Sun, 2012-02-12, 14:51

#5

ichoran

Joined: 2009-08-14,

Re: fold/reduce vs foldLeft/reduceLeft?

On Sat, Feb 11, 2012 at 5:59 AM, Alex Repain <alex.repain@gmail.com> wrote:

2012/2/10 Rex Kerr <ichoran@gmail.com>
On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li <haoyi.sg@gmail.com> wrote:

Hello Everyone,

I've been learning scala recently (already know java/c#/python) and
been thoroughly enjoying the higher order collections functions.
filter/map is pretty obvious, and I've more or less figured out
foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
use foldRight/reduceRight because of O(n^2) complexity

That would be a strange implementation. The typical implementation requires O(n) stack space, which is the real problem--for many collections of practical size, you throw stack overflow exceptions.

Does it have to be on the stack though ? Can't the function be tail recursive on the reversed collection ?

Of course. xs.foldRight(z)(f) is the same as xs.reverse.foldLeft(z)((a,b) => f(b,a)). It's often not implemented that way in the library, however.

--Rex

Sun, 2012-02-12, 15:01

#6

Viktor Klang

Joined: 2008-12-17,

Re: fold/reduce vs foldLeft/reduceLeft?

Blowing the heap (OOME) kills your entire application, blowing your
stack only ruins one thread.

Cheers,
√

On Sat, Feb 11, 2012 at 4:03 PM, Haoyi Li wrote:
> Why isn't it, though? It seems to me that having O(n) heap space (for
> the reversed collection) would be superior to having O(n) stack space,
> which may blow up your stack. And they're both O(n) runtime, so
> risking a stack overflow seems much more dangerous than running a
> constant factor slower.
>
> -Haoyi
>
> On Sat, Feb 11, 2012 at 7:32 AM, Rex Kerr wrote:
>> On Sat, Feb 11, 2012 at 5:59 AM, Alex Repain wrote:
>>>
>>>
>>>
>>> 2012/2/10 Rex Kerr
>>>>
>>>> On Fri, Feb 10, 2012 at 1:36 PM, Haoyi Li wrote:
>>>>>
>>>>> Hello Everyone,
>>>>>
>>>>> I've been learning scala recently (already know java/c#/python) and
>>>>> been thoroughly enjoying the higher order collections functions.
>>>>> filter/map is pretty obvious, and I've more or less figured out
>>>>> foldLeft/reduceLeft. I've heard somewhere that you probably shouldn't
>>>>> use foldRight/reduceRight because of O(n^2) complexity
>>>>
>>>>
>>>> That would be a strange implementation. The typical implementation
>>>> requires O(n) stack space, which is the real problem--for many collections
>>>> of practical size, you throw stack overflow exceptions.
>>>
>>>
>>> Does it have to be on the stack though ? Can't the function be tail
>>> recursive on the reversed collection ?
>>
>>
>> Of course. xs.foldRight(z)(f) is the same as xs.reverse.foldLeft(z)((a,b)
>> => f(b,a)). It's often not implemented that way in the library, however.
>>
>> --Rex

Sun, 2012-02-12, 15:11

#7

Tony Morris 2

Joined: 2009-03-20,

Re: fold/reduce vs foldLeft/reduceLeft?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/12/2012 08:24 AM, Rex Kerr wrote:
> For example, if you have something that is lazily computed (e.g. a Stream
> or a view), you have just committed yourself to computing the entire
> collection (potentially). Depending on your fold, you might not need to.

More importantly than algorithmic complexity, for non-strict data
structures, a foldRight written using foldLeft is a different function
altogether, since for some values that would typically terminate, they
no longer do. That is to say, your program will no longer work.

Sun, 2012-02-12, 15:21

#8

Tony Morris

Joined: 2008-12-19,

Re: fold/reduce vs foldLeft/reduceLeft?

On Feb 12, 2012 1:03 AM, "Haoyi Li" <haoyi.sg@gmail.com> wrote:
>
> Why isn't it, though? It seems to me that having O(n) heap space (for
> the reversed collection) would be superior to having O(n) stack space,
> which may blow up your stack. And they're both O(n) runtime, so
> risking a stack overflow seems much more dangerous than running a
> constant factor slower.

It is not true that for all values of xs then foldRight can be written with foldLeft. I should imagine this is why it is not implemented that way. Indeed, it is only true for very few heap-based structures like List.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland