Introducing Scala using streams is perilous [Was: For a balanced scala article]

4 replies

Mon, 2011-12-26, 15:08

Sebastien Bocq

Joined: 2008-12-18,

2011/12/26 Julio Faerman :
> Once again, thanks everyone for the input. I have considered each
> opinion and the first draft of the article is ready for review:
>
> https://docs.google.com/document/d/1VHEImTA8poNg6o56eoSDL0CwOtJ2kBxHftDd...
>
> The link is open for editing and commenting. Please forgive (or
> correct) the non-native english.
>
> Comments and suggestions are a favor to me and to future readers, the
> article will be published on the web shortly.
>
> Thanks in advance,
> Julio

Hi Julio,

The functional version of Fibonacci looks nice and clean, however it
is not space-efficient conversely to the Java version.

Here are three versions of Fibonacci based on stream, which one is
space-efficient?

object FiboFunc1 {
val fibs:Stream[BigInt] =
0 #:: 1 #:: (fibs zip fibs.tail).map{ case (a,b) => a+b }
}

object FiboFunc2 {
def fibs:Stream[BigInt] =
0 #:: 1 #:: (fibs zip fibs.tail).map{ case (a,b) => a+b }
}

object FiboFunc3 {
def fibs:Stream[BigInt] = {
def _fibs(a:BigInt, b:BigInt):Stream[BigInt] = {
a #:: _fibs(b, a + b)
}

_fibs(0, 1)
}
}

Let's see (2.9.1.final).

scala> FiboFunc1.fibs(100000)
java.lang.OutOfMemoryError: GC overhead limit exceeded

scala> FiboFunc2.fibs(100000)
java.lang.OutOfMemoryError: Java heap space

scala> FiboFunc3.fibs(100000)
java.lang.OutOfMemoryError: Java heap space

Hmm, strange. Let's insist a bit:

scala> FiboFunc3.fibs.drop(100000).head
res2: BigInt = 25974069347221724166155034021275915414880485386517696584724770703952534543511273686265556772836716744754637587223074432111638399473875091030965697382188304493052287638531334921353026792789567010512765782716356080730505322002432331143839865161378272381247774537783372999162146340500546698603908627509966393664092118901252719601721050603003505868940285581036751176582513683774386849364134573388343651587754253719124105003321959913300622043630352137565254218239986908485563740801792517616293917549634585586163007628199160811098365263529954406942842065710460449038056471363460330005208522777075544467947237090309790190148604328468198579610159510018506082649192345873133991501339199323631023018641725364771362664750801339824312317034314529641817900511879573167668349799016820118499077566...

I have no time right now to figure out why the last two tests do not
yield the same result, but as you can see this is a perilous exercise.

Cheers,
Sébastien

Mon, 2011-12-26, 16:51

H-star Development

Joined: 2010-04-14,

Re: Introducing Scala using streams is perilous [Was: For a bal

i checked:

object FiboFunc2 {
def fibs:Stream[BigInt] =
0 #:: 1 #:: (fibs zip fibs.tail).map{ case (a,b) => a+b }
}

this one is the worst. it doesn't create one stream, it creates an army
of streams
FiboFunc1 is the working version of func2, here the stream re-uses itself.
Fibofunc 3 is worse than 1, but a lot better than 2. it's tail-function
holds 2 references to bigints which are not necessary in func2.

a memory problem is introduced by the fact that every single stream
element references not only the rest of the stream and the cached value,
but also the generation function which is never needed again..
a more efficient stream would be mutable and hold *one* reference to the
function and one to an arraybuffer of results.

if you want to show off streams, use streams holding a few big objects
that are costly to create so no one notices the amount of memory used by
the stream.

Am 26.12.2011 15:08, schrieb Sébastien Bocq:
> 2011/12/26 Julio Faerman :
>> Once again, thanks everyone for the input. I have considered each
>> opinion and the first draft of the article is ready for review:
>>
>> https://docs.google.com/document/d/1VHEImTA8poNg6o56eoSDL0CwOtJ2kBxHftDd...
>>
>> The link is open for editing and commenting. Please forgive (or
>> correct) the non-native english.
>>
>> Comments and suggestions are a favor to me and to future readers, the
>> article will be published on the web shortly.
>>
>> Thanks in advance,
>> Julio
> Hi Julio,
>
> The functional version of Fibonacci looks nice and clean, however it
> is not space-efficient conversely to the Java version.
>
> Here are three versions of Fibonacci based on stream, which one is
> space-efficient?
>
> object FiboFunc1 {
> val fibs:Stream[BigInt] =
> 0 #:: 1 #:: (fibs zip fibs.tail).map{ case (a,b) => a+b }
> }
>
> object FiboFunc2 {
> def fibs:Stream[BigInt] =
> 0 #:: 1 #:: (fibs zip fibs.tail).map{ case (a,b) => a+b }
> }
>
> object FiboFunc3 {
> def fibs:Stream[BigInt] = {
> def _fibs(a:BigInt, b:BigInt):Stream[BigInt] = {
> a #:: _fibs(b, a + b)
> }
>
> _fibs(0, 1)
> }
> }
>
> Let's see (2.9.1.final).
>
> scala> FiboFunc1.fibs(100000)
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> scala> FiboFunc2.fibs(100000)
> java.lang.OutOfMemoryError: Java heap space
>
> scala> FiboFunc3.fibs(100000)
> java.lang.OutOfMemoryError: Java heap space
>
> Hmm, strange. Let's insist a bit:
>
> scala> FiboFunc3.fibs.drop(100000).head
> res2: BigInt = 25974069347221724166155034021275915414880485386517696584724770703952534543511273686265556772836716744754637587223074432111638399473875091030965697382188304493052287638531334921353026792789567010512765782716356080730505322002432331143839865161378272381247774537783372999162146340500546698603908627509966393664092118901252719601721050603003505868940285581036751176582513683774386849364134573388343651587754253719124105003321959913300622043630352137565254218239986908485563740801792517616293917549634585586163007628199160811098365263529954406942842065710460449038056471363460330005208522777075544467947237090309790190148604328468198579610159510018506082649192345873133991501339199323631023018641725364771362664750801339824312317034314529641817900511879573167668349799016820118499077566...
>
> I have no time right now to figure out why the last two tests do not
> yield the same result, but as you can see this is a perilous exercise.
>
> Cheers,
> Sébastien
>

Mon, 2011-12-26, 22:41

Derek Williams 3

Joined: 2011-08-12,

Re: Introducing Scala using streams is perilous [Was: For a bal

On Mon, Dec 26, 2011 at 7:08 AM, Sébastien Bocq <sebastien.bocq@gmail.com> wrote:

The functional version of Fibonacci looks nice and clean, however it
is not space-efficient conversely to the Java version.

I agree that the Stream version isn't a great alternative to the Java version, mostly because they serve different purposes. The Stream version is good if you need previous values to be cached, but as you are finding out those values can start taking up a bit of memory.
Not an answer to your question (HamserofDeath already answered that one), but a better functional interpretation of the Java version would probably be something like this:
def fib(idx: Int): BigInt = { def run(i: Int, a: BigInt, b: BigInt): BigInt = if (i == idx) b else run(i + 1, b, a + b)
if (idx == 0) 0 else run(idx, 0, 1)}
Of course this isn't as clean and compact as the stream version, although it can be easier to follow for someone not familiar with functional programming.
--
Derek Williams

Tue, 2011-12-27, 00:11

Sebastien Bocq

Joined: 2008-12-18,

Re: Introducing Scala using streams is perilous [Was: For a bal

2011/12/26 Derek Williams :
> On Mon, Dec 26, 2011 at 7:08 AM, Sébastien Bocq
> wrote:
>>
>> The functional version of Fibonacci looks nice and clean, however it
>> is not space-efficient conversely to the Java version.
>
>
> I agree that the Stream version isn't a great alternative to the Java
> version, mostly because they serve different purposes. The Stream version is
> good if you need previous values to be cached, but as you are finding out
> those values can start taking up a bit of memory.
>
> Not an answer to your question (HamserofDeath already answered that one),
> but a better functional interpretation of the Java version would probably be
> something like this:
>
> def fib(idx: Int): BigInt = {
> def run(i: Int, a: BigInt, b: BigInt): BigInt =
> if (i == idx) b else run(i + 1, b, a + b)
>
> if (idx == 0) 0 else run(idx, 0, 1)
> }
>
> Of course this isn't as clean and compact as the stream version, although it
> can be easier to follow for someone not familiar with functional
> programming.

What is counter intuitive in the third example is that

FiboFunc3.fibs(100000)

caches intermediate values although they cannot be referenced any more.

I looked it up and the reason is that the apply method is defined in
LinearSeqOptimized as:

def apply(n: Int): A = {
val rest = drop(n)
if (n < 0 || rest.isEmpty) throw new IndexOutOfBoundsException("" + n)
rest.head
}

which keeps a reference to the head of the stream via 'this', hence
the caching behaviour.

This is roughly equivalent to this:

scala> FiboFunc3.fibs
res0: Stream[BigInt] = Stream(0, ?)

scala> res0.drop(100000).head
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead
limit exceeded

While the following works fine:

scala> FiboFunc3.fibs.drop(100000).head
res1: BigInt = 25974...

See the catch?

Tue, 2011-12-27, 15:21

jfaerman

Joined: 2011-11-29,

Re: Introducing Scala using streams is perilous [Was: For a bala

wow, that was interesting :)

I agree, but would like to make clear that this example was not meant
to introduce scala, but to show how different the functional version
can look.

That is important because one opinion i hear a lot is " i have 10+
years of programming in 10+ languages but can't read scala. Then scala
is hard to read". This is because they are very experienced in OO but
not in functional programming, and i would like to show this point in
the article.

On Dec 26, 9:01 pm, Sébastien Bocq wrote:
> 2011/12/26 Derek Williams :
>
>
>
>
>
>
>
>
>
> > On Mon, Dec 26, 2011 at 7:08 AM, Sébastien Bocq
> > wrote:
>
> >> The functional version of Fibonacci looks nice and clean, however it
> >> is not space-efficient conversely to the Java version.
>
> > I agree that the Stream version isn't a great alternative to the Java
> > version, mostly because they serve different purposes. The Stream version is
> > good if you need previous values to be cached, but as you are finding out
> > those values can start taking up a bit of memory.
>
> > Not an answer to your question (HamserofDeath already answered that one),
> > but a better functional interpretation of the Java version would probably be
> > something like this:
>
> > def fib(idx: Int): BigInt = {
> > def run(i: Int, a: BigInt, b: BigInt): BigInt =
> > if (i == idx) b else run(i + 1, b, a + b)
>
> > if (idx == 0) 0 else run(idx, 0, 1)
> > }
>
> > Of course this isn't as clean and compact as the stream version, although it
> > can be easier to follow for someone not familiar with functional
> > programming.
>
> What is counter intuitive in the third example is that
>
> FiboFunc3.fibs(100000)
>
> caches intermediate values although they cannot be referenced any more.
>
> I looked it up and the reason is that the apply method is defined in
> LinearSeqOptimized as:
>
> def apply(n: Int): A = {
> val rest = drop(n)
> if (n < 0 || rest.isEmpty) throw new IndexOutOfBoundsException("" + n)
> rest.head
>
> }
>
> which keeps a reference to the head of the stream via 'this', hence
> the caching behaviour.
>
> This is roughly equivalent to this:
>
> scala> FiboFunc3.fibs
> res0: Stream[BigInt] = Stream(0, ?)
>
> scala> res0.drop(100000).head
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead
> limit exceeded
>
> While the following works fine:
>
> scala> FiboFunc3.fibs.drop(100000).head
> res1: BigInt = 25974...
>
> See the catch?
>
> --
> Sébastien

Scala Main Menu

Introducing Scala using streams is perilous [Was: For a balanced scala article]

Scala Quick Links

Featured News

User login