This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: foreachWithIndex?

38 replies

Wed, 2010-05-19, 10:44

Sciss

Joined: 2008-12-17,

you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft etc. then you iterate over instances of Tuple2 carrying your original collection's item along with an index Int.

Am 19.05.2010 um 10:41 schrieb 谢非:

> sometimes we need to get current index while iterating an indexed sequence, but I cannot find methods like foreachWithIndex, foldLeftWithIndex, etc....
>
>
> Any suggestions?
>
>
> it is not convenient to check about method existence in scala, with Java I have mighty CHM documents.
>
>
>
>
>
>
>
>
>
> 使用新一代 Windows Live Messenger 轻松交流和共享！立刻下载！

Wed, 2010-05-19, 10:57

#1

XieFei

Joined: 2010-04-26,

foreachWithIndex?

sometimes we need to get current index while iterating an indexed sequence, but I cannot find methods like foreachWithIndex, foldLeftWithIndex, etc....

Any suggestions?

it is not convenient to check about method existence in scala, with Java I have mighty CHM documents.

使用新一代 Windows Live Messenger 轻松交流和共享！立刻下载！

Wed, 2010-05-19, 11:07

#2

XieFei

Joined: 2010-04-26,

Re: foreachWithIndex?

.ExternalClass .ecxhmmessage P {padding:0px;} .ExternalClass body.ecxhmmessage {font-size:10pt;font-family:Verdana;} Oh, that will take double memory and one more iteration, feels bad

> you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft etc. then you iterate over instances of Tuple2 carrying your original collection's item along with an index Int.

使用Messenger保护盾2.0，支持多账号登录！现在就下载！

Wed, 2010-05-19, 11:17

#3

H-star Development

Joined: 2010-04-14,

Re: foreachWithIndex?

similar problem here, i want to fill an n-dimensional array with a function like this:

((a,b,c,d,e.... /*a tuple*/)):ArrayType => whateverShouldBeInTheArrayAt _

but scala doesn't seem to provide a method to do this

-------- Original-Nachricht --------
> Datum: Wed, 19 May 2010 17:51:45 +0800
> Von: "谢非"
> An: scala-user@listes.epfl.ch
> Betreff: Re: [scala-user] foreachWithIndex?

>
>
>
>
>
>
>
>
>
> Oh, that will take double memory and one more iteration, feels bad
>
> > you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft etc.
> then you iterate over instances of Tuple2 carrying your original
> collection's item along with an index Int.
>
>
>
> _________________________________________________________________
> 一张照片的自白――Windows Live照片的可爱视频介绍
> http://windowslivesky.spaces.live.com/blog/cns!5892B6048E2498BD!889.entry

Wed, 2010-05-19, 11:27

#4

Kevin Wright

Joined: 2009-06-09,

Re: foreachWithIndex?

Tuples inherit the Product trait
Take a look at the productArity and productElement methods there, you could easily use these to implicitly convert a tuple to an Array/Seq/Stream/Map/etc.

On 19 May 2010 11:02, Dennis Haupt <h-star@gmx.de> wrote:

similar problem here, i want to fill an n-dimensional array with a function like this:

((a,b,c,d,e.... /*a tuple*/)):ArrayType => whateverShouldBeInTheArrayAt _

but scala doesn't seem to provide a method to do this

-------- Original-Nachricht --------
> Datum: Wed, 19 May 2010 17:51:45 +0800
> Von: "谢非" <xie___fei@hotmail.com>
> An: scala-user@listes.epfl.ch
> Betreff: Re: [scala-user] foreachWithIndex?

>
>
>
>
>
>
>
>
>
> Oh, that will take double memory and one more iteration, feels bad
>
> > you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft etc.
> then you iterate over instances of Tuple2 carrying your original
> collection's item along with an index Int.
>
>
>
> _________________________________________________________________
> 一张照片的自白――Windows Live照片的可爱视频介绍
> http://windowslivesky.spaces.live.com/blog/cns!5892B6048E2498BD!889.entry

--
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

--
Kevin Wright

mail/google talk: kev.lee.wright@googlemail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

Wed, 2010-05-19, 11:47

#5

H-star Development

Joined: 2010-04-14,

Re: foreachWithIndex?

no, that's not what i want.

i have this array:

val array = Array.ofDim[String](1,2,3,4,5)

and want to do this:

array.fillEach((a,b,c,d,e) => "row "+a+", col+" b+" whatever "+c*d/e)
i want to give it an initialization-function

-------- Original-Nachricht --------
> Datum: Wed, 19 May 2010 11:26:16 +0100
> Von: Kevin Wright
> An: Dennis Haupt
> CC: "è°¢é ž" , scala-user@listes.epfl.ch
> Betreff: Re: [scala-user] foreachWithIndex?

> Tuples inherit the Product trait
>
> Take a look at the productArity and productElement methods there, you
> could
> easily use these to implicitly convert a tuple to an
> Array/Seq/Stream/Map/etc.
>
> On 19 May 2010 11:02, Dennis Haupt wrote:
>
> > similar problem here, i want to fill an n-dimensional array with a
> function
> > like this:
> >
> > ((a,b,c,d,e.... /*a tuple*/)):ArrayType => whateverShouldBeInTheArrayAt
> _
> >
> > but scala doesn't seem to provide a method to do this
> >
> > -------- Original-Nachricht --------
> > > Datum: Wed, 19 May 2010 17:51:45 +0800
> > > Von: "谢非"
> > > An: scala-user@listes.epfl.ch
> > > Betreff: Re: [scala-user] foreachWithIndex?
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Oh, that will take double memory and one more iteration, feels bad
> > >
> > > > you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft
> etc.
> > > then you iterate over instances of Tuple2 carrying your original
> > > collection's item along with an index Int.
> > >
> > >
> > >
> > > _________________________________________________________________
> > > 一张照片的自白――Windows Live照片的可爱视频介绍
> > >
> >
> http://windowslivesky.spaces.live.com/blog/cns!5892B6048E2498BD!889.entry
> >
> > --
> > GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
> > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
> >
>
>
>

Wed, 2010-05-19, 12:17

#6

Jason Zaugg

Joined: 2009-05-18,

Re: foreachWithIndex?

scala> Array.tabulate[Int](3, 3)(_ * _)
res3: Array[Array[Int]] = Array(Array(0, 0, 0), Array(0, 1, 2), Array(0, 2, 4))

On Wed, May 19, 2010 at 12:40 PM, Dennis Haupt wrote:
> no, that's not what i want.
>
> i have this array:
>
> val array = Array.ofDim[String](1,2,3,4,5)
>
> and want to do this:
>
> array.fillEach((a,b,c,d,e) => "row "+a+", col+" b+" whatever "+c*d/e)
> i want to give it an initialization-function

Wed, 2010-05-19, 12:27

#7

H-star Development

Joined: 2010-04-14,

Re: foreachWithIndex?

this mailing list is an infinite source of valuable wisdom

-------- Original-Nachricht --------
> Datum: Wed, 19 May 2010 13:10:10 +0200
> Von: Jason Zaugg
> An: Dennis Haupt
> CC: Kevin Wright , scala-user@listes.epfl.ch, xie___fei@hotmail.com
> Betreff: Re: [scala-user] foreachWithIndex?

> scala> Array.tabulate[Int](3, 3)(_ * _)
> res3: Array[Array[Int]] = Array(Array(0, 0, 0), Array(0, 1, 2), Array(0,
> 2, 4))
>
> On Wed, May 19, 2010 at 12:40 PM, Dennis Haupt wrote:
> > no, that's not what i want.
> >
> > i have this array:
> >
> > val array = Array.ofDim[String](1,2,3,4,5)
> >
> > and want to do this:
> >
> > array.fillEach((a,b,c,d,e) => "row "+a+", col+" b+" whatever "+c*d/e)
> > i want to give it an initialization-function

Wed, 2010-05-19, 14:37

#8

dcsobral

Joined: 2009-04-23,

Re: foreachWithIndex?

You can apply it to a view, then it doesn't take more memory.

2010/5/19 谢非 <xie___fei@hotmail.com>

Oh, that will take double memory and one more iteration, feels bad

> you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft etc. then you iterate over instances of Tuple2 carrying your original collection's item along with an index Int.

使用Messenger保护盾2.0，支持多账号登录！现在就下载！

--
Daniel C. Sobral

I travel to the future all the time.

Wed, 2010-05-19, 17:57

#9

Garrett Rowe 2

Joined: 2009-03-23,

Re: foreachWithIndex?

Or you can cook up something yourself.

class IndexedArrayOps[A](array: Array[A]) {
def foreachWithIndex(f: (A, Int) => Unit): Unit = {
var currIdx = 0
while (currIdx < array.length) {
f(array(currIdx), currIdx)
currIdx += 1
}
}
}

2010/5/19 Daniel Sobral :
> You can apply it to a view, then it doesn't take more memory.
>
> 2010/5/19 谢非
>>
>> Oh, that will take double memory and one more iteration, feels bad
>>
>> > you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft etc.
>> > then you iterate over instances of Tuple2 carrying your original
>> > collection's item along with an index Int.
>>
>>
>>
>> ________________________________
>> 使用Messenger保护盾2.0，支持多账号登录！现在就下载！
>
>
> --
> Daniel C. Sobral
>
> I travel to the future all the time.
>

Wed, 2010-05-19, 18:37

#10

MonkeeSage

Joined: 2010-05-07,

Re: foreachWithIndex?

On 05/19/2010 11:33 AM, Garrett Rowe wrote:
> Or you can cook up something yourself.
>
> class IndexedArrayOps[A](array: Array[A]) {
> def foreachWithIndex(f: (A, Int) => Unit): Unit = {
> var currIdx = 0
> while (currIdx< array.length) {
> f(array(currIdx), currIdx)
> currIdx += 1
> }
> }
> }
>

someSeq.foldLeft(0) { (idx, x) =>
printf("item %d == %s\n", idx, x)
idx+1
}

> 2010/5/19 Daniel Sobral:
>
>> You can apply it to a view, then it doesn't take more memory.
>>
>> 2010/5/19 谢非
>>
>>> Oh, that will take double memory and one more iteration, feels bad
>>>
>>>
>>>> you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft etc.
>>>> then you iterate over instances of Tuple2 carrying your original
>>>> collection's item along with an index Int.
>>>>
>>>
>>>
>>> ________________________________
>>> 使用Messenger保护盾2.0，支持多账号登录！现在就下载！
>>>
>>
>> --
>> Daniel C. Sobral
>>
>> I travel to the future all the time.
>>
>>
>

Thu, 2010-05-20, 02:37

#11

MonkeeSage

Joined: 2010-05-07,

Re: foreachWithIndex?

On 05/19/2010 12:23 PM, MonkeeSage wrote:
> On 05/19/2010 11:33 AM, Garrett Rowe wrote:
>> Or you can cook up something yourself.
>>
>> class IndexedArrayOps[A](array: Array[A]) {
>> def foreachWithIndex(f: (A, Int) => Unit): Unit = {
>> var currIdx = 0
>> while (currIdx< array.length) {
>> f(array(currIdx), currIdx)
>> currIdx += 1
>> }
>> }
>> }
>
> someSeq.foldLeft(0) { (idx, x) =>
> printf("item %d == %s\n", idx, x)
> idx+1
> }

And of course, to walk backwards

someSeq.foldRight(someSeq.length) { (idx, x) =>
printf("item %d == %s\n", idx, x)
idx-1
}

Folds tend to be thought of a just a way to sum or reduce (and you can
certainly implement sum and reduce using them), but they are really a
general pattern for when you want to iterate a sequence and pass some
kind of state with each iteration. You need not be calculating a return
value, or even changing the state on each iteration.

Some examples (contrived, I know):

// filter
List(1,2,3,4).foldLeft(List[Int]()) { (odds, x) =>
if (x % 2 == 1) x :: odds
else odds
}.reverse

// map
List(1,2,3,4).foldLeft(List[Int]()) { (res, x) =>
doStuffToX(x) :: res
}.reverse

// always print the first item...
List(1,2,3,4).foldLeft(true) { (shouldPrint, x) =>
if (shouldPrint) println(x)
checkIfWeStillWantToBeVerbose()
}

>> 2010/5/19 Daniel Sobral:
>>> You can apply it to a view, then it doesn't take more memory.
>>>
>>> 2010/5/19 谢非
>>>> Oh, that will take double memory and one more iteration, feels bad
>>>>
>>>>> you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft
>>>>> etc.
>>>>> then you iterate over instances of Tuple2 carrying your original
>>>>> collection's item along with an index Int.
>>>>
>>>>
>>>> ________________________________
>>>> 使用Messenger保护盾2.0，支持多账号登录！现在就下载！
>>>
>>> --
>>> Daniel C. Sobral
>>>
>>> I travel to the future all the time.
>>>
>
>

Sat, 2010-05-22, 06:37

#12

MonkeeSage

Joined: 2010-05-07,

Re: foreachWithIndex?

On 05/19/2010 08:29 PM, MonkeeSage wrote:
> On 05/19/2010 12:23 PM, MonkeeSage wrote:
>> On 05/19/2010 11:33 AM, Garrett Rowe wrote:
>>> Or you can cook up something yourself.
>>>
>>> class IndexedArrayOps[A](array: Array[A]) {
>>> def foreachWithIndex(f: (A, Int) => Unit): Unit = {
>>> var currIdx = 0
>>> while (currIdx< array.length) {
>>> f(array(currIdx), currIdx)
>>> currIdx += 1
>>> }
>>> }
>>> }
>>
>> someSeq.foldLeft(0) { (idx, x) =>
>> printf("item %d == %s\n", idx, x)
>> idx+1
>> }
>
> And of course, to walk backwards
>
> someSeq.foldRight(someSeq.length) { (idx, x) =>
> printf("item %d == %s\n", idx, x)
> idx-1
> }

Except, that's wrong. Let's try again...(notice the order reversal of
the item/accumulator)...

// walking backwards with index
someSeq.foldRight(someSeq.length-1) { (x, idx) =>
printf("item %d == %s\n", idx, x)
idx - 1
}

>
> Folds tend to be thought of a just a way to sum or reduce (and you can
> certainly implement sum and reduce using them), but they are really a
> general pattern for when you want to iterate a sequence and pass some
> kind of state with each iteration. You need not be calculating a
> return value, or even changing the state on each iteration.
>
> Some examples (contrived, I know):
>
> // filter
> List(1,2,3,4).foldLeft(List[Int]()) { (odds, x) =>
> if (x % 2 == 1) x :: odds
> else odds
> }.reverse
>
> // map
> List(1,2,3,4).foldLeft(List[Int]()) { (res, x) =>
> doStuffToX(x) :: res
> }.reverse
>
> // always print the first item...
> List(1,2,3,4).foldLeft(true) { (shouldPrint, x) =>
> if (shouldPrint) println(x)
> checkIfWeStillWantToBeVerbose()
> }
>
>>> 2010/5/19 Daniel Sobral:
>>>> You can apply it to a view, then it doesn't take more memory.
>>>>
>>>> 2010/5/19 谢非
>>>>> Oh, that will take double memory and one more iteration, feels bad
>>>>>
>>>>>> you can do aSeq.zipWithIndex.foreach or
>>>>>> aSeq.zipWithIndex.foldLeft etc.
>>>>>> then you iterate over instances of Tuple2 carrying your original
>>>>>> collection's item along with an index Int.
>>>>>
>>>>>
>>>>> ________________________________
>>>>> 使用Messenger保护盾2.0，支持多账号登录！现在就下载！
>>>>
>>>> --
>>>> Daniel C. Sobral
>>>>
>>>> I travel to the future all the time.
>>>>
>>
>>
>
>
>

Sat, 2010-05-22, 06:57

#13

MonkeeSage

Joined: 2010-05-07,

Re: foreachWithIndex?

On 05/19/2010 12:23 PM, MonkeeSage wrote:
> On 05/19/2010 11:33 AM, Garrett Rowe wrote:
>> Or you can cook up something yourself.
>>
>> class IndexedArrayOps[A](array: Array[A]) {
>> def foreachWithIndex(f: (A, Int) => Unit): Unit = {
>> var currIdx = 0
>> while (currIdx< array.length) {
>> f(array(currIdx), currIdx)
>> currIdx += 1
>> }
>> }
>> }
>
> someSeq.foldLeft(0) { (idx, x) =>
> printf("item %d == %s\n", idx, x)
> idx+1
> }

Here's...

// foldLeftWithIndex
someSeq.foldLeft((0, 1)) { (idx_acc, x) =>
printf("item %d == %s, with accum %d\n", idx_acc._1, x, idx_acc._2)
(idx_acc._1 + 1, idx_acc._2 * 2)
}

A bit terse, but still pretty readable, IMO.

>
>> 2010/5/19 Daniel Sobral:
>>> You can apply it to a view, then it doesn't take more memory.
>>>
>>> 2010/5/19 谢非
>>>> Oh, that will take double memory and one more iteration, feels bad
>>>>
>>>>> you can do aSeq.zipWithIndex.foreach or aSeq.zipWithIndex.foldLeft
>>>>> etc.
>>>>> then you iterate over instances of Tuple2 carrying your original
>>>>> collection's item along with an index Int.
>>>>
>>>>
>>>> ________________________________
>>>> 使用Messenger保护盾2.0，支持多账号登录！现在就下载！
>>>
>>> --
>>> Daniel C. Sobral
>>>
>>> I travel to the future all the time.
>>>
>
>

Sun, 2010-11-21, 22:37

#14

H-star Development

Joined: 2010-04-14,

foreachwithindex?

let's say i have a traversable and i want to iterate over all elements
but i need to know the index of the current element as well.
is there a faster way than traversable.zipWithIndex.foreach? i mean, i
don't really want to create a huge collection filled with new tuples
just because of a single int that could be a simple counter var hidden
in some method called foreachWithIndex which is given to the function i
give to forEachWithIndex.

of course i could do
var i = 0
trav.foreach(e => logic(i, e); i+=1;)
but i don't like these "outside the actual block"-variables.

what's a good and clean way to achieve this?

Sun, 2010-11-21, 22:47

#15

Sciss

Joined: 2008-12-17,

Re: foreachwithindex?

if you use an iterator first, zipWithIndex won't create "a huge collection". so if you do

aSeq.iterator.zipWithIndex.foreach

you don't waste space. this is what is done for you:

def zipWithIndex = new Iterator[(A, Int)] {
var idx = 0
def hasNext = self.hasNext
def next = {
val ret = (self.next, idx)
idx += 1
ret
}
}

not sure what happens if you do aSeq.view.zipWithIndex ... maybe someone competent can tell whether this is space-efficient, too...

best, -sciss-

Am 21.11.2010 um 21:33 schrieb HamsterofDeath:

> let's say i have a traversable and i want to iterate over all elements
> but i need to know the index of the current element as well.
> is there a faster way than traversable.zipWithIndex.foreach? i mean, i
> don't really want to create a huge collection filled with new tuples
> just because of a single int that could be a simple counter var hidden
> in some method called foreachWithIndex which is given to the function i
> give to forEachWithIndex.
>
> of course i could do
> var i = 0
> trav.foreach(e => logic(i, e); i+=1;)
> but i don't like these "outside the actual block"-variables.
>
> what's a good and clean way to achieve this?

Sun, 2010-11-21, 23:37

#16

Luc Duponcheel

Joined: 2008-12-19,

Re: foreachwithindex?

just a minor remark about

> let's say i have a traversable ...
and
> aSeq.iterator.zipWithIndex.foreach ...

on the one hand it is not sufficient to work with a traversable
and
on the other hand, it is not necessary to work with a seq

iterables are. I think, the right abstraction

scala> Iterable('a', 'b', 'c').iterator.zipWithIndex.foreach { case (c, i) => println(c + " at index " + i) }
a at index 0
b at index 1
c at index 2

On Sun, Nov 21, 2010 at 10:44 PM, Sciss <contact@sciss.de> wrote:

if you use an iterator first, zipWithIndex won't create "a huge collection". so if you do

aSeq.iterator.zipWithIndex.foreach

you don't waste space. this is what is done for you:

def zipWithIndex = new Iterator[(A, Int)] {
var idx = 0
def hasNext = self.hasNext
def next = {
val ret = (self.next, idx)
idx += 1
ret
}
}

not sure what happens if you do aSeq.view.zipWithIndex ... maybe someone competent can tell whether this is space-efficient, too...

best, -sciss-

Am 21.11.2010 um 21:33 schrieb HamsterofDeath:

> let's say i have a traversable and i want to iterate over all elements
> but i need to know the index of the current element as well.
> is there a faster way than traversable.zipWithIndex.foreach? i mean, i
> don't really want to create a huge collection filled with new tuples
> just because of a single int that could be a simple counter var hidden
> in some method called foreachWithIndex which is given to the function i
> give to forEachWithIndex.
>
> of course i could do
> var i = 0
> trav.foreach(e => logic(i, e); i+=1;)
> but i don't like these "outside the actual block"-variables.
>
> what's a good and clean way to achieve this?

--
__~O
-\ <,
(*)/ (*)

reality goes far beyond imagination

Mon, 2010-11-22, 04:17

#17

ichoran

Joined: 2009-08-14,

Re: foreachwithindex?

There used to be iterator.counted, which incurred almost no performance penalty. As it is, zipWithIndex is almost unusable in any performance-intensive task since it requires at least one extra object creation per item* (but then again, do you ever really want to do performance-intensive stuff on anything but primitive arrays?). I keep meaning to add a "zippedWithIndex" wrapper (like "zipped") to my personal library to get around this problem, but I haven't yet so I can't share it. I think this is probably the best way to go.

--Rex

* Maybe JVMs will eventually get to the point where the object is (routinely, not just in special best-case-scenarios) not created at all unless absolutely required. But this is a pretty hefty requirement for the JVM if you then use tuples in expected ways (e.g. pass them as tuples, match on them, etc.).

On Sun, Nov 21, 2010 at 4:33 PM, HamsterofDeath <h-star@gmx.de> wrote:

let's say i have a traversable and i want to iterate over all elements
but i need to know the index of the current element as well.
is there a faster way than traversable.zipWithIndex.foreach? i mean, i
don't really want to create a huge collection filled with new tuples
just because of a single int that could be a simple counter var hidden
in some method called foreachWithIndex which is given to the function i
give to forEachWithIndex.

of course i could do
var i = 0
trav.foreach(e => logic(i, e); i+=1;)
but i don't like these "outside the actual block"-variables.

what's a good and clean way to achieve this?

Mon, 2010-11-22, 08:37

#18

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

better, but that still creates a large number of tuples.

Am 21.11.2010 23:24, schrieb Luc Duponcheel:

qSckRhfdv [at] mail [dot] gmail [dot] com" type="cite">just a minor remark about

> let's say i have a traversable ...
and
> aSeq.iterator.zipWithIndex.foreach ...

on the one hand it is not sufficient to work with a traversable
and
on the other hand, it is not necessary to work with a seq

iterables are. I think, the right abstraction

scala> Iterable('a', 'b', 'c').iterator.zipWithIndex.foreach { case (c, i) => println(c + " at index " + i) }
a at index 0
b at index 1
c at index 2

On Sun, Nov 21, 2010 at 10:44 PM, Sciss <contact [at] sciss [dot] de" rel="nofollow">contact@sciss.de> wrote:

if you use an iterator first, zipWithIndex won't create "a huge collection". so if you do

aSeq.iterator.zipWithIndex.foreach

you don't waste space. this is what is done for you:

def zipWithIndex = new Iterator[(A, Int)] {
var idx = 0
def hasNext = self.hasNext
def next = {
val ret = (self.next, idx)
idx += 1
ret
}
}

not sure what happens if you do aSeq.view.zipWithIndex ... maybe someone competent can tell whether this is space-efficient, too...

best, -sciss-

Am 21.11.2010 um 21:33 schrieb HamsterofDeath:

> let's say i have a traversable and i want to iterate over all elements
> but i need to know the index of the current element as well.
> is there a faster way than traversable.zipWithIndex.foreach? i mean, i
> don't really want to create a huge collection filled with new tuples
> just because of a single int that could be a simple counter var hidden
> in some method called foreachWithIndex which is given to the function i
> give to forEachWithIndex.
>
> of course i could do
> var i = 0
> trav.foreach(e => logic(i, e); i+=1;)
> but i don't like these "outside the actual block"-variables.
>
> what's a good and clean way to achieve this?

--
__~O
-\ <,
(*)/ (*)

reality goes far beyond imagination

Mon, 2010-11-22, 08:47

#19

Sciss

Joined: 2008-12-17,

Re: foreachwithindex?

you could still roll your own version that doesn't create tuples....

class RichIterable[ A ]( i: Iterable[ A ]) {
def foreachWithIndex( fun: (A, Int) => Unit ) {
var idx = 0
val iter = i.iterator
while( iter.hasNext ) {
fun( iter.next, idx )
idx += 1
}
}
}

implicit def enrichIterable[ A ]( i: Iterable[ A ]) = new RichIterable( i )

Vector.fill( 20 )( math.random ).foreachWithIndex( (num, idx) => println( "at " + idx + " : " + num ))

Am 22.11.2010 um 07:24 schrieb HamsterofDeath:

> better, but that still creates a large number of tuples.
>
> Am 21.11.2010 23:24, schrieb Luc Duponcheel:
>> just a minor remark about
>>
>> > let's say i have a traversable ...
>> and
>> > aSeq.iterator.zipWithIndex.foreach ...
>>
>> on the one hand it is not sufficient to work with a traversable
>> and
>> on the other hand, it is not necessary to work with a seq
>>
>> iterables are. I think, the right abstraction
>>
>> scala> Iterable('a', 'b', 'c').iterator.zipWithIndex.foreach { case (c, i) => println(c + " at index " + i) }
>> a at index 0
>> b at index 1
>> c at index 2
>>
>>
>> On Sun, Nov 21, 2010 at 10:44 PM, Sciss wrote:
>> if you use an iterator first, zipWithIndex won't create "a huge collection". so if you do
>>
>> aSeq.iterator.zipWithIndex.foreach
>>
>> you don't waste space. this is what is done for you:
>>
>> def zipWithIndex = new Iterator[(A, Int)] {
>> var idx = 0
>> def hasNext = self.hasNext
>> def next = {
>> val ret = (self.next, idx)
>> idx += 1
>> ret
>> }
>> }
>>
>> not sure what happens if you do aSeq.view.zipWithIndex ... maybe someone competent can tell whether this is space-efficient, too...
>>
>> best, -sciss-
>>
>>
>> Am 21.11.2010 um 21:33 schrieb HamsterofDeath:
>>
>> > let's say i have a traversable and i want to iterate over all elements
>> > but i need to know the index of the current element as well.
>> > is there a faster way than traversable.zipWithIndex.foreach? i mean, i
>> > don't really want to create a huge collection filled with new tuples
>> > just because of a single int that could be a simple counter var hidden
>> > in some method called foreachWithIndex which is given to the function i
>> > give to forEachWithIndex.
>> >
>> > of course i could do
>> > var i = 0
>> > trav.foreach(e => logic(i, e); i+=1;)
>> > but i don't like these "outside the actual block"-variables.
>> >
>> > what's a good and clean way to achieve this?
>>
>>
>>
>>

Mon, 2010-11-22, 10:07

#20

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

this is probably what i will do.

-------- Original-Nachricht --------
> Datum: Mon, 22 Nov 2010 07:31:21 +0000
> Von: Sciss
> An: HamsterofDeath
> CC: Luc Duponcheel , "scala-user@listes.epfl.ch"
> Betreff: Re: [scala-user] foreachwithindex?

> you could still roll your own version that doesn't create tuples....
>
> class RichIterable[ A ]( i: Iterable[ A ]) {
> def foreachWithIndex( fun: (A, Int) => Unit ) {
> var idx = 0
> val iter = i.iterator
> while( iter.hasNext ) {
> fun( iter.next, idx )
> idx += 1
> }
> }
> }
>
> implicit def enrichIterable[ A ]( i: Iterable[ A ]) = new RichIterable( i
> )
>
> Vector.fill( 20 )( math.random ).foreachWithIndex( (num, idx) => println(
> "at " + idx + " : " + num ))
>
>
> Am 22.11.2010 um 07:24 schrieb HamsterofDeath:
>
> > better, but that still creates a large number of tuples.
> >
> > Am 21.11.2010 23:24, schrieb Luc Duponcheel:
> >> just a minor remark about
> >>
> >> > let's say i have a traversable ...
> >> and
> >> > aSeq.iterator.zipWithIndex.foreach ...
> >>
> >> on the one hand it is not sufficient to work with a traversable
> >> and
> >> on the other hand, it is not necessary to work with a seq
> >>
> >> iterables are. I think, the right abstraction
> >>
> >> scala> Iterable('a', 'b', 'c').iterator.zipWithIndex.foreach { case (c,
> i) => println(c + " at index " + i) }
> >> a at index 0
> >> b at index 1
> >> c at index 2
> >>
> >>
> >> On Sun, Nov 21, 2010 at 10:44 PM, Sciss wrote:
> >> if you use an iterator first, zipWithIndex won't create "a huge
> collection". so if you do
> >>
> >> aSeq.iterator.zipWithIndex.foreach
> >>
> >> you don't waste space. this is what is done for you:
> >>
> >> def zipWithIndex = new Iterator[(A, Int)] {
> >> var idx = 0
> >> def hasNext = self.hasNext
> >> def next = {
> >> val ret = (self.next, idx)
> >> idx += 1
> >> ret
> >> }
> >> }
> >>
> >> not sure what happens if you do aSeq.view.zipWithIndex ... maybe
> someone competent can tell whether this is space-efficient, too...
> >>
> >> best, -sciss-
> >>
> >>
> >> Am 21.11.2010 um 21:33 schrieb HamsterofDeath:
> >>
> >> > let's say i have a traversable and i want to iterate over all
> elements
> >> > but i need to know the index of the current element as well.
> >> > is there a faster way than traversable.zipWithIndex.foreach? i mean,
> i
> >> > don't really want to create a huge collection filled with new tuples
> >> > just because of a single int that could be a simple counter var
> hidden
> >> > in some method called foreachWithIndex which is given to the function
> i
> >> > give to forEachWithIndex.
> >> >
> >> > of course i could do
> >> > var i = 0
> >> > trav.foreach(e => logic(i, e); i+=1;)
> >> > but i don't like these "outside the actual block"-variables.
> >> >
> >> > what's a good and clean way to achieve this?
> >>
> >>
> >>
> >>
> >> --
> >> __~O
> >> -\ <,
> >> (*)/ (*)
> >>
> >> reality goes far beyond imagination
> >>
> >
>

Mon, 2010-11-22, 22:07

#21

Rüdiger Klaehn

Joined: 2009-06-02,

Re: foreachwithindex?

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Mon, 2010-11-22, 23:57

#22

ichoran

Joined: 2009-08-14,

Re: foreachwithindex?

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Tue, 2010-11-23, 02:47

#23

Aaron Novstrup

Joined: 2010-08-17,

Re: foreachwithindex?

I'm probably missing something, but couldn't zippedWithIndex be written as:

def zippedWithIndex[El, Repr](c: TraversableLike[El, Repr]) =
(c, LazyIterable.from(0)).zipped

where LazyIterable is defined as

class LazyIterable[T](iter: => Iterator[T]) extends Iterable[T] {
def iterator = iter
}
object LazyIterable {
def apply[T](iter: => Iterator[T]) = new LazyIterable(iter)
def from(i: Int) = LazyIterable(Iterator.from(i))
}

Rex's example becomes

zippedWithIndex(a).filter(_.startsWith("1") && _ % 3 == 0)._1

A couple things stand out to me here.

The first is that the standard library lacks a non-strict collection
that does not consume memory. Stream is non-strict but it stores its
elements after they are evaluated. Collection views do not consume
_extra_ memory, but their elements are stored in a backing collection.
My LazyIterable may be sufficient but can also be dangerous (e.g.
`LazyIterable.from(0).map(_.toString).take(5)` blows up).

Second, Tuple2.Zipped isn't without its kinks. In particular, note the
`_1` at the end of the expression -- operations on a Zipped cannot be
chained easily because they return a Tuple2 rather than another Zipped.
Perhaps this is only surprising because of the jarring inconsistency
when compared to Scala collections, but it certainly threw me off.

On 11/22/2010 02:52 PM, Rex Kerr wrote:
> I can't speak for anyone else's use cases, but in my case, profiling,
> microbenchmarking, and benchmarking production code written both ways have
> shown that this is a real problem for almost every non-IO/UI task that I
> do. For example, in
>
> val a = Array.range(1,1000000).map(_.toString)
> val b = a.zipWithIndex.filter(x => x._1.startsWith("1") &&
> (x._2%3==0)).map(_._1)
> val c = { var i=-1; a.filter(_.startsWith("1") &) }
>
> the computation of c is 10-20x faster _and_ is considerably more compact to
> write (though, sadly, more error prone--look at the trick I had to use where
> I started at -1). (Same deal with List instead of Array.)
>
> Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn.
> zipped is almost always faster _and_ more convenient, and zippedWithIndex
> would be too.
>
> --Rex

Sat, 2010-11-27, 09:57

#24

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

i made some benchmarks. i let this version:
def fastForeachIndexed4D[U](f: (Int, Int, Int, Int, T) => U) {
var x = array.length - 1;
while (x >= 0) {
var y = array(x).length - 1
while (y >= 0) {
var z = array(x)(y).length - 1
while (z >= 0) {
var a = array(x)(y)(z).length - 1
while (a >= 0) {
f(x, y, z, a, array(x)(y)(z)(a))
a -= 1
}
z -= 1
}
y -= 1
}
x -= 1
}
}

fight against this:

def foreachIndexed4D[U](f: (Int, Int, Int, Int, T) => U) {
array.foreachIndexed((index, nested3darray) => {
nested3darray.foreachIndexed3D((index2, index3, index4, element)
=> f(index, index2, index3, index4, element))
})
def foreachIndexed3D[U](f: (Int, Int, Int, T) => U) {
array.foreachIndexed((index, nested2darray) => {
nested2darray.foreachIndexed2D((index2, index3, element) =>
f(index, index2, index3, element))
})
}
def foreachIndexed2D[U](f: (Int, Int, T) => U) {
array.foreachIndexed((index: Int, nested1darray: Array[T]) => {
nested1darray.foreachIndexed((index2, element) => f(index, index2,
element))
})
}
def foreachIndexed[U](f: (Int, T) => U) {
var counter = 0
val until = array.length
while (counter < until) {
f(counter, array(counter))
counter += 1
}
}

and made another implemention without the index parameters:

def fastForeach4D[U](f: T => U) {
var x = array.length - 1;
while (x >= 0) {
var y = array(x).length - 1
while (y >= 0) {
var z = array(x)(y).length - 1
while (z >= 0) {
var a = array(x)(y)(z).length - 1
while (a >= 0) {
f(array(x)(y)(z)(a))
a -= 1
}
z -= 1
}
y -= 1
}
x -= 1
}
}

and

def foreach4D[U](f: T => U) {
array4d.foreach(_.foreach3D(f))
}
def foreach3D[U](f: T => U) {
array3d.foreach(_.foreach2D(f))
}
def foreach2D[U](f: T => U) {
array2d.foreach(_.foreach(f))
}

on a java 7 vm, server and optimize, after enough executions, the
while-looped methods where the fastest, but by a really small margin.
the difference was at most 380 to 410 million operations per second, in
some runs both versions were equal. when ignoring random slowdowns,
probably caused by minor garbage collections or background stuff, i'd
say the vm has gotten really good at optimizing and one shouldn't worry
about problems like simple local objects or nested method calls. they
are optimized away.
the bigger problem are escaping objects which can not be optimized like
this.

Am 23.11.2010 02:38, schrieb Aaron Novstrup:
> I'm probably missing something, but couldn't zippedWithIndex be written as:
>
> def zippedWithIndex[El, Repr](c: TraversableLike[El, Repr]) =
> (c, LazyIterable.from(0)).zipped
>
> where LazyIterable is defined as
>
> class LazyIterable[T](iter: => Iterator[T]) extends Iterable[T] {
> def iterator = iter
> }
> object LazyIterable {
> def apply[T](iter: => Iterator[T]) = new LazyIterable(iter)
> def from(i: Int) = LazyIterable(Iterator.from(i))
> }
>
> Rex's example becomes
>
> zippedWithIndex(a).filter(_.startsWith("1") && _ % 3 == 0)._1
>
> A couple things stand out to me here.
>
> The first is that the standard library lacks a non-strict collection
> that does not consume memory. Stream is non-strict but it stores its
> elements after they are evaluated. Collection views do not consume
> _extra_ memory, but their elements are stored in a backing collection.
> My LazyIterable may be sufficient but can also be dangerous (e.g.
> `LazyIterable.from(0).map(_.toString).take(5)` blows up).
>
> Second, Tuple2.Zipped isn't without its kinks. In particular, note the
> `_1` at the end of the expression -- operations on a Zipped cannot be
> chained easily because they return a Tuple2 rather than another Zipped.
> Perhaps this is only surprising because of the jarring inconsistency
> when compared to Scala collections, but it certainly threw me off.
>
> On 11/22/2010 02:52 PM, Rex Kerr wrote:
>> I can't speak for anyone else's use cases, but in my case, profiling,
>> microbenchmarking, and benchmarking production code written both ways have
>> shown that this is a real problem for almost every non-IO/UI task that I
>> do. For example, in
>>
>> val a = Array.range(1,1000000).map(_.toString)
>> val b = a.zipWithIndex.filter(x => x._1.startsWith("1") &&
>> (x._2%3==0)).map(_._1)
>> val c = { var i=-1; a.filter(_.startsWith("1") &) }
>>
>> the computation of c is 10-20x faster _and_ is considerably more compact to
>> write (though, sadly, more error prone--look at the trick I had to use where
>> I started at -1). (Same deal with List instead of Array.)
>>
>> Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn.
>> zipped is almost always faster _and_ more convenient, and zippedWithIndex
>> would be too.
>>
>> --Rex

Sat, 2010-11-27, 13:07

#25

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

i found a small bug in my benchmark which made it ... not quite right.
the correct results are a bit disturbing.
this code:

def fastForeach4D[U](f: (Int, Int, Int, Int, T) => U) {
var x = array.length - 1;
while (x >= 0) {
var y = array(x).length - 1
while (y >= 0) {
var z = array(x)(y).length - 1
while (z >= 0) {
var a = array(x)(y)(z).length - 1
while (a >= 0) {
f(array(x)(y)(z)(a))
a -= 1
}
z -= 1
}
y -= 1
}
x -= 1
}
}

is actually SLOWER than the implicit wrapping object oriented approach.
optimizing the array access by hand into

def fastForeach4D[U](f: T => U) {
var x = array.length - 1;
while (x >= 0) {
val ax = array(x)
var y = ax.length - 1
while (y >= 0) {
val ay = ax(y)
var z = ay.length - 1
while (z >= 0) {
val az = ay(z)
var a = az.length - 1
while (a >= 0) {
f(az(a))
a -= 1
}
z -= 1
}
y -= 1
}
x -= 1
}
}

fixed that, but i tested such things a while ago - the server vm should
be capable of optimizing thing like these.
new results, summarized:
while loop in case of foreach is about 5% faster than the
oo/functional-approach
while loop in case of indexed foreach with a 4d array is about twice as
fast as the oo/functional approach, but i'm sure i could tweak that one
to match the while one.

Am 27.11.2010 09:55, schrieb HamsterofDeath:
> i made some benchmarks. i let this version:
> def fastForeachIndexed4D[U](f: (Int, Int, Int, Int, T) => U) {
> var x = array.length - 1;
> while (x >= 0) {
> var y = array(x).length - 1
> while (y >= 0) {
> var z = array(x)(y).length - 1
> while (z >= 0) {
> var a = array(x)(y)(z).length - 1
> while (a >= 0) {
> f(x, y, z, a, array(x)(y)(z)(a))
> a -= 1
> }
> z -= 1
> }
> y -= 1
> }
> x -= 1
> }
> }
>
> fight against this:
>
> def foreachIndexed4D[U](f: (Int, Int, Int, Int, T) => U) {
> array.foreachIndexed((index, nested3darray) => {
> nested3darray.foreachIndexed3D((index2, index3, index4, element)
> => f(index, index2, index3, index4, element))
> })
> def foreachIndexed3D[U](f: (Int, Int, Int, T) => U) {
> array.foreachIndexed((index, nested2darray) => {
> nested2darray.foreachIndexed2D((index2, index3, element) =>
> f(index, index2, index3, element))
> })
> }
> def foreachIndexed2D[U](f: (Int, Int, T) => U) {
> array.foreachIndexed((index: Int, nested1darray: Array[T]) => {
> nested1darray.foreachIndexed((index2, element) => f(index, index2,
> element))
> })
> }
> def foreachIndexed[U](f: (Int, T) => U) {
> var counter = 0
> val until = array.length
> while (counter < until) {
> f(counter, array(counter))
> counter += 1
> }
> }
>
> and made another implemention without the index parameters:
>
> def fastForeach4D[U](f: T => U) {
> var x = array.length - 1;
> while (x >= 0) {
> var y = array(x).length - 1
> while (y >= 0) {
> var z = array(x)(y).length - 1
> while (z >= 0) {
> var a = array(x)(y)(z).length - 1
> while (a >= 0) {
> f(array(x)(y)(z)(a))
> a -= 1
> }
> z -= 1
> }
> y -= 1
> }
> x -= 1
> }
> }
>
> and
>
> def foreach4D[U](f: T => U) {
> array4d.foreach(_.foreach3D(f))
> }
> def foreach3D[U](f: T => U) {
> array3d.foreach(_.foreach2D(f))
> }
> def foreach2D[U](f: T => U) {
> array2d.foreach(_.foreach(f))
> }
>
> on a java 7 vm, server and optimize, after enough executions, the
> while-looped methods where the fastest, but by a really small margin.
> the difference was at most 380 to 410 million operations per second, in
> some runs both versions were equal. when ignoring random slowdowns,
> probably caused by minor garbage collections or background stuff, i'd
> say the vm has gotten really good at optimizing and one shouldn't worry
> about problems like simple local objects or nested method calls. they
> are optimized away.
> the bigger problem are escaping objects which can not be optimized like
> this.
>
> Am 23.11.2010 02:38, schrieb Aaron Novstrup:
>> I'm probably missing something, but couldn't zippedWithIndex be written as:
>>
>> def zippedWithIndex[El, Repr](c: TraversableLike[El, Repr]) =
>> (c, LazyIterable.from(0)).zipped
>>
>> where LazyIterable is defined as
>>
>> class LazyIterable[T](iter: => Iterator[T]) extends Iterable[T] {
>> def iterator = iter
>> }
>> object LazyIterable {
>> def apply[T](iter: => Iterator[T]) = new LazyIterable(iter)
>> def from(i: Int) = LazyIterable(Iterator.from(i))
>> }
>>
>> Rex's example becomes
>>
>> zippedWithIndex(a).filter(_.startsWith("1") && _ % 3 == 0)._1
>>
>> A couple things stand out to me here.
>>
>> The first is that the standard library lacks a non-strict collection
>> that does not consume memory. Stream is non-strict but it stores its
>> elements after they are evaluated. Collection views do not consume
>> _extra_ memory, but their elements are stored in a backing collection.
>> My LazyIterable may be sufficient but can also be dangerous (e.g.
>> `LazyIterable.from(0).map(_.toString).take(5)` blows up).
>>
>> Second, Tuple2.Zipped isn't without its kinks. In particular, note the
>> `_1` at the end of the expression -- operations on a Zipped cannot be
>> chained easily because they return a Tuple2 rather than another Zipped.
>> Perhaps this is only surprising because of the jarring inconsistency
>> when compared to Scala collections, but it certainly threw me off.
>>
>> On 11/22/2010 02:52 PM, Rex Kerr wrote:
>>> I can't speak for anyone else's use cases, but in my case, profiling,
>>> microbenchmarking, and benchmarking production code written both ways have
>>> shown that this is a real problem for almost every non-IO/UI task that I
>>> do. For example, in
>>>
>>> val a = Array.range(1,1000000).map(_.toString)
>>> val b = a.zipWithIndex.filter(x => x._1.startsWith("1") &&
>>> (x._2%3==0)).map(_._1)
>>> val c = { var i=-1; a.filter(_.startsWith("1") &) }
>>>
>>> the computation of c is 10-20x faster _and_ is considerably more compact to
>>> write (though, sadly, more error prone--look at the trick I had to use where
>>> I started at -1). (Same deal with List instead of Array.)
>>>
>>> Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn.
>>> zipped is almost always faster _and_ more convenient, and zippedWithIndex
>>> would be too.
>>>
>>> --Rex
>

Sat, 2010-11-27, 17:57

#26

Joshua.Suereth

Joined: 2008-09-02,

Re: foreachwithindex?

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sat, 2010-11-27, 20:07

#27

ichoran

Joined: 2009-08-14,

Re: foreachwithindex?

With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sat, 2010-11-27, 20:27

#28

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:

AANLkTinxjopSnfdJ7ZL3sKfxCMUMtOrbGRDhrw4Gk9xJ [at] mail [dot] gmail [dot] com" type="cite">With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua [dot] suereth [at] gmail [dot] com" rel="nofollow">joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran [at] gmail [dot] com" target="_blank" rel="nofollow">ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn [at] googlemail [dot] com" target="_blank" rel="nofollow">rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sat, 2010-11-27, 22:17

#29

Joshua.Suereth

Joined: 2008-09-02,

Re: foreachwithindex?

Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sat, 2010-11-27, 23:17

#30

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

do you have an example of a view being significantly faster than a non-viewed collection?

Am 27.11.2010 22:07, schrieb Josh Suereth:

Tj3DfMUDYZq1e4eNN_MkpmJn [at] mail [dot] gmail [dot] com" type="cite">Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star [at] gmx [dot] de" rel="nofollow">h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua [dot] suereth [at] gmail [dot] com" target="_blank" rel="nofollow">joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran [at] gmail [dot] com" target="_blank" rel="nofollow">ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn [at] googlemail [dot] com" target="_blank" rel="nofollow">rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sat, 2010-11-27, 23:27

#31

Luc Duponcheel

Joined: 2008-12-19,

Re: foreachwithindex?

hi,

... at the risk of saying something stupid ...

could it be that lazy evaluation is inherently space-consuming and
as a consequence (because of garbage collection) could also be
more time consuming ?

luc

On Sat, Nov 27, 2010 at 10:07 PM, Josh Suereth <joshua.suereth@gmail.com> wrote:

Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star@gmx.de> wrote:
random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

--
__~O
-\ <,
(*)/ (*)

reality goes far beyond imagination

Sun, 2010-11-28, 01:57

#32

Joshua.Suereth

Joined: 2008-09-02,

Re: foreachwithindex?

Yes, anytime you use the "take" methods, the view will eventually be faster. Here's an sperformance output: https://github.com/jsuereth/sperformance/wiki/Scala-Collection-Views
I'm investigating the problem a bit more, but it appears like the zipWithIndex implementation on views has a large overhead. I think this can be overcome, but for the short-term you should probably avoid views and zipWithIndex.

On Sat, Nov 27, 2010 at 5:08 PM, HamsterofDeath <h-star@gmx.de> wrote:

do you have an example of a view being significantly faster than a non-viewed collection?

Am 27.11.2010 22:07, schrieb Josh Suereth:
Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sun, 2010-11-28, 08:17

#33

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

afaik only if you foreach the collection more than once AND if new objects are created on the way through

Am 27.11.2010 23:11, schrieb Luc Duponcheel:

z9x7GEd4Ktmd850SVqa6WE4 [at] mail [dot] gmail [dot] com" type="cite">hi,

... at the risk of saying something stupid ...

could it be that lazy evaluation is inherently space-consuming and
as a consequence (because of garbage collection) could also be
more time consuming ?

luc

On Sat, Nov 27, 2010 at 10:07 PM, Josh Suereth <joshua [dot] suereth [at] gmail [dot] com" rel="nofollow">joshua.suereth@gmail.com> wrote:

Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star [at] gmx [dot] de" target="_blank" rel="nofollow">h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua [dot] suereth [at] gmail [dot] com" target="_blank" rel="nofollow">joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran [at] gmail [dot] com" target="_blank" rel="nofollow">ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn [at] googlemail [dot] com" target="_blank" rel="nofollow">rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

--
__~O
-\ <,
(*)/ (*)

reality goes far beyond imagination

Sun, 2010-11-28, 08:27

#34

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

makes sense since filter is applied to the whole collection in the eager case and then the first x elements are taken. but then again, there is "takeWhile" which could avoid that:
val tricky = {
    var taken = 0;
    (e:T) => {
      val take = if (taken < 100) myCondition(e) else false
      if (take) taken += 1
      take
   }
}
col.takeWhile(tricky)

Am 28.11.2010 01:49, schrieb Josh Suereth:

AANLkTim8k6nK4t-z7qdF3KHpkcgHf+0L5Ppaur--8Lmw [at] mail [dot] gmail [dot] com" type="cite">Yes, anytime you use the "take" methods, the view will eventually be faster. Here's an sperformance output: https://github.com/jsuereth/sperformance/wiki/Scala-Collection-Views
I'm investigating the problem a bit more, but it appears like the zipWithIndex implementation on views has a large overhead. I think this can be overcome, but for the short-term you should probably avoid views and zipWithIndex.

On Sat, Nov 27, 2010 at 5:08 PM, HamsterofDeath <h-star [at] gmx [dot] de" rel="nofollow">h-star@gmx.de> wrote:

do you have an example of a view being significantly faster than a non-viewed collection?

Am 27.11.2010 22:07, schrieb Josh Suereth:
Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star [at] gmx [dot] de" target="_blank" rel="nofollow">h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua [dot] suereth [at] gmail [dot] com" target="_blank" rel="nofollow">joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran [at] gmail [dot] com" target="_blank" rel="nofollow">ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn [at] googlemail [dot] com" target="_blank" rel="nofollow">rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sun, 2010-11-28, 15:47

#35

Joshua.Suereth

Joined: 2008-09-02,

Re: foreachwithindex?

Unfortunately that algorithm has a logic flaw. I think you meant to call .filter instead of .takeWhile (otherwise your implementation is equivalent to col.takeWhile(myCondition).take(100) rather than col.filter(myCondition).take(100).
I've run your example (using filter) through sperformance with 100,000 warm-up runs and best-of 10 measurements. Although your method outperforms standard collection usage, views still outperform this 'anti-functional-pattern' as labeled in the graph below:
<a href="http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png"> Here's the image: <img src="http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png" /></a>

So, like I said: views are great when using any take method. They help keep the code readable and have good performance. I think the zip/zipWithIndex piece needs to be optimise, but other than that, views help maintain readability and laziness over using state-y tricks on closures.
- Josh

On Sun, Nov 28, 2010 at 2:21 AM, HamsterofDeath <h-star@gmx.de> wrote:

makes sense since filter is applied to the whole collection in the eager case and then the first x elements are taken. but then again, there is "takeWhile" which could avoid that:
val tricky = {
    var taken = 0;
    (e:T) => {
      val take = if (taken < 100) myCondition(e) else false
      if (take) taken += 1
      take
   }
}
col.takeWhile(tricky)

Am 28.11.2010 01:49, schrieb Josh Suereth:
Yes, anytime you use the "take" methods, the view will eventually be faster. Here's an sperformance output: https://github.com/jsuereth/sperformance/wiki/Scala-Collection-Views
I'm investigating the problem a bit more, but it appears like the zipWithIndex implementation on views has a large overhead. I think this can be overcome, but for the short-term you should probably avoid views and zipWithIndex.

On Sat, Nov 27, 2010 at 5:08 PM, HamsterofDeath <h-star@gmx.de> wrote:

do you have an example of a view being significantly faster than a non-viewed collection?

Am 27.11.2010 22:07, schrieb Josh Suereth:
Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sun, 2010-11-28, 15:57

#36

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

yes, you are right, my method has a bug.
what is was supposed to do is to foreach over a collection, and stop iterating once it collected 100 valid elements - the most important part being "stop iterating", which it doesn't when applying it to filter and does too soon when using takeWhile.

Am 28.11.2010 15:36, schrieb Josh Suereth:

AANLkTik0j6OUaWgV3XtGHnkk1HH+jQzKZEwo_HERTj8D [at] mail [dot] gmail [dot] com" type="cite">Unfortunately that algorithm has a logic flaw. I think you meant to call .filter instead of .takeWhile (otherwise your implementation is equivalent to col.takeWhile(myCondition).take(100) rather than col.filter(myCondition).take(100).
I've run your example (using filter) through sperformance with 100,000 warm-up runs and best-of 10 measurements. Although your method outperforms standard collection usage, views still outperform this 'anti-functional-pattern' as labeled in the graph below:
<a href="http://www.scala-lang.org/http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png"> Here's the image: <img src="http://www.scala-lang.org/http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png" /></a>

So, like I said: views are great when using any take method. They help keep the code readable and have good performance. I think the zip/zipWithIndex piece needs to be optimise, but other than that, views help maintain readability and laziness over using state-y tricks on closures.
- Josh

On Sun, Nov 28, 2010 at 2:21 AM, HamsterofDeath <h-star [at] gmx [dot] de" rel="nofollow">h-star@gmx.de> wrote:

makes sense since filter is applied to the whole collection in the eager case and then the first x elements are taken. but then again, there is "takeWhile" which could avoid that:
val tricky = {
    var taken = 0;
    (e:T) => {
      val take = if (taken < 100) myCondition(e) else false
      if (take) taken += 1
      take
   }
}
col.takeWhile(tricky)

Am 28.11.2010 01:49, schrieb Josh Suereth:
Yes, anytime you use the "take" methods, the view will eventually be faster. Here's an sperformance output: https://github.com/jsuereth/sperformance/wiki/Scala-Collection-Views
I'm investigating the problem a bit more, but it appears like the zipWithIndex implementation on views has a large overhead. I think this can be overcome, but for the short-term you should probably avoid views and zipWithIndex.

On Sat, Nov 27, 2010 at 5:08 PM, HamsterofDeath <h-star [at] gmx [dot] de" target="_blank" rel="nofollow">h-star@gmx.de> wrote:

do you have an example of a view being significantly faster than a non-viewed collection?

Am 27.11.2010 22:07, schrieb Josh Suereth:
Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star [at] gmx [dot] de" target="_blank" rel="nofollow">h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua [dot] suereth [at] gmail [dot] com" target="_blank" rel="nofollow">joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran [at] gmail [dot] com" target="_blank" rel="nofollow">ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn [at] googlemail [dot] com" target="_blank" rel="nofollow">rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sun, 2010-11-28, 20:27

#37

Joshua.Suereth

Joined: 2008-09-02,

Re: foreachwithindex?

Yes, this is exactly the situation that views were designed for. Did you notice that the view version out-performs your by-hand version *before* it hits the early-out on iteration cap (colleciton size between 40 and 200) as well? This is most likely due to inefficiencies in the filter method, but please don't just throw out the functional style of collections when worried about performance. That's my basic point. Views should get us close to the performance of hand-rolled optimizations to the point of making hand-rolled optimizations unnecessary for a majority of code.
On Sun, Nov 28, 2010 at 9:47 AM, HamsterofDeath <h-star@gmx.de> wrote:

yes, you are right, my method has a bug.
what is was supposed to do is to foreach over a collection, and stop iterating once it collected 100 valid elements - the most important part being "stop iterating", which it doesn't when applying it to filter and does too soon when using takeWhile.

Am 28.11.2010 15:36, schrieb Josh Suereth:
Unfortunately that algorithm has a logic flaw. I think you meant to call .filter instead of .takeWhile (otherwise your implementation is equivalent to col.takeWhile(myCondition).take(100) rather than col.filter(myCondition).take(100).
I've run your example (using filter) through sperformance with 100,000 warm-up runs and best-of 10 measurements. Although your method outperforms standard collection usage, views still outperform this 'anti-functional-pattern' as labeled in the graph below:
<a href="http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png"> Here's the image: <img src="http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png" /></a>

So, like I said: views are great when using any take method. They help keep the code readable and have good performance. I think the zip/zipWithIndex piece needs to be optimise, but other than that, views help maintain readability and laziness over using state-y tricks on closures.
- Josh

On Sun, Nov 28, 2010 at 2:21 AM, HamsterofDeath <h-star@gmx.de> wrote:

makes sense since filter is applied to the whole collection in the eager case and then the first x elements are taken. but then again, there is "takeWhile" which could avoid that:
val tricky = {
    var taken = 0;
    (e:T) => {
      val take = if (taken < 100) myCondition(e) else false
      if (take) taken += 1
      take
   }
}
col.takeWhile(tricky)

Am 28.11.2010 01:49, schrieb Josh Suereth:
Yes, anytime you use the "take" methods, the view will eventually be faster. Here's an sperformance output: https://github.com/jsuereth/sperformance/wiki/Scala-Collection-Views
I'm investigating the problem a bit more, but it appears like the zipWithIndex implementation on views has a large overhead. I think this can be overcome, but for the short-term you should probably avoid views and zipWithIndex.

On Sat, Nov 27, 2010 at 5:08 PM, HamsterofDeath <h-star@gmx.de> wrote:

do you have an example of a view being significantly faster than a non-viewed collection?

Am 27.11.2010 22:07, schrieb Josh Suereth:
Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Sun, 2010-11-28, 21:57

#38

H-star Development

Joined: 2010-04-14,

Re: foreachwithindex?

i know, i know. i never optimize serious code before profiling the application. but playing around like i did here always adds experience and deeper understanding, that's why i do it. now i know *why*, not only *that*

Am 28.11.2010 20:22, schrieb Josh Suereth:

AANLkTikZfK1vFe+3TzcwnAma_pC_mq07A1i8dBWmV3GM [at] mail [dot] gmail [dot] com" type="cite">Yes, this is exactly the situation that views were designed for. Did you notice that the view version out-performs your by-hand version *before* it hits the early-out on iteration cap (colleciton size between 40 and 200) as well? This is most likely due to inefficiencies in the filter method, but please don't just throw out the functional style of collections when worried about performance. That's my basic point. Views should get us close to the performance of hand-rolled optimizations to the point of making hand-rolled optimizations unnecessary for a majority of code.
On Sun, Nov 28, 2010 at 9:47 AM, HamsterofDeath <h-star [at] gmx [dot] de" rel="nofollow">h-star@gmx.de> wrote:

yes, you are right, my method has a bug.
what is was supposed to do is to foreach over a collection, and stop iterating once it collected 100 valid elements - the most important part being "stop iterating", which it doesn't when applying it to filter and does too soon when using takeWhile.

Am 28.11.2010 15:36, schrieb Josh Suereth:
Unfortunately that algorithm has a logic flaw. I think you meant to call .filter instead of .takeWhile (otherwise your implementation is equivalent to col.takeWhile(myCondition).take(100) rather than col.filter(myCondition).take(100).
I've run your example (using filter) through sperformance with 100,000 warm-up runs and best-of 10 measurements. Although your method outperforms standard collection usage, views still outperform this 'anti-functional-pattern' as labeled in the graph below:
<a href="http://www.scala-lang.org/http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png"> Here's the image: <img src="http://www.scala-lang.org/http://lh4.ggpht.com/_zY6AYwdFmgA/TPJnuyQSM8I/AAAAAAAAA2o/JJFbI0GEv0s/method%20-%3E%20takeWithFilter%20By%20Size.png" /></a>

So, like I said: views are great when using any take method. They help keep the code readable and have good performance. I think the zip/zipWithIndex piece needs to be optimise, but other than that, views help maintain readability and laziness over using state-y tricks on closures.
- Josh

On Sun, Nov 28, 2010 at 2:21 AM, HamsterofDeath <h-star [at] gmx [dot] de" target="_blank" rel="nofollow">h-star@gmx.de> wrote:

makes sense since filter is applied to the whole collection in the eager case and then the first x elements are taken. but then again, there is "takeWhile" which could avoid that:
val tricky = {
    var taken = 0;
    (e:T) => {
      val take = if (taken < 100) myCondition(e) else false
      if (take) taken += 1
      take
   }
}
col.takeWhile(tricky)

Am 28.11.2010 01:49, schrieb Josh Suereth:
Yes, anytime you use the "take" methods, the view will eventually be faster. Here's an sperformance output: https://github.com/jsuereth/sperformance/wiki/Scala-Collection-Views
I'm investigating the problem a bit more, but it appears like the zipWithIndex implementation on views has a large overhead. I think this can be overcome, but for the short-term you should probably avoid views and zipWithIndex.

On Sat, Nov 27, 2010 at 5:08 PM, HamsterofDeath <h-star [at] gmx [dot] de" target="_blank" rel="nofollow">h-star@gmx.de> wrote:

do you have an example of a view being significantly faster than a non-viewed collection?

Am 27.11.2010 22:07, schrieb Josh Suereth:
Essentially, that's what view is trying to do. When you call "view" all manipulations you are performing are deferred and saved until they must be executed. Calling .force returns a 'normal' collection and therefore executes the deferred work. With optimal inlining, it *should* perform well.
I ran a few of my benchmarks against it and found that the view case is an order of magnitude slower than the non-functional use case. Each of the algorithms grows linearly with collection size, but views have a higher slope.
It seems right here we may be suffering from the following: The zippedWithIndex call on an IndexedSeqView can be dangerous. Particularly because the index operator on Stream (used for the index part of the zip) is O(N) and the "IndexSeq" classes all assume that indexing is fast, and prefer it to other means of accessing members of the class. In fact, IndexedSeqView.Filtered creates a lazy index map to use for the filtered list so that it only has to lookup data as needed. This unfortunately means you iterate over the entire set of data as many times as you do in the non-view case, as well as having the overhead of indexing through this 'lookup array'. All-in-all, views seem sub-optimal for this use case which is very unfortunate!
I'm going to spend some time profiling the innards of the collections library and see if I can improve the performance of views. These are meant to be the correct solution to your problem, it's unfortunate they don't perform ideally right now.
On Sat, Nov 27, 2010 at 2:12 PM, HamsterofDeath <h-star [at] gmx [dot] de" target="_blank" rel="nofollow">h-star@gmx.de> wrote:

random thrown in question regarding views/map/filter-chaining:
is there even a reason to chain collection transforming method calls?
why not put all the logic together into one function and do it like this:
orgTraversable.collect(list of cases, if and some code blocks) // <- minimized overhead, no intermediate results, no accidently repeated evaluation

Am 27.11.2010 20:00, schrieb Rex Kerr:
With my test, it takes twice as long again to use the view with List (i.e. 20-40x longer than with the var instead of 10-20). With arrays, you can't .force because it ends up typed as a sequence after the map, and when you .toArray something goes horribly wrong and there's apparently O(n^2) performance. (I don't have time to look into it right now, but this should be fixed; there is no sensible reason for it to be so slow.)

--Rex

On Sat, Nov 27, 2010 at 11:55 AM, Josh Suereth <joshua [dot] suereth [at] gmail [dot] com" target="_blank" rel="nofollow">joshua.suereth@gmail.com> wrote:

just for posterity, try to call .view on the array prior to zipWithIndex and see what the performance difference is. Remember that methods on 2.8 collections are by default 'eager' and using the view method can make them lazy, hopefully amortizing/negating the cost of complex collection manipulations. So, for b, try:
val b = a.view.zipWithIndex.filter(x = > x._1.startsWith("1") && (x._2 % 3==0)).map(_._1).force
That should remove the intermediate object creation so that only the resulting collection is allocated. It's a wash at times whether this improves performance, but you should try it out.
- Josh
On Mon, Nov 22, 2010 at 5:52 PM, Rex Kerr <ichoran [at] gmail [dot] com" target="_blank" rel="nofollow">ichoran@gmail.com> wrote:

I can't speak for anyone else's use cases, but in my case, profiling, microbenchmarking, and benchmarking production code written both ways have shown that this is a real problem for almost every non-IO/UI task that I do. For example, in

val a = Array.range(1,1000000).map(_.toString)
val b = a.zipWithIndex.filter(x => x._1.startsWith("1") && (x._2%3==0)).map(_._1)
val c = { var i=-1; a.filter(_.startsWith("1") && { i += 1; i%3==0 }) }

the computation of c is 10-20x faster _and_ is considerably more compact to write (though, sadly, more error prone--look at the trick I had to use where I started at -1). (Same deal with List instead of Array.)

Personally, I think the emphasis on zip/zipWithIndex thing is a wrong turn. zipped is almost always faster _and_ more convenient, and zippedWithIndex would be too.

--Rex

On Mon, Nov 22, 2010 at 3:59 PM, Rüdiger Klaehn <rklaehn [at] googlemail [dot] com" target="_blank" rel="nofollow">rklaehn@googlemail.com> wrote:

Are you sure that this is a problem? Even without escape analysis, a
modern JVM can create and collect an incredible number of objects per
second as long as they are a) small and b) short-lived. The tuples in
this case are both small and short-lived.

Escape analysis and stack allocation (already present as an
experimental feature in the current JVM, and soon to be available in a
production JVM) will probably completely avoid allocating the object.

So I would not worry about this unless profiling has actually shown
that it is a real problem.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland