- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
lessons learned optimizing Range.foreach, and yes, another (promising) attempt
Fri, 2011-12-16, 15:53
For those who haven't been following, here's a summary of factors heavily influencing Range.foreach performance (as far as I can remember):
(0) in all cases, we use -optimise.
(1) accessing a lazy val (e.g., Range.length) is a performance killer (may be prevents JIT compilation?)
(2) invoking the closure arg can be done with or without casting, the cast neither hurts nor improves performance significantly:
(f.asInstanceOf[Int => Unit]).apply(i)
vs
f(i)
(3) it's desirable to perform that invocation only once, otherwise each occurrence can lead to inlining the loop body.
(4) keeping Range.foreach small helps, which means in practice pre-processing beforehand, not forgetting "corner cases" such as:
Int.MinValue to Int.MaxValue by (Int.MinValue / 2).abs
1 to 1 by Int.MaxValue
0 to Int.MinValue by Int.MinValue
(5) Daniel Sobral has made available a very complete benchmarking harness.
And now an attempt based on the above (binary compatible at that). The number of iterations is precomputed at Range construction time, using the same algorithm as in NumericRange.count() (thus I believe all corner cases are handled properly). The value -1 signals an exception should be thrown at Range.foreach() time. Range.foreach() itself just performs that check, and then moves on to a while loop. Performance seems competitive, more benchmarking is welcome. Having said this, the main goal of this attempt was gathering evidence as to whether the factors mentioned above fully explain Range.foreach() performance. Regarding "few iterations" (ie < 500), that hypothesis seems corroborated. In case you have evidence to the contrary, a summary of the design criteria at play would also be welcome.
Miguel
http://lamp.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/
class Range(val start: Int, val end: Int, val step: Int)
extends collection.AbstractSeq[Int]
with IndexedSeq[Int]
with collection.CustomParallelizable[Int, ParRange]
with Serializable
{
// has to be public, due to access in foreach (if private then foreach can't be inlined)
val numSteps = preproc(start, end, step, isInclusive)
private[this] def preproc(start: Long, end: Long, step: Long, isInclusive: Boolean): Int = {
val upward = (start < end)
if (step == 0) -1
else if (start == end) if (isInclusive) 1 else 0
else if (upward != (step > 0)) 0
else {
val diff = end - start
val jumps = diff / step
val remainder = diff % step
val longCount = jumps + (
if (!isInclusive && 0 == remainder) 0 else 1
)
/** The edge cases keep coming. Since e.g.
* Long.MaxValue + 1 == Long.MinValue
* we do some more improbable seeming checks lest
* overflow turn up as an empty range.
*/
// The second condition contradicts an empty result.
val isOverflow = (longCount == 0) && (((start + step) < end) == upward)
if (longCount > scala.Int.MaxValue || longCount < 0L || isOverflow) -1
else longCount.toInt
}
}
/** @note Making foreach run as fast as a while loop is a challenge.
*/
@inline final override def foreach[U](f: Int => U) {
var numIters = numSteps
if(numIters == -1) {
if (step == 0) throw new IllegalArgumentException("step cannot be 0.")
else {
val word = if (isInclusive) "to" else "until"
val descr = List(start, word, end, "by", step) mkString " "
throw new IllegalArgumentException(descr + ": seqs cannot contain more than Int.MaxValue elements.")
}
}
var i = start
while ( numIters > 0 ) {
(f.asInstanceOf[Int => Unit]).apply(i)
i += step
numIters -= 1
}
}
. . .
Fri, 2011-12-16, 16:41
#2
Re: lessons learned optimizing Range.foreach, and yes, another
Ismael,
Right, and an existing method can be used (as shown below). There's a slight performance degradation however (accessing lazy vals, even in a cold path, apparently makes the JITer less aggressive, for some reason).
@inline final override def foreach[U](f: Int => U) {
var numIters = numSteps
if(numIters == -1) this.length // invoked for side-effect (throw IllegalArgumentException)
var i = start
while ( numIters > 0 ) {
(f.asInstanceOf[Int => Unit]).apply(i)
i += step
numIters -= 1
}
}
Miguel
http://lamp.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/
Fri, 2011-12-16, 16:41
#3
Re: lessons learned optimizing Range.foreach, and yes, another
On Fri, Dec 16, 2011 at 06:53:12AM -0800, Miguel Garcia said
> // has to be public, due to access in foreach (if private then foreach
> can't be inlined)
> val numSteps = preproc(start, end, step, isInclusive)
Should we automatically synthesize a public accessor for private members
referenced from an @inline method? I think it'd have to have a mangled
name though since subclasses can introduce a different private field
with the same name.
Fri, 2011-12-16, 16:51
#4
Re: lessons learned optimizing Range.foreach, and yes, another
On Fri, Dec 16, 2011 at 3:31 PM, Miguel Garcia <mgarcia512@yahoo.com> wrote:
Are you accessing the lazy val in the benchmark? If not, that sounds suspicious.
Is `this.length` a method that is not inlined by scalac?
Best,Ismael
Right, and an existing method can be used (as shown below). There's a slight performance degradation however (accessing lazy vals, even in a cold path, apparently makes the JITer less aggressive, for some reason).
Are you accessing the lazy val in the benchmark? If not, that sounds suspicious.
@inline final override def foreach[U](f: Int => U) {
var numIters = numSteps
if(numIters == -1) this.length // invoked for side-effect (throw IllegalArgumentException)
Is `this.length` a method that is not inlined by scalac?
Best,Ismael
Fri, 2011-12-16, 17:01
#5
Re: lessons learned optimizing Range.foreach, and yes, another
> Are you accessing the lazy val in the benchmark? If not, that sounds suspicious.
Range.length accesses the lazy val Range.numRangeElements . I'm using the range-bench micro-benchmark.
The `this.length` callsite in my version of Range.foreach isn't inlined, ICode after dce:
9:
16 LOAD_LOCAL(variable $inlThis1)
16 CALL_METHOD scala.collection.immutable.Range.length (dynamic)
16 DROP INT
16 JUMP 4
And, for those interested, full ICode after dce under -optimise for range-bench's foreachSum() (shown below) can be found at the end of this message.
def foreachSum(max: Int): Int = {
var sum = 0
1 to max foreach (sum += _)
sum
}
Miguel
http://lamp.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/
def foreachSum(max: Int (INT)): Int {
locals: value max, variable $inlThis4, variable par13, variable par14, variable par21, value sum$1, variable v11, variable $inlThis1, variable loc31, variable loc21
startBlock: 1
blocks: [1,7,9,4,8,5,6]
1:
15 NEW REF(class IntRef)
15 DUP(REF(class IntRef))
15 CONSTANT(0)
15 CALL_METHOD scala.runtime.IntRef.<init> (static-instance)
15 STORE_LOCAL(value sum$1)
15 SCOPE_ENTER value sum$1
16 NEW REF(class RichInt)
16 DUP(REF(class Object))
16 CONSTANT(1)
16 CALL_METHOD scala.runtime.RichInt.<init> (static-instance)
16 LOAD_LOCAL(value max)
16 STORE_LOCAL(variable par13)
16 CALL_METHOD scala.runtime.RichInt.self (dynamic)
16 LOAD_LOCAL(variable par13)
16 STORE_LOCAL(variable par21)
16 STORE_LOCAL(variable par14)
16 NEW REF(class Range$Inclusive)
16 DUP(REF(class Object))
16 LOAD_LOCAL(variable par14)
16 LOAD_LOCAL(variable par21)
16 CONSTANT(1)
16 CALL_METHOD scala.collection.immutable.Range$Inclusive.<init> (static-instance)
? DUP(REF(class Object))
? STORE_LOCAL(variable $inlThis1)
16 CALL_METHOD scala.collection.immutable.Range.numSteps (dynamic)
? DUP(INT)
? STORE_LOCAL(variable loc21)
16 CONSTANT(-1)
16 CJUMP (INT)NE ? 7 : 9
7:
16 JUMP 4
9:
16 LOAD_LOCAL(variable $inlThis1)
16 CALL_METHOD scala.collection.immutable.Range.length (dynamic)
16 DROP INT
16 JUMP 4
4:
16 LOAD_LOCAL(variable $inlThis1)
16 CALL_METHOD scala.collection.immutable.Range.start (dynamic)
16 STORE_LOCAL(variable loc31)
16 JUMP 8
8:
16 LOAD_LOCAL(variable loc21)
16 CONSTANT(0)
16 CJUMP (INT)LE ? 5 : 6
5:
17 LOAD_LOCAL(value sum$1)
17 LOAD_FIELD variable elem
17 SCOPE_EXIT value sum$1
17 RETURN(INT)
6:
16 LOAD_LOCAL(variable loc31)
16 STORE_LOCAL(variable v11)
? LOAD_LOCAL(value sum$1)
? LOAD_LOCAL(value sum$1)
16 LOAD_FIELD variable elem
16 LOAD_LOCAL(variable v11)
16 CALL_PRIMITIVE(Arithmetic(ADD,INT))
16 STORE_FIELD variable elem (dynamic)
16 LOAD_LOCAL(variable loc31)
16 LOAD_LOCAL(variable $inlThis1)
16 CALL_METHOD scala.collection.immutable.Range.step (dynamic)
16 CALL_PRIMITIVE(Arithmetic(ADD,INT))
16 STORE_LOCAL(variable loc31)
16 LOAD_LOCAL(variable loc21)
16 CONSTANT(1)
16 CALL_PRIMITIVE(Arithmetic(SUB,INT))
16 STORE_LOCAL(variable loc21)
16 JUMP 8
}
Fri, 2011-12-16, 17:11
#6
Re: lessons learned optimizing Range.foreach, and yes, another
On Fri, Dec 16, 2011 at 7:37 AM, Ismael Juma wrote:
> Are you accessing the lazy val in the benchmark? If not, that sounds
> suspicious.
I have observed the same thing. If there is a lazy val anywhere, good
luck not seeing that minimum 2x difference between while and foreach.
I don't know about everyone else, but I find there are only two
execution profiles among all plausible implementations: foreach is
about the same speed as while, or foreach is half the speed of while.
(OK, there are a bunch that are slower yet.) I would like to
understand that, because the consistency of the 2x factor tells me
it's some very specific, happens-or-doesn't-happen aspect. And the
lazy val is in my experimentation a lock to move the needle to
"happens".
Fri, 2011-12-16, 17:11
#7
Re: lessons learned optimizing Range.foreach, and yes, another
On Fri, Dec 16, 2011 at 3:51 PM, Paul Phillips <paulp@improving.org> wrote:
Interesting. Seems like we have to look at the assembler generated to understand the reason. Meanwhile, Miguel can just move the same cold path code that he had inlined into a separate method and there should be no performance difference in the micro-benchmark with less fragility in real-life code.
Best,Ismael
I have observed the same thing. If there is a lazy val anywhere, good luck not seeing that minimum 2x difference between while and foreach.
Interesting. Seems like we have to look at the assembler generated to understand the reason. Meanwhile, Miguel can just move the same cold path code that he had inlined into a separate method and there should be no performance difference in the micro-benchmark with less fragility in real-life code.
Best,Ismael
Fri, 2011-12-16, 17:21
#8
Re: lessons learned optimizing Range.foreach, and yes, another
Geoff,
Inliner tries to make public in some cases, but not when the callee is a library method (`inc` below is the callee)
def makePublic(f: Symbol): Boolean =
inc.hasSourceFile && (f.isSynthetic || f.isParamAccessor) && {
debuglog("Making not-private symbol out of synthetic: " + f)
f setNotFlag Flags.PRIVATE
true
}
Help to improve the inliner is always welcome:
http://lamp.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/2011Q4/Inliner.pdf
Miguel
Fri, 2011-12-16, 18:21
#9
Re: lessons learned optimizing Range.foreach, and yes, another
On Fri, Dec 16, 2011 at 12:53, Miguel Garcia wrote:
>
> For those who haven't been following, here's a summary of factors heavily
> influencing Range.foreach performance (as far as I can remember):
>
> (0) in all cases, we use -optimise.
Actually, I got while-like performance without -optimise in the ByOne
case. Better than -optimise as often as not, in fact.
> (1) accessing a lazy val (e.g., Range.length) is a performance killer (may
> be prevents JIT compilation?)
>
> (2) invoking the closure arg can be done with or without casting, the cast
> neither hurts nor improves performance significantly:
>
> (f.asInstanceOf[Int => Unit]).apply(i)
> vs
> f(i)
>
> (3) it's desirable to perform that invocation only once, otherwise each
> occurrence can lead to inlining the loop body.
>
> (4) keeping Range.foreach small helps, which means in practice
> pre-processing beforehand, not forgetting "corner cases" such as:
>
> Int.MinValue to Int.MaxValue by (Int.MinValue / 2).abs
> 1 to 1 by Int.MaxValue
> 0 to Int.MinValue by Int.MinValue
Paul has also raised the hypothesis that using a literal increment
instead of adding step may also contribute to speed diferences. I can
see where this might make a difference to JIT: constant-increment
loops are so common it might special-case it.
I have not tested this hypothesis. I'll look into it.
On the other hand, it seems that subclassing Range gives benefits even
in the absence of changes to RichInt's type signature.
> (5) Daniel Sobral has made available a very complete benchmarking harness.
I'm taking pull requests to improve it further too. I have renamed the
repository, though. The new name is
https://github.com/dcsobral/scala-foreach-benchmark. Paul sent a pull
request to remove the hardcoded path I had in it, which made it really
easy to use.
Fri, 2011-12-16, 18:51
#10
Re: lessons learned optimizing Range.foreach, and yes, another
On Fri, Dec 16, 2011 at 9:14 AM, Daniel Sobral wrote:
> Actually, I got while-like performance without -optimise in the ByOne
> case. Better than -optimise as often as not, in fact.
Me too.
Fri, 2011-12-16, 19:01
#11
Re: lessons learned optimizing Range.foreach, and yes, another
On Fri, Dec 16, 2011 at 9:14 AM, Daniel Sobral wrote:
> Paul has also raised the hypothesis that using a literal increment
> instead of adding step may also contribute to speed diferences. I can
> see where this might make a difference to JIT: constant-increment
> loops are so common it might special-case it.
It does.
http://wikis.sun.com/display/HotSpotInternals/PerformanceTechniques
Constants
Use constants when you can.
It's OK to store them in local variables; the SSA representation tends
to keep them visible.
It's OK to store them in static final fields, too.
Static final fields can hold non-primitive, non-string constants.
The class of a static final field value is also constant, as is the
array length (if it's an array).
[Ed: that's what pushed me toward implementing public static final fields.]
Loops
The server compiler likes a loop with an int counter (int i = 0), a
constant stride (i++), and loop-invariant limit (i <= n).
^-----
Loops over arrays work especially well when the compiler can relate
the counter limit to the length of the array(s).
For long loops over arrays, the majority of iterations are free of
individual range checks.
Loops are typically peeled by one iteration, to "shake out" tests
which are loop invariant but execute only on a non-zero tripcount.
Null checks are the key example.
If a loop contains a call, it is best if that call is inlined, so that
loop can be optimized as a whole.
A loop can have multiple exits. Any deoptimization point counts as a loop exit.
If your loop has a rare exceptional condition, consider exiting to
another (slower) loop when it happens.
Fri, 2011-12-16, 19:31
#12
Re: lessons learned optimizing Range.foreach, and yes, another
By the way, here's for foreach compares between 2.8.1 and 2.9.1:
http://microbenchmarks.appspot.com/run/dcsobral@gmail.com/org.example.Benchmark/1195018.
On Fri, Dec 16, 2011 at 15:14, Daniel Sobral wrote:
> On Fri, Dec 16, 2011 at 12:53, Miguel Garcia wrote:
>>
>> For those who haven't been following, here's a summary of factors heavily
>> influencing Range.foreach performance (as far as I can remember):
>>
>> (0) in all cases, we use -optimise.
>
> Actually, I got while-like performance without -optimise in the ByOne
> case. Better than -optimise as often as not, in fact.
>
>> (1) accessing a lazy val (e.g., Range.length) is a performance killer (may
>> be prevents JIT compilation?)
>>
>> (2) invoking the closure arg can be done with or without casting, the cast
>> neither hurts nor improves performance significantly:
>>
>> (f.asInstanceOf[Int => Unit]).apply(i)
>> vs
>> f(i)
>>
>> (3) it's desirable to perform that invocation only once, otherwise each
>> occurrence can lead to inlining the loop body.
>>
>> (4) keeping Range.foreach small helps, which means in practice
>> pre-processing beforehand, not forgetting "corner cases" such as:
>>
>> Int.MinValue to Int.MaxValue by (Int.MinValue / 2).abs
>> 1 to 1 by Int.MaxValue
>> 0 to Int.MinValue by Int.MinValue
>
> Paul has also raised the hypothesis that using a literal increment
> instead of adding step may also contribute to speed diferences. I can
> see where this might make a difference to JIT: constant-increment
> loops are so common it might special-case it.
>
> I have not tested this hypothesis. I'll look into it.
>
> On the other hand, it seems that subclassing Range gives benefits even
> in the absence of changes to RichInt's type signature.
>
>> (5) Daniel Sobral has made available a very complete benchmarking harness.
>
> I'm taking pull requests to improve it further too. I have renamed the
> repository, though. The new name is
> https://github.com/dcsobral/scala-foreach-benchmark. Paul sent a pull
> request to remove the hardcoded path I had in it, which made it really
> easy to use.
>
> --
> Daniel C. Sobral
>
> I travel to the future all the time.
Sun, 2011-12-18, 18:21
#13
Re: lessons learned optimizing Range.foreach, and yes, another
2011/12/16 Miguel Garcia <mgarcia512@yahoo.com>
For those who haven't been following, here's a summary of factors heavily influencing Range.foreach performance (as far as I can remember):
(0) in all cases, we use -optimise.
(1) accessing a lazy val (e.g., Range.length) is a performance killer (may be prevents JIT compilation?)
(2) invoking the closure arg can be done with or without casting, the cast neither hurts nor improves performance significantly:
(f.asInstanceOf[Int => Unit]).apply(i)
vs
f(i)
(3) it's desirable to perform that invocation only once, otherwise each occurrence can lead to inlining the loop body.
(4) keeping Range.foreach small helps, which means in practice pre-processing beforehand, not forgetting "corner cases" such as:
Int.MinValue to Int.MaxValue by (Int.MinValue / 2).abs
1 to 1 by Int.MaxValue
0 to Int.MinValue by Int.MinValue
(5) Daniel Sobral has made available a very complete benchmarking harness.
And now an attempt based on the above (binary compatible at that). The number of iterations is precomputed at Range construction time, using the same algorithm as in NumericRange.count() (thus I believe all corner cases are handled properly). The value -1 signals an exception should be thrown at Range.foreach() time. Range.foreach() itself just performs that check, and then moves on to a while loop. Performance seems competitive, more benchmarking is welcome. Having said this, the main goal of this attempt was gathering evidence as to whether the factors mentioned above fully explain Range.foreach() performance. Regarding "few iterations" (ie < 500), that hypothesis seems corroborated. In case you have evidence to the contrary, a summary of the design criteria at play would also be welcome.
Miguel
http://lamp.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/
class Range(val start: Int, val end: Int, val step: Int)
extends collection.AbstractSeq[Int]
with IndexedSeq[Int]
with collection.CustomParallelizable[Int, ParRange]
with Serializable
{
// has to be public, due to access in foreach (if private then foreach can't be inlined)
val numSteps = preproc(start, end, step, isInclusive)
private[this] def preproc(start: Long, end: Long, step: Long, isInclusive: Boolean): Int = {
val upward = (start < end)
if (step == 0) -1
else if (start == end) if (isInclusive) 1 else 0
else if (upward != (step > 0)) 0
else {
val diff = end - start
val jumps = diff / step
val remainder = diff % step
val longCount = jumps + (
if (!isInclusive && 0 == remainder) 0 else 1
)
/** The edge cases keep coming. Since e.g.
* Long.MaxValue + 1 == Long.MinValue
* we do some more improbable seeming checks lest
* overflow turn up as an empty range.
*/
// The second condition contradicts an empty result.
val isOverflow = (longCount == 0) && (((start + step) < end) == upward)
if (longCount > scala.Int.MaxValue || longCount < 0L || isOverflow) -1
else longCount.toInt
}
}
/** @note Making foreach run as fast as a while loop is a challenge.
*/
@inline final override def foreach[U](f: Int => U) {
var numIters = numSteps
if(numIters == -1) {
if (step == 0) throw new IllegalArgumentException("step cannot be 0.")
else {
val word = if (isInclusive) "to" else "until"
val descr = List(start, word, end, "by", step) mkString " "
throw new IllegalArgumentException(descr + ": seqs cannot contain more than Int.MaxValue elements.")
}
}
var i = start
while ( numIters > 0 ) {
(f.asInstanceOf[Int => Unit]).apply(i)
i += step
numIters -= 1
}
}
. . .
Hi,
I played with these benchmarks and I get some strange results. Here is a slightly modified version of Miguel's Range:
class RangeOpt(val start: Int, val end: Int, val step: Int) extends IndexedSeq[Int]
with collection.CustomParallelizable[Int, ParRange]
with Serializable{
...
final def numSteps:Int = {
val numSteps = preproc(start, end, step, isInclusive)
if(numSteps == -1) {
if (step == 0) throw new IllegalArgumentException("step cannot be 0.")
else {
val word = if (isInclusive) "to" else "until"
val descr = List(start, word, end, "by", step) mkString " "
throw new IllegalArgumentException(descr + ": seqs cannot contain more than Int.MaxValue elements.")
}
}
numSteps
}
@inline
final override def foreach[U](f: Int => U) {
RangeOpt.applyEach(start, step, numSteps, f.asInstanceOf[Int => Unit])
}
}
object RangeOpt {
final def applyEach(start:Int, step:Int, numSteps:Int, f:Int => Unit) {
if (numSteps > 0) {
f.apply(start)
applyEach(start + step, step, numSteps -1, f)
}
}
...
}
Note that this version does not perform as well if "applyEach" is renamed to "foreach", which brings up factor number 6:
(6) don't overload methods, even in companion object.
So, I modified Daniel's benchmarking harness to experiment with both implementations (the source code is in attachment). What is intriguing is that the improvements with -optimize are not consistent. See the results below for instance (the benchmark names prefixed with "Scala" correspond to Miguel's version.)
----------------------------
with -optimize:
[info] 10000 ScalaForeachUnit 4473,26 =
[info] 10000 OptForeachUnit 4387,00 =
[info] 10000 WhileUnit 4757,30 =
without:
[info] 10000 ScalaForeachUnit 4563,64 =
[info] 10000 OptForeachUnit 6634,02 =
[info] 10000 WhileUnit 4760,56 =
----------------------------
with -optimize:
[info] 10000 ScalaForeachArray 12547,58 =
[info] 10000 OptForeachArray 6300,88 =
[info] 10000 WhileArray 4872,60 =
without:
[info] 10000 ScalaForeachArray 4906,10 =
[info] 10000 OptForeachArray 13547,56 =
[info] 10000 WhileArray 4889,45 =
----------------------------
with -optimize:
[info] 10000 ScalaForeachDumbPrime 192690649,50 =============================
[info] 10000 OptForeachDumbPrime 193040203,86 ==============================
[info] 10000 WhileDumbPrime 152969618,67 =======================
without:
[info] 10000 ScalaForeachDumbPrime 185219927,00 ==============================
[info] 10000 OptForeachDumbPrime 134526519,29 =====================
[info] 10000 WhileDumbPrime 153043985,67 ========================
Could someone double check my benchmarks please? Maybe I did something wrong. Right now, it looks like its rather complex to reason about optimizations and I'm not sure what to conclude about -optimize.
Cheers,
Sébastien
$ java -version
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.4) (Gentoo build 1.6.0_22-b22)
OpenJDK Server VM (build 20.0-b11, mixed mode)
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Duo CPU T9400 @ 2.53GHz
sbt> run
...
with -optimize:
[info] length benchmark ns linear runtime
[info] 10 ScalaForeachUnit 16,37 =
[info] 10 OptForeachUnit 17,03 =
[info] 10 WhileUnit 5,54 =
[info] 10 ScalaForeachInt 64,66 =
[info] 10 OptForeachInt 66,45 =
[info] 10 WhileInt 5,54 =
[info] 10 ScalaForeachDec 13,13 =
[info] 10 OptForeachDec 23,88 =
[info] 10 ScalaForeachReverse 180,43 =
[info] 10 OptForeachReverse 197,42 =
[info] 10 WhileDec 9,63 =
[info] 10 ScalaForeachOpen 13,63 =
[info] 10 OptForeachOpen 12,65 =
[info] 10 WhileOpen 5,22 =
[info] 10 ScalaForeachStep2 15,61 =
[info] 10 OptForeachStep2 16,28 =
[info] 10 WhileStep2 4,35 =
[info] 10 ScalaForeachStepM2 21,00 =
[info] 10 OptForeachStepM2 19,97 =
[info] 10 WhileStepM2 8,24 =
[info] 10 ScalaForeachArray 13,51 =
[info] 10 OptForeachArray 13,48 =
[info] 10 WhileArray 6,13 =
[info] 10 ScalaForeachHeron 309,12 =
[info] 10 OptForeachHeron 311,35 =
[info] 10 WhileHeron 304,49 =
[info] 10 ScalaForeachDumbPrime 249,94 =
[info] 10 OptForeachDumbPrime 566,38 =
[info] 10 WhileDumbPrime 167,32 =
[info] 10 ScalaForeachEratosthenes 213,62 =
[info] 10 OptForeachEratosthenes 572,48 =
[info] 10 WhileEratosthenes 82,38 =
[info] 100 ScalaForeachUnit 55,75 =
[info] 100 OptForeachUnit 58,41 =
[info] 100 WhileUnit 52,36 =
[info] 100 ScalaForeachInt 525,09 =
[info] 100 OptForeachInt 517,02 =
[info] 100 WhileInt 52,36 =
[info] 100 ScalaForeachDec 56,08 =
[info] 100 OptForeachDec 50,44 =
[info] 100 ScalaForeachReverse 655,78 =
[info] 100 OptForeachReverse 672,30 =
[info] 100 WhileDec 52,83 =
[info] 100 ScalaForeachOpen 89,59 =
[info] 100 OptForeachOpen 54,16 =
[info] 100 WhileOpen 51,69 =
[info] 100 ScalaForeachStep2 37,99 =
[info] 100 OptForeachStep2 37,59 =
[info] 100 WhileStep2 23,27 =
[info] 100 ScalaForeachStepM2 41,80 =
[info] 100 OptForeachStepM2 40,78 =
[info] 100 WhileStepM2 27,89 =
[info] 100 ScalaForeachArray 52,92 =
[info] 100 OptForeachArray 52,76 =
[info] 100 WhileArray 41,95 =
[info] 100 ScalaForeachHeron 7042,35 =
[info] 100 OptForeachHeron 7011,71 =
[info] 100 WhileHeron 6962,55 =
[info] 100 ScalaForeachDumbPrime 22754,37 =
[info] 100 OptForeachDumbPrime 23438,61 =
[info] 100 WhileDumbPrime 17499,66 =
[info] 100 ScalaForeachEratosthenes 9095,57 =
[info] 100 OptForeachEratosthenes 10405,51 =
[info] 100 WhileEratosthenes 4555,72 =
[info] 1000 ScalaForeachUnit 455,48 =
[info] 1000 OptForeachUnit 460,48 =
[info] 1000 WhileUnit 468,61 =
[info] 1000 ScalaForeachInt 7969,47 =
[info] 1000 OptForeachInt 7967,97 =
[info] 1000 WhileInt 469,89 =
[info] 1000 ScalaForeachDec 746,35 =
[info] 1000 OptForeachDec 478,50 =
[info] 1000 ScalaForeachReverse 8621,86 =
[info] 1000 OptForeachReverse 8734,51 =
[info] 1000 WhileDec 467,75 =
[info] 1000 ScalaForeachOpen 696,64 =
[info] 1000 OptForeachOpen 445,77 =
[info] 1000 WhileOpen 467,34 =
[info] 1000 ScalaForeachStep2 401,41 =
[info] 1000 OptForeachStep2 242,81 =
[info] 1000 WhileStep2 260,99 =
[info] 1000 ScalaForeachStepM2 412,05 =
[info] 1000 OptForeachStepM2 244,01 =
[info] 1000 WhileStepM2 236,66 =
[info] 1000 ScalaForeachArray 1230,64 =
[info] 1000 OptForeachArray 548,91 =
[info] 1000 WhileArray 418,03 =
[info] 1000 ScalaForeachHeron 96283,41 =
[info] 1000 OptForeachHeron 96275,78 =
[info] 1000 WhileHeron 95684,05 =
[info] 1000 ScalaForeachDumbPrime 2160548,03 =
[info] 1000 OptForeachDumbPrime 1632831,97 =
[info] 1000 WhileDumbPrime 1793613,34 =
[info] 1000 ScalaForeachEratosthenes 655508,99 =
[info] 1000 OptForeachEratosthenes 498866,13 =
[info] 1000 WhileEratosthenes 347508,13 =
[info] 10000 ScalaForeachUnit 4473,26 =
[info] 10000 OptForeachUnit 4387,00 =
[info] 10000 WhileUnit 4757,30 =
[info] 10000 ScalaForeachInt 89291,52 =
[info] 10000 OptForeachInt 88926,04 =
[info] 10000 WhileInt 4754,33 =
[info] 10000 ScalaForeachDec 6664,22 =
[info] 10000 OptForeachDec 4695,55 =
[info] 10000 ScalaForeachReverse 92604,12 =
[info] 10000 OptForeachReverse 95456,05 =
[info] 10000 WhileDec 4498,38 =
[info] 10000 ScalaForeachOpen 6618,01 =
[info] 10000 OptForeachOpen 4374,62 =
[info] 10000 WhileOpen 4656,96 =
[info] 10000 ScalaForeachStep2 3356,29 =
[info] 10000 OptForeachStep2 2284,11 =
[info] 10000 WhileStep2 2633,47 =
[info] 10000 ScalaForeachStepM2 3376,44 =
[info] 10000 OptForeachStepM2 2278,21 =
[info] 10000 WhileStepM2 2364,73 =
[info] 10000 ScalaForeachArray 12547,58 =
[info] 10000 OptForeachArray 6300,88 =
[info] 10000 WhileArray 4872,60 =
[info] 10000 ScalaForeachHeron 1237844,10 =
[info] 10000 OptForeachHeron 1243683,07 =
[info] 10000 WhileHeron 1232865,41 =
[info] 10000 ScalaForeachDumbPrime 192690649,50 =============================
[info] 10000 OptForeachDumbPrime 193040203,86 ==============================
[info] 10000 WhileDumbPrime 152969618,67 =======================
[info] 10000 ScalaForeachEratosthenes 48321987,98 =======
[info] 10000 OptForeachEratosthenes 57257370,50 ========
[info] 10000 WhileEratosthenes 23461142,43 ===
without:
[info]
[info] length benchmark ns linear runtime
[info] 10 ScalaForeachUnit 14,72 =
[info] 10 OptForeachUnit 17,03 =
[info] 10 WhileUnit 5,54 =
[info] 10 ScalaForeachInt 65,75 =
[info] 10 OptForeachInt 66,18 =
[info] 10 WhileInt 5,54 =
[info] 10 ScalaForeachDec 13,19 =
[info] 10 OptForeachDec 24,68 =
[info] 10 ScalaForeachReverse 195,60 =
[info] 10 OptForeachReverse 254,37 =
[info] 10 WhileDec 9,63 =
[info] 10 ScalaForeachOpen 12,10 =
[info] 10 OptForeachOpen 12,65 =
[info] 10 WhileOpen 4,75 =
[info] 10 ScalaForeachStep2 15,61 =
[info] 10 OptForeachStep2 16,28 =
[info] 10 WhileStep2 4,36 =
[info] 10 ScalaForeachStepM2 18,09 =
[info] 10 OptForeachStepM2 19,97 =
[info] 10 WhileStepM2 8,24 =
[info] 10 ScalaForeachArray 13,49 =
[info] 10 OptForeachArray 81,70 =
[info] 10 WhileArray 6,12 =
[info] 10 ScalaForeachHeron 312,21 =
[info] 10 OptForeachHeron 308,31 =
[info] 10 WhileHeron 302,71 =
[info] 10 ScalaForeachDumbPrime 388,85 =
[info] 10 OptForeachDumbPrime 565,15 =
[info] 10 WhileDumbPrime 168,61 =
[info] 10 ScalaForeachEratosthenes 597,38 =
[info] 10 OptForeachEratosthenes 581,73 =
[info] 10 WhileEratosthenes 82,73 =
[info] 100 ScalaForeachUnit 55,28 =
[info] 100 OptForeachUnit 58,40 =
[info] 100 WhileUnit 52,36 =
[info] 100 ScalaForeachInt 491,82 =
[info] 100 OptForeachInt 1302,71 =
[info] 100 WhileInt 52,38 =
[info] 100 ScalaForeachDec 56,37 =
[info] 100 OptForeachDec 50,47 =
[info] 100 ScalaForeachReverse 670,21 =
[info] 100 OptForeachReverse 963,09 =
[info] 100 WhileDec 52,83 =
[info] 100 ScalaForeachOpen 53,07 =
[info] 100 OptForeachOpen 155,84 =
[info] 100 WhileOpen 51,67 =
[info] 100 ScalaForeachStep2 37,51 =
[info] 100 OptForeachStep2 37,61 =
[info] 100 WhileStep2 23,27 =
[info] 100 ScalaForeachStepM2 38,72 =
[info] 100 OptForeachStepM2 40,81 =
[info] 100 WhileStepM2 27,89 =
[info] 100 ScalaForeachArray 52,93 =
[info] 100 OptForeachArray 195,54 =
[info] 100 WhileArray 41,94 =
[info] 100 ScalaForeachHeron 7062,11 =
[info] 100 OptForeachHeron 7100,62 =
[info] 100 WhileHeron 6942,28 =
[info] 100 ScalaForeachDumbPrime 23718,27 =
[info] 100 OptForeachDumbPrime 17714,59 =
[info] 100 WhileDumbPrime 17425,67 =
[info] 100 ScalaForeachEratosthenes 9250,65 =
[info] 100 OptForeachEratosthenes 9221,41 =
[info] 100 WhileEratosthenes 4559,93 =
[info] 1000 ScalaForeachUnit 451,84 =
[info] 1000 OptForeachUnit 742,61 =
[info] 1000 WhileUnit 470,36 =
[info] 1000 ScalaForeachInt 10139,48 =
[info] 1000 OptForeachInt 21375,80 =
[info] 1000 WhileInt 469,45 =
[info] 1000 ScalaForeachDec 479,39 =
[info] 1000 OptForeachDec 750,48 =
[info] 1000 ScalaForeachReverse 11174,16 =
[info] 1000 OptForeachReverse 10500,74 =
[info] 1000 WhileDec 467,81 =
[info] 1000 ScalaForeachOpen 451,03 =
[info] 1000 OptForeachOpen 746,24 =
[info] 1000 WhileOpen 466,38 =
[info] 1000 ScalaForeachStep2 242,44 =
[info] 1000 OptForeachStep2 404,69 =
[info] 1000 WhileStep2 260,99 =
[info] 1000 ScalaForeachStepM2 241,94 =
[info] 1000 OptForeachStepM2 409,62 =
[info] 1000 WhileStepM2 236,79 =
[info] 1000 ScalaForeachArray 430,59 =
[info] 1000 OptForeachArray 1369,34 =
[info] 1000 WhileArray 417,23 =
[info] 1000 ScalaForeachHeron 96250,19 =
[info] 1000 OptForeachHeron 96819,11 =
[info] 1000 WhileHeron 95683,17 =
[info] 1000 ScalaForeachDumbPrime 2079398,86 =
[info] 1000 OptForeachDumbPrime 1600272,21 =
[info] 1000 WhileDumbPrime 1798492,10 =
[info] 1000 ScalaForeachEratosthenes 602405,93 =
[info] 1000 OptForeachEratosthenes 556142,71 =
[info] 1000 WhileEratosthenes 349392,10 =
[info] 10000 ScalaForeachUnit 4563,64 =
[info] 10000 OptForeachUnit 6634,02 =
[info] 10000 WhileUnit 4760,56 =
[info] 10000 ScalaForeachInt 107232,19 =
[info] 10000 OptForeachInt 210767,17 =
[info] 10000 WhileInt 4757,47 =
[info] 10000 ScalaForeachDec 4707,63 =
[info] 10000 OptForeachDec 6663,45 =
[info] 10000 ScalaForeachReverse 117261,60 =
[info] 10000 OptForeachReverse 108942,68 =
[info] 10000 WhileDec 4499,97 =
[info] 10000 ScalaForeachOpen 4484,43 =
[info] 10000 OptForeachOpen 6667,60 =
[info] 10000 WhileOpen 4659,20 =
[info] 10000 ScalaForeachStep2 2288,47 =
[info] 10000 OptForeachStep2 3367,85 =
[info] 10000 WhileStep2 2632,56 =
[info] 10000 ScalaForeachStepM2 2287,49 =
[info] 10000 OptForeachStepM2 2267,90 =
[info] 10000 WhileStepM2 2365,25 =
[info] 10000 ScalaForeachArray 4906,10 =
[info] 10000 OptForeachArray 13547,56 =
[info] 10000 WhileArray 4889,45 =
[info] 10000 ScalaForeachHeron 1236872,18 =
[info] 10000 OptForeachHeron 1244323,60 =
[info] 10000 WhileHeron 1233106,32 =
[info] 10000 ScalaForeachDumbPrime 185219927,00 ==============================
[info] 10000 OptForeachDumbPrime 134526519,29 =====================
[info] 10000 WhileDumbPrime 153043985,67 ========================
[info] 10000 ScalaForeachEratosthenes 35652739,80 =====
[info] 10000 OptForeachEratosthenes 52890626,64 ========
[info] 10000 WhileEratosthenes 23476727,32 ===
Mon, 2011-12-19, 12:11
#14
Re: lessons learned optimizing Range.foreach, and yes, another
Sébastien,
Your benchmarks point at something interesting, at this point I can only guess that (unsurprisingly) the time when jit-ing kicks in explains performance differences (as shown by -XX:+PrintCompilation, [1] ). That however does not explain why jit-ing kicks in earlier for different (functionally equivalent) versions of the same method.
Guessing some more, the inliner is right when inlining "inside closures" (i.e. the "loop body" for Range.foreach) but inlining in the prolog to such loop does not seem profitable (on the other hand, for the few experiments where I looked at this, jit-ing kicks in at about the same time for 'whileSum' and 'foreachSum' in the range-bench micro-benchmark).
What's clear is that when one goes for aggressive jit-ing of loops (via -XX:+AdvancedOpts ) then 'whileSum' wins whatever else one does.
Miguel
[1] http://q-redux.blogspot.com/2011/01/inspecting-hotspot-jvm-options.html
Your benchmarks point at something interesting, at this point I can only guess that (unsurprisingly) the time when jit-ing kicks in explains performance differences (as shown by -XX:+PrintCompilation, [1] ). That however does not explain why jit-ing kicks in earlier for different (functionally equivalent) versions of the same method.
Guessing some more, the inliner is right when inlining "inside closures" (i.e. the "loop body" for Range.foreach) but inlining in the prolog to such loop does not seem profitable (on the other hand, for the few experiments where I looked at this, jit-ing kicks in at about the same time for 'whileSum' and 'foreachSum' in the range-bench micro-benchmark).
What's clear is that when one goes for aggressive jit-ing of loops (via -XX:+AdvancedOpts ) then 'whileSum' wins whatever else one does.
Miguel
[1] http://q-redux.blogspot.com/2011/01/inspecting-hotspot-jvm-options.html
Tue, 2011-12-20, 00:11
#15
Re: lessons learned optimizing Range.foreach, and yes, another
2011/12/19 Miguel Garcia <mgarcia512@yahoo.com>
Sébastien,
Your benchmarks point at something interesting, at this point I can only guess that (unsurprisingly) the time when jit-ing kicks in explains performance differences (as shown by -XX:+PrintCompilation, [1] ). That however does not explain why jit-ing kicks in earlier for different (functionally equivalent) versions of the same method.
Guessing some more, the inliner is right when inlining "inside closures" (i.e. the "loop body" for Range.foreach) but inlining in the prolog to such loop does not seem profitable (on the other hand, for the few experiments where I looked at this, jit-ing kicks in at about the same time for 'whileSum' and 'foreachSum' in the range-bench micro-benchmark).
What's clear is that when one goes for aggressive jit-ing of loops (via -XX:+AdvancedOpts ) then 'whileSum' wins whatever else one does.
Miguel
[1] http://q-redux.blogspot.com/2011/01/inspecting-hotspot-jvm-options.html
Hi Miguel,
Thanks for the insight. It is possible to approach more consistently the performance of "while" but not with "foreach". The original benchmark is contrived in some sense because it is misusing "foreach" for folding over a range by mutating a variable outside the scope of the "closure body". This might be a reason why optimizations get messed up. All it takes is to specialize "foldLeft" and I think it is worth it (complete benchmark is in attachment):
class RangeOpt extends ... {
...
final def foldLeftS[@specialized(Int) B](start:Int, step:Int, numSteps:Int)(z: B)(op: (B, Int) ⇒ B): B =
// Inner method does not get specialized (bug SI-3457?) => moved it to companion
RangeOpt.foldEachLeftS(start, step, numSteps)(z)(op)
}
object RangeOpt {
...
// Method does not get specialized if marked private (new bug?)
final def foldEachLeftS[@specialized(Int) B](start:Int, step:Int, numSteps:Int)(z: B)(op: (B, Int) ⇒ B): B =
if (numSteps > 0)
foldEachLeftS(start + step, step, numSteps - 1)(op(z, start))(op)
else
z
}
The timings below show that:
- Conversely to previous optimizations, "foldLeft" performs consistently with or without -optimise.
- Sometimes the JVM cannot make up its mind whether to optimise or not.
Note that -XX:+AggressivedOpts did not make a difference in these benchmarks.
- with -optimise (10000):
[info] benchmark trial us linear runtime
[info] ScalaForeachInt 0 100,55 ==============
[info] ScalaForeachInt 1 100,76 ==============
[info] ScalaForeachInt 2 100,53 ==============
[info] OptForeachInt 0 100,76 ==============
[info] OptForeachInt 1 100,94 ==============
[info] OptForeachInt 2 100,66 ==============
[info] FoldLeftInt 0 198,38 ============================ // Not specialized
[info] FoldLeftInt 1 198,01 ============================
[info] FoldLeftInt 2 198,22 ============================
[info] FoldLeftSInt 0 6,61 = // Specialized
[info] FoldLeftSInt 1 6,62 =
[info] FoldLeftSInt 2 6,61 =
[info] WhileInt 0 4,75 =
[info] WhileInt 1 4,75 =
[info] WhileInt 2 4,76 =
[info] ScalaForeachStep2 0 3,36 =
[info] ScalaForeachStep2 1 3,36 =
[info] ScalaForeachStep2 2 3,36 =
[info] OptForeachStep2 0 2,28 =
[info] OptForeachStep2 1 2,29 =
[info] OptForeachStep2 2 2,28 =
[info] FoldLeftStep2 0 99,82 ==============
[info] FoldLeftStep2 1 99,65 ==============
[info] FoldLeftStep2 2 99,69 ==============
[info] FoldLeftSStep2 0 3,36 =
[info] FoldLeftSStep2 1 3,36 =
[info] FoldLeftSStep2 2 3,36 =
[info] WhileStep2 0 2,63 =
[info] WhileStep2 1 2,63 =
[info] WhileStep2 2 2,63 =
[info] ScalaForeachArray 0 13,54 =
[info] ScalaForeachArray 1 13,50 =
[info] ScalaForeachArray 2 13,52 =
[info] OptForeachArray 0 6,04 =
[info] OptForeachArray 1 6,02 =
[info] OptForeachArray 2 6,06 =
[info] FoldLeftArray 0 202,92 =============================
[info] FoldLeftArray 1 207,59 ==============================
[info] FoldLeftArray 2 206,65 =============================
[info] FoldLeftSArray 0 4,89 =
[info] FoldLeftSArray 1 23,88 ===
[info] FoldLeftSArray 2 23,89 ===
[info] WhileArray 0 4,87 =
[info] WhileArray 1 4,87 =
[info] WhileArray 2 4,86 =
[info]
- without:
[info] benchmark trial us linear runtime
[info] ScalaForeachInt 0 101,05 ==============
[info] ScalaForeachInt 1 100,41 =============
[info] ScalaForeachInt 2 100,08 =============
[info] OptForeachInt 0 206,24 ============================
[info] OptForeachInt 1 208,59 =============================
[info] OptForeachInt 2 213,98 =============================
[info] FoldLeftInt 0 200,83 ===========================
[info] FoldLeftInt 1 197,82 ===========================
[info] FoldLeftInt 2 200,25 ===========================
[info] FoldLeftSInt 0 6,62 =
[info] FoldLeftSInt 1 4,57 =
[info] FoldLeftSInt 2 4,56 =
[info] WhileInt 0 4,75 =
[info] WhileInt 1 4,75 =
[info] WhileInt 2 4,76 =
[info] ScalaForeachStep2 0 2,30 =
[info] ScalaForeachStep2 1 2,30 =
[info] ScalaForeachStep2 2 2,30 =
[info] OptForeachStep2 0 3,36 =
[info] OptForeachStep2 1 3,37 =
[info] OptForeachStep2 2 3,37 =
[info] FoldLeftStep2 0 101,62 ==============
[info] FoldLeftStep2 1 100,37 =============
[info] FoldLeftStep2 2 99,08 =============
[info] FoldLeftSStep2 0 3,36 =
[info] FoldLeftSStep2 1 3,36 =
[info] FoldLeftSStep2 2 3,36 =
[info] WhileStep2 0 2,63 =
[info] WhileStep2 1 2,63 =
[info] WhileStep2 2 2,63 =
[info] ScalaForeachArray 0 4,92 =
[info] ScalaForeachArray 1 4,90 =
[info] ScalaForeachArray 2 4,90 =
[info] OptForeachArray 0 13,54 =
[info] OptForeachArray 1 13,54 =
[info] OptForeachArray 2 13,54 =
[info] FoldLeftArray 0 207,35 ============================
[info] FoldLeftArray 1 215,38 ==============================
[info] FoldLeftArray 2 206,75 ============================
[info] FoldLeftSArray 0 23,88 ===
[info] FoldLeftSArray 1 4,89 =
[info] FoldLeftSArray 2 23,88 ===
[info] WhileArray 0 4,87 =
[info] WhileArray 1 4,86 =
[info] WhileArray 2 4,86 =
I don't think it is worth investing more effort in optimizing "foreach" because it must perform I/O if it is used correctly, which will consume most of the time anyway. In some ways, it sounds natural that one can't beat imperative style on its own ground by abusing functional style. I haven't tried to convert the other tests but, for the greater good, optimizing idiomatic code paths against "while" loops encourages best coding practices.
Cheers,
Sébastien
Tue, 2011-12-20, 17:31
#16
Re: lessons learned optimizing Range.foreach, and yes, another
On 19 December 2011 23:09, Sébastien Bocq <sebastien.bocq@gmail.com> wrote:
I don't think it is worth investing more effort in optimizing "foreach" because it must perform I/O if it is used correctly, which will consume most of the time anyway. In some ways, it sounds natural that one can't beat imperative style on its own ground by abusing functional style. I haven't tried to convert the other tests but, for the greater good, optimizing idiomatic code paths against "while" loops encourages best coding practices.
I may have misunderstood you, but if you are saying that there's no point optimizing foreach in the cases where people should be coding in a functionally-pure way but chose to use mutable state that the loop updates, and that having this case run slowly is how we should encourage them towards the FP-pure coding style, then I could not disagree more. If in the formative time with scala someone finds that this case is slow, their first inclination will be to switch to a language where it is fast, not find the 'right' way to do it in scala. If this becomes policy, then at the very least the compiler needs to spit out a warning pushing them towards the 'right' way. I do hope that I got the wrong end of the stick.
Matthew
Cheers,
Sébastien
--
Dr Matthew PocockIntegrative Bioinformatics Group, School of Computing Science, Newcastle Universitymailto: turingatemyhamster@gmail.com gchat: turingatemyhamster@gmail.commsn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozerskype: matthew.pococktel: (0191) 2566550mob: +447535664143
Tue, 2011-12-20, 17:41
#17
Re: lessons learned optimizing Range.foreach, and yes, another
Matthew,
There's value in making the compiler ever smarter about optimizing functional-style code,
and
there's value in writing C-style code for performance-critical sections.
Miguel
http://lamp.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/
Tue, 2011-12-20, 17:51
#18
Re: lessons learned optimizing Range.foreach, and yes, another
On Mon, Dec 19, 2011 at 11:09 PM, Sébastien Bocq <sebastien.bocq@gmail.com> wrote:
This is false.
Best,Ismael
I don't think it is worth investing more effort in optimizing "foreach" because it must perform I/O if it is used correctly
This is false.
Best,Ismael
Tue, 2011-12-20, 18:01
#19
Re: lessons learned optimizing Range.foreach, and yes, another
On Tue, Dec 20, 2011 at 10:35 AM, Ismael Juma <ismael@juma.me.uk> wrote:
Tony?
On Mon, Dec 19, 2011 at 11:09 PM, Sébastien Bocq <sebastien.bocq@gmail.com> wrote:I don't think it is worth investing more effort in optimizing "foreach" because it must perform I/O if it is used correctly
This is false.
Tony?
Tue, 2011-12-20, 18:21
#20
Re: lessons learned optimizing Range.foreach, and yes, another
On Tue, Dec 20, 2011 at 4:55 PM, Nils Kilden-Pedersen <nilskp@gmail.com> wrote:
Best,Ismael
On Tue, Dec 20, 2011 at 10:35 AM, Ismael Juma <ismael@juma.me.uk> wrote:To elaborate, there are many scenarios in performance critical code where using foreach without performing I/O is desirable. For example, the inner loop of a machine learning algorithm where feature vectors are updated.On Mon, Dec 19, 2011 at 11:09 PM, Sébastien Bocq <sebastien.bocq@gmail.com> wrote:I don't think it is worth investing more effort in optimizing "foreach" because it must perform I/O if it is used correctly
This is false.
Best,Ismael
Tue, 2011-12-20, 20:31
#21
Re: lessons learned optimizing Range.foreach, and yes, another
On Tue, Dec 20, 2011 at 9:14 AM, Ismael Juma wrote:
> To elaborate, there are many scenarios in performance critical code where
> using foreach without performing I/O is desirable. For example, the inner
> loop of a machine learning algorithm where feature vectors are updated.
Or, look at every method implementation in TraversableLike.
Wed, 2011-12-21, 02:51
#22
Re: lessons learned optimizing Range.foreach, and yes, another
2011/12/20 Paul Phillips
>
> On Tue, Dec 20, 2011 at 9:14 AM, Ismael Juma wrote:
> > To elaborate, there are many scenarios in performance critical code where
> > using foreach without performing I/O is desirable. For example, the inner
> > loop of a machine learning algorithm where feature vectors are updated.
>
> Or, look at every method implementation in TraversableLike.
Here is my grouped answer:
@Matthew To rephrase it another way, I think there has been enough
time sinked in optimizing foreach (at least I'll stop trying it) and,
as shown in my latest benchmarks, idiomatic code deserves some love
too. I was not trying to win a language popularity contest here, I
just tried to help and then share my conclusions in all honesty. See
also paragraph below.
@Ismael Unless the syntactic-sugar for for-comprehensions changes in
the future, I'd use "while" in Scala to program imperatively at memory
level and mutate in place arrays or vectors in performance critical
code. It has a less nicer syntax than the "for" loop in Java but
hopefully the scope can be limited. The reason is that I don't
entertain the thought that "foreach" is equivalent to "for" in Java.
For example, using "for" in Java to compute the sum of the elements in
an array is fast and good style. In Scala, "foreach" is a higher-order
function tailored for I/O. You can do the same as in Java, by mutating
a data structure outside the scope of the effectful function closure,
but it is less efficient than its imperative counterpart at that job.
Fair enough. "foldLeft" is another higher-function like "foreach", but
this one is fit for the job and it can be easily optimized to perform
comparably to the imperative loop. Thinking this way, one can use the
right tool for the right job and make the most out of Scala.
@Paul Ah ah, it doesn't leave much choices, does it? Well, I'll answer
that implementing a collection in terms of "foreach" is a design
choice that is questionable at best. See this one for example:
def isEmpty: Boolean = {
var result = true
breakable {
for (x <- this) {
result = false
break
}
}
result
}
It is much too complex for what it is and most of these methods are
overriden in subclasses anyway. Look at every overriden method
implementation in IterableLike for example. So, why bother?
Cheers!
Sébastien
Wed, 2011-12-21, 09:41
#23
Re: lessons learned optimizing Range.foreach, and yes, another
Hi Sebastien,
On Wed, Dec 21, 2011 at 1:42 AM, Sébastien Bocq <sebastien.bocq@gmail.com> wrote:
As someone who has to do this a lot, it's less than satisfactory.
You keep saying this. What is your source for such claim? I have been following the Scala community closely for years now and I don't remember Martin ever saying that. Best,Ismael
On Wed, Dec 21, 2011 at 1:42 AM, Sébastien Bocq <sebastien.bocq@gmail.com> wrote:
@Ismael Unless the syntactic-sugar for for-comprehensions changes in the future, I'd use "while" in Scala to program imperatively at memory
level and mutate in place arrays or vectors in performance critical
code. It has a less nicer syntax than the "for" loop in Java but
hopefully the scope can be limited.
As someone who has to do this a lot, it's less than satisfactory.
In Scala, "foreach" is a higher-order function tailored for I/O.
You keep saying this. What is your source for such claim? I have been following the Scala community closely for years now and I don't remember Martin ever saying that. Best,Ismael
Wed, 2011-12-21, 16:21
#24
Re: lessons learned optimizing Range.foreach, and yes, another
On Wed, Dec 21, 2011 at 08:38:52AM +0000, Ismael Juma wrote:
> On Wed, Dec 21, 2011 at 1:42 AM, Sébastien Bocq wrote:
> > @Ismael Unless the syntactic-sugar for for-comprehensions changes in
> > the future, I'd use "while" in Scala to program imperatively at memory
> > level and mutate in place arrays or vectors in performance critical
> > code. It has a less nicer syntax than the "for" loop in Java but
> > hopefully the scope can be limited.
>
> As someone who has to do this a lot, it's less than satisfactory.
+100
Thanks for writing a summary of the findings so far. A comment below.
On Fri, Dec 16, 2011 at 2:53 PM, Miguel Garcia <mgarcia512@yahoo.com> wrote:
You want to move all of this to a separate method as we don't want to inline the cold path and we don't want it to inflate the size of foreach.
Best,Ismael