- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Scala 2.9 slower?
Sun, 2011-05-22, 19:40
I'm a bit concerned. After seeing that -optimize really does optimize
for comprehensions on 2.9.0, I then benchmarked the code, and
discovered it was about 2.5 times slower than 2.8.1. Then I compiled
without optimization, and got code almost 3 times slower than 2.8.1,
which, at least, shows that the optimization is working.
The code I used to benchmark is below. I tried some variations some
which improved 2.9.0 numbers, but none that made it as fast as 2.8.1.
I know this is pure micro-benchmark, and doesn't mean much in a larger
codebase, but _something_ seems to have gone the wrong way.
object OptTest {
def main(args: Array[String]) {
println("Time: "+bench)
}
def bench: Long = {
val testSet = Array.fill(15000)(Array.fill(1000)(util.Random.nextInt))
val optex = new OptEx
// Warm-up
var result = 0
for (array <- testSet) result += optex.sum(array)
// benchmark
val start = java.lang.System.currentTimeMillis
for (array <- testSet) result += optex.sum(array)
val stop = java.lang.System.currentTimeMillis
println("Ensure nothing gets optimized away: "+result)
stop - start
}
}
class OptEx {
def sum(l: Array[Int]) = {
var acc = 0
for (i <- 0 until l.length) acc += l(i)
acc
}
}
Sun, 2011-05-22, 22:28
#2
Re: Scala 2.9 slower?
For what it's worth,
When I run benchmarks (against recent sun Java on Linux) iterating over an array, my anecdotal observation is that substantial jvm hotspot optimization (I *think* escape analysis) kicks in between 1.5 and 2 million elements. I just mention that because I wouldn't expect different sized arrays under a certain threshold to have the same performance characteristics -- and also I think sometimes people (e.g. me) expect optimizations like escape analysis to kick in way sooner than the JVM actually applies them.
--j
On May 22, 2011 4:41 PM, "Paul Phillips" <paulp@improving.org> wrote:> On 5/22/11 11:40 AM, Daniel Sobral wrote:
>> I'm a bit concerned. After seeing that -optimize really does optimize
>> for comprehensions on 2.9.0, I then benchmarked the code, and
>> discovered it was about 2.5 times slower than 2.8.1. Then I compiled
>> without optimization, and got code almost 3 times slower than 2.8.1,
>> which, at least, shows that the optimization is working.
>
> My numbers are nothing like yours. What they said pretty unambiguously is that the optimization is not working. That it's madly pessimizing. This is with 150000 where you have 15000.
>
> Times for unoptimised, 281 / 29 / trunk
> Time: 113
> Time: 204
> Time: 182
> Times for unoptimised, 281 / 29 / trunk
> Time: 116
> Time: 207
> Time: 180
> Times for unoptimised, 281 / 29 / trunk
> Time: 122
> Time: 206
> Time: 182
>
> I can only think you mixed up your outputs, because if it's twice as slow under -optimise for me I have trouble imagining what confounding variables would make it faster for you.
>
> Times for -optimise, 281 / 29 / trunk
> Time: 118
> Time: 380
> Time: 353
> Times for -optimise, 281 / 29 / trunk
> Time: 127
> Time: 382
> Time: 347
> Times for -optimise, 281 / 29 / trunk
> Time: 116
> Time: 379
> Time: 353
>
> I'm on osx.
>
> % java -version
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3326)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
Mon, 2011-05-23, 02:27
#3
Re: Scala 2.9 slower?
On 5/22/11 2:11 PM, Josh Marcus wrote:
> When I run benchmarks (against recent sun Java on Linux) iterating over
> an array, my anecdotal observation is that substantial jvm hotspot
> optimization (I *think* escape analysis) kicks in between 1.5 and 2
> million elements. I just mention that because I wouldn't expect
> different sized arrays under a certain threshold to have the same
> performance characteristics -- and also I think sometimes people (e.g.
> me) expect optimizations like escape analysis to kick in way sooner than
> the JVM actually applies them.
The numbers I posted used a higher number than he did, but my results
never varied, including running it as given.
My current supposition is that 2.8.1 is faster because it had
Range.ByOne and hotspot liked it better.
// this was in 2.8.1, is no longer with us
trait ByOne extends Range {
override final def foreach[@specialized(Unit) U](f: Int => U) {
if (length > 0) {
val last = this.last
var i = start
while (i != last) {
f(i)
i += 1
}
f(i)
}
}
}
// 2.8.1 RichInt
def until(end: Int): Range with Range.ByOne = Range(start, end)
def until(end: Int, step: Int): Range = Range(start, end, step)
Mon, 2011-05-23, 02:47
#4
Re: Scala 2.9 slower?
Here's what I got using Java 1.6.24 on Windows 7 32-bit (3GB):
2.8.1-standard: 120
2.8.1-optimize: 124
2.9.0-standard: 137
2.9.0-optimize: 53
On 22 May 2011 20:40, Daniel Sobral <dcsobral@gmail.com> wrote:
2.8.1-standard: 120
2.8.1-optimize: 124
2.9.0-standard: 137
2.9.0-optimize: 53
On 22 May 2011 20:40, Daniel Sobral <dcsobral@gmail.com> wrote:
I'm a bit concerned. After seeing that -optimize really does optimize
for comprehensions on 2.9.0, I then benchmarked the code, and
discovered it was about 2.5 times slower than 2.8.1. Then I compiled
without optimization, and got code almost 3 times slower than 2.8.1,
which, at least, shows that the optimization is working.
The code I used to benchmark is below. I tried some variations some
which improved 2.9.0 numbers, but none that made it as fast as 2.8.1.
I know this is pure micro-benchmark, and doesn't mean much in a larger
codebase, but _something_ seems to have gone the wrong way.
object OptTest {
def main(args: Array[String]) {
println("Time: "+bench)
}
def bench: Long = {
val testSet = Array.fill(15000)(Array.fill(1000)(util.Random.nextInt))
val optex = new OptEx
// Warm-up
var result = 0
for (array <- testSet) result += optex.sum(array)
// benchmark
val start = java.lang.System.currentTimeMillis
for (array <- testSet) result += optex.sum(array)
val stop = java.lang.System.currentTimeMillis
println("Ensure nothing gets optimized away: "+result)
stop - start
}
}
class OptEx {
def sum(l: Array[Int]) = {
var acc = 0
for (i <- 0 until l.length) acc += l(i)
acc
}
}
--
Daniel C. Sobral
I travel to the future all the time.
Mon, 2011-05-23, 03:47
#5
Re: Scala 2.9 slower?
On 5/22/11 6:37 PM, Trond Olsen wrote:
> Here's what I got using Java 1.6.24 on Windows 7 32-bit (3GB):
>
> 2.8.1-standard: 120
> 2.8.1-optimize: 124
> 2.9.0-standard: 137
> 2.9.0-optimize: 53
And it is all kinds of possible dcsobral was on windows too. Windows,
turning things on their head since 1882. Not that your results match
his OR mine. Perhaps we're running some kind of jvm implementation
rorschach test... "what does this image make you want to inline?"
Mon, 2011-05-23, 05:07
#6
Re: Scala 2.9 slower?
To add another data point, I get performance on Linux that are similar to Daniel's. The code runs more quickly when compiled with "-optimise" under 2.9.0, but significantly more slowly than 2.8.1.
Running: -- 150000 instead of 15000 (like Paul) -- Linux (ubuntu 11.04) -- Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
Time for unoptimised 281: 212Time for optimised 281: 211
Time for unoptimised 290: 406Time for optimized 290: 351
--j
On Sun, May 22, 2011 at 4:41 PM, Paul Phillips <paulp@improving.org> wrote:
Running: -- 150000 instead of 15000 (like Paul) -- Linux (ubuntu 11.04) -- Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
Time for unoptimised 281: 212Time for optimised 281: 211
Time for unoptimised 290: 406Time for optimized 290: 351
--j
On Sun, May 22, 2011 at 4:41 PM, Paul Phillips <paulp@improving.org> wrote:
On 5/22/11 11:40 AM, Daniel Sobral wrote:
> I'm a bit concerned. After seeing that -optimize really does optimize
> for comprehensions on 2.9.0, I then benchmarked the code, and
> discovered it was about 2.5 times slower than 2.8.1. Then I compiled
> without optimization, and got code almost 3 times slower than 2.8.1,
> which, at least, shows that the optimization is working.
My numbers are nothing like yours. What they said pretty unambiguously is that the optimization is not working. That it's madly pessimizing. This is with 150000 where you have 15000.
Times for unoptimised, 281 / 29 / trunk
Time: 113
Time: 204
Time: 182
Times for unoptimised, 281 / 29 / trunk
Time: 116
Time: 207
Time: 180
Times for unoptimised, 281 / 29 / trunk
Time: 122
Time: 206
Time: 182
I can only think you mixed up your outputs, because if it's twice as slow under -optimise for me I have trouble imagining what confounding variables would make it faster for you.
Times for -optimise, 281 / 29 / trunk
Time: 118
Time: 380
Time: 353
Times for -optimise, 281 / 29 / trunk
Time: 127
Time: 382
Time: 347
Times for -optimise, 281 / 29 / trunk
Time: 116
Time: 379
Time: 353
I'm on osx.
% java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3326)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
Mon, 2011-05-23, 05:27
#7
Re: Scala 2.9 slower?
Okay, so has anyone tried a long-running test then?
Something which will take an hour to complete?
On 23 May 2011 06:03, Josh Marcus wrote:
> To add another data point, I get performance on Linux that are similar to
> Daniel's. The code runs more quickly when compiled with "-optimise" under
> 2.9.0, but significantly more slowly than 2.8.1.
> Running:
> -- 150000 instead of 15000 (like Paul)
> -- Linux (ubuntu 11.04)
> -- Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> Time for unoptimised 281: 212
> Time for optimised 281: 211
> Time for unoptimised 290: 406
> Time for optimized 290: 351
> --j
> On Sun, May 22, 2011 at 4:41 PM, Paul Phillips wrote:
>>
>> On 5/22/11 11:40 AM, Daniel Sobral wrote:
>> > I'm a bit concerned. After seeing that -optimize really does optimize
>> > for comprehensions on 2.9.0, I then benchmarked the code, and
>> > discovered it was about 2.5 times slower than 2.8.1. Then I compiled
>> > without optimization, and got code almost 3 times slower than 2.8.1,
>> > which, at least, shows that the optimization is working.
>>
>> My numbers are nothing like yours. What they said pretty unambiguously is
>> that the optimization is not working. That it's madly pessimizing. This is
>> with 150000 where you have 15000.
>>
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 113
>> Time: 204
>> Time: 182
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 116
>> Time: 207
>> Time: 180
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 122
>> Time: 206
>> Time: 182
>>
>> I can only think you mixed up your outputs, because if it's twice as slow
>> under -optimise for me I have trouble imagining what confounding variables
>> would make it faster for you.
>>
>> Times for -optimise, 281 / 29 / trunk
>> Time: 118
>> Time: 380
>> Time: 353
>> Times for -optimise, 281 / 29 / trunk
>> Time: 127
>> Time: 382
>> Time: 347
>> Times for -optimise, 281 / 29 / trunk
>> Time: 116
>> Time: 379
>> Time: 353
>>
>> I'm on osx.
>>
>> % java -version
>> java version "1.6.0_22"
>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3326)
>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
>
>
Mon, 2011-05-23, 06:37
#8
Re: Scala 2.9 slower?
I couldn't bother to run this thing for ages, but some 100 iterations turned out these stats (while and for-loop compared):
// val testSet = Array.fill(50000)(Array.fill(1000)(util.Random.nextInt))
// 2.9.0 stats
scala benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 73ms/100 72ms 194mb/300 195mb while-test
ok 416ms/100 411ms 202mb/999 210mb for-test
scala -optimise benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 194mb/299 195mb while-test
ok 170ms/100 167ms 202mb/499 209mb for-test
// 2.8.1 stats
scala benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 202mb/300 202mb while-test
ok 383ms/100 378ms 202mb/900 202mb for-test
scala -optimise benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 198mb/300 199mb while-test
ok 387ms/100 379ms 199mb/901 199mb for-test
The benchmark is available at http://pastebin.com/Z6JM5EzE if anyone want to try it out. Note: I just recently wrote it and may contain a bug or two. Can't make any guarantees that the measurement process doesn't affect the result either.
On 23 May 2011 06:19, Adam Jorgensen <adam.jorgensen.za@gmail.com> wrote:
// val testSet = Array.fill(50000)(Array.fill(1000)(util.Random.nextInt))
// 2.9.0 stats
scala benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 73ms/100 72ms 194mb/300 195mb while-test
ok 416ms/100 411ms 202mb/999 210mb for-test
scala -optimise benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 194mb/299 195mb while-test
ok 170ms/100 167ms 202mb/499 209mb for-test
// 2.8.1 stats
scala benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 202mb/300 202mb while-test
ok 383ms/100 378ms 202mb/900 202mb for-test
scala -optimise benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 198mb/300 199mb while-test
ok 387ms/100 379ms 199mb/901 199mb for-test
The benchmark is available at http://pastebin.com/Z6JM5EzE if anyone want to try it out. Note: I just recently wrote it and may contain a bug or two. Can't make any guarantees that the measurement process doesn't affect the result either.
On 23 May 2011 06:19, Adam Jorgensen <adam.jorgensen.za@gmail.com> wrote:
Okay, so has anyone tried a long-running test then?
Something which will take an hour to complete?
On 23 May 2011 06:03, Josh Marcus <jmarcus@azavea.com> wrote:
> To add another data point, I get performance on Linux that are similar to
> Daniel's. The code runs more quickly when compiled with "-optimise" under
> 2.9.0, but significantly more slowly than 2.8.1.
> Running:
> -- 150000 instead of 15000 (like Paul)
> -- Linux (ubuntu 11.04)
> -- Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> Time for unoptimised 281: 212
> Time for optimised 281: 211
> Time for unoptimised 290: 406
> Time for optimized 290: 351
> --j
> On Sun, May 22, 2011 at 4:41 PM, Paul Phillips <paulp@improving.org> wrote:
>>
>> On 5/22/11 11:40 AM, Daniel Sobral wrote:
>> > I'm a bit concerned. After seeing that -optimize really does optimize
>> > for comprehensions on 2.9.0, I then benchmarked the code, and
>> > discovered it was about 2.5 times slower than 2.8.1. Then I compiled
>> > without optimization, and got code almost 3 times slower than 2.8.1,
>> > which, at least, shows that the optimization is working.
>>
>> My numbers are nothing like yours. What they said pretty unambiguously is
>> that the optimization is not working. That it's madly pessimizing. This is
>> with 150000 where you have 15000.
>>
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 113
>> Time: 204
>> Time: 182
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 116
>> Time: 207
>> Time: 180
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 122
>> Time: 206
>> Time: 182
>>
>> I can only think you mixed up your outputs, because if it's twice as slow
>> under -optimise for me I have trouble imagining what confounding variables
>> would make it faster for you.
>>
>> Times for -optimise, 281 / 29 / trunk
>> Time: 118
>> Time: 380
>> Time: 353
>> Times for -optimise, 281 / 29 / trunk
>> Time: 127
>> Time: 382
>> Time: 347
>> Times for -optimise, 281 / 29 / trunk
>> Time: 116
>> Time: 379
>> Time: 353
>>
>> I'm on osx.
>>
>> % java -version
>> java version "1.6.0_22"
>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3326)
>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
>
>
Mon, 2011-05-23, 06:47
#9
Re: Scala 2.9 slower?
Bork! Forgot to use for in outer loop in for-test. Didn't affect results much anyway. Just som 20ms extra. Updated benchmark at http://pastebin.com/wEfNtBkq.
On 23 May 2011 07:30, Trond Olsen <trond@steinbit.org> wrote:
On 23 May 2011 07:30, Trond Olsen <trond@steinbit.org> wrote:
I couldn't bother to run this thing for ages, but some 100 iterations turned out these stats (while and for-loop compared):
// val testSet = Array.fill(50000)(Array.fill(1000)(util.Random.nextInt))
// 2.9.0 stats
scala benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 73ms/100 72ms 194mb/300 195mb while-test
ok 416ms/100 411ms 202mb/999 210mb for-test
scala -optimise benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 194mb/299 195mb while-test
ok 170ms/100 167ms 202mb/499 209mb for-test
// 2.8.1 stats
scala benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 202mb/300 202mb while-test
ok 383ms/100 378ms 202mb/900 202mb for-test
scala -optimise benchmark.scala
avgtime/no mintime avgmem/no maxmem id
ok 74ms/100 72ms 198mb/300 199mb while-test
ok 387ms/100 379ms 199mb/901 199mb for-test
The benchmark is available at http://pastebin.com/Z6JM5EzE if anyone want to try it out. Note: I just recently wrote it and may contain a bug or two. Can't make any guarantees that the measurement process doesn't affect the result either.
On 23 May 2011 06:19, Adam Jorgensen <adam.jorgensen.za@gmail.com> wrote:
Okay, so has anyone tried a long-running test then?
Something which will take an hour to complete?
On 23 May 2011 06:03, Josh Marcus <jmarcus@azavea.com> wrote:
> To add another data point, I get performance on Linux that are similar to
> Daniel's. The code runs more quickly when compiled with "-optimise" under
> 2.9.0, but significantly more slowly than 2.8.1.
> Running:
> -- 150000 instead of 15000 (like Paul)
> -- Linux (ubuntu 11.04)
> -- Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
> Time for unoptimised 281: 212
> Time for optimised 281: 211
> Time for unoptimised 290: 406
> Time for optimized 290: 351
> --j
> On Sun, May 22, 2011 at 4:41 PM, Paul Phillips <paulp@improving.org> wrote:
>>
>> On 5/22/11 11:40 AM, Daniel Sobral wrote:
>> > I'm a bit concerned. After seeing that -optimize really does optimize
>> > for comprehensions on 2.9.0, I then benchmarked the code, and
>> > discovered it was about 2.5 times slower than 2.8.1. Then I compiled
>> > without optimization, and got code almost 3 times slower than 2.8.1,
>> > which, at least, shows that the optimization is working.
>>
>> My numbers are nothing like yours. What they said pretty unambiguously is
>> that the optimization is not working. That it's madly pessimizing. This is
>> with 150000 where you have 15000.
>>
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 113
>> Time: 204
>> Time: 182
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 116
>> Time: 207
>> Time: 180
>> Times for unoptimised, 281 / 29 / trunk
>> Time: 122
>> Time: 206
>> Time: 182
>>
>> I can only think you mixed up your outputs, because if it's twice as slow
>> under -optimise for me I have trouble imagining what confounding variables
>> would make it faster for you.
>>
>> Times for -optimise, 281 / 29 / trunk
>> Time: 118
>> Time: 380
>> Time: 353
>> Times for -optimise, 281 / 29 / trunk
>> Time: 127
>> Time: 382
>> Time: 347
>> Times for -optimise, 281 / 29 / trunk
>> Time: 116
>> Time: 379
>> Time: 353
>>
>> I'm on osx.
>>
>> % java -version
>> java version "1.6.0_22"
>> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3326)
>> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
>
>
Mon, 2011-05-23, 14:17
#10
Re: Scala 2.9 slower?
Sorry for not having mentioned it, but I was running on Windows Vista
64, 64 bits Java, no flags. I'm at work right now, so I can't get the
exact version of the JVM (I pasted it on IRC at one time, in case
anyone keeps a window logging everything there). I was definitely
seeing -optimize yield better results. I suspected ByOne as well, and
so also tried using "by 2" inside OptEx. It helped, but did not
overcome the difference.
On Sun, May 22, 2011 at 15:40, Daniel Sobral wrote:
> I'm a bit concerned. After seeing that -optimize really does optimize
> for comprehensions on 2.9.0, I then benchmarked the code, and
> discovered it was about 2.5 times slower than 2.8.1. Then I compiled
> without optimization, and got code almost 3 times slower than 2.8.1,
> which, at least, shows that the optimization is working.
>
> The code I used to benchmark is below. I tried some variations some
> which improved 2.9.0 numbers, but none that made it as fast as 2.8.1.
> I know this is pure micro-benchmark, and doesn't mean much in a larger
> codebase, but _something_ seems to have gone the wrong way.
>
> object OptTest {
> def main(args: Array[String]) {
> println("Time: "+bench)
> }
>
> def bench: Long = {
> val testSet = Array.fill(15000)(Array.fill(1000)(util.Random.nextInt))
> val optex = new OptEx
>
> // Warm-up
> var result = 0
> for (array <- testSet) result += optex.sum(array)
>
> // benchmark
> val start = java.lang.System.currentTimeMillis
> for (array <- testSet) result += optex.sum(array)
> val stop = java.lang.System.currentTimeMillis
> println("Ensure nothing gets optimized away: "+result)
>
> stop - start
> }
>
> }
>
> class OptEx {
> def sum(l: Array[Int]) = {
> var acc = 0
> for (i <- 0 until l.length) acc += l(i)
> acc
> }
> }
>
> --
> Daniel C. Sobral
>
> I travel to the future all the time.
>
Mon, 2011-05-23, 19:47
#11
2.8 vs. 2.9: mr. microbenchmark returns
These numbers may only serve to confuse things further, but they do nicely illustrate that numbers which aren't accompanied by at least all the factors covered here, plus the platform, jvm version, and who knows what else, are useless. Also, I took additional measures to stabilize the numbers and they still vary by nearly a factor of two in consecutive calls to the same method. A spread like this:
Times: 24.401 / 41.13 / 21.543
doesn't fill me with confidence I have a clue what is really happening. Sometimes I think all we ever measure are hotspot quirks. Maybe there's some major flaw in the code (see below.)
The difference with 32-bit is so big I'm thinking something is being emulated in the osx jvm.
-client -d32 gets nearly 3x faster under -optimise.
-client -d64 gets nearly 2x slower under -optimise.
-server gets about 1.5x slower under -optimise.
The numbers are basically stable across multiple runs, including the weird example above: here are a couple more of those from different runs.
Times: 26.103 / 41.483 / 21.169
Times: 23.68 / 42.071 / 21.27
All of them:
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/281/bin/scala -cp ./target/281/client Test (compiled with )
Times: 147.247 / 146.642 / 150.934
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/281/bin/scala -cp ./target/281/client-opt Test (compiled with -optimise)
Times: 146.652 / 146.314 / 150.41
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/281/bin/scala -cp ./target/281/client64 Test (compiled with )
Times: 23.413 / 21.758 / 20.948
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/281/bin/scala -cp ./target/281/client64-opt Test (compiled with -optimise)
Times: 24.401 / 41.13 / 21.543
JAVA_OPTS="-Xmx1024m -server" ./scalas/281/bin/scala -cp ./target/281/server Test (compiled with )
Times: 24.435 / 25.972 / 22.151
JAVA_OPTS="-Xmx1024m -server" ./scalas/281/bin/scala -cp ./target/281/server-opt Test (compiled with -optimise)
Times: 24.427 / 21.995 / 22.372
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/29/bin/scala -cp ./target/29/client Test (compiled with )
Times: 169.916 / 186.567 / 187.333
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/29/bin/scala -cp ./target/29/client-opt Test (compiled with -optimise)
Times: 72.994 / 90.613 / 78.926
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/29/bin/scala -cp ./target/29/client64 Test (compiled with )
Times: 39.039 / 34.789 / 34.964
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/29/bin/scala -cp ./target/29/client64-opt Test (compiled with -optimise)
Times: 67.416 / 52.48 / 52.525
JAVA_OPTS="-Xmx1024m -server" ./scalas/29/bin/scala -cp ./target/29/server Test (compiled with )
Times: 42.018 / 41.98 / 33.762
JAVA_OPTS="-Xmx1024m -server" ./scalas/29/bin/scala -cp ./target/29/server-opt Test (compiled with -optimise)
Times: 70.827 / 76.681 / 69.961
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/trunk/bin/scala -cp ./target/trunk/client Test (compiled with )
Times: 171.41 / 179.602 / 171.946
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/trunk/bin/scala -cp ./target/trunk/client-opt Test (compiled with -optimise)
Times: 69.005 / 68.011 / 67.853
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/trunk/bin/scala -cp ./target/trunk/client64 Test (compiled with )
Times: 34.801 / 35.017 / 51.299
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/trunk/bin/scala -cp ./target/trunk/client64-opt Test (compiled with -optimise)
Times: 62.01 / 67.581 / 50.296
JAVA_OPTS="-Xmx1024m -server" ./scalas/trunk/bin/scala -cp ./target/trunk/server Test (compiled with )
Times: 44.47 / 32.772 / 32.562
JAVA_OPTS="-Xmx1024m -server" ./scalas/trunk/bin/scala -cp ./target/trunk/server-opt Test (compiled with -optimise)
Times: 67.066 / 67.597 / 50.858
Numbers collected with the following code: spirit is the same as the original.
object Test {
def main(args: Array[String]) {
val s1 = if (args.length > 0) args(0).toInt else 15000
val s2 = if (args.length > 1) args(1).toInt else 2000
// warmup
bench(100, 1000)
bench(100, 1000)
bench(100, 1000)
bench(100, 1000)
bench(100, 1000)
val t1 = bench(s1, s2)
val t2 = bench(s1, s2)
val t3 = bench(s1, s2)
println("Times: " + t1 + " / " + t2 + " / " + t3)
}
def bench(s1: Int, s2: Int): Double = {
val testSet = new Array[Array[Int]](s1)
var i = 0;
while (i < s1) {
val arr = new Array[Int](s2)
testSet(i) = arr
var j = 0
while (j < s2) {
arr(j) = util.Random.nextInt()
j += 1
}
i += 1
}
val optex = new OptEx
// Warm-up
var result = 0
for (array <- testSet) result += optex.sum(array)
// benchmark
val start = System.nanoTime
for (array <- testSet) result += optex.sum(array)
println("Ensure nothing gets optimized away: "+result)
val stop = System.nanoTime
(stop - start).toDouble / 1000000
}
}
class OptEx {
def sum(l: Array[Int]): Int = {
var acc = 0
for (i <- 0 until l.length)
acc += l(i)
acc
}
}
Mon, 2011-05-23, 20:17
#12
Re: 2.8 vs. 2.9: mr. microbenchmark returns
Your benchmark is too small. 20 ms is pretty much nothing--almost certainly a garbage collection or a compilation or something. You can catch those with -XX:+PrintCompilation -verbose:gc but the solution is really to make the test harder. You should loop over your arrays several times (e.g. 10).
I usually run tests with this level of inner-loop complexity at least a billion times (unless it proves arduously slow). The inner loop here is only run 30M times by default.
Also, util.Random.nextInt() is an awfully expensive thing to do for a test like this; it's ~30x slower than a simple addition. This means that you're spending almost all your time creating the test and almost none of it testing.
--Rex
On Mon, May 23, 2011 at 2:46 PM, Paul Phillips <paulp@improving.org> wrote:
I usually run tests with this level of inner-loop complexity at least a billion times (unless it proves arduously slow). The inner loop here is only run 30M times by default.
Also, util.Random.nextInt() is an awfully expensive thing to do for a test like this; it's ~30x slower than a simple addition. This means that you're spending almost all your time creating the test and almost none of it testing.
--Rex
On Mon, May 23, 2011 at 2:46 PM, Paul Phillips <paulp@improving.org> wrote:
These numbers may only serve to confuse things further, but they do nicely illustrate that numbers which aren't accompanied by at least all the factors covered here, plus the platform, jvm version, and who knows what else, are useless. Also, I took additional measures to stabilize the numbers and they still vary by nearly a factor of two in consecutive calls to the same method. A spread like this:
Times: 24.401 / 41.13 / 21.543
doesn't fill me with confidence I have a clue what is really happening. Sometimes I think all we ever measure are hotspot quirks. Maybe there's some major flaw in the code (see below.)
The difference with 32-bit is so big I'm thinking something is being emulated in the osx jvm.
-client -d32 gets nearly 3x faster under -optimise.
-client -d64 gets nearly 2x slower under -optimise.
-server gets about 1.5x slower under -optimise.
The numbers are basically stable across multiple runs, including the weird example above: here are a couple more of those from different runs.
Times: 26.103 / 41.483 / 21.169
Times: 23.68 / 42.071 / 21.27
All of them:
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/281/bin/scala -cp ./target/281/client Test (compiled with )
Times: 147.247 / 146.642 / 150.934
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/281/bin/scala -cp ./target/281/client-opt Test (compiled with -optimise)
Times: 146.652 / 146.314 / 150.41
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/281/bin/scala -cp ./target/281/client64 Test (compiled with )
Times: 23.413 / 21.758 / 20.948
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/281/bin/scala -cp ./target/281/client64-opt Test (compiled with -optimise)
Times: 24.401 / 41.13 / 21.543
JAVA_OPTS="-Xmx1024m -server" ./scalas/281/bin/scala -cp ./target/281/server Test (compiled with )
Times: 24.435 / 25.972 / 22.151
JAVA_OPTS="-Xmx1024m -server" ./scalas/281/bin/scala -cp ./target/281/server-opt Test (compiled with -optimise)
Times: 24.427 / 21.995 / 22.372
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/29/bin/scala -cp ./target/29/client Test (compiled with )
Times: 169.916 / 186.567 / 187.333
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/29/bin/scala -cp ./target/29/client-opt Test (compiled with -optimise)
Times: 72.994 / 90.613 / 78.926
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/29/bin/scala -cp ./target/29/client64 Test (compiled with )
Times: 39.039 / 34.789 / 34.964
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/29/bin/scala -cp ./target/29/client64-opt Test (compiled with -optimise)
Times: 67.416 / 52.48 / 52.525
JAVA_OPTS="-Xmx1024m -server" ./scalas/29/bin/scala -cp ./target/29/server Test (compiled with )
Times: 42.018 / 41.98 / 33.762
JAVA_OPTS="-Xmx1024m -server" ./scalas/29/bin/scala -cp ./target/29/server-opt Test (compiled with -optimise)
Times: 70.827 / 76.681 / 69.961
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/trunk/bin/scala -cp ./target/trunk/client Test (compiled with )
Times: 171.41 / 179.602 / 171.946
JAVA_OPTS="-Xmx1024m -client -d32" ./scalas/trunk/bin/scala -cp ./target/trunk/client-opt Test (compiled with -optimise)
Times: 69.005 / 68.011 / 67.853
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/trunk/bin/scala -cp ./target/trunk/client64 Test (compiled with )
Times: 34.801 / 35.017 / 51.299
JAVA_OPTS="-Xmx1024m -client -d64" ./scalas/trunk/bin/scala -cp ./target/trunk/client64-opt Test (compiled with -optimise)
Times: 62.01 / 67.581 / 50.296
JAVA_OPTS="-Xmx1024m -server" ./scalas/trunk/bin/scala -cp ./target/trunk/server Test (compiled with )
Times: 44.47 / 32.772 / 32.562
JAVA_OPTS="-Xmx1024m -server" ./scalas/trunk/bin/scala -cp ./target/trunk/server-opt Test (compiled with -optimise)
Times: 67.066 / 67.597 / 50.858
Numbers collected with the following code: spirit is the same as the original.
object Test {
def main(args: Array[String]) {
val s1 = if (args.length > 0) args(0).toInt else 15000
val s2 = if (args.length > 1) args(1).toInt else 2000
// warmup
bench(100, 1000)
bench(100, 1000)
bench(100, 1000)
bench(100, 1000)
bench(100, 1000)
val t1 = bench(s1, s2)
val t2 = bench(s1, s2)
val t3 = bench(s1, s2)
println("Times: " + t1 + " / " + t2 + " / " + t3)
}
def bench(s1: Int, s2: Int): Double = {
val testSet = new Array[Array[Int]](s1)
var i = 0;
while (i < s1) {
val arr = new Array[Int](s2)
testSet(i) = arr
var j = 0
while (j < s2) {
arr(j) = util.Random.nextInt()
j += 1
}
i += 1
}
val optex = new OptEx
// Warm-up
var result = 0
for (array <- testSet) result += optex.sum(array)
// benchmark
val start = System.nanoTime
for (array <- testSet) result += optex.sum(array)
println("Ensure nothing gets optimized away: "+result)
val stop = System.nanoTime
(stop - start).toDouble / 1000000
}
}
class OptEx {
def sum(l: Array[Int]): Int = {
var acc = 0
for (i <- 0 until l.length)
acc += l(i)
acc
}
}
Mon, 2011-05-23, 22:37
#13
Re: 2.8 vs. 2.9: mr. microbenchmark returns
On 23 May 2011 11:46, Paul Phillips wrote:
:
:
> object Test {
> def main(args: Array[String]) {
> val s1 = if (args.length > 0) args(0).toInt else 15000
> val s2 = if (args.length > 1) args(1).toInt else 2000
>
> // warmup
> bench(100, 1000)
> bench(100, 1000)
> bench(100, 1000)
> bench(100, 1000)
> bench(100, 1000)
Are you sure this is enough to warm up the JVM? I would just run it a
few hundred times to be sure.
> val t1 = bench(s1, s2)
> val t2 = bench(s1, s2)
> val t3 = bench(s1, s2)
This has no statistical significance. Why don't you run it a few
hundred times and determine the average and standard deviation? Then
you can create a 95% (or 99%) confidence interval and make a
statistically meaningful statement about your scenario.
> println("Times: " + t1 + " / " + t2 + " / " + t3)
> }
>
> def bench(s1: Int, s2: Int): Double = {
:
:
> // benchmark
> val start = System.nanoTime
> for (array <- testSet) result += optex.sum(array)
> println("Ensure nothing gets optimized away: "+result)
> val stop = System.nanoTime
Why do you include the println statement in your test? That introduces
all kinds of IO, kernel and thread interactions to your test. I would
swap the last two statements so you only measure the for loop.
> (stop - start).toDouble / 1000000
> }
> }
:
Mon, 2011-05-23, 22:57
#14
Re: 2.8 vs. 2.9: mr. microbenchmark returns
I was kicking someone else's can here, it's really not my scenario. As I
have often noted, measuring performance is not my thing: all signs point
toward this being a fine opportunity for someone for whom that is their
thing to make a contribution by offering concrete recommendations. (By
which I mean: recommendations about the finer points of benchmark
writing are less desirable than recommendations about why 2.9 is
whatever amount slower than 2.8 and what might be done about it.)
Tue, 2011-05-24, 01:47
#15
Re: 2.8 vs. 2.9: mr. microbenchmark returns
On Mon, May 23, 2011 at 18:29, Hilco Wijbenga wrote:
> On 23 May 2011 11:46, Paul Phillips wrote:
> :
>
> :
>> object Test {
>> def main(args: Array[String]) {
>> val s1 = if (args.length > 0) args(0).toInt else 15000
>> val s2 = if (args.length > 1) args(1).toInt else 2000
>>
>> // warmup
>> bench(100, 1000)
>> bench(100, 1000)
>> bench(100, 1000)
>> bench(100, 1000)
>> bench(100, 1000)
>
> Are you sure this is enough to warm up the JVM? I would just run it a
> few hundred times to be sure.
Note that OptEx is run 15000 times each time bench is called.
However, unless you can show the observed slowdown from 2.8.1 to 2.9.0
is solely due to inadequacies of the benchmark code -- by showing
benchmark code that measures the same thing and shows no such slowdown
Tue, 2011-05-24, 11:37
#16
Re: 2.8 vs. 2.9: mr. microbenchmark returns
On Mon, May 23, 2011 at 8:16 PM, Rex Kerr wrote:
> Also, util.Random.nextInt() is an awfully expensive thing to do for a test
> like this; it's ~30x slower than a simple addition. This means that you're
> spending almost all your time creating the test and almost none of it
> testing.
It's worth making it clear that Random.nextInt is not timed, so it's a
minor issue. A major issue with that benchmark is that println was
called before stop was measured (as was mentioned in another message).
Those lines should be swapped. Adding System.gc before timings is also
a good idea to reduce noise.
Best,
Ismael
Tue, 2011-05-24, 11:47
#17
Re: 2.8 vs. 2.9: mr. microbenchmark returns
There's a whole Java microbenchmarking framework available from Google:
http://code.google.com/p/caliper/
Perhaps it would make sense to setup an instance of that tool for
Scala (however, it seems to use AppEngine, so it's probably not
particularly light-weight).
On Tue, May 24, 2011 at 12:12 PM, Ismael Juma wrote:
> On Mon, May 23, 2011 at 8:16 PM, Rex Kerr wrote:
>> Also, util.Random.nextInt() is an awfully expensive thing to do for a test
>> like this; it's ~30x slower than a simple addition. This means that you're
>> spending almost all your time creating the test and almost none of it
>> testing.
>
> It's worth making it clear that Random.nextInt is not timed, so it's a
> minor issue. A major issue with that benchmark is that println was
> called before stop was measured (as was mentioned in another message).
> Those lines should be swapped. Adding System.gc before timings is also
> a good idea to reduce noise.
>
> Best,
> Ismael
>
Tue, 2011-05-24, 12:27
#18
Re: 2.8 vs. 2.9: mr. microbenchmark returns
On Tue, May 24, 2011 at 12:32 PM, Johannes Rudolph
wrote:
> There's a whole Java microbenchmarking framework available from Google:
>
> http://code.google.com/p/caliper/
>
> Perhaps it would make sense to setup an instance of that tool for
> Scala (however, it seems to use AppEngine, so it's probably not
> particularly light-weight).
Stupid me: Mathias already got that ready to use for Scala, so it
works without AppEngine as well, of course:
Tue, 2011-05-24, 22:07
#19
Re: 2.8 vs. 2.9: mr. microbenchmark returns
On Tue, May 24, 2011 at 12:12 PM, Ismael Juma <ismael@juma.me.uk> wrote:
And stop should probably be included in the println too. Otherwise some optimizing entity between the scala code and the metal is allowed to reorder the operations unless I'm mistaken.
Cheers,
Daniel
minor issue. A major issue with that benchmark is that println was
called before stop was measured (as was mentioned in another message).
Those lines should be swapped.
And stop should probably be included in the println too. Otherwise some optimizing entity between the scala code and the metal is allowed to reorder the operations unless I'm mistaken.
Cheers,
Daniel
Tue, 2011-06-07, 18:57
#20
Re: 2.8 vs. 2.9: mr. microbenchmark returns
On May 24, 3:32 am, Johannes Rudolph
wrote:
> There's a whole Java microbenchmarking framework available from Google:
>
> http://code.google.com/p/caliper/
Also - JavaStats 'Statistically Rigorous Java Performance Evaluation'
http://www.elis.ugent.be/JavaStats
"JavaStats is a tool, written in Python that allows to run Java
benchmarks automatically until a sufficiently narrow confidence
interval is obtained for a given performance metric."
On 5/22/11 11:40 AM, Daniel Sobral wrote:
> I'm a bit concerned. After seeing that -optimize really does optimize
> for comprehensions on 2.9.0, I then benchmarked the code, and
> discovered it was about 2.5 times slower than 2.8.1. Then I compiled
> without optimization, and got code almost 3 times slower than 2.8.1,
> which, at least, shows that the optimization is working.
My numbers are nothing like yours. What they said pretty unambiguously is that the optimization is not working. That it's madly pessimizing. This is with 150000 where you have 15000.
Times for unoptimised, 281 / 29 / trunk
Time: 113
Time: 204
Time: 182
Times for unoptimised, 281 / 29 / trunk
Time: 116
Time: 207
Time: 180
Times for unoptimised, 281 / 29 / trunk
Time: 122
Time: 206
Time: 182
I can only think you mixed up your outputs, because if it's twice as slow under -optimise for me I have trouble imagining what confounding variables would make it faster for you.
Times for -optimise, 281 / 29 / trunk
Time: 118
Time: 380
Time: 353
Times for -optimise, 281 / 29 / trunk
Time: 127
Time: 382
Time: 347
Times for -optimise, 281 / 29 / trunk
Time: 116
Time: 379
Time: 353
I'm on osx.
% java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3326)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)