This page is no longer maintained — Please continue to the home page at www.scala-lang.org

2.8+ and while versus for-comprehensions

22 replies
Silvio Bierman
Joined: 2009-02-16,
User offline. Last seen 1 year 16 weeks ago.

Hello all,

I read a lot of posts lately about the speed differences between
plain-and-ugly while loops to replace equivalent but much slower for (i <-
low until high) constructs.

Today I rewrote a genetic algorithm from Java to Scala. I purposely used
for-comprehensions instead of while loops to see how big the difference
would be. Because most loops in the algorithm are pair-wise operations on
two arrays foreach is not an efficient alternative.

As could be expected the Scala versions was about 4-5 times slower than the
Java version. Then I replaced the for-comprehensions with equivalent while
loops and that brought the Scala version within 2-3% of the Java version.

Although I understand what causes this I am curious if future Scala versions
might be able to close this gap. The while-based code is much less readable
than the for-version and arguably even less pretty than the original Java
version which is in total contrast with all previous experiences rewriting
Java stuff in Scala.

Any chance future Scala versions may be able to specialize plain integer
versions of for-comprehensions?

David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions


On Tue, Feb 9, 2010 at 3:09 PM, Silvio Bierman <sbierman@jambo-software.com> wrote:

Hello all,

I read a lot of posts lately about the speed differences between
plain-and-ugly while loops to replace equivalent but much slower for (i <-
low until high) constructs.

Today I rewrote a genetic algorithm from Java to Scala. I purposely used
for-comprehensions instead of while loops to see how big the difference
would be. Because most loops in the algorithm are pair-wise operations on
two arrays foreach is not an efficient alternative.

As could be expected the Scala versions was about 4-5 times slower than the
Java version. Then I replaced the for-comprehensions with equivalent while
loops and that brought the Scala version within 2-3% of the Java version.

Although I understand what causes this I am curious if future Scala versions
might be able to close this gap.

I for one would prefer to see the syntactic difference remain.

A bit of background, I wrote the fastest spreadsheet in the world... and it ran on the JVM (how? multi-threaded recalculation, and the dependencies that enable multi-threaded calculation are non-trivial to calculate in O(n log n) time using O(n) space).  So, I've got some experience writing high performance JVM code.

I think it's sub-optimal for high level constructs and performance oriented constructs to share syntax.  Why?  Because it should be very, very obvious to your team and the folks maintaining your code that a particular piece of code is performance sensitive.  Having while-looped, imperative code is a very, very stark indicator that code is performance oriented.  But...

It's been my experience, even with Java code, that my very high performance code (the inner most loop), looks a lot like C code anyway.  When I'm pre-allocating buffers, making sure I'm not entering any monitors, only calling final methods on concrete classes (rather than interfaces/traits), etc., the difference between a discrete while {} construct vs. a for loop (not a Scala for comprehension) matters little.

So, I'd like to see things stay the way they are or have an additional syntactic construct that clearly communicates "I mean this to be high performance, close to the metal byte-code" to every developer who touches it.


 
The while-based code is much less readable
than the for-version and arguably even less pretty than the original Java
version which is in total contrast with all previous experiences rewriting
Java stuff in Scala.

Any chance future Scala versions may be able to specialize plain integer
versions of for-comprehensions?


--
View this message in context: http://old.nabble.com/2.8%2B-and-while-versus-for-comprehensions-tp27523757p27523757.html
Sent from the Scala - User mailing list archive at Nabble.com.




--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics
ewilligers
Joined: 2008-08-20,
User offline. Last seen 3 years 17 weeks ago.
Re: 2.8+ and while versus for-comprehensions

Silvio Bierman wrote:
> Any chance future Scala versions may be able to specialize plain integer
> versions of for-comprehensions?

+1

Option 1: the compiler could special case for (x <- a to b)

Option 2: the compiler could avoid allocating anonymous function objects
when passing by-name arguments to inlined functions. Annotate Range
map and foreach as @inline.

I think option 2 is more consistent with Scala's philosophy, is
approximately the same amount of work, and side-steps the "we don't want
to complicate the compiler with lots of special cases" argument.

Silvio Bierman
Joined: 2009-02-16,
User offline. Last seen 1 year 16 weeks ago.
Re: 2.8+ and while versus for-comprehensions

Although most code I and my team write is not very performance-focused I
disagree with this. One of the big attractions of Scala to me is the fact
that it combines the features of a high abstraction level language with
similar to Java performance (which I consider about as fast as generally
possible, even when compared with non-JVM code).

Execution speed is not only of main importance in tight loop situations. In
my opinion this is a very widely spread misconception. Only code that
involves really slow operations (usually some form of IO or thread incurred
wait-states) is relatively insensitive to raw language execution speed.

Using multiple layers of abstraction can cause even tiny actions on higher
levels to execute enormous amounts of code in lower abstraction layers. On
server based systems where a single process may be serving thousands of
concurrent users (which is the area I mainly code in) execution speed is of
paramount importance.
We have experimented with Groovy and found that a Groovy code base
structured quite similarly to its Java equivalent was able to handle only
about 10% of the concurrent users the Java version could carry. The only
thing resembling a tight loop in there was a piece of code that parsed a XML
DOM into a linear executable (actually interpretable) structure which was
shared across multiple users. The rest was mainly HTML generation, albeit
very complex HTML.

For loops express what the code does a lot more explicitly than while loops
and are less error prone (forgetting or misplacing the increments), which is
why I prefer them. In Java they are just as fast, in Scala they are not.
That is a pity, any way you look at it. Additionally, most
for-comprehensions in my Scala code are already in imperative code involving
arrays, otherwise I will probably use foreach/map/etc. So I consider them
low-level most of the time, even if they are very powerful and high-level in
general.

I am not asking to degrade for-comprehensions to allow a faster
implementation. I am just asking for the compiler to recognize the pattern
and generate the while code in cases where it is equivalent.

Silvio Bierman
Joined: 2009-02-16,
User offline. Last seen 1 year 16 weeks ago.
Re: 2.8+ and while versus for-comprehensions

And please please please WITHOUT the use of annotations!!!

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: 2.8+ and while versus for-comprehensions

On Tue, 2010-02-09 at 15:52 -0800, Silvio Bierman wrote:
> Execution speed is not only of main importance in tight loop situations. In
> my opinion this is a very widely spread misconception. Only code that
> involves really slow operations (usually some form of IO or thread incurred
> wait-states) is relatively insensitive to raw language execution speed.

Out of curiosity, do you have any profiling information that shows Scala
for comprehensions as a hot spot in your server-side application?

If execution speed is so important in the cases that Scala does badly
(which is not any for comprehension!), everyone would be using the Trove
primitive collections or arrays instead of Java collections when dealing
with numbers (after all these get boxed in Java too) and people would be
very careful about using reflection.

Now, there are certainly cases where this is warranted, but for your
typical server-side application, it is generally a few spots that can be
found with a profiler.

Best,
Ismael

Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions

On Feb 9, 2010, at 6:21 PM, David Pollak wrote:


On Tue, Feb 9, 2010 at 3:09 PM, Silvio Bierman <sbierman@jambo-software.com> wrote:


I think it's sub-optimal for high level constructs and performance oriented constructs to share syntax.  Why?  Because it should be very, very obvious to your team and the folks maintaining your code that a particular piece of code is performance sensitive.  Having while-looped, imperative code is a very, very stark indicator that code is performance oriented.  But...


Aside from the elegance of FP, one of the main reasons for someone to use FP is to be able to express concisely and increase productivity.    Forcing one to use something *worse* (i.e. while loops rather than, say java style for loops or better for-comprehensions) does not make sense or win converts.
I think what you may be alluding to is the concern that people will use constructs without understanding the performance characteristics, or under which scenarios these can be optimised.    Anyone who writes performance sensitive applications quickly becomes aware of these issues.

It's been my experience, even with Java code, that my very high performance code (the inner most loop), looks a lot like C code anyway.  When I'm pre-allocating buffers, making sure I'm not entering any monitors, only calling final methods on concrete classes (rather than interfaces/traits), etc., the difference between a discrete while {} construct vs. a for loop (not a Scala for comprehension) matters little.


The problem is that there *is no* for loop equivalent in Scala.    Whiles introduce much more verbose code, reduce readability, and lead to more coding errors.

So, I'd like to see things stay the way they are or have an additional syntactic construct that clearly communicates "I mean this to be high performance, close to the metal byte-code" to every developer who touches it.


It would be a shame to have to add in the java "for" construct, when optimising cases of the for comprehension is surely doable.   I would lean towards optimising the later, but one would certainly get mileage out of the former.
Jonathan
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: 2.8+ and while versus for-comprehensions
On Tue, Feb 9, 2010 at 6:52 PM, Silvio Bierman <sbierman@jambo-software.com> wrote:

For loops express what the code does a lot more explicitly than while loops
and are less error prone (forgetting or misplacing the increments), which is
why I prefer them. In Java they are just as fast, in Scala they are not.
That is a pity, any way you look at it.

Agreed, it is a pity.

I wonder, though, how much of the overhead is boxing and how much is the function call.  I would expect about 5x more overhead from object creation than a non-inlined function call, so for longer runs I'd expect @specialize to make for competitive for many more situations.  For short loops, the initial range object creation will probably dominate, but otherwise the non-inlining is probably less of a problem than the boxing.

Anyway, although I agree with David that the while construct is a nice reminder that you're dealing with high-performance code, having to work so hard to write high performance code is frustrating given that there is no particularly good reason for it (except possibly to make it so awkward to write high-performance imperative code that it encourages people to write immutable, functional code which they normally should be doing anyway).

  --Rex

Silvio Bierman
Joined: 2009-02-16,
User offline. Last seen 1 year 16 weeks ago.
Re: 2.8+ and while versus for-comprehensions

Ismael Juma wrote:
>
> Out of curiosity, do you have any profiling information that shows Scala
> for comprehensions as a hot spot in your server-side application?
>

No profiling info, just typical problem solving cases (with fixed random
seeds so exactly the same execution path) that takes 140s with
for-comprehensions and 24s using equivalent while-loops.
This is a genetic algorithm and therefore so little code that replacing them
loops is a two-minute job. Since the for-comprehensions where the obvious
suspects that is what I tried first.

Ismael Juma wrote:
>
> If execution speed is so important in the cases that Scala does badly
> (which is not any for comprehension!), everyone would be using the Trove
> primitive collections or arrays instead of Java collections when dealing
> with numbers (after all these get boxed in Java too) and people would be
> very careful about using reflection.
>

As I stated and demonstrated, for-comprehensions are notably slow. And I am
not the only one who has noticed. This has little or nothing to do with
boxing, by the way. I think I did make it clear that Java is fast and Scala
is slow in this area.

Ismael Juma wrote:
>
> Now, there are certainly cases where this is warranted, but for your
> typical server-side application, it is generally a few spots that can be
> found with a profiler.
>

Usually it is. But I am talking about a very small piece of code (less than
150 lines) that is all about tight loops. You'd be surprised how
disappointing the profiling accuracy of VisualVM in this case was.

There is no such thing as a typical server side application. I am not
talking about a basic data entry system here, this is a system where people
compose complex nested problems visually and want the application to find an
optimal solution for their problems. That takes a computation that takes a
lot of processor cycles and typically takes somewhere between several
seconds to several minutes. I do not need a profiler to guess where I should
look at for speeding up things.

Silvio Bierman
Joined: 2009-02-16,
User offline. Last seen 1 year 16 weeks ago.
Re: 2.8+ and while versus for-comprehensions

Rex Kerr-2 wrote:
>
> Agreed, it is a pity.
>
> I wonder, though, how much of the overhead is boxing and how much is the
> function call. I would expect about 5x more overhead from object creation
> than a non-inlined function call, so for longer runs I'd expect
> @specialize
> to make for competitive for many more situations. For short loops, the
> initial range object creation will probably dominate, but otherwise the
> non-inlining is probably less of a problem than the boxing.
>

It is either the creation of the range object or the closure. In this case I
don't care since both of them are superfluous.

Rex Kerr-2 wrote:
>
> Anyway, although I agree with David that the while construct is a nice
> reminder that you're dealing with high-performance code, having to work so
> hard to write high performance code is frustrating given that there is no
> particularly good reason for it (except possibly to make it so awkward to
> write high-performance imperative code that it encourages people to write
> immutable, functional code which they normally should be doing anyway).
>
> --Rex
>
>

Yes, I agree in general. Unfortunately immutable functional code is usually
a good (better) replacement for larger code constructs but rarely for small
time-critical loops. That is where the vars and whiles are pulled.

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: 2.8+ and while versus for-comprehensions

Hi Silvio,

On Wed, 2010-02-10 at 00:16 -0800, Silvio Bierman wrote:
> As I stated and demonstrated, for-comprehensions are notably slow. And I am
> not the only one who has noticed. This has little or nothing to do with
> boxing, by the way.

Really?

> I think I did make it clear that Java is fast and Scala is slow in this area.

Let's just say that a lot has been written about the topic and it's much
more nuanced than you may think. Aside from the various posts in this
mailing list, I'll add a few links from my blog:

http://blog.juma.me.uk/2008/09/15/efficient-scala-with-primitive-collect...
http://blog.juma.me.uk/2008/12/17/objects-with-no-allocation-overhead/
http://blog.juma.me.uk/2009/10/26/new-jvm-options-and-scala-iteration-pe...

Some of the entries are quite old (2008), but they are mostly still
relevant today. The one area that needs more investigation is
@specialized.

>
> Ismael Juma wrote:
> >
> > Now, there are certainly cases where this is warranted, but for your
> > typical server-side application, it is generally a few spots that can be
> > found with a profiler.
> >
>
> Usually it is. But I am talking about a very small piece of code (less than
> 150 lines) that is all about tight loops. You'd be surprised how
> disappointing the profiling accuracy of VisualVM in this case was.

I think you're actually agreeing with me. 150 lines is the sort of thing
that I meant by a few spots that can be optimised with uglier code.
Certainly, I agree that it would be nice if the uglier code were not
needed, but it's a much smaller predicament to fix a small amount of
code than having to do it for tens of thousands of code.

> There is no such thing as a typical server side application. I am not
> talking about a basic data entry system here, this is a system where people
> compose complex nested problems visually and want the application to find an
> optimal solution for their problems. That takes a computation that takes a
> lot of processor cycles and typically takes somewhere between several
> seconds to several minutes. I do not need a profiler to guess where I should
> look at for speeding up things.

If you know where to look and it's a small piece of code (certainly 150
lines would qualify), then that sounds like what people tend to call
"tight loops" and "performance oriented". Your previous statements
sounded like you meant something else.

Best,
Ismael

Matthew Pocock 2
Joined: 2010-02-10,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions
My personal view is that this is exactly the sort of issue that should be handled (silently) by the compiler. Making choices between while or for loops, and generating function literals vs anonymous classes are the sort of thing that should never be part of an exposed public API, so the compiler should be free to implement them in a situation-applicable manner.

In this situation, I strongly disagree with annotating the source code. Loops are so pervasive that the compiler should be doing the best thing for me without my intervention. The default behaviour for a numeric loop should be to assume that @specialize is being applied.

Just my 2c

Matthew
ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: 2.8+ and while versus for-comprehensions

Hi Matthew,

On Wed, 2010-02-10 at 09:50 +0000, Matthew Pocock wrote:
> My personal view is that this is exactly the sort of issue that should
> be handled (silently) by the compiler.

What does "this" mean exactly?

> Making choices between while or for loops, and generating function
> literals vs anonymous classes are the sort of thing that should never
> be part of an exposed public API, so the compiler should be free to
> implement them in a situation-applicable manner.

The compiler may be free to implement in a situation-applicable manner,
but it doesn't mean that it will make the best choice.

> In this situation, I strongly disagree with annotating the source
> code.

Again, what do you mean here?

> Loops are so pervasive that the compiler should be doing the best
> thing for me without my intervention. The default behaviour for a
> numeric loop should be to assume that @specialize is being applied.

Yes, it should be. Once @specialize is mature enough, the plan is to
annotate the standard library so you get that by default. Annotating the
standard library seems like a better approach than hard-coding stuff
like that in the compiler. In any case, it may be that @specialize is
not good enough, we just don't know yet.

I see a lot of talk and claims and less detailed analysis. In any case,
I think I'll stay away from these discussions as I feel like I am
repeating myself.

Best,
Ismael

Matthew Pocock 2
Joined: 2010-02-10,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions


On 10 February 2010 10:02, Ismael Juma <mlists@juma.me.uk> wrote:
Hi Matthew,

On Wed, 2010-02-10 at 09:50 +0000, Matthew Pocock wrote:
> My personal view is that this is exactly the sort of issue that should
> be handled (silently) by the compiler.

What does "this" mean exactly?

Specifically, optimising for comprehensions over integer bounds to get rid of the in-practice performance overhead.
 

> Making choices between while or for loops, ...

The compiler may be free to implement in a situation-applicable manner,
but it doesn't mean that it will make the best choice.

Right now, it doesn't need to make the 'best' choice. It simply needs to do better than the current default, which everyone agrees performs badly. It's hard to see how any special-casing here could make things worse.
 

> In this situation, I strongly disagree with annotating the source
> code.

Again, what do you mean here?

I don't want to have to annotate my code with @annotations to get numeric loops to perform adequately. It should 'just work'.
 
I see a lot of talk and claims and less detailed analysis. In any case,
I think I'll stay away from these discussions as I feel like I am
repeating myself.

Others have said this more clearly than me, but there are multiple groups with different interests. I (in this case) only care about the common case performing adequately. I don't care if it is achieved through a compiler hack, or a temporary fix that is later replaced by @specialise and @inline when that is stable enough, or some 3rd alternative. Those who care a lot about the compiler probably have a different take on this.
 
Best,
Ismael

Matthew
ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: 2.8+ and while versus for-comprehensions

On Wed, Feb 10, 2010 at 12:23 PM, Matthew Pocock
wrote:
> Others have said this more clearly than me, but there are multiple groups
> with different interests. I (in this case) only care about the common case
> performing adequately. I don't care if it is achieved through a compiler
> hack, or a temporary fix that is later replaced by @specialise and @inline
> when that is stable enough, or some 3rd alternative. Those who care a lot
> about the compiler probably have a different take on this.

It's not that I care about the compiler, I just understand that there
is limited manpower and time spent on hacks (this includes adding
them, removing them and fixing bugs as a result of them) is time that
is not spent on something else (that could be more useful).

Best,
Ismael

Bradley Buchsbaum
Joined: 2009-04-15,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions
Just as an aside.  In high level, dynamic languges such as python, R or matlab when someone complains about performance the answer is invariably: "well, if you want to tune for speed, write the critical code in C and link it in". No one has actually suggested that, here. No one has said "if you care about performance, use C via JNI".  But the form of the response is similar, they're saying "if you care about performance so much, use while loops". The appeal of Scala for numeric computing is precisely that one does not have to occasionally "dip in to C".  The language is high or higher level than these specialized computing platforms and is at the same time (for the most part) much faster.  So, nobody wants to "write C code" or "write a while loop" who has chosen Scala as language for numerical computing (actually, the while loops that one must write in Scala are worse than "dipping in to C"). 

The other issue that one must contend with is the sheer psychic pain a person used to writing high performance feels when he contemplates that when he or she loops though 10 million elements in a multidimensional array that such things "boxing" are occurring. The programmer will actually picture all of these elements being wrapped up in a box, one by one, and then extracted from the box.  This is painful to imagine, to think about, even if in many cases irrational.

cheers,

Brad Buchsbaum




On Wed, Feb 10, 2010 at 4:50 AM, Matthew Pocock <turingatemyhamster@googlemail.com> wrote:
My personal view is that this is exactly the sort of issue that should be handled (silently) by the compiler. Making choices between while or for loops, and generating function literals vs anonymous classes are the sort of thing that should never be part of an exposed public API, so the compiler should be free to implement them in a situation-applicable manner.

In this situation, I strongly disagree with annotating the source code. Loops are so pervasive that the compiler should be doing the best thing for me without my intervention. The default behaviour for a numeric loop should be to assume that @specialize is being applied.

Just my 2c

Matthew



ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: 2.8+ and while versus for-comprehensions

On Wed, Feb 10, 2010 at 1:14 PM, Bradley Buchsbaum
wrote:
> Just as an aside.  In high level, dynamic languges such as python, R or
> matlab when someone complains about performance the answer is invariably:
> "well, if you want to tune for speed, write the critical code in C and link
> it in". No one has actually suggested that, here. No one has said "if you
> care about performance, use C via JNI".  But the form of the response is
> similar, they're saying "if you care about performance so much, use while
> loops".

I don't know if someone is saying that, what I am saying is that use
while loops _for now_. Certainly, the idea is not to ask people to do
that forever. It's just a temporary solution.

Best,
Ismael

Luc Duponcheel
Joined: 2008-12-19,
User offline. Last seen 34 weeks 3 days ago.
Re: 2.8+ and while versus for-comprehensions
Bradley,

For some categories of problems, high level languages like Scala
or Haskell tend to perform better 'out-of-the-box' than C
(in terms of memory (heap and/or stack) usage and/or cpu usage).

A week ago, when solving a puzzle
(essentially equivalent with calculating the number of surjections from m to n)
my twin brother wrote a C program that behaved as follows

$ ./puzzle 32 4
s(32,4)=18439332018723948024
$ ./puzzle 32 5
s(32,5)=3291874524447596688 <-- 64bit overflow (max=18446744073709551615)

but my (simple) Haskell solutions had no problems

*Main> recursiveSolution 32 4
18439332018723948024
*Main> recursiveSolution 32 5
23190849175177353978000
*Main> solution 32 5
23190849175177353978000

Luc


On Wed, Feb 10, 2010 at 2:14 PM, Bradley Buchsbaum <brad.buchsbaum@gmail.com> wrote:
Just as an aside.  In high level, dynamic languges such as python, R or matlab when someone complains about performance the answer is invariably: "well, if you want to tune for speed, write the critical code in C and link it in". No one has actually suggested that, here. No one has said "if you care about performance, use C via JNI".  But the form of the response is similar, they're saying "if you care about performance so much, use while loops". The appeal of Scala for numeric computing is precisely that one does not have to occasionally "dip in to C".  The language is high or higher level than these specialized computing platforms and is at the same time (for the most part) much faster.  So, nobody wants to "write C code" or "write a while loop" who has chosen Scala as language for numerical computing (actually, the while loops that one must write in Scala are worse than "dipping in to C"). 

The other issue that one must contend with is the sheer psychic pain a person used to writing high performance feels when he contemplates that when he or she loops though 10 million elements in a multidimensional array that such things "boxing" are occurring. The programmer will actually picture all of these elements being wrapped up in a box, one by one, and then extracted from the box.  This is painful to imagine, to think about, even if in many cases irrational.

cheers,

Brad Buchsbaum




On Wed, Feb 10, 2010 at 4:50 AM, Matthew Pocock <turingatemyhamster@googlemail.com> wrote:
My personal view is that this is exactly the sort of issue that should be handled (silently) by the compiler. Making choices between while or for loops, and generating function literals vs anonymous classes are the sort of thing that should never be part of an exposed public API, so the compiler should be free to implement them in a situation-applicable manner.

In this situation, I strongly disagree with annotating the source code. Loops are so pervasive that the compiler should be doing the best thing for me without my intervention. The default behaviour for a numeric loop should be to assume that @specialize is being applied.

Just my 2c

Matthew






--
  __~O
 -\ <,
(*)/ (*)

reality goes far beyond imagination

Kevin Wright
Joined: 2009-06-09,
User offline. Last seen 49 weeks 3 days ago.
Re: 2.8+ and while versus for-comprehensions


On 10 February 2010 13:14, Bradley Buchsbaum <brad.buchsbaum@gmail.com> wrote:
Just as an aside.  In high level, dynamic languges such as python, R or matlab when someone complains about performance the answer is invariably: "well, if you want to tune for speed, write the critical code in C and link it in". No one has actually suggested that, here. No one has said "if you care about performance, use C via JNI".  But the form of the response is similar, they're saying "if you care about performance so much, use while loops". The appeal of Scala for numeric computing is precisely that one does not have to occasionally "dip in to C".  The language is high or higher level than these specialized computing platforms and is at the same time (for the most part) much faster.  So, nobody wants to "write C code" or "write a while loop" who has chosen Scala as language for numerical computing (actually, the while loops that one must write in Scala are worse than "dipping in to C"). 

The other issue that one must contend with is the sheer psychic pain a person used to writing high performance feels when he contemplates that when he or she loops though 10 million elements in a multidimensional array that such things "boxing" are occurring. The programmer will actually picture all of these elements being wrapped up in a box, one by one, and then extracted from the box.  This is painful to imagine, to think about, even if in many cases irrational.


I suspect that, more and more, the modern-day equivalent of this will be "write it in OpenCL"
 
cheers,

Brad Buchsbaum




On Wed, Feb 10, 2010 at 4:50 AM, Matthew Pocock <turingatemyhamster@googlemail.com> wrote:
My personal view is that this is exactly the sort of issue that should be handled (silently) by the compiler. Making choices between while or for loops, and generating function literals vs anonymous classes are the sort of thing that should never be part of an exposed public API, so the compiler should be free to implement them in a situation-applicable manner.

In this situation, I strongly disagree with annotating the source code. Loops are so pervasive that the compiler should be doing the best thing for me without my intervention. The default behaviour for a numeric loop should be to assume that @specialize is being applied.

Just my 2c

Matthew






--
Kevin Wright

mail/google talk: kev.lee.wright@googlemail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

Burkhard Ludwig
Joined: 2009-04-18,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions

<off-topic but instructive>
"Premature optimization is the root of all evil"This sentence is approximately half as old as time .[1]
It is usually translated into an instruction for doing as such:
"Write your code in the most beautiful/clearest/simplest form you can, then do performance measurements, identify hot spots and then fix them and keep all the beauty in the rest of your code." [2]
Not that this isn't a good advice – but in a language oriented discussion it is a bit off-topic.And it doesn't tell the whole story. My experience comes in 3 episodes
<nested detour>Once upon a time I spec'ed a rather complicated data structure which modeled some aspect of electronic circuits in a hierarchical fashion wich closely followed the hierarchical structure of the given circuits, alas with complicated cross-hierarchy interactions.
A colleague coded that (in C) and after careful testing and profiling that went to the customer. 
The next time  we spent with re-profiling with new extraordinary test cases (from users) and replaced one slow quadratic algorithm (usually caused by linked lists) and replaced it by a faster one (usually with the aid of yet another hash table). We where fast at fixing and kept the users happy.
After a year or so we got sick of it and replaced all lists by hash tables and added enough features into the next release to have an excuse for the overall (modest) slow-down for the places where the overhead didn't pay. (Remember, it was C-time back in these days) </nested detour>
What this tells us is not more that any measurements can't trivially be better then the test case used for it. What it does not tell is how tedious it was to make these measurement.
<nested detour>Once upon another time we used for something similar a software by a renowned vendor. And spent more than a year with support cycles like that
"the tool is slooow" – "send test case" – "here it its" –  "next version fixes it" – "yes it does". 
Until the next ride on the round robin. New data, new bottleneck.
After some time my company got sick of it and switched the vendor.</nested detour>
What this tells us is not more that any measurements can't trivially better then the test case used for it and it is absolutely nontrivial to get the real meaningful test data.And life is unjust – nobody cares why you don't get it.
<nested detour>All really fast code that I read used advance structuring on high level and careful coding in a low level (e.g. doing all indexing with unsigned integers instead of signed ones), and did this coding all over the place.  </nested detour>
What this does not tell us, is that measurement is good, but only occasionally it shows some hundred lines of code to be rewritten in a faster idiom.
Most neasurements show either that there are fast and slow idioms (in our scala case: for is slow, while is fast) or show that there is en entire building block to be rewritten from scratch, giving a better overall structure 
</off-topic-but-instructive>
So what do experienced programmers do: They acquire habits and idioms which shall defend them against the most offending blunders and stick to them.
My off-topic detours and the course of this thread tells me that the idioms and judgements are  almost ready and frozen as follows (in increasing harm)
- for loops are slow, use while loops - scala is slow for numerics, break out to java.- scala is slow for numerics, use python and c/c++/fortran
All this is unjust and merely a legend [3] – but nobody cares.
Maybe time has indeed come to solve the problem [4]
[1] http://en.wikipedia.org/wiki/Program_optimization[2] I believe this advice has been used in this thread already. [3] http://en.wikipedia.org/wiki/The_Man_Who_Shot_Liberty_Valance[4] Did somebody suggest already a new looping construct which makes the breakout visible      for both, compiler and reader? E.g.        for (i <<- "now only selected constructs, no more <-") "no yield here" {          "this is inlined in any case"         }

Burkhard

2010/2/10 Ismael Juma <mlists@juma.me.uk>
On Wed, Feb 10, 2010 at 1:14 PM, Bradley Buchsbaum
<brad.buchsbaum@gmail.com> wrote:
> Just as an aside.  In high level, dynamic languges such as python, R or
> matlab when someone complains about performance the answer is invariably:
> "well, if you want to tune for speed, write the critical code in C and link
> it in". No one has actually suggested that, here. No one has said "if you
> care about performance, use C via JNI".  But the form of the response is
> similar, they're saying "if you care about performance so much, use while
> loops".

I don't know if someone is saying that, what I am saying is that use
while loops _for now_. Certainly, the idea is not to ask people to do
that forever. It's just a temporary solution.

Best,
Ismael

Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions

On Feb 10, 2010, at 10:10 AM, Burkhard Ludwig wrote:

My off-topic detours and the course of this thread tells me that the idioms and judgements are  almost ready and frozen as follows (in increasing harm)
- for loops are slow, use while loops

for-comprehensions are demonstrably slow;  I also don't agree that we should be using while loops.   "for comprehension" performance should be "fixed" (i.e. optimised by the compiler for cases where this is  easily done)
- scala is slow for numerics, break out to java.

It need not be and I would not want to break out to java.     F# is able to produce high performance code in many cases, Scala should be able to as well.    Scala on the JVM  has the disadvantage of not being able to make use of value types (which could reduce the overhead substantially).    
That said, the JVM / Scala does surprisingly well in some instances.  Access to Scala Array[Double] for instance is about as fast as java though one might expect otherwise.    I think the first issue I and others have encountered is that for-comprehensions are too slow for the trivial cases.
- scala is slow for numerics, use python and c/c++/fortran

I would like to see Scala be both a great functional language *and* include optimisations that get its performance closer to the metal.   Hybrid environments "suck".
Jonathan
Lachlan Deck
Joined: 2009-10-25,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8+ and while versus for-comprehensions

Sorry to bang this drum a little more...

On 10/02/2010, at 9:02 PM, Ismael Juma wrote:

> On Wed, 2010-02-10 at 09:50 +0000, Matthew Pocock wrote:
>> My personal view is that this is exactly the sort of issue that should
>> be handled (silently) by the compiler.
>
> <...>
>> In this situation, I strongly disagree with annotating the source
>> code.
>
> Again, what do you mean here?

That it makes no sense to annotate with something that effectively means: "make this faster". i.e., when would you ever want the alternative? :) Or have I misunderstood what this annotation does?

>> Loops are so pervasive that the compiler should be doing the best
>> thing for me without my intervention. The default behaviour for a
>> numeric loop should be to assume that @specialize is being applied.
>
> Yes, it should be. Once @specialize is mature enough, the plan is to
> annotate the standard library so you get that by default.

But what's the difference between code with or without the annotation? None, apart from the annotation as far as I can tell. .. or better, if @specialize can be applied to a section of code which essentially makes it more optimized (and it therefore has the capability to do so regardless of whether there's an annotation)... it seems to me that the compiler/jvm is the most sensible place to make further optimisations going forward so you don't have to re-release / re-annotate software to benefit from future enhancements.

Essentially, I don't quite understand your reasoning. i.e., you're suggesting that the compiler wouldn't need stuff hard-coded into it to perform these optimisations. I don't quite follow your logic here. It still needs to deal with it, only optionally according to the annotation. So why not always do it?

> Annotating the
> standard library seems like a better approach than hard-coding stuff
> like that in the compiler.

I would have thought the reverse is true. Does that mean that older code can't benefit from advances to compiler algorithms unless you go back and re-annotate older code?

> In any case, it may be that @specialize is
> not good enough, we just don't know yet.
>
> I see a lot of talk and claims and less detailed analysis. In any case,
> I think I'll stay away from these discussions as I feel like I am
> repeating myself.
>
> Best,
> Ismael
>

with regards,
--

Lachlan Deck

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: 2.8+ and while versus for-comprehensions

On Thu, 2010-02-11 at 12:50 +1100, Lachlan Deck wrote:
> That it makes no sense to annotate with something that effectively
> means: "make this faster". i.e., when would you ever want the
> alternative? :) Or have I misunderstood what this annotation does?

Yes, you have misunderstood what it means. Optimization often involves a
set of trade-offs and @specialized is one of those cases as it increases
code size. As such, it should not be applied everywhere.

> Essentially, I don't quite understand your reasoning. i.e., you're
> suggesting that the compiler wouldn't need stuff hard-coded into it to
> perform these optimisations. I don't quite follow your logic here. It
> still needs to deal with it, only optionally according to the
> annotation. So why not always do it?

I think you missed out on many details of the conversation. The
optimization that people are asking for is _not_ @specialized. It's
something much more specific while @specialized is more general and
applicable to more scenarios (which may include the one that the more
specific optimization would handle).

> > Annotating the
> > standard library seems like a better approach than hard-coding stuff
> > like that in the compiler.
>
> I would have thought the reverse is true. Does that mean that older
> code can't benefit from advances to compiler algorithms unless you go
> back and re-annotate older code?

This makes no sense since you can't use a newer compiler without a newer
standard library in Scala.

Best,
Ismael

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland