This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: optimizing simple fors

13 replies
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
On Thu, Oct 14, 2010 at 6:14 AM, iulian dragos <iulian.dragos@epfl.ch> wrote:
On Sun, Sep 19, 2010 at 3:33 PM, martin odersky <martin.odersky@epfl.ch> wrote:
I have dreamed for about 5 years now that for loops should be just as fast as while loops, just using normal optimizations that are applicable everywhere.

Without aggressively grabbing and inlining bytecode,  I'm not sure how one could attain that goal.  The JVM has to pay the cost of analyzing every run, during runtime, so it is necessarily going to apply carefully targeted and somewhat limited optimizations.  But the compiler only has to pay the cost of optimizing once, so in the more-difficult-to-analyze cases, the compiler should take the lead, relying on the JVM to tidy up the details.

That's the underlying assumption of the scalac optimizer. Inlining is in fact a necessary step, but performance is gained only when boxing and closures are eliminated, which rely on having a good side-effect analysis (which we don't have yet). Maybe the future effect type system will help us there.

Why not optimize for into while now, and then when this future effect type system proves to help enough to get good side-effect analysis and renders that obsolete, remove it again?
 
 
Take a look at these benchmark times (after my signature) comparing loops that are rather cruel to the JVM--the comparison is between while loops with for loops using range ("Limit") and the specialized utility loop function "cfor" (which emulates the C for loop).  Code is attached.  The bottom line is that while loops are fast, whether they are a single loop (While1) or nested multiple times (While3, While5) or are a mishmash of a bunch of different loops (While?).  Relying upon the JVM to optimize generic specialized code is occasionally--well, just look at the timings, especially the "Limit1" and "Cfor1" times in the second block.

I ran your tests, and after a few changes to your setup (especially taking IO out of the timing loop), they look more reasonable:

Could you please share the code?  It is hard to evaluate whether the improvement is something that can be generally applied to any code, or is a specific set of tweaks you've made that happen to keep the JVM happy but which cannot be applied to a wide range of problems.
 

While1 Elapsed: 0.452 s While3 Elapsed: 0.556 sWhile5 Elapsed: 0.695 s While? Elapsed: 1.580 sLimit1 Elapsed: 1.066 s Limit3 Elapsed: 1.359 sLimit5 Elapsed: 2.406 s Limit? Elapsed: 2.108 sCfor1 Elapsed: 0.527 s Cfor2 Elapsed: 1.033 sCfor4 Elapsed: 1.533 s Cfor?? Elapsed: 10.792 s
I also added a warmup method (just run the benchmark once before measuring). Now times are monotonically increasing, as one would expect from the source code. Also, Limit1 is 2-3 times slower, while cfor has an outlier, for cfor??. So there are no 10x slowdowns, which I believe were due to IO being measured together with the benchmarks.

I am rather skeptical that a handful of small IO operations would take many seconds, and also that the same amount of I/O would generate delays ranging from 3s-7s, and that those delays would only appear when one didn't use a while loop.  I think you are observing outliers due to differing (but deterministic) levels of JVM optimization; in your test, it was only Cfor?? that got stuck with the poorly-optimized code, but in mine it expressed itself in a more confusing pattern--possibly due to IO not because IO itself is slow, but because IO involves a large number of different method calls, which may have made the JVM forget some of its optimizations.  But in production code, you can't always tell the JVM to ignore the vast majority of the codebase and focus in on a small handful of methods.

And I am skeptical that the "warmup method" is a fair tactic.  Yes, you can apply that pattern in production code if it is really, really important, but that's hardly the normal use case.  Usually, warmup methods are in microbenchmarks in order to compensate for you running a microbenchmark instead of your true long-running program.  Here, though, it seems like cheating: you know you have heavy lifting to do with many many calls to the same method, but you know that the JVM is biting you when you don't run the warmup, so you run warmup code first.  Although this is occasionally necessary, having to do it routinely to get good performance is a major burden placed on the coder who would probably rather be doing other things (like coding the performance-critical parts as while loops so they don't have to write specialized warmup code).

  --Rex

P.S. I am quite delighted by the work on the ScalaCL plugin, but unless I become a ScalaCL developer, I'd be in danger of having all of my performance-critical code break (i.e. not perform properly) when a new version of Scala came out.  (I am willing to accept that I might have to wait for a while, depending on Chafik's schedule, for code that really does need OpenCL to make it work acceptably fast.)

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 3 days ago.
Re: optimizing simple fors

> P.S. I am quite delighted by the work on the ScalaCL plugin, but unless I become a ScalaCL developer, I'd be in danger of having all of my performance-critical code break (i.e. not perform properly) when a new version of Scala came out.  (I am willing to accept that I might have to wait for a while, depending on Chafik's schedule, for code that really does need OpenCL to make it work acceptably fast.)

It would be nice if there were two plugins:

- One for the while transformations (note that it does much more than
transform ranges now)
- One for the OpenCL stuff

This would reduce the changes that a bug in the OpenCL stuff would
cause issues for people who don't use it.

Regarding the issue of possibly having to wait for changes to the
plugin before upgrading to newer releases, that is certainly not
ideal, but it seems like it's the price one will have to pay for
getting the fastest iteration performance given EPFL's current
position. If the plugin becomes mature enough, enough people use it
and talk about it, maybe EPFL will consider including it with the
distribution like the continuations plugin.

Best,
Ismael

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: optimizing simple fors

On Thu, Oct 14, 2010 at 12:15:18PM -0400, Rex Kerr wrote:
> P.S. I am quite delighted by the work on the ScalaCL plugin, but
> unless I become a ScalaCL developer, I'd be in danger of having all of
> my performance-critical code break (i.e. not perform properly) when a
> new version of Scala came out. (I am willing to accept that I might
> have to wait for a while, depending on Chafik's schedule, for code
> that really does need OpenCL to make it work acceptably fast.)

At the rate he's going you won't have to worry about it, because his
plugin will be the standard scala compiler, and scalac will be an add-on
offering but a few obscure features.

Iulian Dragos
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: optimizing simple fors

On Thu, Oct 14, 2010 at 6:15 PM, Rex Kerr <ichoran@gmail.com> wrote:
On Thu, Oct 14, 2010 at 6:14 AM, iulian dragos <iulian.dragos@epfl.ch> wrote:
On Sun, Sep 19, 2010 at 3:33 PM, martin odersky <martin.odersky@epfl.ch> wrote:
I have dreamed for about 5 years now that for loops should be just as fast as while loops, just using normal optimizations that are applicable everywhere.

Without aggressively grabbing and inlining bytecode,  I'm not sure how one could attain that goal.  The JVM has to pay the cost of analyzing every run, during runtime, so it is necessarily going to apply carefully targeted and somewhat limited optimizations.  But the compiler only has to pay the cost of optimizing once, so in the more-difficult-to-analyze cases, the compiler should take the lead, relying on the JVM to tidy up the details.

That's the underlying assumption of the scalac optimizer. Inlining is in fact a necessary step, but performance is gained only when boxing and closures are eliminated, which rely on having a good side-effect analysis (which we don't have yet). Maybe the future effect type system will help us there.

Why not optimize for into while now, and then when this future effect type system proves to help enough to get good side-effect analysis and renders that obsolete, remove it again?

I'm not against having simple for loops in the language (that's more Martin's view), but I am against special casing scala.Range, as this would tight them too much together.  
 
 
Take a look at these benchmark times (after my signature) comparing loops that are rather cruel to the JVM--the comparison is between while loops with for loops using range ("Limit") and the specialized utility loop function "cfor" (which emulates the C for loop).  Code is attached.  The bottom line is that while loops are fast, whether they are a single loop (While1) or nested multiple times (While3, While5) or are a mishmash of a bunch of different loops (While?).  Relying upon the JVM to optimize generic specialized code is occasionally--well, just look at the timings, especially the "Limit1" and "Cfor1" times in the second block.

I ran your tests, and after a few changes to your setup (especially taking IO out of the timing loop), they look more reasonable:

Could you please share the code?  It is hard to evaluate whether the improvement is something that can be generally applied to any code, or is a specific set of tweaks you've made that happen to keep the JVM happy but which cannot be applied to a wide range of problems.

The only change I did was to take out the call to print outside of the timing loop. For warm up I ran the tests once before. The test is attached.  
  I am rather skeptical that a handful of small IO operations would take many seconds, and also that the same amount of I/O would generate delays ranging from 3s-7s, and that those delays would only appear when one didn't use a while loop.  I think you are observing outliers due to differing (but deterministic) levels of JVM optimization;

It's all a question of what you are measuring: for loop performance, I/O in the OS, or both? What is the proportion of the two? In my experience IO (logging to the console) can considerably slow down the compiler, so it seems only fair that if we are talking about for loops, to measure only for-loops. How do you explain the outliers? Why don't they happen when there's no IO?  
in your test, it was only Cfor?? that got stuck with the poorly-optimized code, but in mine it expressed itself in a more confusing pattern--possibly due to IO not because IO itself is slow, but because IO involves a large number of different method calls, which may have made the JVM forget some of its optimizations.  But in production code, you can't always tell the JVM to ignore the vast majority of the codebase and focus in on a small handful of methods.

I don't understand your argument. I agree micro benchmarks are not always realistic, all I'm saying is that the outliers don't appear when you don't measure I/O. Speculations about forgetting optimizations need more substance.  
And I am skeptical that the "warmup method" is a fair tactic.  Yes, you can apply that pattern in production code if it is really, really important, but that's hardly the normal use case.  Usually, warmup methods are in microbenchmarks in order to compensate for you running a microbenchmark instead of your true long-running program.  Here, though, it seems like cheating: you know you have heavy lifting to do with many many calls to the same method, but you know that the JVM is biting you when you don't run the warmup, so you run warmup code first.  Although this is occasionally necessary, having to do it routinely to get good performance is a major burden placed on the coder who would probably rather be doing other things (like coding the performance-critical parts as while loops so they don't have to write specialized warmup code).

Wow, heavy wording!
Funny thing is, I actually forgot to run the warmup! I added the method, but I never called it. So I don't cheat (even though I disagree with you on what cheating is). I doubt it would make a large difference in this case, but it is again common practice when running benchmarks. If you are interested in how well some piece of code runs, you don't want to measure the time spent in the JIT compiler. You can argue there are two measures of performance: startup performance, that takes into account class loading, jit compilation, cache warmup, what-not. And steady-state, when you try to minimize all other factors but the code under inspection. I was aiming for the second.
Sorry for confusing you. I found the following papers very helpful when benchmarking or trying to understand JVM performance:
http://wikis.sun.com/display/HotSpotInternals/MicroBenchmarks  http://www.ibm.com/developerworks/java/library/j-jtp02225.html?ca=drs-j0805
iulian

 
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: optimizing simple fors
On Thu, Oct 14, 2010 at 5:27 PM, iulian dragos <iulian.dragos@epfl.ch> wrote:

I'm not against having simple for loops in the language (that's more Martin's view), but I am against special casing scala.Range, as this would tight them too much together.

I'm more interested in special casing raw arrays than scala.Range--but I'd be happy with anything that could be special-cased without causing mass breakage (up to the limit of what people have time/interest to implement).  This is one of the top recurring complaints remaining about Scala on these boards (generally third, I think, after "I want better IDE support" and "Collections types are scary"), so one could argue that it would be time well spent.

(Then again, perhaps the ScalaCL plugin renders this discussion obsolete.)
 
Could you please share the code?  It is hard to evaluate whether the improvement is something that can be generally applied to any code, or is a specific set of tweaks you've made that happen to keep the JVM happy but which cannot be applied to a wide range of problems.

The only change I did was to take out the call to print outside of the timing loop. For warm up I ran the tests once before. The test is attached.

Ah, now I understand what happened.  You took out the loop that made the benchmark run multiple times.  I ran the benchmark four times, but you only ran it once.  The weirdnesses show up on the second and third iterations through the benchmark--in your case too.  Just wrap a "for (i <- 0 to 3) {...}" around everything in the main method and I expect you'll see it too.
 
 
 
I am rather skeptical that a handful of small IO operations would take many seconds, and also that the same amount of I/O would generate delays ranging from 3s-7s, and that those delays would only appear when one didn't use a while loop.  I think you are observing outliers due to differing (but deterministic) levels of JVM optimization;

It's all a question of what you are measuring: for loop performance, I/O in the OS, or both? What is the proportion of the two? In my experience IO (logging to the console) can considerably slow down the compiler, so it seems only fair that if we are talking about for loops, to measure only for-loops.

Indeed, but the IO time is negligible here.  I switched to code that times the print statements, and they eat about 2 milliseconds per run of the benchmark (plus about 15x that the first time through, presumably because of JIT compilation of the IO routines).
 
How do you explain the outliers? Why don't they happen when there's no IO?

Well, that's easily solved now: they do happen when there's no IO.

This leaves the puzzle of why they do _not_ happen on the first time through the benchmark.  I am not highly familiar with the optimization strategy used by the JVM compiler, but it does apparently change its mind not-infrequently when large code blocks are re-run.  I've observed on many instances that the first pass is actually the fast one, and later passes confuse it (sometimes on the 2nd run through, sometimes on the 3rd).  Although -XX:+PrintCompilation can help identify these cases, it doesn't help to solve them.  In contrast, writing a while loop usually does solve the problem.
 

And I am skeptical that the "warmup method" is a fair tactic.  [Snip.]

Wow, heavy wording!

Sorry--I do a lot more performance tuning than I'd like to have to, and sometimes the annoyance shows through.
 
Funny thing is, I actually forgot to run the warmup! I added the method, but I never called it. So I don't cheat (even though I disagree with you on what cheating is).

Without the code, I had misunderstood what you were doing.  I agree that what you intending to do--running the whole benchmark once without timing--is not cheating.  (If you had called that warmup, you would have seen the strange behavior on the one iteration you ran thereafter.)
 
If you are interested in how well some piece of code runs, you don't want to measure the time spent in the JIT compiler. You can argue there are two measures of performance: startup performance, that takes into account class loading, jit compilation, cache warmup, what-not. And steady-state, when you try to minimize all other factors but the code under inspection. I was aiming for the second.

This is true to a first approximation.  But as my test shows, when you are mean to the JVM (with things like multicasting and re-entrant methods), the JIT compiler does not simply compile things on the first pass through and then leave you in peace thereafter.  Although my code was crafted to be something like a minimal example (okay, multiple minimal examples) of this behavior, this is relevant because real code often looks more like this--especially when using Scala, where having deeply nested foreaches and maps and whatnot are so easy to write--than like a simple and clean synthetic benchmark.

One might argue that this shows that Oracle (or Google or whoever) needs more people on the compiler team.  (I should try the JRockit or IBM JVMs.)  But it also argues that Scala is asking for the JVM to take some rather heroic steps to get good optimization, and in this case the compiler could really help out.

  --Rex

Olivier Chafik
Joined: 2010-10-04,
User offline. Last seen 1 year 38 weeks ago.
Re: optimizing simple fors
Hello,

2010/10/14 Ismael Juma <mlists@juma.me.uk>
> P.S. I am quite delighted by the work on the ScalaCL plugin,

I'm delighted that you're delighted :-) 
but unless I become a ScalaCL developer, I'd be in danger of having all of my performance-critical code break (i.e. not perform properly) when a new version of Scala came out.  (I am willing to accept that I might have to wait for a while, depending on Chafik's schedule, for code that really does need OpenCL to make it work acceptably fast.)

Paul already has commit rights on my SVN repo and I'm open to getting other contributors, so keeping up to core Scala breaking changes might not be entirely up to me ;-)
Besides, I'm currently starting a new venture in which we decided to have our 20% time for pet projects, "à-la-google", so my schedule is not gonna depend solely on how little sleep I agree to get :-)  
It would be nice if there were two plugins:

- One for the while transformations (note that it does much more than
transform ranges now) 
- One for the OpenCL stuff

I indend to keep the two families of features separated, both in terms of internal classes and in terms of workflow : the OpenCL stuff will only be activated if ScalaCL data structures are explicitely used in the code being compiled, or if the user explicitely asked for automatic conversion of regular data structures to ScalaCL ones.
I'd rather not have to split the two, as there will be much synergy between them (same matchers, same planned usage- and escape-analysis, maybe even same autovectorization with different backends if I try to introduce .par parallel collections). Promoting ScalaCL is of course a side-effect you might not care about, but hey, it's free (BSD), cross-platform and scala-ble :-)
This would reduce the changes that a bug in the OpenCL stuff would
cause issues for people who don't use it.

I'll make this unlikely :-) Also, I've always tried to be very reactive to bug reports on my various projects, so I will do my best not to take regular Scala users as hostages for too long (if ever).
Regarding the issue of possibly having to wait for changes to the
plugin before upgrading to newer releases, that is certainly not
ideal, but it seems like it's the price one will have to pay for
getting the fastest iteration performance given EPFL's current
position. If the plugin becomes mature enough, enough people use it
and talk about it, maybe EPFL will consider including it with the
distribution like the continuations plugin.

Let's first wait for it to become mature enough to help compile scala itself :-)
Btw, any help spotting bugs is welcome : http://code.google.com/p/nativelibs4java/issues/entry
Cheers--Olivier
ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 3 days ago.
Re: optimizing simple fors

On Fri, Oct 15, 2010 at 9:44 AM, Olivier Chafik
wrote:
> I indend to keep the two families of features separated, both in terms of
> internal classes and in terms of workflow : the OpenCL stuff will only be
> activated if ScalaCL data structures are explicitely used in the code being
> compiled, or if the user explicitely asked for automatic conversion of
> regular data structures to ScalaCL ones.
> I'd rather not have to split the two, as there will be much synergy between
> them (same matchers, same planned usage- and escape-analysis, maybe even
> same autovectorization with different backends if I try to introduce .par
> parallel collections).

Fair enough.

> Promoting ScalaCL is of course a side-effect you
> might not care about, but hey, it's free (BSD), cross-platform and scala-ble
> :-)

I actually think ScalaCL is very interesting and I intend to
experiment with the OpenCL stuff too in the future.

> Let's first wait for it to become mature enough to help compile scala itself
> :-)

Yes, that will be a good milestone.

> Btw, any help spotting bugs is welcome :
> http://code.google.com/p/nativelibs4java/issues/entry

After you fixed issue #33[1], it seems like the following is the only
type of error I get when compiling my project:

http://pastebin.com/21dErh6D

I have't filed an issue yet as I still need to narrow it down.
Suggestions on how to get an idea of what file is causing it are
welcome.

Best,
Ismael

[1] http://code.google.com/p/nativelibs4java/issues/detail?id=33

Olivier Chafik
Joined: 2010-10-04,
User offline. Last seen 1 year 38 weeks ago.
Re: optimizing simple fors
2010/10/15 <mlists@juma.me.uk>
After you fixed issue #33[1], it seems like the following is the only
type of error I get when compiling my project:

http://pastebin.com/21dErh6D

Ugh, I get the same one... really wish this error was better reported... maybe there's some not-so-hidden compiler option (-Ydebug... ?) that could show the tree stack trace in case of exception ? And of course, I only get it with big codebases (scala and my "real-world" project).
I suspect it has to do with the way I make functions disappear without properly removing references to their symbols, but I've had no luck transplanting symbols so far (for instance, the functions I make disappear can be the owners of some inner functions to-be-lifted ; tried using Symbol.cloneWithOwner to replace the owner of all symbols but for some reason it led me nowhere).  
I have't filed an issue yet as I still need to narrow it down.
Suggestions on how to get an idea of what file is causing it are
welcome.

The (painful) way I proceed is :- run with the plugin, pipe the output to some file keeping only the [scalacl] lines- open that file, regexp it to only get the filenames and lines that were optimized - build yourself a skip list and set it to ScalaCL's skip environment variable :     SCALACL_SKIP=File1,File2:line2,File3,...  (the .scala suffix is optional, as is the line number) - check that it compiles with the skip env var : if not, I'm in big trouble :-)- remove one file from the skip list at a time, and recompile until the bug occurs- once you've found an offending file, narrow it down to the line using per-line skip items
Of course, recompiling scalac with proper diagnostic output for that "no-symbol does not have owner" error would be cleaner... maybe someone will help ?
Cheers--Olivier
Jason Zaugg
Joined: 2009-05-18,
User offline. Last seen 38 weeks 5 days ago.
Re: optimizing simple fors

I usually set a breakpoint in the IntelliJ debugger at the error
message and then work up the stack to see the context of the error.
You can connect it to a scalac.. set JAVA_OPTS=-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005 in the
shell, and set up a remote debugger run configuration in IntelliJ from
any project with scala-compiler sources attached.

On Fri, Oct 15, 2010 at 11:21 AM, Olivier Chafik
wrote:
> Of course, recompiling scalac with proper diagnostic output for that
> "no-symbol does not have owner" error would be cleaner... maybe someone will
> help ?

Johannes Rudolph 2
Joined: 2010-02-12,
User offline. Last seen 42 years 45 weeks ago.
Re: optimizing simple fors

On Fri, Oct 15, 2010 at 11:21 AM, Olivier Chafik
wrote:
> Of course, recompiling scalac with proper diagnostic output for that
> "no-symbol does not have owner" error would be cleaner... maybe someone will
> help ?

Oh yes, that's one of my favorites as well. I got that quite often
when experimenting with compiler plugins, but never really got behind
the exact reason. It seems like the compiler quietly assumes some
invariants and if they don't hold, it runs into this error message at
some place or the other.

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: optimizing simple fors
My usual trick is to just call into the compiler directly from a unit test:
scala.tools.nsc.Main.main(args)
Saves all that tedious mucking about with debugger params.

On 15 October 2010 10:35, Jason Zaugg <jzaugg@gmail.com> wrote:
I usually set a breakpoint in the IntelliJ debugger at the error
message and then work up the stack to see the context of the error.
You can connect it to a scalac.. set JAVA_OPTS=-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005 in the
shell, and set up a remote debugger run configuration in IntelliJ from
any project with scala-compiler sources attached.

On Fri, Oct 15, 2010 at 11:21 AM, Olivier Chafik
<olivier.chafik@gmail.com> wrote:
> Of course, recompiling scalac with proper diagnostic output for that
> "no-symbol does not have owner" error would be cleaner... maybe someone will
> help ?



--
Kevin Wright

mail / gtalk / msn : kev.lee.wright@gmail.com
pulse / skype: kev.lee.wright
twitter: @thecoda

Olivier Chafik
Joined: 2010-10-04,
User offline. Last seen 1 year 38 weeks ago.
Re: optimizing simple fors
2010/10/15 <jzaugg@gmail.com>
I usually set a breakpoint in the IntelliJ debugger at the error
message and then work up the stack to see the context of the error.
You can connect it to a scalac.. set JAVA_OPTS=-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005

Excellent, thanks for the advice ! 
shell, and set up a remote debugger run configuration in IntelliJ from
any project with scala-compiler sources attached.

Netbeans appears to handle it very well too, that's cool ! (I've exhausted my IntelliJ credit quite a while ago)
2010/10/15 <kev.lee.wright@gmail.com>
My usual trick is to just call into the compiler directly from a unit test:
scala.tools.nsc.Main.main(args)

(or scalacl.Compile.main(args) ;-))
Will try calling sbt directly, to avoid the classpath hell... (I'm currently trying to compile scalala, which fails on the same pattern as scala but with a much smaller codebase and only 29 optimizations taking place)
Cheers--Olivier
Olivier Chafik
Joined: 2010-10-04,
User offline. Last seen 1 year 38 weeks ago.
Re: optimizing simple fors
2010/10/15 <mlists@juma.me.uk>
After you fixed issue #33[1], it seems like the following is the only
type of error I get when compiling my project:

http://pastebin.com/21dErh6D

Ok, I've identified a crash-case that produces the same error trace :http://code.google.com/p/nativelibs4java/issues/detail?id=34
Good news is scalala compiles with the plugin with only one file to skip :SCALACL_SKIP=Statistics sbt clean compile
(you just need to transform the sbt project definition's first line to : class Project(info: ProjectInfo) extends ProguardProject(info) with AutoCompilerPlugins {  val nativelibs4javaRepo = "NativeLibs4Java Repository" at "http://nativelibs4java.sourceforge.net/maven/"   val scalacl = compilerPlugin("com.nativelibs4java" % "scalacl-compiler-plugin" % "1.0-SNAPSHOT"))
Cheers--Olivier
Olivier Chafik
Joined: 2010-10-04,
User offline. Last seen 1 year 38 weeks ago.
Re: optimizing simple fors
2010/10/15 <mlists@juma.me.uk>
> Let's first wait for it to become mature enough to help compile scala itself
> :-)

Yes, that will be a good milestone.

The time has come : Scala's 2.8.x branch compiles fine with the ScalaCL Compiler Plugin.
I think this is a good reason to release its first 'stable' version : 0.1 (there's still a long way to go till version 1.0, which has many more  features planned).
Beware: version 0.1 is labelled as stable because there's no known bug, but I'm pretty sure there are tons of them waiting to be discovered.
The plugin's wiki page has a detailed list of supported optimizations, along with instructions on how to use it (and on how to compile Scala 2.8.x with it) : http://code.google.com/p/scalacl/wiki/ScalaCLPlugin
Many thanks to :- Paul Phillips for his precious advice + insightful SVN commits - Ismael Juma and Rex Kerr for their bug reports - Jason Zaugg for his suggestions
Looking forward to getting more feedback (in particular, please help find bugs !)-- Olivier Chafikhttp://ochafik.free.fr/blog/http://twitter.com/ochafik

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland