- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Help with performance for a Scala app
Fri, 2011-04-15, 20:40
Hello Scala masters,
I'm looking for help with an implementation that I've done for an open
source project that I've been fiddling with. The issue that I'm
facing is that the Scala implementation is *painfully* slow compared
to the Java and Clojure implementations, and it's likely due to my
weak Scala Kung Fu. The code for Scala can be found here:
https://github.com/jsvazic/GAHelloWorld/tree/master/scala
And the main project can be found here:
https://github.com/jsvazic/GAHelloWorld
Can anyone assist with a quick review and offer any suggestions? TIA
Fri, 2011-04-15, 20:57
#2
Re: Help with performance for a Scala app
You're using a List, which has O(n) index access time, to do random access lookups. (List is a singly-linked list.)
If you switch to Vector, performance should improve dramatically and you will still have immutable collections. You probably, if performance is a big issue, want to actually use Array or ArrayBuffer instead.
--Rex
On Fri, Apr 15, 2011 at 3:40 PM, John <jsvazic@gmail.com> wrote:
If you switch to Vector, performance should improve dramatically and you will still have immutable collections. You probably, if performance is a big issue, want to actually use Array or ArrayBuffer instead.
--Rex
On Fri, Apr 15, 2011 at 3:40 PM, John <jsvazic@gmail.com> wrote:
Hello Scala masters,
I'm looking for help with an implementation that I've done for an open
source project that I've been fiddling with. The issue that I'm
facing is that the Scala implementation is *painfully* slow compared
to the Java and Clojure implementations, and it's likely due to my
weak Scala Kung Fu. The code for Scala can be found here:
https://github.com/jsvazic/GAHelloWorld/tree/master/scala
And the main project can be found here:
https://github.com/jsvazic/GAHelloWorld
Can anyone assist with a quick review and offer any suggestions? TIA
Fri, 2011-04-15, 21:07
#3
Re: Help with performance for a Scala app
On 4/15/11 12:40 PM, John wrote:
> I'm looking for help with an implementation that I've done for an open
> source project that I've been fiddling with. The issue that I'm
> facing is that the Scala implementation is *painfully* slow compared
> to the Java and Clojure implementations, and it's likely due to my
> weak Scala Kung Fu.
I was pretty sure before I looked that you'd be indexing into a List,
and sure enough.
_population: List[Chromosome]
All those places you're doing population(i) are traversing the entire
list up to i. Make it a Vector or something.
I don't know that that's going to be your whole problem, but it can have
that kind of effect.
Fri, 2011-04-15, 21:17
#4
Re: Help with performance for a Scala app
If you want speed, use an Array or something that wraps an array
(ArrayBuffer in scala, ArrayList in java). Linked lists require
careful handling, common pitfalls are iterating with index, calling
size, and appending instead of prepending. All of these operations are
O(n) vs O(1) for arrays-based solutions.
On Fri, Apr 15, 2011 at 3:40 PM, John wrote:
> Hello Scala masters,
>
> I'm looking for help with an implementation that I've done for an open
> source project that I've been fiddling with. The issue that I'm
> facing is that the Scala implementation is *painfully* slow compared
> to the Java and Clojure implementations, and it's likely due to my
> weak Scala Kung Fu. The code for Scala can be found here:
>
> https://github.com/jsvazic/GAHelloWorld/tree/master/scala
>
> And the main project can be found here:
>
> https://github.com/jsvazic/GAHelloWorld
>
> Can anyone assist with a quick review and offer any suggestions? TIA
Fri, 2011-04-15, 21:27
#5
Re: Help with performance for a Scala app
Thanks Sciss (and everyone else), switching to a Vector from a List
increased speed dramatically! Thanks all.
On Apr 15, 3:48 pm, Sciss wrote:
> _population.takeRight(popSize - elitismCount)
> _population(Random.nextInt(_population.length))
> (_population(idx).fitness < parents(i).fitness)
>
> takeRight and apply have really bad performance characteristics for List. you should definitely use another collection type such as Vector.
>
> http://www.scala-lang.org/docu/files/collections-api/collections_40.html
>
> since your implementation is single-threaded, you may also try to use mutable collections instead.
>
> best, -sciss-
>
> On 15 Apr 2011, at 20:40, John wrote:
>
>
>
>
>
>
>
> > Hello Scala masters,
>
> > I'm looking for help with an implementation that I've done for an open
> > source project that I've been fiddling with. The issue that I'm
> > facing is that the Scala implementation is *painfully* slow compared
> > to the Java and Clojure implementations, and it's likely due to my
> > weak Scala Kung Fu. The code for Scala can be found here:
>
> >https://github.com/jsvazic/GAHelloWorld/tree/master/scala
>
> > And the main project can be found here:
>
> >https://github.com/jsvazic/GAHelloWorld
>
> > Can anyone assist with a quick review and offer any suggestions? TIA
Sun, 2011-04-17, 11:47
#6
Re: Re: Help with performance for a Scala app
On Fri, Apr 15, 2011 at 10:08 PM, John <jsvazic@gmail.com> wrote:
Thanks Sciss (and everyone else), switching to a Vector from a List
increased speed dramatically! Thanks all.
This kind of problem is becoming so common that we should investigate the feasibility of a performance troubleshooting tool. Or at least a performance troubleshooting FAQ, where this would be close to #1.
Cheers
-- Martin
Sun, 2011-04-17, 11:57
#7
Re: Re: Help with performance for a Scala app
For many things like this, an annotation @slow would do the trick--if you turn it on, it'd warn you that you're using slow operations (e.g. O(n) where O(1) or at least O(log n) would be expected).
I'm not sure how to keep people from mis-using their data structures, however, unless we have generic structures that reconfigure themselves at runtime to something reasonable given the usage.
--Rex
On Sun, Apr 17, 2011 at 6:40 AM, martin odersky <martin.odersky@epfl.ch> wrote:
I'm not sure how to keep people from mis-using their data structures, however, unless we have generic structures that reconfigure themselves at runtime to something reasonable given the usage.
--Rex
On Sun, Apr 17, 2011 at 6:40 AM, martin odersky <martin.odersky@epfl.ch> wrote:
On Fri, Apr 15, 2011 at 10:08 PM, John <jsvazic@gmail.com> wrote:Thanks Sciss (and everyone else), switching to a Vector from a List
increased speed dramatically! Thanks all.
This kind of problem is becoming so common that we should investigate the feasibility of a performance troubleshooting tool. Or at least a performance troubleshooting FAQ, where this would be close to #1.
Cheers
-- Martin
Sun, 2011-04-17, 12:07
#8
Re: Re: Help with performance for a Scala app
I've been thinking of performance/complexity annotations that a compiler plugin could use to generate warnings.
Just a thought...
On Apr 17, 2011 12:40 PM, "martin odersky" <martin.odersky@epfl.ch> wrote:> On Fri, Apr 15, 2011 at 10:08 PM, John <jsvazic@gmail.com> wrote:
>
>> Thanks Sciss (and everyone else), switching to a Vector from a List
>> increased speed dramatically! Thanks all.
>>
>>
> This kind of problem is becoming so common that we should investigate the
> feasibility of a performance troubleshooting tool. Or at least a performance
> troubleshooting FAQ, where this would be close to #1.
>
> Cheers
>
> -- Martin
Sun, 2011-04-17, 12:47
#9
Re: Re: Help with performance for a Scala app
Is there any good reason for *not* making Vector the default implementation of Seq?
All we then have to do is encourage people to program against the Seq interface by default, instead of prematurely specialising for List - as happens in so many examples right now.
2011/4/17 √iktor Ҡlang <viktor.klang@gmail.com>
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
All we then have to do is encourage people to program against the Seq interface by default, instead of prematurely specialising for List - as happens in so many examples right now.
2011/4/17 √iktor Ҡlang <viktor.klang@gmail.com>
I've been thinking of performance/complexity annotations that a compiler plugin could use to generate warnings.
Just a thought...
On Apr 17, 2011 12:40 PM, "martin odersky" <martin.odersky@epfl.ch> wrote:
> On Fri, Apr 15, 2011 at 10:08 PM, John <jsvazic@gmail.com> wrote:
>
>> Thanks Sciss (and everyone else), switching to a Vector from a List
>> increased speed dramatically! Thanks all.
>>
>>
> This kind of problem is becoming so common that we should investigate the
> feasibility of a performance troubleshooting tool. Or at least a performance
> troubleshooting FAQ, where this would be close to #1.
>
> Cheers
>
> -- Martin
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Sun, 2011-04-17, 13:27
#10
Re: Re: Help with performance for a Scala app
But then for algorithmic stuff List has the :: and Nil unapplys. I keep meaning to use Seq in my code but then end up relying on these in match/case blocks so have to roll everything back to List after all. If Vector or some other collection became the default Seq implementation, it would perhaps make sense to be able to easily case-match over (head,tail/empty) through Seq, or if you can already, document it clearly in the Seq docs. Otherwise, you'll just have a tendency for people to use list anyway, or even worse, construct lists from other seqs all over the place.
ps I'd love to see complexity annotations, ideally giving some sort of bounds, rather than just @slow/@fast - @O(1), @O('n'), @O(Log('n')), and so on - initially purely as user documentation, but perhaps later with the possibility of some compiler plugin e.g. to check that an implementation of an @O(1) isn't annotated with @O('n'), and perhaps some rudimentary complexity estimation by analysis of loops and case expressions.
class List[T] ... { @O('this.length') def length: Int ...
@O(1) def head: T}
Matthew
2011/4/17 Kevin Wright <kev.lee.wright@gmail.com>
--
Matthew Pocockmailto: turingatemyhamster@gmail.comgchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.ukirc.freenode.net: drdozer(0191) 2566550
ps I'd love to see complexity annotations, ideally giving some sort of bounds, rather than just @slow/@fast - @O(1), @O('n'), @O(Log('n')), and so on - initially purely as user documentation, but perhaps later with the possibility of some compiler plugin e.g. to check that an implementation of an @O(1) isn't annotated with @O('n'), and perhaps some rudimentary complexity estimation by analysis of loops and case expressions.
class List[T] ... { @O('this.length') def length: Int ...
@O(1) def head: T}
Matthew
2011/4/17 Kevin Wright <kev.lee.wright@gmail.com>
Is there any good reason for *not* making Vector the default implementation of Seq?
All we then have to do is encourage people to program against the Seq interface by default, instead of prematurely specialising for List - as happens in so many examples right now.
2011/4/17 √iktor Ҡlang <viktor.klang@gmail.com>I've been thinking of performance/complexity annotations that a compiler plugin could use to generate warnings.
Just a thought...
On Apr 17, 2011 12:40 PM, "martin odersky" <martin.odersky@epfl.ch> wrote:
> On Fri, Apr 15, 2011 at 10:08 PM, John <jsvazic@gmail.com> wrote:
>
>> Thanks Sciss (and everyone else), switching to a Vector from a List
>> increased speed dramatically! Thanks all.
>>
>>
> This kind of problem is becoming so common that we should investigate the
> feasibility of a performance troubleshooting tool. Or at least a performance
> troubleshooting FAQ, where this would be close to #1.
>
> Cheers
>
> -- Martin
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
--
Matthew Pocockmailto: turingatemyhamster@gmail.comgchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.ukirc.freenode.net: drdozer(0191) 2566550
Sun, 2011-04-17, 13:47
#11
Re: Help with performance for a Scala app
Kevin Wright skrev 2011-04-17 13:37:
> Is there any good reason for *not* making Vector the default
> implementation of Seq?
>
> All we then have to do is encourage people to program against the Seq
> interface by default, instead of prematurely specialising for List - as
> happens in so many examples right now.
Indeed this is the problem. Encouraged by "functional" code examples
(which might be excused due to its simplicity), people choose List as
the default collection type when it's almost never the best choice. Not
only is its single threaded performance bad for most operations, its
performance in multi-core systems is horrible.
/Jesper Nordenberg
Sun, 2011-04-17, 13:57
#12
Re: Re: Help with performance for a Scala app
Which is fine... *if* you're writing an algorithm that's clearly based around head/tail matching, or folds, *then* you'd make the explicit and conscious decision to use Lists; otherwise, Seq (or even Iterable/Traversable) should be favoured.
List has the unfortunate legacy of sharing a name with Java's default sequence type, and I'm absolutely seeing it being over-used a lot for this reason alone. Interestingly, singly-linked lists are the default collection type for many coming from functional programming, it's what you'll get if you use collection literals in either LISP or Haskell, so the influence to use Lists seems to be coming from multiple directions.
Personally, I think we should be moving more in the direction of Prolog, and make Scala as fully declarative as possible. Consider something like Seq.sum over a list of integers:
In an imperative style this might be written:
In a functional style:
But with the trend towards parallelisation/virtualisation/etc. even this isn't enough. The functional version (as it stands) has considerable advantages, but it's still essentially serial, you're still saying *how* you're performing the operation (e.g. start with 0, then add each value in turn).
One solution is to pull up a level of abstraction, and work directly with monoids. This ties the '0' and '+' together into an atomic entity, allowing the AST to be rewritten or converted into OpenCL calls, or any other optimisation you care for:
Which is common enough that we just add it as an operation on all Sequences
We absolutely need more of this stuff in the higher levels of the language. It's going to make parallelization much easier in the future, it's also going to remove a lot of use-cases for specifying List instead of Seq, where forcing a singly-linked list by default is actively going to harm the potential for performance in concurrency.
On 17 April 2011 13:17, Matthew Pocock <turingatemyhamster@gmail.com> wrote:
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
List has the unfortunate legacy of sharing a name with Java's default sequence type, and I'm absolutely seeing it being over-used a lot for this reason alone. Interestingly, singly-linked lists are the default collection type for many coming from functional programming, it's what you'll get if you use collection literals in either LISP or Haskell, so the influence to use Lists seems to be coming from multiple directions.
Personally, I think we should be moving more in the direction of Prolog, and make Scala as fully declarative as possible. Consider something like Seq.sum over a list of integers:
val xs: List[Int] = List(...)
In an imperative style this might be written:
var sum = 0for(x <- xs) { sum = sum + x}
In a functional style:
val sum = xs.foldLeft(0)(_ + _)
But with the trend towards parallelisation/virtualisation/etc. even this isn't enough. The functional version (as it stands) has considerable advantages, but it's still essentially serial, you're still saying *how* you're performing the operation (e.g. start with 0, then add each value in turn).
One solution is to pull up a level of abstraction, and work directly with monoids. This ties the '0' and '+' together into an atomic entity, allowing the AST to be rewritten or converted into OpenCL calls, or any other optimisation you care for:
val sum = xs.fold(IntSumMonoid)
Which is common enough that we just add it as an operation on all Sequences
val sum = xs.sum
We absolutely need more of this stuff in the higher levels of the language. It's going to make parallelization much easier in the future, it's also going to remove a lot of use-cases for specifying List instead of Seq, where forcing a singly-linked list by default is actively going to harm the potential for performance in concurrency.
On 17 April 2011 13:17, Matthew Pocock <turingatemyhamster@gmail.com> wrote:
But then for algorithmic stuff List has the :: and Nil unapplys. I keep meaning to use Seq in my code but then end up relying on these in match/case blocks so have to roll everything back to List after all. If Vector or some other collection became the default Seq implementation, it would perhaps make sense to be able to easily case-match over (head,tail/empty) through Seq, or if you can already, document it clearly in the Seq docs. Otherwise, you'll just have a tendency for people to use list anyway, or even worse, construct lists from other seqs all over the place.
ps I'd love to see complexity annotations, ideally giving some sort of bounds, rather than just @slow/@fast - @O(1), @O('n'), @O(Log('n')), and so on - initially purely as user documentation, but perhaps later with the possibility of some compiler plugin e.g. to check that an implementation of an @O(1) isn't annotated with @O('n'), and perhaps some rudimentary complexity estimation by analysis of loops and case expressions.
class List[T] ... { @O('this.length') def length: Int ...
@O(1) def head: T}
Matthew
2011/4/17 Kevin Wright <kev.lee.wright@gmail.com>Is there any good reason for *not* making Vector the default implementation of Seq?
All we then have to do is encourage people to program against the Seq interface by default, instead of prematurely specialising for List - as happens in so many examples right now.
2011/4/17 √iktor Ҡlang <viktor.klang@gmail.com>I've been thinking of performance/complexity annotations that a compiler plugin could use to generate warnings.
Just a thought...
On Apr 17, 2011 12:40 PM, "martin odersky" <martin.odersky@epfl.ch> wrote:
> On Fri, Apr 15, 2011 at 10:08 PM, John <jsvazic@gmail.com> wrote:
>
>> Thanks Sciss (and everyone else), switching to a Vector from a List
>> increased speed dramatically! Thanks all.
>>
>>
> This kind of problem is becoming so common that we should investigate the
> feasibility of a performance troubleshooting tool. Or at least a performance
> troubleshooting FAQ, where this would be close to #1.
>
> Cheers
>
> -- Martin
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
--
Matthew Pocockmailto: turingatemyhamster@gmail.comgchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.ukirc.freenode.net: drdozer(0191) 2566550
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Sun, 2011-04-17, 14:17
#13
Re: Re: Help with performance for a Scala app
On 17 April 2011 13:45, Kevin Wright <kev.lee.wright@gmail.com> wrote:
One solution is to pull up a level of abstraction, and work directly with monoids. This ties the '0' and '+' together into an atomic entity, allowing the AST to be rewritten or converted into OpenCL calls, or any other optimisation you care for:val sum = xs.fold(IntSumMonoid)
Which is common enough that we just add it as an operation on all Sequencesval sum = xs.sum
The more general case that is parallelism friendly would be something like:
val agr = xs.mapReduce[M, MM : Monoid[M](f: T => M, mm: MM): M
Sprinkle implicits to taste. We could implement average as:
T : IntM : (Double, Int) f = (x => (x.toDouble, 1))zero = (0, 0)mplus = ((sum1, n1), (sum2, n2)) => ((sum1+ sum2), (n1, n2))
Of course, you have to remember to divide _1 by _2.toDouble at the end. Having SymmetricMonoid extends Monoid would give even more leg-room for undert-the-hood optimization, and it can be safely used with unsorted collections.
Coming up with the monoid version of your favourite operation is not always this easy, and certainly making it a requirement to work with collections will put off any casual coder looking to see if scala may be useful.
Matthew
--
Matthew Pocockmailto: turingatemyhamster@gmail.comgchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.ukirc.freenode.net: drdozer(0191) 2566550
Sun, 2011-04-17, 15:47
#14
Re: Re: Help with performance for a Scala app
i don't think that's true. `List` is a good choice in many cases. It is simple, good in pattern matching, and for prepending-based build up and traversal probably faster and less space consuming than `Vector`. I think you need to encourage people to at least basically familiarise them with the different performance options, so they are confident about making the right choice.
best, -sciss-
On 17 Apr 2011, at 13:34, Jesper Nordenberg wrote:
> Kevin Wright skrev 2011-04-17 13:37:
>> Is there any good reason for *not* making Vector the default
>> implementation of Seq?
>>
>> All we then have to do is encourage people to program against the Seq
>> interface by default, instead of prematurely specialising for List - as
>> happens in so many examples right now.
>
> Indeed this is the problem. Encouraged by "functional" code examples (which might be excused due to its simplicity), people choose List as the default collection type when it's almost never the best choice. Not only is its single threaded performance bad for most operations, its performance in multi-core systems is horrible.
>
> /Jesper Nordenberg
>
Sun, 2011-04-17, 16:17
#15
Re: Re: Help with performance for a Scala app
On Sun, Apr 17, 2011 at 2:45 PM, Kevin Wright wrote:
> val sum = xs.foldLeft(0)(_ + _)
>
> But with the trend towards parallelisation/virtualisation/etc. even this
> isn't enough.
Here's an interesting talk (from IPFL '09) from Guy Steele about that:
http://www.vimeo.com/6624203
Sun, 2011-04-17, 16:37
#16
Re: Re: Help with performance for a Scala app
On 17 April 2011 15:45, Sciss <contact@sciss.de> wrote:
i don't think that's true. `List` is a good choice in many cases. It is simple, good in pattern matching, and for prepending-based build up and traversal probably faster and less space consuming than `Vector`. I think you need to encourage people to at least basically familiarise them with the different performance options, so they are confident about making the right choice.
And therein lies the problem!
As soon as you mention performance, you're talking about optimisation, and once optimisation is under discussion then the age-old quote that "premature optimisation is the root of all evil" just *has* to be mentioned.
Starting a design that uses Lists is making a decision around a concrete collection with particular performance characteristics, characteristics that may not be suited for the final solution. It's a premature optimisation and should be treated as such with all due caution.
best, -sciss-
On 17 Apr 2011, at 13:34, Jesper Nordenberg wrote:
> Kevin Wright skrev 2011-04-17 13:37:
>> Is there any good reason for *not* making Vector the default
>> implementation of Seq?
>>
>> All we then have to do is encourage people to program against the Seq
>> interface by default, instead of prematurely specialising for List - as
>> happens in so many examples right now.
>
> Indeed this is the problem. Encouraged by "functional" code examples (which might be excused due to its simplicity), people choose List as the default collection type when it's almost never the best choice. Not only is its single threaded performance bad for most operations, its performance in multi-core systems is horrible.
>
> /Jesper Nordenberg
>
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Sun, 2011-04-17, 16:57
#17
Re: Re: Help with performance for a Scala app
agreed, but often you will find yourself having a very broad method argument type, e.g. `Seq`, and then you do all sorts of stuff internally, and if you need O(1) for apply and length, you start off with
def myMethod( xs: Seq[ Int ]) {
val ixs = xs.toIndexedSeq
...
}
and if the caller also provided an `IndexedSeq` by chance, the `toIndexedSeq` operation is for free.
so i think it's more about isolation than about deferring decisions about collection types to the end of your project.
true though that `List` might be less advantageous when parallelism comes in...
best, -sciss-
On 17 Apr 2011, at 16:30, Kevin Wright wrote:
>
>
> On 17 April 2011 15:45, Sciss wrote:
> i don't think that's true. `List` is a good choice in many cases. It is simple, good in pattern matching, and for prepending-based build up and traversal probably faster and less space consuming than `Vector`. I think you need to encourage people to at least basically familiarise them with the different performance options, so they are confident about making the right choice.
>
>
> And therein lies the problem!
>
> As soon as you mention performance, you're talking about optimisation, and once optimisation is under discussion then the age-old quote that "premature optimisation is the root of all evil" just *has* to be mentioned.
>
> Starting a design that uses Lists is making a decision around a concrete collection with particular performance characteristics, characteristics that may not be suited for the final solution. It's a premature optimisation and should be treated as such with all due caution.
>
>
>
>
>
> best, -sciss-
>
>
> On 17 Apr 2011, at 13:34, Jesper Nordenberg wrote:
>
> > Kevin Wright skrev 2011-04-17 13:37:
> >> Is there any good reason for *not* making Vector the default
> >> implementation of Seq?
> >>
> >> All we then have to do is encourage people to program against the Seq
> >> interface by default, instead of prematurely specialising for List - as
> >> happens in so many examples right now.
> >
> > Indeed this is the problem. Encouraged by "functional" code examples (which might be excused due to its simplicity), people choose List as the default collection type when it's almost never the best choice. Not only is its single threaded performance bad for most operations, its performance in multi-core systems is horrible.
> >
> > /Jesper Nordenberg
> >
>
>
>
>
Sun, 2011-04-17, 17:07
#18
Re: Re: Help with performance for a Scala app
i just had another thought. if you're library or framework really crucially relies on the speed of some user provided collection, and furthermore, if that speed might depend on runtime parameters, for instance number of processors when you use parallel execution, couldn't one think of some type of collection specification, like (very quickly sketched, so don't take this super serious):
object CollectionSpec {
def apply : CollectionSpec = ....
}
trait CollectionSpec {
def withFastAppend: CollectionSpec
def withFastPrepend : CollectionSpec
def withFastApply : CollectionSpec
...
def build[ V ] : SeqBuilder[ V ]
}
and then
object MyLibrary {
def builder: SeqBuilder[ Int ] = CollectionSpec().withFastApply.build[ Int ]
}
class MyLibrary {
def perform( xs: Seq[ Int ]) { ... }
}
and
object UserCode {
val b = MyLibrary.builder
val myColl = b( 1, 2, 3, 4 )
myLib.perform( myColl )
}
something along these lines. CollectionSpec could then use internal knowledge as to which collections perform best under the given constraints and runtime environment, and provide an optimized collection builder
best, -sciss-
On 17 Apr 2011, at 16:48, Sciss wrote:
> agreed, but often you will find yourself having a very broad method argument type, e.g. `Seq`, and then you do all sorts of stuff internally, and if you need O(1) for apply and length, you start off with
>
> def myMethod( xs: Seq[ Int ]) {
> val ixs = xs.toIndexedSeq
> ...
> }
>
> and if the caller also provided an `IndexedSeq` by chance, the `toIndexedSeq` operation is for free.
>
> so i think it's more about isolation than about deferring decisions about collection types to the end of your project.
>
> true though that `List` might be less advantageous when parallelism comes in...
>
> best, -sciss-
>
>
>
> On 17 Apr 2011, at 16:30, Kevin Wright wrote:
>
>>
>>
>> On 17 April 2011 15:45, Sciss wrote:
>> i don't think that's true. `List` is a good choice in many cases. It is simple, good in pattern matching, and for prepending-based build up and traversal probably faster and less space consuming than `Vector`. I think you need to encourage people to at least basically familiarise them with the different performance options, so they are confident about making the right choice.
>>
>>
>> And therein lies the problem!
>>
>> As soon as you mention performance, you're talking about optimisation, and once optimisation is under discussion then the age-old quote that "premature optimisation is the root of all evil" just *has* to be mentioned.
>>
>> Starting a design that uses Lists is making a decision around a concrete collection with particular performance characteristics, characteristics that may not be suited for the final solution. It's a premature optimisation and should be treated as such with all due caution.
>>
>>
>>
>>
>>
>> best, -sciss-
>>
>>
>> On 17 Apr 2011, at 13:34, Jesper Nordenberg wrote:
>>
>>> Kevin Wright skrev 2011-04-17 13:37:
>>>> Is there any good reason for *not* making Vector the default
>>>> implementation of Seq?
>>>>
>>>> All we then have to do is encourage people to program against the Seq
>>>> interface by default, instead of prematurely specialising for List - as
>>>> happens in so many examples right now.
>>>
>>> Indeed this is the problem. Encouraged by "functional" code examples (which might be excused due to its simplicity), people choose List as the default collection type when it's almost never the best choice. Not only is its single threaded performance bad for most operations, its performance in multi-core systems is horrible.
>>>
>>> /Jesper Nordenberg
>>>
>>
>>
>>
>>
Sun, 2011-04-17, 17:07
#19
Re: Re: Help with performance for a Scala app
...although this may fall into the trap of regressing to less performative types when using operations such as `map` which rely on an implicit `CanBuildFrom` and not on the underlying type:
http://stackoverflow.com/questions/5678682/should-scalas-map-behave-diff...
On 17 Apr 2011, at 16:59, Sciss wrote:
> i just had another thought. if you're library or framework really crucially relies on the speed of some user provided collection, and furthermore, if that speed might depend on runtime parameters, for instance number of processors when you use parallel execution, couldn't one think of some type of collection specification, like (very quickly sketched, so don't take this super serious):
>
> object CollectionSpec {
> def apply : CollectionSpec = ....
> }
> trait CollectionSpec {
> def withFastAppend: CollectionSpec
> def withFastPrepend : CollectionSpec
> def withFastApply : CollectionSpec
> ...
> def build[ V ] : SeqBuilder[ V ]
> }
>
> and then
>
> object MyLibrary {
> def builder: SeqBuilder[ Int ] = CollectionSpec().withFastApply.build[ Int ]
> }
> class MyLibrary {
> def perform( xs: Seq[ Int ]) { ... }
> }
>
> and
>
> object UserCode {
> val b = MyLibrary.builder
> val myColl = b( 1, 2, 3, 4 )
> myLib.perform( myColl )
> }
>
> something along these lines. CollectionSpec could then use internal knowledge as to which collections perform best under the given constraints and runtime environment, and provide an optimized collection builder
>
> best, -sciss-
>
>
>
>
> On 17 Apr 2011, at 16:48, Sciss wrote:
>
>> agreed, but often you will find yourself having a very broad method argument type, e.g. `Seq`, and then you do all sorts of stuff internally, and if you need O(1) for apply and length, you start off with
>>
>> def myMethod( xs: Seq[ Int ]) {
>> val ixs = xs.toIndexedSeq
>> ...
>> }
>>
>> and if the caller also provided an `IndexedSeq` by chance, the `toIndexedSeq` operation is for free.
>>
>> so i think it's more about isolation than about deferring decisions about collection types to the end of your project.
>>
>> true though that `List` might be less advantageous when parallelism comes in...
>>
>> best, -sciss-
>>
>>
>>
>> On 17 Apr 2011, at 16:30, Kevin Wright wrote:
>>
>>>
>>>
>>> On 17 April 2011 15:45, Sciss wrote:
>>> i don't think that's true. `List` is a good choice in many cases. It is simple, good in pattern matching, and for prepending-based build up and traversal probably faster and less space consuming than `Vector`. I think you need to encourage people to at least basically familiarise them with the different performance options, so they are confident about making the right choice.
>>>
>>>
>>> And therein lies the problem!
>>>
>>> As soon as you mention performance, you're talking about optimisation, and once optimisation is under discussion then the age-old quote that "premature optimisation is the root of all evil" just *has* to be mentioned.
>>>
>>> Starting a design that uses Lists is making a decision around a concrete collection with particular performance characteristics, characteristics that may not be suited for the final solution. It's a premature optimisation and should be treated as such with all due caution.
>>>
>>>
>>>
>>>
>>>
>>> best, -sciss-
>>>
>>>
>>> On 17 Apr 2011, at 13:34, Jesper Nordenberg wrote:
>>>
>>>> Kevin Wright skrev 2011-04-17 13:37:
>>>>> Is there any good reason for *not* making Vector the default
>>>>> implementation of Seq?
>>>>>
>>>>> All we then have to do is encourage people to program against the Seq
>>>>> interface by default, instead of prematurely specialising for List - as
>>>>> happens in so many examples right now.
>>>>
>>>> Indeed this is the problem. Encouraged by "functional" code examples (which might be excused due to its simplicity), people choose List as the default collection type when it's almost never the best choice. Not only is its single threaded performance bad for most operations, its performance in multi-core systems is horrible.
>>>>
>>>> /Jesper Nordenberg
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> Kevin Wright
>>>
>>> gtalk / msn : kev.lee.wright@gmail.com
>>> mail: kevin.wright@scalatechnology.com
>>> vibe / skype: kev.lee.wright
>>> quora: http://www.quora.com/Kevin-Wright
>>> twitter: @thecoda
>>>
>>> "My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
>>>
>>
>
Sun, 2011-04-17, 17:17
#20
Re: Re: Help with performance for a Scala app
On 17 April 2011 16:48, Sciss <contact@sciss.de> wrote:
agreed, but often you will find yourself having a very broad method argument type, e.g. `Seq`, and then you do all sorts of stuff internally, and if you need O(1) for apply and length, you start off with
def myMethod( xs: Seq[ Int ]) {
val ixs = xs.toIndexedSeq
...
}
and if the caller also provided an `IndexedSeq` by chance, the `toIndexedSeq` operation is for free.
so i think it's more about isolation than about deferring decisions about collection types to the end of your project.
true though that `List` might be less advantageous when parallelism comes in...
best, -sciss-
IndexedSeq specifies some desired characteristic of the supplied collection, but not the exact concrete implementation. As such, it's a good abstraction and is a far better choice than List in almost any conceivable scenario. Both List and Vector would be valid matching implementations here, but by forcing List you've ruled out the possibility of using a Vector - which also has the particular characteristics that you need.
I also wouldn't have any problems making IndexedSeq a requirement via the method signature in this particular example. It's trivial enough for callers to invoke Seq#toIndexedSeq if necessary, and it also gives an important hint that callers can choose to optimise by arranging in advance for the supplied collection to be an IndexedSeq.
On 17 Apr 2011, at 16:30, Kevin Wright wrote:
>
>
> On 17 April 2011 15:45, Sciss <contact@sciss.de> wrote:
> i don't think that's true. `List` is a good choice in many cases. It is simple, good in pattern matching, and for prepending-based build up and traversal probably faster and less space consuming than `Vector`. I think you need to encourage people to at least basically familiarise them with the different performance options, so they are confident about making the right choice.
>
>
> And therein lies the problem!
>
> As soon as you mention performance, you're talking about optimisation, and once optimisation is under discussion then the age-old quote that "premature optimisation is the root of all evil" just *has* to be mentioned.
>
> Starting a design that uses Lists is making a decision around a concrete collection with particular performance characteristics, characteristics that may not be suited for the final solution. It's a premature optimisation and should be treated as such with all due caution.
>
>
>
>
>
> best, -sciss-
>
>
> On 17 Apr 2011, at 13:34, Jesper Nordenberg wrote:
>
> > Kevin Wright skrev 2011-04-17 13:37:
> >> Is there any good reason for *not* making Vector the default
> >> implementation of Seq?
> >>
> >> All we then have to do is encourage people to program against the Seq
> >> interface by default, instead of prematurely specialising for List - as
> >> happens in so many examples right now.
> >
> > Indeed this is the problem. Encouraged by "functional" code examples (which might be excused due to its simplicity), people choose List as the default collection type when it's almost never the best choice. Not only is its single threaded performance bad for most operations, its performance in multi-core systems is horrible.
> >
> > /Jesper Nordenberg
> >
>
>
>
>
> --
> Kevin Wright
>
> gtalk / msn : kev.lee.wright@gmail.com
> mail: kevin.wright@scalatechnology.com
> vibe / skype: kev.lee.wright
> quora: http://www.quora.com/Kevin-Wright
> twitter: @thecoda
>
> "My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
>
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Sun, 2011-04-17, 19:47
#21
Re: Re: Help with performance for a Scala app
2011/4/17 Sciss <contact@sciss.de>
...although this may fall into the trap of regressing to less performative types when using operations such as `map` which rely on an implicit `CanBuildFrom` and not on the underlying type:
http://stackoverflow.com/questions/5678682/should-scalas-map-behave-differently-when-mapping-to-the-same-type
Bugs aside, you can typically rely on CanBuildFrom to do the right thing based on the concrete type of collection instances (as opposed to the declared type of references):
Example 1:
//xs is declared a Seq, and contains a Vector scala> val xs: Seq[Int] = Vector(1,2,3,4,5,6,7,8,9)xs: Seq[Int] = Vector(1, 2, 3, 4, 5, 6, 7, 8, 9)
scala> xs map (_ + 2)res0: Seq[Int] = Vector(3, 4, 5, 6, 7, 8, 9, 10, 11)
//the result of the mapping is also a Vector
Example 2:
//xs is inferred as a Seq, with the (default) concrete type 'List' scala> val xs = Seq(1,2,3,4,5,5,6,7,8,9)xs: Seq[Int] = List(1, 2, 3, 4, 5, 5, 6, 7, 8, 9)
scala> xs map (_ + 2)res1: Seq[Int] = List(3, 4, 5, 6, 7, 7, 8, 9, 10, 11) //the result of the mapping is also a List
Of course, some operations will result in a "widening" of the collection type (after all, nobody could reasonably expect to see a BitSet of Strings), but that's a different consideration.
The only way you'd "regress to a less performant type" is by mapping to some element type for which the "more performant" collection is invalid.
On 17 Apr 2011, at 16:59, Sciss wrote:
> i just had another thought. if you're library or framework really crucially relies on the speed of some user provided collection, and furthermore, if that speed might depend on runtime parameters, for instance number of processors when you use parallel execution, couldn't one think of some type of collection specification, like (very quickly sketched, so don't take this super serious):
>
> object CollectionSpec {
> def apply : CollectionSpec = ....
> }
> trait CollectionSpec {
> def withFastAppend: CollectionSpec
> def withFastPrepend : CollectionSpec
> def withFastApply : CollectionSpec
> ...
> def build[ V ] : SeqBuilder[ V ]
> }
>
> and then
>
> object MyLibrary {
> def builder: SeqBuilder[ Int ] = CollectionSpec().withFastApply.build[ Int ]
> }
> class MyLibrary {
> def perform( xs: Seq[ Int ]) { ... }
> }
>
> and
>
> object UserCode {
> val b = MyLibrary.builder
> val myColl = b( 1, 2, 3, 4 )
> myLib.perform( myColl )
> }
>
> something along these lines. CollectionSpec could then use internal knowledge as to which collections perform best under the given constraints and runtime environment, and provide an optimized collection builder
>
> best, -sciss-
>
>
>
>
> On 17 Apr 2011, at 16:48, Sciss wrote:
>
>> agreed, but often you will find yourself having a very broad method argument type, e.g. `Seq`, and then you do all sorts of stuff internally, and if you need O(1) for apply and length, you start off with
>>
>> def myMethod( xs: Seq[ Int ]) {
>> val ixs = xs.toIndexedSeq
>> ...
>> }
>>
>> and if the caller also provided an `IndexedSeq` by chance, the `toIndexedSeq` operation is for free.
>>
>> so i think it's more about isolation than about deferring decisions about collection types to the end of your project.
>>
>> true though that `List` might be less advantageous when parallelism comes in...
>>
>> best, -sciss-
>>
>>
>>
>> On 17 Apr 2011, at 16:30, Kevin Wright wrote:
>>
>>>
>>>
>>> On 17 April 2011 15:45, Sciss <contact@sciss.de> wrote:
>>> i don't think that's true. `List` is a good choice in many cases. It is simple, good in pattern matching, and for prepending-based build up and traversal probably faster and less space consuming than `Vector`. I think you need to encourage people to at least basically familiarise them with the different performance options, so they are confident about making the right choice.
>>>
>>>
>>> And therein lies the problem!
>>>
>>> As soon as you mention performance, you're talking about optimisation, and once optimisation is under discussion then the age-old quote that "premature optimisation is the root of all evil" just *has* to be mentioned.
>>>
>>> Starting a design that uses Lists is making a decision around a concrete collection with particular performance characteristics, characteristics that may not be suited for the final solution. It's a premature optimisation and should be treated as such with all due caution.
>>>
>>>
>>>
>>>
>>>
>>> best, -sciss-
>>>
>>>
>>> On 17 Apr 2011, at 13:34, Jesper Nordenberg wrote:
>>>
>>>> Kevin Wright skrev 2011-04-17 13:37:
>>>>> Is there any good reason for *not* making Vector the default
>>>>> implementation of Seq?
>>>>>
>>>>> All we then have to do is encourage people to program against the Seq
>>>>> interface by default, instead of prematurely specialising for List - as
>>>>> happens in so many examples right now.
>>>>
>>>> Indeed this is the problem. Encouraged by "functional" code examples (which might be excused due to its simplicity), people choose List as the default collection type when it's almost never the best choice. Not only is its single threaded performance bad for most operations, its performance in multi-core systems is horrible.
>>>>
>>>> /Jesper Nordenberg
>>>>
>>>
>>>
>>>
>>>
>>> --
>>> Kevin Wright
>>>
>>> gtalk / msn : kev.lee.wright@gmail.com
>>> mail: kevin.wright@scalatechnology.com
>>> vibe / skype: kev.lee.wright
>>> quora: http://www.quora.com/Kevin-Wright
>>> twitter: @thecoda
>>>
>>> "My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
>>>
>>
>
--
Kevin Wright
gtalk / msn : kev.lee.wright@gmail.comkev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda
"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra
Sun, 2011-04-17, 21:47
#22
Re: Help with performance for a Scala app
Sciss skrev 2011-04-17 16:45:
> i don't think that's true. `List` is a good choice in many cases. It is simple,
Not performance related.
> good in pattern matching,
Not performance related.
> and for prepending-based build up
Yes, this is the one of the few operations for which List is fast.
> and traversal probably faster and less space consuming than `Vector`.
I seriously doubt List traversal is faster than Vector traversal due to
better data locality in Vector. I also think Vector is more space
efficient for large collections due to additional references in List.
But feel free to prove me wrong.
/Jesper Nordenberg
Sun, 2011-04-17, 22:37
#23
Re: Re: Help with performance for a Scala app
ok, i ran three iterations from coldstart scala REPL
n = 500000
List : prepending n random integers to an empty list:
778 ms
742 ms
884 ms
Vector : prepending n random integers to an empty vector:
657 ms
468 ms
452 ms
List : performing a foreach on a list of size n, foreach body is empty:
43 ms
11 ms
9 ms
Vector : performing a foreach on a vector of size n, foreach body is empty:
61 ms
18 ms
14 ms
So they have pretty much the same speed performance (List a bit slower in prepend, a bit faster in traversal)
I don't know how to measure the space consumption, but a simple test would be to iteratively prepend elements until an out of memory error occurs.
I did that once for each of them (again running from cold start):
List prepended 4140448 elements before hitting out-of-memory
Vector prepended 7010080 elements before hitting out-of-memory
So Vector seems to be indeed more space efficient.
best, -sciss-
( disclaimer: Yes I know benchmarks are a science and micro-benchmarks in particular are problematic )
On 17 Apr 2011, at 21:40, Jesper Nordenberg wrote:
> Sciss skrev 2011-04-17 16:45:
>> i don't think that's true. `List` is a good choice in many cases. It is simple,
>
> Not performance related.
>
>> good in pattern matching,
>
> Not performance related.
>
> > and for prepending-based build up
>
> Yes, this is the one of the few operations for which List is fast.
>
>> and traversal probably faster and less space consuming than `Vector`.
>
> I seriously doubt List traversal is faster than Vector traversal due to better data locality in Vector. I also think Vector is more space efficient for large collections due to additional references in List. But feel free to prove me wrong.
>
> /Jesper Nordenberg
>
Tue, 2011-04-19, 21:57
#24
Re: Re: Help with performance for a Scala app
You should compare a while loop for lists and vectors. Claiming that
lists are "a bit faster in traversal" is premature.
On Sun, Apr 17, 2011 at 5:31 PM, Sciss wrote:
> ok, i ran three iterations from coldstart scala REPL
>
> n = 500000
>
> List : prepending n random integers to an empty list:
>
> 778 ms
> 742 ms
> 884 ms
>
> Vector : prepending n random integers to an empty vector:
>
> 657 ms
> 468 ms
> 452 ms
>
> List : performing a foreach on a list of size n, foreach body is empty:
>
> 43 ms
> 11 ms
> 9 ms
>
> Vector : performing a foreach on a vector of size n, foreach body is empty:
>
> 61 ms
> 18 ms
> 14 ms
>
> So they have pretty much the same speed performance (List a bit slower in prepend, a bit faster in traversal)
>
> I don't know how to measure the space consumption, but a simple test would be to iteratively prepend elements until an out of memory error occurs.
> I did that once for each of them (again running from cold start):
>
> List prepended 4140448 elements before hitting out-of-memory
> Vector prepended 7010080 elements before hitting out-of-memory
>
> So Vector seems to be indeed more space efficient.
>
> best, -sciss-
Tue, 2011-04-19, 22:17
#25
Re: Re: Help with performance for a Scala app
i don't claim it's faster, i say it's practically same speed
i don't use list in a while loop
best, -sciss-
On 19 Apr 2011, at 21:47, Lex wrote:
> You should compare a while loop for lists and vectors. Claiming that
> lists are "a bit faster in traversal" is premature.
>
>
> On Sun, Apr 17, 2011 at 5:31 PM, Sciss wrote:
>> ok, i ran three iterations from coldstart scala REPL
>>
>> n = 500000
>>
>> List : prepending n random integers to an empty list:
>>
>> 778 ms
>> 742 ms
>> 884 ms
>>
>> Vector : prepending n random integers to an empty vector:
>>
>> 657 ms
>> 468 ms
>> 452 ms
>>
>> List : performing a foreach on a list of size n, foreach body is empty:
>>
>> 43 ms
>> 11 ms
>> 9 ms
>>
>> Vector : performing a foreach on a vector of size n, foreach body is empty:
>>
>> 61 ms
>> 18 ms
>> 14 ms
>>
>> So they have pretty much the same speed performance (List a bit slower in prepend, a bit faster in traversal)
>>
>> I don't know how to measure the space consumption, but a simple test would be to iteratively prepend elements until an out of memory error occurs.
>> I did that once for each of them (again running from cold start):
>>
>> List prepended 4140448 elements before hitting out-of-memory
>> Vector prepended 7010080 elements before hitting out-of-memory
>>
>> So Vector seems to be indeed more space efficient.
>>
>> best, -sciss-
Tue, 2011-04-19, 22:37
#26
Re: Re: Help with performance for a Scala app
each collection comes with a foreach method, and that should be used for
traversing it. a while loop can never be faster. it can be as fast as
foreach, or slower. it doesn't make sense to use it.
the results are what i expected. vector has a get-overhead because of
it's tree structure, and list needs more memory because it wraps each
element in a ::
Am 19.04.2011 22:47, schrieb Lex:
> You should compare a while loop for lists and vectors. Claiming that
> lists are "a bit faster in traversal" is premature.
>
>
> On Sun, Apr 17, 2011 at 5:31 PM, Sciss wrote:
>> ok, i ran three iterations from coldstart scala REPL
>>
>> n = 500000
>>
>> List : prepending n random integers to an empty list:
>>
>> 778 ms
>> 742 ms
>> 884 ms
>>
>> Vector : prepending n random integers to an empty vector:
>>
>> 657 ms
>> 468 ms
>> 452 ms
>>
>> List : performing a foreach on a list of size n, foreach body is empty:
>>
>> 43 ms
>> 11 ms
>> 9 ms
>>
>> Vector : performing a foreach on a vector of size n, foreach body is empty:
>>
>> 61 ms
>> 18 ms
>> 14 ms
>>
>> So they have pretty much the same speed performance (List a bit slower in prepend, a bit faster in traversal)
>>
>> I don't know how to measure the space consumption, but a simple test would be to iteratively prepend elements until an out of memory error occurs.
>> I did that once for each of them (again running from cold start):
>>
>> List prepended 4140448 elements before hitting out-of-memory
>> Vector prepended 7010080 elements before hitting out-of-memory
>>
>> So Vector seems to be indeed more space efficient.
>>
>> best, -sciss-
Wed, 2011-04-20, 06:17
#27
Re: Re: Help with performance for a Scala app
On 4/19/11 2:28 PM, HamsterofDeath wrote:
> each collection comes with a foreach method, and that should be used for
> traversing it. a while loop can never be faster. it can be as fast as
> foreach, or slower. it doesn't make sense to use it.
It must take years of dogged effort to assemble misinformation of this
caliber.
Wed, 2011-04-20, 06:57
#28
Re: Re: Help with performance for a Scala app
On 20/04/11 15:15, Paul Phillips wrote:
> On 4/19/11 2:28 PM, HamsterofDeath wrote:
>> each collection comes with a foreach method, and that should be used for
>> traversing it. a while loop can never be faster. it can be as fast as
>> foreach, or slower. it doesn't make sense to use it.
> It must take years of dogged effort to assemble misinformation of this
> caliber.
each effort is dogged, and that should be used for assembling it. an
undogged effort can never be misinformed. it can be as informed as
dogged, or doggier. it doesn't make sense to use it.
Wed, 2011-04-20, 07:07
#29
Re: Re: Help with performance for a Scala app
i meant:
val list = List(1,2,3,4,5.....100000)
var index = 0;
while (index < list.size) {
doIt(list(index))
index+=1
}
is really, really bad. foreach knows how to traverse a collection
efficiently. a while loop does not. it might or might not work. of
course you can look into a collections' foreach-method. if a while loop
is there, you can use a while loop directly to traverse the collection,
avoiding function call overhead. but it's not the while loop that is
faster, it's the non existent overhead that does.
Am 20.04.2011 07:15, schrieb Paul Phillips:
> On 4/19/11 2:28 PM, HamsterofDeath wrote:
>> each collection comes with a foreach method, and that should be used for
>> traversing it. a while loop can never be faster. it can be as fast as
>> foreach, or slower. it doesn't make sense to use it.
> It must take years of dogged effort to assemble misinformation of this
> caliber.
>
Wed, 2011-04-20, 07:07
#30
Re: Re: Help with performance for a Scala app
furthermore, the compiler/vm could at some point become able to remove
that overhead completely.
Am 20.04.2011 07:54, schrieb HamsterofDeath:
> i meant:
>
> val list = List(1,2,3,4,5.....100000)
> var index = 0;
> while (index < list.size) {
> doIt(list(index))
> index+=1
> }
>
> is really, really bad. foreach knows how to traverse a collection
> efficiently. a while loop does not. it might or might not work. of
> course you can look into a collections' foreach-method. if a while loop
> is there, you can use a while loop directly to traverse the collection,
> avoiding function call overhead. but it's not the while loop that is
> faster, it's the non existent overhead that does.
>
> Am 20.04.2011 07:15, schrieb Paul Phillips:
>> On 4/19/11 2:28 PM, HamsterofDeath wrote:
>>> each collection comes with a foreach method, and that should be used for
>>> traversing it. a while loop can never be faster. it can be as fast as
>>> foreach, or slower. it doesn't make sense to use it.
>> It must take years of dogged effort to assemble misinformation of this
>> caliber.
>>
>
Wed, 2011-04-20, 07:17
#31
Re: Re: Help with performance for a Scala app
It is really, really bad [for performance], but that's because you
called List.apply(Int).
val list = List(1,2,3,4,5.....100000)
var x = list
while(!x.isEmpty) {
doIt(x.head)
x = x.tail
}
On 20/04/11 15:54, HamsterofDeath wrote:
> i meant:
>
> val list = List(1,2,3,4,5.....100000)
> var index = 0;
> while (index < list.size) {
> doIt(list(index))
> index+=1
> }
>
> is really, really bad. foreach knows how to traverse a collection
> efficiently. a while loop does not. it might or might not work. of
> course you can look into a collections' foreach-method. if a while loop
> is there, you can use a while loop directly to traverse the collection,
> avoiding function call overhead. but it's not the while loop that is
> faster, it's the non existent overhead that does.
>
> Am 20.04.2011 07:15, schrieb Paul Phillips:
>> On 4/19/11 2:28 PM, HamsterofDeath wrote:
>>> each collection comes with a foreach method, and that should be used for
>>> traversing it. a while loop can never be faster. it can be as fast as
>>> foreach, or slower. it doesn't make sense to use it.
>> It must take years of dogged effort to assemble misinformation of this
>> caliber.
>>
Wed, 2011-04-20, 07:27
#32
Re: Re: Help with performance for a Scala app
What overhead?
On 20/04/11 15:59, HamsterofDeath wrote:
> furthermore, the compiler/vm could at some point become able to remove
> that overhead completely.
>
> Am 20.04.2011 07:54, schrieb HamsterofDeath:
>> i meant:
>>
>> val list = List(1,2,3,4,5.....100000)
>> var index = 0;
>> while (index < list.size) {
>> doIt(list(index))
>> index+=1
>> }
>>
>> is really, really bad. foreach knows how to traverse a collection
>> efficiently. a while loop does not. it might or might not work. of
>> course you can look into a collections' foreach-method. if a while loop
>> is there, you can use a while loop directly to traverse the collection,
>> avoiding function call overhead. but it's not the while loop that is
>> faster, it's the non existent overhead that does.
>>
>> Am 20.04.2011 07:15, schrieb Paul Phillips:
>>> On 4/19/11 2:28 PM, HamsterofDeath wrote:
>>>> each collection comes with a foreach method, and that should be used for
>>>> traversing it. a while loop can never be faster. it can be as fast as
>>>> foreach, or slower. it doesn't make sense to use it.
>>> It must take years of dogged effort to assemble misinformation of this
>>> caliber.
>>>
Wed, 2011-04-20, 07:57
#33
Re: Re: Help with performance for a Scala app
I do a lot of aircraft trajectory prediction, and I tend to use the following pattern in several places:
val dt = Config.PredictionTimeStep
var time = prevmult(this.time, dt)
val endTime = time + maxPredTime
var points = Vector[Position]()
while (time < endTime) {
time += dt
// compute predicted position
points :+= position
}
The time step (dt) is essentially a global parameter (6 seconds), and the prevmult function keeps all predictions synchronous (although they could have different starting and ending times). Is there a better pattern for this? Perhaps one that avoids the vars? Thanks.
--Russ P.
--
http://RussP.us
val dt = Config.PredictionTimeStep
var time = prevmult(this.time, dt)
val endTime = time + maxPredTime
var points = Vector[Position]()
while (time < endTime) {
time += dt
// compute predicted position
points :+= position
}
The time step (dt) is essentially a global parameter (6 seconds), and the prevmult function keeps all predictions synchronous (although they could have different starting and ending times). Is there a better pattern for this? Perhaps one that avoids the vars? Thanks.
--Russ P.
--
http://RussP.us
Wed, 2011-04-20, 08:07
#34
Re: Re: Help with performance for a Scala app
foreach uses a function to "do the stuff". calling apply on that function is the overhead i mean.
-------- Original-Nachricht --------
> Datum: Wed, 20 Apr 2011 16:01:17 +1000
> Von: Tony Morris
> An: scala-user@googlegroups.com
> Betreff: Re: [scala-user] Re: Help with performance for a Scala app
> What overhead?
>
> On 20/04/11 15:59, HamsterofDeath wrote:
> > furthermore, the compiler/vm could at some point become able to remove
> > that overhead completely.
> >
> > Am 20.04.2011 07:54, schrieb HamsterofDeath:
> >> i meant:
> >>
> >> val list = List(1,2,3,4,5.....100000)
> >> var index = 0;
> >> while (index < list.size) {
> >> doIt(list(index))
> >> index+=1
> >> }
> >>
> >> is really, really bad. foreach knows how to traverse a collection
> >> efficiently. a while loop does not. it might or might not work. of
> >> course you can look into a collections' foreach-method. if a while loop
> >> is there, you can use a while loop directly to traverse the collection,
> >> avoiding function call overhead. but it's not the while loop that is
> >> faster, it's the non existent overhead that does.
> >>
> >> Am 20.04.2011 07:15, schrieb Paul Phillips:
> >>> On 4/19/11 2:28 PM, HamsterofDeath wrote:
> >>>> each collection comes with a foreach method, and that should be used
> for
> >>>> traversing it. a while loop can never be faster. it can be as fast as
> >>>> foreach, or slower. it doesn't make sense to use it.
> >>> It must take years of dogged effort to assemble misinformation of this
> >>> caliber.
> >>>
>
>
Wed, 2011-04-20, 08:17
#35
Re: Re: Help with performance for a Scala app
and here, the fiddling starts. i don't want to care about *how* it's done. i just want to write code that says *what* has to be done.
-------- Original-Nachricht --------
> Datum: Wed, 20 Apr 2011 15:56:40 +1000
> Von: Tony Morris
> An: scala-user@googlegroups.com
> Betreff: Re: [scala-user] Re: Help with performance for a Scala app
> It is really, really bad [for performance], but that's because you
> called List.apply(Int).
>
> val list = List(1,2,3,4,5.....100000)
> var x = list
> while(!x.isEmpty) {
> doIt(x.head)
> x = x.tail
> }
>
> On 20/04/11 15:54, HamsterofDeath wrote:
> > i meant:
> >
> > val list = List(1,2,3,4,5.....100000)
> > var index = 0;
> > while (index < list.size) {
> > doIt(list(index))
> > index+=1
> > }
> >
> > is really, really bad. foreach knows how to traverse a collection
> > efficiently. a while loop does not. it might or might not work. of
> > course you can look into a collections' foreach-method. if a while loop
> > is there, you can use a while loop directly to traverse the collection,
> > avoiding function call overhead. but it's not the while loop that is
> > faster, it's the non existent overhead that does.
> >
> > Am 20.04.2011 07:15, schrieb Paul Phillips:
> >> On 4/19/11 2:28 PM, HamsterofDeath wrote:
> >>> each collection comes with a foreach method, and that should be used
> for
> >>> traversing it. a while loop can never be faster. it can be as fast as
> >>> foreach, or slower. it doesn't make sense to use it.
> >> It must take years of dogged effort to assemble misinformation of this
> >> caliber.
> >>
>
>
Wed, 2011-04-20, 08:27
#36
Re: Re: Help with performance for a Scala app
On 20/04/11 16:47, Russ Paielli wrote:
> Is there a better pattern for this?
Anamorphism/unfold.
Wed, 2011-04-20, 08:57
#37
Re: Re: Help with performance for a Scala app
On Wed, Apr 20, 2011 at 12:10 AM, Tony Morris <tonymorris@gmail.com> wrote:
I took a look at the Wikipedia article on this, and it looks interesting, but unfortunately the examples are in Haskell and I don't understand them. Can you point me to some tutorial examples in Scala?
Also, am I correct in assuming that to use this pattern I will need a function for a single time step?
--Russ P.
--
http://RussP.us
On 20/04/11 16:47, Russ Paielli wrote:
> Is there a better pattern for this?
Anamorphism/unfold.
I took a look at the Wikipedia article on this, and it looks interesting, but unfortunately the examples are in Haskell and I don't understand them. Can you point me to some tutorial examples in Scala?
Also, am I correct in assuming that to use this pattern I will need a function for a single time step?
--Russ P.
--
http://RussP.us
Wed, 2011-04-20, 09:47
#38
Re: Re: Help with performance for a Scala app
On Wed, Apr 20, 2011 at 12:53 AM, Russ Paielli wrote:
> On Wed, Apr 20, 2011 at 12:10 AM, Tony Morris wrote:
>>
>> On 20/04/11 16:47, Russ Paielli wrote:
>> > Is there a better pattern for this?
>> Anamorphism/unfold.
>>
>
> I took a look at the Wikipedia article on this, and it looks interesting,
> but unfortunately the examples are in Haskell and I don't understand them.
> Can you point me to some tutorial examples in Scala?
>
> Also, am I correct in assuming that to use this pattern I will need a
> function for a single time step?
val times = Iterator.iterate(time)(_ +dt).takeWhile(_ <
maxPredTime).toIndexedSeq
val points = times.map{ time => compute position }
HTH,
David
>
> --Russ P.
>
> --
> http://RussP.us
>
Wed, 2011-04-20, 09:57
#39
Re: Re: Help with performance for a Scala app
On 20/04/11 18:43, David Hall wrote:
> On Wed, Apr 20, 2011 at 12:53 AM, Russ Paielli wrote:
>> On Wed, Apr 20, 2011 at 12:10 AM, Tony Morris wrote:
>>> On 20/04/11 16:47, Russ Paielli wrote:
>>>> Is there a better pattern for this?
>>> Anamorphism/unfold.
>>>
>> I took a look at the Wikipedia article on this, and it looks interesting,
>> but unfortunately the examples are in Haskell and I don't understand them.
>> Can you point me to some tutorial examples in Scala?
>>
>> Also, am I correct in assuming that to use this pattern I will need a
>> function for a single time step?
> val times = Iterator.iterate(time)(_ +dt).takeWhile(_ <
> maxPredTime).toIndexedSeq
> val points = times.map{ time => compute position }
>
> HTH,
> David
>
>> --Russ P.
>>
>> --
>> http://RussP.us
>>
scala> def unfold[A, B](k: B => Option[(A, B)])(z: B): List[A] = k(z)
match { case None => Nil; case Some((a, b)) => a :: unfold(k)(b) }
Note that iterate is a specialisation of this function.
scala> def iterate[A](z: A => A) = unfold((b: A) => Some(b, z(b))) _ //
does not terminate for F=List
Wed, 2011-04-20, 10:17
#40
Re: Re: Help with performance for a Scala app
On Wed, Apr 20, 2011 at 10:53 AM, Tony Morris <tonymorris@gmail.com> wrote:
On 20/04/11 18:43, David Hall wrote:
> On Wed, Apr 20, 2011 at 12:53 AM, Russ Paielli <russ.paielli@gmail.com> wrote:
>> On Wed, Apr 20, 2011 at 12:10 AM, Tony Morris <tonymorris@gmail.com> wrote:
>>> On 20/04/11 16:47, Russ Paielli wrote:
>>>> Is there a better pattern for this?
>>> Anamorphism/unfold.
>>>
>> I took a look at the Wikipedia article on this, and it looks interesting,
>> but unfortunately the examples are in Haskell and I don't understand them.
>> Can you point me to some tutorial examples in Scala?
>>
>> Also, am I correct in assuming that to use this pattern I will need a
>> function for a single time step?
> val times = Iterator.iterate(time)(_ +dt).takeWhile(_ <
> maxPredTime).toIndexedSeq
> val points = times.map{ time => compute position }
>
> HTH,
> David
>
>> --Russ P.
>>
>> --
>> http://RussP.us
>>
scala> def unfold[A, B](k: B => Option[(A, B)])(z: B): List[A] = k(z)
match { case None => Nil; case Some((a, b)) => a :: unfold(k)(b) }
A bit too specialized? Unfoldable?
Note that iterate is a specialisation of this function.
scala> def iterate[A](z: A => A) = unfold((b: A) => Some(b, z(b))) _ //
does not terminate for F=List
What F are you referring to?
--
Tony Morris
http://tmorris.net/
--
Viktor Klang,
Director of Research and Development
Scalable Solutions
Code: github.com/viktorklang
Follow: twitter.com/viktorklang
Read: klangism.tumblr.com
Wed, 2011-04-20, 10:27
#41
Re: Re: Help with performance for a Scala app
On 20/04/11 19:07, √iktor Ҡlang wrote:
> On Wed, Apr 20, 2011 at 10:53 AM, Tony Morris wrote:
>
>> On 20/04/11 18:43, David Hall wrote:
>>> On Wed, Apr 20, 2011 at 12:53 AM, Russ Paielli
>> wrote:
>>>> On Wed, Apr 20, 2011 at 12:10 AM, Tony Morris
>> wrote:
>>>>> On 20/04/11 16:47, Russ Paielli wrote:
>>>>>> Is there a better pattern for this?
>>>>> Anamorphism/unfold.
>>>>>
>>>> I took a look at the Wikipedia article on this, and it looks
>> interesting,
>>>> but unfortunately the examples are in Haskell and I don't understand
>> them.
>>>> Can you point me to some tutorial examples in Scala?
>>>>
>>>> Also, am I correct in assuming that to use this pattern I will need a
>>>> function for a single time step?
>>> val times = Iterator.iterate(time)(_ +dt).takeWhile(_ <
>>> maxPredTime).toIndexedSeq
>>> val points = times.map{ time => compute position }
>>>
>>> HTH,
>>> David
>>>
>>>> --Russ P.
>>>>
>>>> --
>>>> http://RussP.us
>>>>
>> scala> def unfold[A, B](k: B => Option[(A, B)])(z: B): List[A] = k(z)
>> match { case None => Nil; case Some((a, b)) => a :: unfold(k)(b) }
>>
> A bit too specialized? Unfoldable?
>
>
>> Note that iterate is a specialisation of this function.
>>
>> scala> def iterate[A](z: A => A) = unfold((b: A) => Some(b, z(b))) _ //
>> does not terminate for F=List
>>
> What F are you referring to?
>
>
>> --
>> Tony Morris
>> http://tmorris.net/
>>
>>
>>
>
Yes unfold can be generalised, and is done so in some crazy library out
there. F is a type constructor on which you can generalise for a
polymorphic unfold.
Wed, 2011-04-20, 13:07
#42
Re: Re: Help with performance for a Scala app
On Wed, Apr 20, 2011 at 1:15 AM, Paul Phillips <paulp@improving.org> wrote:
--
Jim Powers
On 4/19/11 2:28 PM, HamsterofDeath wrote:HamsterofDeath, for your consideration: http://vimeo.com/20290504
> each collection comes with a foreach method, and that should be used for
> traversing it. a while loop can never be faster. it can be as fast as
> foreach, or slower. it doesn't make sense to use it.
--
Jim Powers
Wed, 2011-04-20, 15:37
#43
Re: Re: Help with performance for a Scala app
Nathan's first two examples are broken.
First, the Java for loop is optimized by flattening the inner loop--it doesn't even _run_ the innermost loop (and maybe not the others either) since it knows that += 1 2000 times is the same as += 2000.
Second, the follow-up while loop doesn't increment s, so it doesn't even do the same thing.
Looking through quickly, I think the other things made more sense, but making two basic mistakes at the very beginning of microbenchmarking doesn't give one a lot of confidence.
--Rex
On Wed, Apr 20, 2011 at 8:06 AM, Jim Powers <jim@casapowers.com> wrote:
First, the Java for loop is optimized by flattening the inner loop--it doesn't even _run_ the innermost loop (and maybe not the others either) since it knows that += 1 2000 times is the same as += 2000.
Second, the follow-up while loop doesn't increment s, so it doesn't even do the same thing.
Looking through quickly, I think the other things made more sense, but making two basic mistakes at the very beginning of microbenchmarking doesn't give one a lot of confidence.
--Rex
On Wed, Apr 20, 2011 at 8:06 AM, Jim Powers <jim@casapowers.com> wrote:
On Wed, Apr 20, 2011 at 1:15 AM, Paul Phillips <paulp@improving.org> wrote:On 4/19/11 2:28 PM, HamsterofDeath wrote:HamsterofDeath, for your consideration: http://vimeo.com/20290504
> each collection comes with a foreach method, and that should be used for
> traversing it. a while loop can never be faster. it can be as fast as
> foreach, or slower. it doesn't make sense to use it.
--
Jim Powers
Wed, 2011-04-20, 15:57
#44
Re: Re: Help with performance for a Scala app
>>>>> "Rex" == Rex Kerr writes:
Rex> Nathan's first two examples are broken. [...]
(Nathan uploaded the video, but the speaker is Nermin.)
_population.takeRight(popSize - elitismCount)
_population(Random.nextInt(_population.length))
(_population(idx).fitness < parents(i).fitness)
takeRight and apply have really bad performance characteristics for List. you should definitely use another collection type such as Vector.
http://www.scala-lang.org/docu/files/collections-api/collections_40.html
since your implementation is single-threaded, you may also try to use mutable collections instead.
best, -sciss-
On 15 Apr 2011, at 20:40, John wrote:
> Hello Scala masters,
>
> I'm looking for help with an implementation that I've done for an open
> source project that I've been fiddling with. The issue that I'm
> facing is that the Scala implementation is *painfully* slow compared
> to the Java and Clojure implementations, and it's likely due to my
> weak Scala Kung Fu. The code for Scala can be found here:
>
> https://github.com/jsvazic/GAHelloWorld/tree/master/scala
>
> And the main project can be found here:
>
> https://github.com/jsvazic/GAHelloWorld
>
> Can anyone assist with a quick review and offer any suggestions? TIA