This page is no longer maintained — Please continue to the home page at www.scala-lang.org

collections sortBy() method

1 reply
Ivan Todoroski
Joined: 2009-05-12,
User offline. Last seen 42 years 45 weeks ago.

Hello Scala developers,

I hope this is the right list for requests like this.

Since you are already redesigning the Collections framework, would you
mind adding something like the following method right next to wherever
the "sort((A,A) => Boolean)" method is defined?

def sortBy[X <% Ordered[X](key: A => X) =
sort((x, y) => key(x) < key(y))

The motivation is that it would prevent code duplication and unnecessary
verbosity in many common uses of sorting.

You can probably see where I'm going with this, but for completness
let's take for example a simple Employee class with fields such as id,
firstname, lastname, department, salary, etc. Let's further assume that
there is no single canonical ordering for employees i.e. we can't just
implement Ordered because we need to sort lists of employees differently
throughout the application.

Sorting by salary:

val employees: List[Employee] = ...
employees sort {(e1,e2) => e1.salary < e2.salary}
employees sortBy {e => e.salary}

To me, the sortBy() version seems clearer and more straightforward, you
simply say what you mean: sort by salary. It becomes especially clearer
when you need to sort by multiple keys, e.g. sorting by last name
followed by first name.

employees sortBy {e => (e.lastname, e.firstname)}

Contrast this with the sort() version:

employees sort {(e1,e2) =>
if (e1.lastname == e2.lastname)
e1.firstname < e2.firstname
else
e1.lastname < e2.lastname
}

or even just borrowing the Tuple trick from sortBy:

employees sort {(e1,e2) => (e1.lastname, e1.firstname) < (e2.lastname,
e2.firstname)}

The sort() version is less clear and more error prone when the time
comes to change the sorting keys, you can easily forget to make the same
changes on both sides of the comparison operator and end up with bugs
like "e1.firstname < e2.lastname" and the like.

Regarding the implementation of sortBy(), if you want to get fancy there
is something called the Schwartzian transform which supposedly helps
performance by extracting the key to sort by only once per object, you
can read more about it here:

http://joelneely.wordpress.com/2008/03/29/sorting-by-schwartzian-transfo...

Personally I never needed to use such a complicated key function that
invoking it only once per object became important. I only care about the
added expressiveness and clarity provided by the sortBy() method, the
internal implementation is up to you.

Also, if you decide to incorporate this method, you might want to add
sortByInverse() for completeness, although even without it it wouldn't
be hard to come up with an adapter function Ordered[X] => Ordered[X]
that inverts the comparison, e.g.

employees sortBy {e => inverse(e.salary)}

or even combine different sort orders in compound keys:

employees sortBy {e => (e.lastname, inverse(e.firstname))}

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: collections sortBy() method
I much like this suggestion, and I see that moving sort to IterableTemplate is still TODO. The Schwartzian sort seems definitely like the thing to do in my opinion. Its definition is pretty straight-forward:   def sortBy[V, K <% Ordered[K]] (f: V => K) (vs: List[V]): List[V] = {
    (vs map f zip vs) sort (_._1 < _._1) map (_._2)
 }
As a method, of couse, "vs" wouldn't be required -- it's "this". IterableTemplate does not have "zip", but "vs map (v => (f(v), v))" would do just as fine. I'd take the opportunity and pass an Option[Array[Boolean]] with default value None and arity K.productArity it K is a tuple, 1 if not, to optionally swap tuple members between the elements being compared before comparing, to achieve selective reverse ordering by component.

  On Fri, Jun 19, 2009 at 10:43 AM, Ivan Todoroski <grnch_lists@gmx.net> wrote:
Hello Scala developers,

I hope this is the right list for requests like this.

Since you are already redesigning the Collections framework, would you mind adding something like the following method right next to wherever the "sort((A,A) => Boolean)" method is defined?

def sortBy[X <% Ordered[X](key: A => X) =
 sort((x, y) => key(x) < key(y))


The motivation is that it would prevent code duplication and unnecessary verbosity in many common uses of sorting.

You can probably see where I'm going with this, but for completness let's take for example a simple Employee class with fields such as id, firstname, lastname, department, salary, etc. Let's further assume that there is no single canonical ordering for employees i.e. we can't just implement Ordered because we need to sort lists of employees differently throughout the application.


Sorting by salary:

val employees: List[Employee] = ...
employees sort {(e1,e2) => e1.salary <  e2.salary}
employees sortBy {e => e.salary}

To me, the sortBy() version seems clearer and more straightforward, you simply say what you mean: sort by salary. It becomes especially clearer when you need to sort by multiple keys, e.g. sorting by last name followed by first name.

employees sortBy {e => (e.lastname, e.firstname)}

Contrast this with the sort() version:

employees sort {(e1,e2) =>
   if (e1.lastname == e2.lastname)
       e1.firstname < e2.firstname
   else
       e1.lastname < e2.lastname
}

or even just borrowing the Tuple trick from sortBy:

employees sort {(e1,e2) => (e1.lastname, e1.firstname) < (e2.lastname, e2.firstname)}

The sort() version is less clear and more error prone when the time comes to change the sorting keys, you can easily forget to make the same changes on both sides of the comparison operator and end up with bugs like "e1.firstname < e2.lastname" and the like.


Regarding the implementation of sortBy(), if you want to get fancy there is something called the Schwartzian transform which supposedly helps performance by extracting the key to sort by only once per object, you can read more about it here:

http://joelneely.wordpress.com/2008/03/29/sorting-by-schwartzian-transform-in-scala/

Personally I never needed to use such a complicated key function that invoking it only once per object became important. I only care about the added expressiveness and clarity provided by the sortBy() method, the internal implementation is up to you.

Also, if you decide to incorporate this method, you might want to add sortByInverse() for completeness, although even without it it wouldn't be hard to come up with an adapter function Ordered[X] => Ordered[X] that inverts the comparison, e.g.

employees sortBy {e => inverse(e.salary)}

or even combine different sort orders in compound keys:

employees sortBy {e => (e.lastname, inverse(e.firstname))}



--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland