This page is no longer maintained — Please continue to the home page at www.scala-lang.org

LOC ratios wrt java

28 replies
Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.

I've only done a handful of fairly trivial things in Scala and I'm
sure I'm not writing the most idiomatic and concise Scala code
possible.

That said though, I've been surprised at how little code compression
I've gotten compared to the equivalent Java versions. The kind of Java
code that can be translated to pipelines of sequences with anonymous
functions in Scala seems to be the biggest win but even in fairly
algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.

Is this typical or are people seeing much bigger gains?

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: LOC ratios wrt java
Coming from Java, this is normalThere's definitely a limit to how far you can drop your LOC if you're still structuring/architecting programs in the same way.
You'll probably go through 4 or 5 "ah-hah" moments before you're reaping the full benefit of a more functional style. This covers everything from familiarity with the collections framework, all the way through to grokking typeclasses



On 4 August 2010 18:43, Miles Egan <milesegan@gmail.com> wrote:
I've only done a handful of fairly trivial things in Scala and I'm
sure I'm not writing the most idiomatic and concise Scala code
possible.

That said though, I've been surprised at how little code compression
I've gotten compared to the equivalent Java versions. The kind of Java
code that can be translated to pipelines of sequences with anonymous
functions in Scala seems to be the biggest win but even in fairly
algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.

Is this typical or are people seeing much bigger gains?

--
miles



--
Kevin Wright

mail/google talk: kev.lee.wright@gmail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

bmaso
Joined: 2009-10-04,
User offline. Last seen 2 years 40 weeks ago.
Re: LOC ratios wrt java

I see tons of compassion, especially where genetics would be used, for
comprehensions, implicits can be taken advantage of.

Also, strong typing and Option[_]/Either[_] is lowering need for unit
tests, which saves a lot of code as well.

Brian Maso

On Wednesday, August 4, 2010, Miles Egan wrote:
> I've only done a handful of fairly trivial things in Scala and I'm
> sure I'm not writing the most idiomatic and concise Scala code
> possible.
>
> That said though, I've been surprised at how little code compression
> I've gotten compared to the equivalent Java versions. The kind of Java
> code that can be translated to pipelines of sequences with anonymous
> functions in Scala seems to be the biggest win but even in fairly
> algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.
>
> Is this typical or are people seeing much bigger gains?
>
> --
> miles
>

Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

I'd appreciate it very much if anybody would be willing to give me
some advice on a particular case. I'm writing a simple recommendation
engine based on the GroupLens data for movie ratings. The java version
is here:
http://github.com/cageface/brains/blob/0ee0bfce5bda549e7a3f778ba6a6ef82a...

And the scala version is here:
http://github.com/cageface/brains/blob/plain_java/recommend/movielens.scala

Surprisingly they seem to be about the same length. I could probably
compress the pearsonCorrelation function into shorter, more
declarative code. Maybe this example is just too trivial.

On Wed, Aug 4, 2010 at 10:54 AM, Kevin Wright wrote:
> You'll probably go through 4 or 5 "ah-hah" moments before you're reaping the
> full benefit of a more functional style.
> This covers everything from familiarity with the collections framework, all
> the way through to grokking typeclasses

nilskp
Joined: 2009-01-30,
User offline. Last seen 1 year 27 weeks ago.
Re: LOC ratios wrt java
On Wed, Aug 4, 2010 at 12:43 PM, Miles Egan <milesegan@gmail.com> wrote:
I've only done a handful of fairly trivial things in Scala and I'm
sure I'm not writing the most idiomatic and concise Scala code
possible.

That said though, I've been surprised at how little code compression
I've gotten compared to the equivalent Java versions. The kind of Java
code that can be translated to pipelines of sequences with anonymous
functions in Scala seems to be the biggest win but even in fairly
algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.

This sounds about what I would expect, particularly if you still keep your code readable. You can almost go Perl with Scala, but you probably wouldn't want that.
Eric Leman
Joined: 2010-05-25,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

 Once you start using libraries tools and frameworks writen for Scala,
you'll see more boilerplate reduction, swap your favorite Java build tool for
SBT for example and you'll see a lot of shrinkage right there.
Scala DSLs are another example where you can get a lot of boilerplate reduction.


On Wed, Aug 4, 2010 at 1:43 PM, Miles Egan <milesegan@gmail.com> wrote:
I've only done a handful of fairly trivial things in Scala and I'm
sure I'm not writing the most idiomatic and concise Scala code
possible.

That said though, I've been surprised at how little code compression
I've gotten compared to the equivalent Java versions. The kind of Java
code that can be translated to pipelines of sequences with anonymous
functions in Scala seems to be the biggest win but even in fairly
algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.

Is this typical or are people seeing much bigger gains?

--
miles



--
Eric
H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: LOC ratios wrt java

i rely heavily on scala's collections, and compared to what i had to
write in java, i'm at 50% or even less of code.

Am 04.08.2010 19:43, schrieb Miles Egan:
> I've only done a handful of fairly trivial things in Scala and I'm
> sure I'm not writing the most idiomatic and concise Scala code
> possible.
>
> That said though, I've been surprised at how little code compression
> I've gotten compared to the equivalent Java versions. The kind of Java
> code that can be translated to pipelines of sequences with anonymous
> functions in Scala seems to be the biggest win but even in fairly
> algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.
>
> Is this typical or are people seeing much bigger gains?
>

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: LOC ratios wrt java
One sure sign that you're still "thinking in Java" is lots of vars and mutable collections.So definite room for improvement here.

Do you have the 3 input files that you're using to test with? they'd help anyone looking to suggest alternatives :)


On 4 August 2010 19:31, HamsterofDeath <h-star@gmx.de> wrote:
 i rely heavily on scala's collections, and compared to what i had to
write in java, i'm at 50% or even less of code.

Am 04.08.2010 19:43, schrieb Miles Egan:
> I've only done a handful of fairly trivial things in Scala and I'm
> sure I'm not writing the most idiomatic and concise Scala code
> possible.
>
> That said though, I've been surprised at how little code compression
> I've gotten compared to the equivalent Java versions. The kind of Java
> code that can be translated to pipelines of sequences with anonymous
> functions in Scala seems to be the biggest win but even in fairly
> algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.
>
> Is this typical or are people seeing much bigger gains?
>




--
Kevin Wright

mail/google talk: kev.lee.wright@gmail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

channingwalton
Joined: 2008-09-27,
User offline. Last seen 2 weeks 1 day ago.
Re: LOC ratios wrt java

I did a port of a Java library to Scala as my first Scala project. The design
of the library is identical to the Java version so all the compression is at
a class or method level. Overall I got the scala to 60% of the java, and
some parts were under 50%. The library is also doing a lot of IO which
didn't compress well.

The Java and Scala versions are here: http://www.flyobjectspace.com/

I am very sure that writing the Scala library from scratch will result in
smaller code.

Dan Kang
Joined: 2010-08-03,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java
Hi Miles,
First of all, use of collections in your scala code is neither idiomatic, nor pragmatic.  If performance is important, don't rely on Seq[A]() to return a collection of the desired performance characteristic.  I'm not sure what it returns in 2.8 but you're using it to replace both List and Array in your java example.
Also, a pattern like below is almost certainly not what you intended
var commonA = Seq[Double]()...commonA = commonA :+ (kv._2 - meanAllA)
No matter what collection object Seq[Double]() returns, this is disastrous.  If it's a list, you don't want to add elements at the end.  If it's a vector, you don't want to keep reallocating a new vector.  If you want mutability, get a mutable object (Buffer) and update it.
Second, this code block in your pearsonCorrelation is just not right.  If standard deviation for both samples is zero (I don't know what technically correlation should be in that case, as you have effectively a single data point), none of the rest of the calculation you perform here is variant.  indA and indB (by the way, you're simply calculating the difference between the first and the last values in each collection) are always going to be zero, there's no need to recheck the value of devA, etc, etc.
    if (devA == 0.0 || devB == 0.0) {      var indA = 0.0      var indB = 0.0      (1 until a.size).foreach { i =>        indA += a(i - 1) - a(i)         indB += b(i - 1) - b(i)      }
      if (indA == 0.0 && indB == 0.0) {        // degenerate correlation, all points the same        return 1.0       }      else if (devA == 0) {        // otherwise either a or b vary        devA = devB      }      else {        devB = devA      }     }
Overall, the scala code is pretty much a direct translation of the java code with limited replacement of some loop constructs.  In that case, any perceived improvement is just syntax - it would largely mean that scala is more syntactically terse (which it is) not that it offers tangible improvements in reduction of code complexity.  As others have said, you'd have to learn to decompose the problem differently to fully take advantage of functional constructs in Scala.
With that said, idiomatic scala isn't yet a great fit for this particular domain (number crunching) - I find that I often have to avoid functional constructs to get decent performance when iterating over large amounts of numbers - but that's another topic altogether.
Dan

On Wed, Aug 4, 2010 at 1:58 PM, Miles Egan <milesegan@gmail.com> wrote:
I'd appreciate it very much if anybody would be willing to give me
some advice on a particular case. I'm writing a simple recommendation
engine based on the GroupLens data for movie ratings. The java version
is here:
http://github.com/cageface/brains/blob/0ee0bfce5bda549e7a3f778ba6a6ef82a441339e/recommend/MovieLens.java

And the scala version is here:
http://github.com/cageface/brains/blob/plain_java/recommend/movielens.scala

Surprisingly they seem to be about the same length. I could probably
compress the pearsonCorrelation function into shorter, more
declarative code. Maybe this example is just too trivial.

On Wed, Aug 4, 2010 at 10:54 AM, Kevin Wright <kev.lee.wright@gmail.com> wrote:
> You'll probably go through 4 or 5 "ah-hah" moments before you're reaping the
> full benefit of a more functional style.
> This covers everything from familiarity with the collections framework, all
> the way through to grokking typeclasses

--
miles

Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

I'm using the grouplens data from this site:
http://grouplens.org/node/73#attachments

But I preprocessed it slightly to make it easier to deal with and
posted it here:
http://www.burgerkone.com/tmp/100k.zip

The format is simple. It's three text files, with lines of items
delimited by either a pipe or whitespace. The important one is
ratings.dat, which contains lines with an integer user id, an integer
movie id, and an integer 1-5 rating.

Thanks for taking a look.

On Wed, Aug 4, 2010 at 12:13 PM, Kevin Wright wrote:
> Do you have the 3 input files that you're using to test with? they'd help
> anyone looking to suggest alternatives :)

richard emberson
Joined: 2010-03-22,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

I've always told friends who are still doing Java that when
comparing the standard Scala and Java libraries, what takes
10 lines of Java only takes 2 lines of Scala ... and that
Scala compensates by having far less in-code documentation :-)

Of course, to be fair, Scala is still undergoing rapid
development and does not have a big corporation behind it
funding the production of scaladoc comments. It is my guess
that the Scala community would rather have the Scala team
evolve Scala rather than produce documentation.... though,
still, it is too bad that there's not some corporation
stepping up to fund comment generation.

Richard

On 08/04/2010 10:43 AM, Miles Egan wrote:
> I've only done a handful of fairly trivial things in Scala and I'm
> sure I'm not writing the most idiomatic and concise Scala code
> possible.
>
> That said though, I've been surprised at how little code compression
> I've gotten compared to the equivalent Java versions. The kind of Java
> code that can be translated to pipelines of sequences with anonymous
> functions in Scala seems to be the biggest win but even in fairly
> algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.
>
> Is this typical or are people seeing much bigger gains?
>

Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

Thanks very much for the feedback. Comments below.

On Wed, Aug 4, 2010 at 12:28 PM, Dan Kang wrote:
> First of all, use of collections in your scala code is neither idiomatic,
> nor pragmatic.  If performance is important, don't rely on Seq[A]() to
> return a collection of the desired performance characteristic.  I'm not sure
> what it returns in 2.8 but you're using it to replace both List and Array in
> your java example.

Good point. I'm focusing mainly on just getting this working because
the subject matter is new but I'm sure you're right.

> Also, a pattern like below is almost certainly not what you intended
> var commonA = Seq[Double]()
> ...
> commonA = commonA :+ (kv._2 - meanAllA)
> No matter what collection object Seq[Double]() returns, this is disastrous.
>  If it's a list, you don't want to add elements at the end.  If it's a
> vector, you don't want to keep reallocating a new vector.

I'm a little confused on this point:

scala> val s = collection.mutable.Buffer[Int]()
s: scala.collection.mutable.Buffer[Int] = ArrayBuffer()

scala> s :+ 1
res7: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1)

scala> s
res8: scala.collection.mutable.Buffer[Int] = ArrayBuffer()

What's the correct way to append to a mutable collection?

> Second, this code block in your pearsonCorrelation is just not right.  If
> standard deviation for both samples is zero (I don't know what technically
> correlation should be in that case, as you have effectively a single data
> point), none of the rest of the calculation you perform here is variant.

Statistics is definitely not my forté - this is just something I
cribbed from a book. I'll revisit this.

> As others have said, you'd
> have to learn to decompose the problem differently to fully take advantage
> of functional constructs in Scala.

I'm still figuring out how to approach these things in Scala. For more
complex functional iterations in Clojure I use loop/recur if I need to
accumulate multiple values across each loop. I'm not sure what the
equivalent is in Scala. Maybe a recursive function or a for
comprehension?

> With that said, idiomatic scala isn't yet a great fit for this particular
> domain (number crunching) - I find that I often have to avoid functional
> constructs to get decent performance when iterating over large amounts of
> numbers - but that's another topic altogether.

Yeah I'm sure that to really make this efficient I'd probably have to
do this in a predominantly imperative style. Unlike the ruby version
it replaced though, it's fast enough to explore the ideas.

Randall R Schulz
Joined: 2008-12-16,
User offline. Last seen 1 year 29 weeks ago.
Re: LOC ratios wrt java

On Wednesday August 4 2010, Brian Maso wrote:
> I see tons of compassion, especially where genetics would be used,
> for comprehensions, implicits can be taken advantage of.

Compassion? Genetics?

Empathetic eugenics??

> ...
>
> Brian Maso

RRS

mneeley
Joined: 2010-05-06,
User offline. Last seen 2 years 23 weeks ago.
Re: LOC ratios wrt java

Here are a few thoughts on some ways to make your scala code a bit
more idomatic (at least according to my still-developing understanding
of scala idioms :-) ).  These are essentially statement- or
expression-level changes, not architectural changes, which probably
provide bigger opportunities for major LOC savings. Also, as others
have mentioned, the compute-intensive parts are probably best left in
imperative style (for example, the map in standardDeviation is
probably quite wasteful because it then just gets summed over, and
Array is probably better than Seq in many cases, if marginally less
flexible).

<<<
userFile.getLines.foreach { i =>
 val parts = i.split(separator)
 userRatings.getOrElseUpdate(parts(0).toInt, mut.Map[Int,Int]())
}

movieFile.getLines.foreach { i =>
val parts = i.split(separator)
movies(parts(0).toInt) = parts(1)
movieRatings.getOrElseUpdate(parts(0).toInt, mut.Map[Int,Int]())
}

ratingFile.getLines.foreach { i =>
val parts = i.split("""\s""").map(_.toInt)
movieRatings(parts(1))(parts(0)) = parts(2)
userRatings(parts(0))(parts(1)) = parts(2)
}
>>>
for (line <- userFile.getLines) {
 val num :: rest = line.split(separator).toList
 userRatings.getOrElseUpdate(num.toInt, mut.Map[Int, Int])
}

for (line <- movieFile.getLines) {
val num :: name :: Nil = line.split(separator).toList
movies(num.toInt) = name
movieRatings.getOrElseUpdate(num.toInt, mut.Map[Int,Int])
}

for (line <- ratingFile.getLines) {
val user :: movie :: rating :: Nil = line.split("""\s""").map(_.toInt).toList
movieRatings(movie)(user) = rating
userRatings(user)(movie) = rating
}

IMHO the for comprehension is often cleaner and more readable than the
equivalent expression using foreach and sometimes also map or flatMap
(but it's just syntactic sugar, so no less performant). Also note the
pattern match assignments, which make the code nice and
self-documenting. No LOC savings, though.

<<<
var commonA = Seq[Double]()
var commonB = Seq[Double]()

movieRatings(a).foreach { kv =>
 movieRatings(b).get(kv._1) match {
   case Some(v) => {
     commonA = commonA :+ (kv._2 - meanAllA)
     commonB = commonB :+ (v - meanAllB)
   }
   case None => ()
 }
}
>>>
val commons = for {
(user, aRating) <- movieRatings(a)
bRating <- movieRatings(b).get(user) } yield (aRating - meanAllA,
bRating - meanAllB)
val (commonA, commonB) = commons unzip

Note the awesomeness of pattern matching in the assignment of each
iteration variable.  Also note that because the second nested
iteration over movieRatings(b).get(user) does a flatMap on the
Option[Int] returned by the get, it does the right thing with the None
case and basically ignores it. The for-comprehension creates a
sequence of tuples and then unzip it to get the individual sequences.
This is potentially a place where an imperative solution will give
better performance, but you can't beat the for-comprehension for
conciseness. I'm also not sure if this is the best way to indent code
like this, but it's one possibility.

<<<
val meanA = mean(a)
var devA = standardDeviation(meanA, a)
val meanB = mean(b)
var devB = standardDeviation(meanB, b)
>>>
val (meanA, devA) = meanAndStdDev(a)
val (meanB, devB) = meanAndStdDev(b)

Small savings here in LOC, and obviously you have to write the method
meanAndStdDev, but you don't have to pass the mean into the std dev.
function as a separate parameter (which is just begging for a typo
error) and you still get the advantage of not recomputing the mean to
find the std. deviation. The lesson, though, is don't forget that
it's easy to return multiple values from a function in scala!

<<<
var sims = Seq[Tuple2[Double,String]]()
movies.foreach { m =>
  val sim = set.similarity(86, m)
  sims = sims :+ (sim, set.movies(m))
}
sims.sorted.foreach { i =>
//printf("%3f %s\n", i._1, i._2)
}
>>>
val sims = for (movie <- movies) yield (set.similarity(86, movie),
set.movies(movie))
for ((sim, name) <- sims.toList.sorted) {
//do your thing
}

once again, for-comprehension FTW.

This code is probably not typo-free, but you get the idea :-)

-Matthew

On Wed, Aug 4, 2010 at 12:13 PM, Kevin Wright wrote:
> One sure sign that you're still "thinking in Java" is lots of vars and
> mutable collections.
> So definite room for improvement here.
>
> Do you have the 3 input files that you're using to test with? they'd help
> anyone looking to suggest alternatives :)
>
>
> On 4 August 2010 19:31, HamsterofDeath wrote:
>>
>>  i rely heavily on scala's collections, and compared to what i had to
>> write in java, i'm at 50% or even less of code.
>>
>> Am 04.08.2010 19:43, schrieb Miles Egan:
>> > I've only done a handful of fairly trivial things in Scala and I'm
>> > sure I'm not writing the most idiomatic and concise Scala code
>> > possible.
>> >
>> > That said though, I've been surprised at how little code compression
>> > I've gotten compared to the equivalent Java versions. The kind of Java
>> > code that can be translated to pipelines of sequences with anonymous
>> > functions in Scala seems to be the biggest win but even in fairly
>> > algorithm-heavy apps I'm only seeing a reduction of about 10-20% LOC.
>> >
>> > Is this typical or are people seeing much bigger gains?
>> >
>>
>
>
>
> --
> Kevin Wright
>
> mail/google talk: kev.lee.wright@gmail.com
> wave: kev.lee.wright@googlewave.com
> skype: kev.lee.wright
> twitter: @thecoda
>
>

Dan Kang
Joined: 2010-08-03,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java
On Wed, Aug 4, 2010 at 4:12 PM, Miles Egan <milesegan@gmail.com> wrote:
I'm a little confused on this point:

scala> val s = collection.mutable.Buffer[Int]()
s: scala.collection.mutable.Buffer[Int] = ArrayBuffer()

scala> s :+ 1
res7: scala.collection.mutable.Buffer[Int] = ArrayBuffer(1)

scala> s
res8: scala.collection.mutable.Buffer[Int] = ArrayBuffer()

What's the correct way to append to a mutable collection?

+= to append, +=: to prepend.  Generally speaking, if you don't see = somewhere in the operator, the collection itself doesn't get additional elements.  
I'm still figuring out how to approach these things in Scala. For more complex functional iterations in Clojure I use loop/recur if I need to
accumulate multiple values across each loop. I'm not sure what the
equivalent is in Scala. Maybe a recursive function or a for
comprehension?

I'm not sure what you mean here - any looping construct (for, while, etc) would allow you to accumulate any number of values across each loop.  Do you mean syntactic support for accumulators in a more functional style loop?  Syntactic sugar is nice, but I'd focus more on semantics.  Loops are a solved problem - you can syntactically transform simple recursive functions into a loop or vice versa easily.  The reason introductory literature on functional programming shows how to transform simple iterations into recursive or higher-order function counterparts is that iterations are easy to understand.  To get real savings in terms of complexity, you have to migrate from merely using existing higher-order functions to being able to define your own.
Dan
Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

Thanks very much. That's not a huge LOC decrease but it's definitely
much more readable. The tuple unpacking syntax and for comprehensions
definitely make things clearer. I've shied away from for
comprehensions for no good reason.

One thing I still don't understand about them. This doesn't work:
scala> for (i <- 1 to 10
| j <- 1 to 10) yield (i, j)
:2: error: ')' expected but '<-' found.
j <- 1 to 10) yield (i, j)
^

but with brackets it does:
scala> for { i <- 1 to 10
| j <- 1 to 10 } yield (i, j)

It's still not entirely clear to me in which cases brackets and parens
are equivalent in Scala.

On Wed, Aug 4, 2010 at 2:16 PM, Matthew Neeley wrote:
> Here are a few thoughts on some ways to make your scala code a bit
> more idomatic (at least according to my still-developing understanding
> of scala idioms :-) ).

Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

On Wed, Aug 4, 2010 at 2:27 PM, Dan Kang wrote:
>> What's the correct way to append to a mutable collection?
>
> += to append, +=: to prepend.  Generally speaking, if you don't see =
> somewhere in the operator, the collection itself doesn't get additional
> elements.

I couldn't find those operators on mutable.Seq but I guess the right
way to do this is to use a buffer?

> I'm not sure what you mean here - any looping construct (for, while, etc)
> would allow you to accumulate any number of values across each loop.  Do you
> mean syntactic support for accumulators in a more functional style loop?

Yeah, the same way you'd use a recursive function with multiple args
to accumulate across recursions. I guess you can do most of the same
things with a for comp, right?

Danielk
Joined: 2009-06-08,
User offline. Last seen 3 years 21 weeks ago.
Re: LOC ratios wrt java
It is interesting to see Scala and Java code doing the same thing side by side!

(btw there is a typo in the java code for similarity - && should be || on the first row)

The kind of Java
code that can be translated to pipelines of sequences with anonymous
functions in Scala seems to be the biggest win

Another big win is that you can remove boilerplate. A simple example from your code is that mean can be turned into a one-liner:

def mean[T <% Double](values:Seq[T]) = values.foldLeft(0d)(_ + _) / values.size

A LOC reduction of 60 % for this particular method :)


The gains become huge for simple data container classes:

First a simple immutable java container with correct equals and hashcode in Java (to be fair I should mention that Eclipse generates pratically all this code for me, and the hideous equals and hashcode can look a lot nicer if manually written):

public class Car {
    private final String make;
    private final String model;
       
    public Car(String make, String model) {
        this.make = make;
        this.model = model;
    }

    public String getMake () {
        return make;
    }

    public String getModel () {
        return model;
    }

    @Override public int hashCode () {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((make == null) ? 0 : make.hashCode ());
        result = prime * result + ((model == null) ? 0 : model.hashCode ());
        return result;
    }

    @Override public boolean equals (Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass () != obj.getClass ())
            return false;
        Car other = (Car) obj;
        if (make == null) {
            if (other.make != null)
                return false;
        } else if (!make.equals (other.make))
            return false;
        if (model == null) {
            if (other.model != null)
                return false;
        } else if (!model.equals (other.model))
            return false;
        return true;
    }

   @Override public String toString () {
        return "Car [make=" + make + ", model=" + model + "]";
    }
}

BAM! 50 LOC for a trivial, immutable class with only two fields.

The scala equivalent:

case class Car (make: String, model: String)

Is not only 96 % shorter, but also much nicer to work with because of copy methods (with named arguments), pattern matching and so on. And the difference gets even larger of course when there are more fields.

Another boilerplate-reducing feature is when you can use mixins instead of delegation.








On Wed, Aug 4, 2010 at 7:58 PM, Miles Egan <milesegan@gmail.com> wrote:
I'd appreciate it very much if anybody would be willing to give me
some advice on a particular case. I'm writing a simple recommendation
engine based on the GroupLens data for movie ratings. The java version
is here:
http://github.com/cageface/brains/blob/0ee0bfce5bda549e7a3f778ba6a6ef82a441339e/recommend/MovieLens.java

And the scala version is here:
http://github.com/cageface/brains/blob/plain_java/recommend/movielens.scala

Surprisingly they seem to be about the same length. I could probably
compress the pearsonCorrelation function into shorter, more
declarative code. Maybe this example is just too trivial.

On Wed, Aug 4, 2010 at 10:54 AM, Kevin Wright <kev.lee.wright@gmail.com> wrote:
> You'll probably go through 4 or 5 "ah-hah" moments before you're reaping the
> full benefit of a more functional style.
> This covers everything from familiarity with the collections framework, all
> the way through to grokking typeclasses

--
miles

Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

On Wed, Aug 4, 2010 at 4:38 PM, Daniel Kristensen
wrote:
> It is interesting to see Scala and Java code doing the same thing side by
> side!

Probably not very good scala code but I agree it's interesting to try
to solve the same problem in both languages.

> (btw there is a typo in the java code for similarity - && should be || on
> the first row)

Good catch. Thanks.

> Another big win is that you can remove boilerplate. A simple example from
> your code is that mean can be turned into a one-liner:
>
> def mean[T <% Double](values:Seq[T]) = values.foldLeft(0d)(_ + _) /
> values.size

Yeah that's a good example.

> The gains become huge for simple data container classes:

Also a good example. It's horrific how much boilerplate you have to
write for some simple things in Java.

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: LOC ratios wrt java
First, some small stuff.
Functions with a single statement don't need enclosing braces, you can also use type inference, so
def openFile(path:String):io.BufferedSource = {
  new io.BufferedSource(new java.io.FileInputStream(path))
}
becomes
def openFile(path:String) = new io.BufferedSource(new java.io.FileInputStream(path))

That's a 66% reduction in LOC right there!

Syntactic sugar, there are nicer ways to express tuples
Seq[Tuple2[Double,String]]
is better written
Seq[(Double,String)]

Learn your libraries, StringOps has a format method, collections have a mkString method
val simlines = sims.sorted map { (a,b) => "%3f %s".format(a,b) }println(simLines.mkString("\n"))
Not so much reduction in this example, but arguably cleaner and more debuggable

Getting more advanced...

Think Purity, avoid side effects. Instead of iterating a collection then updating another as a side affect, write an expression that takes a collection as input and yields another as output.Make them both immutable if you can.
This:
val movies = mut.Map[Int, String]()
movieFile.getLines.foreach { i =>   val parts = i.split(separator)  movies(parts(0).toInt) = parts(1)
}

becomes:
val movies = for (line <- movieFile.getLines) yield {   val parts = i.split(separator)  parts(0).toInt -> parts(1)   }.toMap //the toMap converts a List of pairs into a Map


raise your abstractions! You don't need separate userRatings and movieRatings, they hold the same information, a simple sequence of Tuple3's would do the job.Even better, make a case class to keep it type-safe, named params make it more readable still (warning: some of this begins to get performance sensitive, depending on the size of your data-sets)
case class Rating(user:Int, movie:Int, Score:Int)... val ratings = for (line <- ratingFile.getLines) yield {  val parts = i.split("""\s""").map(_.toInt)   Rating(user = parts(0), movie = parts(1), score = parts(2))}
If you need to, you can then:
def userRatings(userId:Int) = ratings.filter(_.user = userId)

Writing in this style, I'd have 3 collections corresponding more closely to the input files: a Map of users, a Map of movies, and a Seq[Rating] of ratings
You also want to be careful of differences between operations that will mutate a collection in-place, and those that will yield a mutated copy As well as the difference between a val holing a mutable collection, and a var holding an immutable one
In most cases, it's considered good practice to stick with immutable collections in vals, using some of the techniques mentioned above


--
Kevin Wright

mail/google talk: kev.lee.wright@gmail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: LOC ratios wrt java


On 5 August 2010 01:22, Miles Egan <milesegan@gmail.com> wrote:
On Wed, Aug 4, 2010 at 4:38 PM, Daniel Kristensen
<daniel.kristensen@gmail.com> wrote:
> It is interesting to see Scala and Java code doing the same thing side by
> side!

Probably not very good scala code but I agree it's interesting to try
to solve the same problem in both languages.

> (btw there is a typo in the java code for similarity - && should be || on
> the first row)

Good catch. Thanks.

> Another big win is that you can remove boilerplate. A simple example from
> your code is that mean can be turned into a one-liner:
>
> def mean[T <% Double](values:Seq[T]) = values.foldLeft(0d)(_ + _) /
> values.size
 
Yeah that's a good example.


even better, in 2.8 you can write:
def mean[T <% Double](values:Seq[T]) = values.sum / values.size
 
> The gains become huge for simple data container classes:

Also a good example. It's horrific how much boilerplate you have to
write for some simple things in Java.

--
miles



--
Kevin Wright

mail/google talk: kev.lee.wright@gmail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

On Wed, Aug 4, 2010 at 5:51 PM, Kevin Wright wrote:
> val movies = for (line <- movieFile.getLines) yield {
>   val parts = i.split(separator)
>   parts(0).toInt -> parts(1)
> }.toMap //the toMap converts a List of pairs into a Map

This doesn't seem to compile here (compiler wants to apply .toMap to
each tuple, not the seq, I think), but this does:
val movies = movieFile.getLines.map { line =>
val id :: title :: rest = line.split(separator).toList
id.toInt -> title
}.toMap

Which I agree is nicer.

> You don't need separate userRatings and movieRatings, they hold the same
> information, a simple sequence of Tuple3's would do the job.

Actually this I don't think I can do. I need to be able to index into
the ratings quickly by an integer ID. In theory there could be many
millions of these.

Thanks for the pointers. Incorporating all the suggestions from this
thread the scala version is now about 40% shorter than the java
version and much more readable, which is pretty impressive:
http://github.com/cageface/brains/blob/plain_java/recommend/movielens.scala

mneeley
Joined: 2010-05-06,
User offline. Last seen 2 years 23 weeks ago.
Re: LOC ratios wrt java
We're getting pretty close to the point of diminishing returns, but I made a few more tweaks:


import collection.mutable.{ Map => MMap }

class MovieLens(movieFile:io.BufferedSource, ratingFile:io.BufferedSource) {
 
 val separator = """\|"""

 val movies = movieFile.getLines.map { line =>
   val id :: title :: rest = line.split(separator).toList
   id.toInt -> title
 }.toMap

 // initializer block.  Allows us to use mutable collections while reading data, then make them immutable  val (movieRatings, userRatings) = {
   val userBuilder = MMap[Int, MMap[Int,Double]]()
   val movieBuilder = MMap[Int, MMap[Int,Double]]()
   for (line <- ratingFile.getLines) {
     val user :: movie :: rating :: rest = line.split("""\s""").map(_.toInt).toList
     movieBuilder.getOrElseUpdate(movie, MMap())(user) = rating
     userBuilder.getOrElseUpdate(user, MMap())(movie) = rating
   }   // make immutable versions
   (movieBuilder.mapValues(_.toMap).toMap, userBuilder.mapValues(_.toMap).toMap)
 }

 def similarity(a:Int, b:Int):Double = {   // immediately unpack Options, rather than keeping them around
   val ratingsA = movieRatings.get(a) getOrElse { return 0.0 }
   val ratingsB = movieRatings.get(b) getOrElse { return 0.0 }

   val meanA = mean(ratingsA.values.toArray)
   val meanB = mean(ratingsB.values.toArray)

   // find ratings of both movies by the same users, subtract mean and store deltas
   val commonUsers = ratingsA.keySet & ratingsB.keySet
   if (commonUsers.size < 3) return 0.0
   val commonRatings = commonUsers.toSeq.map(u =>
     (ratingsA(u) - meanA, ratingsB(u) - meanB))
   val (commonA, commonB) = commonRatings.unzip

   pearsonCorrelation(commonA.toArray, commonB.toArray)
 }

 def pearsonCorrelation(a:Array[Double], b:Array[Double]):Double = {
   if (a.isEmpty) return 0.0 // no overlapping ratings

   val (meanA, devA) = meanAndDev(a)
   val (meanB, devB) = meanAndDev(b)   // note the use of a case structure, which is just a partial function (requires braces)
   val xy = (a zip b).map{ case (a, b) => (a - meanA) * (b - meanB) }.sum

   // this pattern match is clearer than the conditional logic, and no vars anywhere
   (devA, devB) match {
     case (0.0, 0.0) => 1.0
     case (devA, 0.0) => xy / (a.size * devA * devA)
     case (0.0, devB) => xy / (a.size * devB * devB)
     case (devA, devB) => xy / (a.size * devA * devB)
   }
 }

 def mean(values:Array[Double]) = values.sum / values.size

 def meanAndDev(values:Array[Double]) = {
   val meanV = mean(values)
   val squares = values.map(v => (v - meanV) * (v - meanV)).sum
   (meanV, math.sqrt(squares / values.size))
 }
}


Cheers,Matthew

On Wed, Aug 4, 2010 at 6:43 PM, Miles Egan <milesegan@gmail.com> wrote:
> On Wed, Aug 4, 2010 at 5:51 PM, Kevin Wright <kev.lee.wright@gmail.com> wrote:
>> val movies = for (line <- movieFile.getLines) yield {
>>   val parts = i.split(separator)
>>   parts(0).toInt -> parts(1)
>> }.toMap //the toMap converts a List of pairs into a Map
>
> This doesn't seem to compile here (compiler wants to apply .toMap to
> each tuple, not the seq, I think), but this does:
>  val movies = movieFile.getLines.map { line =>
>    val id :: title :: rest = line.split(separator).toList
>    id.toInt -> title
>  }.toMap
>
> Which I agree is nicer.
>
>> You don't need separate userRatings and movieRatings, they hold the same
>> information, a simple sequence of Tuple3's would do the job.
>
> Actually this I don't think I can do. I need to be able to index into
> the ratings quickly by an integer ID. In theory there could be many
> millions of these.
>
> Thanks for the pointers. Incorporating all the suggestions from this
> thread the scala version is now about 40% shorter than the java
> version and much more readable, which is pretty impressive:
> http://github.com/cageface/brains/blob/plain_java/recommend/movielens.scala
>
> --
> miles
>

Miles Egan
Joined: 2010-07-05,
User offline. Last seen 42 years 45 weeks ago.
Re: LOC ratios wrt java

It's come a long way from the original hasn't it? It's great that
Scala allows us to write code that succinct and expressive that is
still at least an order of magnitude (and perhaps more) faster than it
would be in something like python or ruby.

On Wed, Aug 4, 2010 at 7:46 PM, Matthew Neeley wrote:
> We're getting pretty close to the point of diminishing returns, but I made a
> few more tweaks:

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: LOC ratios wrt java


On 5 August 2010 02:43, Miles Egan <milesegan@gmail.com> wrote:
On Wed, Aug 4, 2010 at 5:51 PM, Kevin Wright <kev.lee.wright@gmail.com> wrote:
> val movies = for (line <- movieFile.getLines) yield {
>   val parts = i.split(separator)
>   parts(0).toInt -> parts(1)
> }.toMap //the toMap converts a List of pairs into a Map

This doesn't seem to compile here (compiler wants to apply .toMap to
each tuple, not the seq, I think), but this does:
 val movies = movieFile.getLines.map { line =>
   val id :: title :: rest = line.split(separator).toList
   id.toInt -> title
 }.toMap


Heh heh, you're right.  I originally conceived this as a mapping operation then translated back to a for-loop, thinking that would be more readable.It's always a good buzz to see Scala newcomers picking up concepts so quickly :)  
Which I agree is nicer.

> You don't need separate userRatings and movieRatings, they hold the same
> information, a simple sequence of Tuple3's would do the job.

Actually this I don't think I can do. I need to be able to index into
the ratings quickly by an integer ID. In theory there could be many
millions of these.

Thanks for the pointers. Incorporating all the suggestions from this
thread the scala version is now about 40% shorter than the java
version and much more readable, which is pretty impressive:
http://github.com/cageface/brains/blob/plain_java/recommend/movielens.scala

--
miles



--
Kevin Wright

mail/google talk: kev.lee.wright@gmail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

nilskp
Joined: 2009-01-30,
User offline. Last seen 1 year 27 weeks ago.
Re: LOC ratios wrt java
On Wed, Aug 4, 2010 at 7:55 PM, Kevin Wright <kev.lee.wright@gmail.com> wrote:

even better, in 2.8 you can write:
def mean[T <% Double](values:Seq[T]) = values.sum / values.size

Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_21).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def mean[T <% Double](values:Seq[T]) = values.sum / values.size
<console>:5: error: could not find implicit value for parameter num: Numeric[T]
       def mean[T <% Double](values:Seq[T]) = values.sum / values.size

Kevin Wright 2
Joined: 2010-05-30,
User offline. Last seen 26 weeks 4 days ago.
Re: LOC ratios wrt java
oh, alright then:
scala> def mean[T <% Double : Numeric](values:Seq[T]) = values.sum / values.size
mean: [T](values: Seq[T])(implicit evidence$1: (T) => Double,implicit evidence$2: Numeric[T])Double
scala> mean(Seq(1,2,3,4))res1: Double = 2.5


On 5 August 2010 14:29, Nils Kilden-Pedersen <nilskp@gmail.com> wrote:
On Wed, Aug 4, 2010 at 7:55 PM, Kevin Wright <kev.lee.wright@gmail.com> wrote:

even better, in 2.8 you can write:
def mean[T <% Double](values:Seq[T]) = values.sum / values.size

Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_21).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def mean[T <% Double](values:Seq[T]) = values.sum / values.size
<console>:5: error: could not find implicit value for parameter num: Numeric[T]
       def mean[T <% Double](values:Seq[T]) = values.sum / values.size




--
Kevin Wright

mail/google talk: kev.lee.wright@gmail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda

Erik Engbrecht
Joined: 2008-12-19,
User offline. Last seen 3 years 18 weeks ago.
Re: LOC ratios wrt java
Wouldn't it be better to use Fractional rather than Double?  If you're not averaging doubles then you probably don't want to receive a Double as the result.

On Thu, Aug 5, 2010 at 9:37 AM, Kevin Wright <kev.lee.wright@gmail.com> wrote:
oh, alright then:
scala> def mean[T <% Double : Numeric](values:Seq[T]) = values.sum / values.size
mean: [T](values: Seq[T])(implicit evidence$1: (T) => Double,implicit evidence$2: Numeric[T])Double
scala> mean(Seq(1,2,3,4)) res1: Double = 2.5


On 5 August 2010 14:29, Nils Kilden-Pedersen <nilskp@gmail.com> wrote:
On Wed, Aug 4, 2010 at 7:55 PM, Kevin Wright <kev.lee.wright@gmail.com> wrote:

even better, in 2.8 you can write:
def mean[T <% Double](values:Seq[T]) = values.sum / values.size

Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_21).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def mean[T <% Double](values:Seq[T]) = values.sum / values.size
<console>:5: error: could not find implicit value for parameter num: Numeric[T]
       def mean[T <% Double](values:Seq[T]) = values.sum / values.size




--
Kevin Wright

mail/google talk: kev.lee.wright@gmail.com
wave: kev.lee.wright@googlewave.com
skype: kev.lee.wright
twitter: @thecoda




--
http://erikengbrecht.blogspot.com/

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland