When You Absolutely Positively Have to Have it ASAP...

12 replies

Sun, 2009-07-12, 18:03

Randall R Schulz

Joined: 2008-12-16,

Hi,

Having just replaced this method in my code:

/** Create a new clause from the literals produced by
applying xform to the literals of this clause
*/
def
mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
FOL_Clause(literals.map(xform): _*)

...with this abomination, motivated by a
desire to avoid unnecessary allocation:

def
mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
{
val newLiterals = new Array[FOL_Literal](arity)
var altered = false
for (i <- 0 until arity) {
val lit = literals(i)
val repl = xform(lit)
newLiterals(i) = repl
if ((repl ne lit) && repl != lit)
altered = true
}
if (altered)
FOL_Clause(newLiterals)
else
this
}

...I'm left wondering: when one has decided to commit this sort of
atrocity (leaving aside the determination of when it's justified), what
are the Scala language constructs that have minimal overhead? How do
you recognize when you've done something with an overhead that could,
given some tradeoff of code readability and elegance, afford a
performance edge if changed?

E.g., in the example above, I believe (but am not sure) that the loop
body in the long-form replacement is not compiled into a function
literal, but rather to in-line code. Is that correct? I know that "0
until arity" involves the construction of a scala.Range (and that if
I'd use "to" instead of "until" I'd have gotten a scala.Range$Inclusive
—inner class Inclusive in object Range).

Anyway, to generalize the question, is there a compendium of such
replacements or constructs or some set of rules to use to optimize code
in places where performance is particularly critical?

Randall Schulz

Sun, 2009-07-12, 20:17

Ricky Clarkson

Joined: 2008-12-19,

Re: When You Absolutely Positively Have to Have it ASAP...

No. You want while loops or tail recursion (same thing) for ultimate
performance.

2009/7/12 Randall R Schulz :
> Hi,
>
> Having just replaced this method in my code:
>
> /** Create a new clause from the literals produced by
> applying xform to the literals of this clause
> */
> def
> mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
> FOL_Clause(literals.map(xform): _*)
>
>
> ...with this abomination, motivated by a
> desire to avoid unnecessary allocation:
>
> def
> mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
> {
> val newLiterals = new Array[FOL_Literal](arity)
> var altered = false
> for (i <- 0 until arity) {
> val lit = literals(i)
> val repl = xform(lit)
> newLiterals(i) = repl
> if ((repl ne lit) && repl != lit)
> altered = true
> }
> if (altered)
> FOL_Clause(newLiterals)
> else
> this
> }
>
>
> ...I'm left wondering: when one has decided to commit this sort of
> atrocity (leaving aside the determination of when it's justified), what
> are the Scala language constructs that have minimal overhead? How do
> you recognize when you've done something with an overhead that could,
> given some tradeoff of code readability and elegance, afford a
> performance edge if changed?
>
> E.g., in the example above, I believe (but am not sure) that the loop
> body in the long-form replacement is not compiled into a function
> literal, but rather to in-line code. Is that correct? I know that "0
> until arity" involves the construction of a scala.Range (and that if
> I'd use "to" instead of "until" I'd have gotten a scala.Range$Inclusive
> —inner class Inclusive in object Range).
>
>
> Anyway, to generalize the question, is there a compendium of such
> replacements or constructs or some set of rules to use to optimize code
> in places where performance is particularly critical?
>
>
> Randall Schulz
>

Sun, 2009-07-12, 22:17

Randall R Schulz

Joined: 2008-12-16,

Re: When You Absolutely Positively Have to Have it ASAP...

On Sunday July 12 2009, Ricky Clarkson wrote:
> No. You want while loops or tail recursion
> (same thing) for ultimate performance.

OK. That leads to:

Clean old way:

def
mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
FOL_Clause(literals.map(xform): _*)

Hideous new way (would that the speed-up was of a
magnitude comparable to the bloating of the code!):

def
mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
{
val newLiterals = new Array[FOL_Literal](arity)
var altered = false
val nLits = arity

def
applyXform(i: Int): Unit =
if (i < nLits) {
val lit = literals(i)
val repl = xform(lit)
newLiterals(i) = repl
if (!altered && (repl ne lit) && repl != lit)
altered = true
applyXform(i + 1)
}

applyXform(0)
if (altered)
FOL_Clause(newLiterals)
else
this
}

As I think about this more, this putative optimization probably isn't
justified (it may not be an optimization at all, for one thing). This
is an inner loop that will be executed heavily, but given that it's
common for only a few of the iterations to yield a changed literal and
the fact that cost of the equality test can be high (and despite the
fact that it's short-circuited in this version), I'm not at all sure it
would be a benefit for most programs and / or invocations thereof.

And if we didn't know before, it's should now be obvious why these rules
were coined [1]:

“The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.”
— Michael A. Jackson [2]

Are the techniques mentioned in "Optimizing Higher-Order Functions in
Scala" [3] implemented in the current Scala compiler?

> > Anyway, to generalize the question, is there a compendium of such
> > replacements or constructs or some set of rules to use to optimize
> > code in places where performance is particularly critical?

Do you know of anything of this sort? I could find nothing on the Web.

[1]
[2]
[3]

Randall Schulz

Sun, 2009-07-12, 22:57

Ricky Clarkson

Joined: 2008-12-19,

Re: When You Absolutely Positively Have to Have it ASAP...

Generally I'd suggest looking at the generated bytecode. Consider
reflection a showstopper, boxing something to eliminate if possible,
and extra method calls a possible issue (many are inlined by the JIT,
so they're not a definite issue). Or just profile two or more
approaches.

If you understand that for comprehensions always translate to foreach,
map, or flatMap at least, then you understand that they're typically
more expensive than while loops/tail recursion.

2009/7/12 Randall R Schulz :
> On Sunday July 12 2009, Ricky Clarkson wrote:
>> No. You want while loops or tail recursion
>> (same thing) for ultimate performance.
>
> OK. That leads to:
>
> Clean old way:
>
> def
> mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
> FOL_Clause(literals.map(xform): _*)
>
>
> Hideous new way (would that the speed-up was of a
> magnitude comparable to the bloating of the code!):
>
> def
> mapped(xform: FOL_Literal => FOL_Literal): FOL_Clause =
> {
> val newLiterals = new Array[FOL_Literal](arity)
> var altered = false
> val nLits = arity
>
> def
> applyXform(i: Int): Unit =
> if (i < nLits) {
> val lit = literals(i)
> val repl = xform(lit)
> newLiterals(i) = repl
> if (!altered && (repl ne lit) && repl != lit)
> altered = true
> applyXform(i + 1)
> }
>
> applyXform(0)
> if (altered)
> FOL_Clause(newLiterals)
> else
> this
> }
>
> As I think about this more, this putative optimization probably isn't
> justified (it may not be an optimization at all, for one thing). This
> is an inner loop that will be executed heavily, but given that it's
> common for only a few of the iterations to yield a changed literal and
> the fact that cost of the equality test can be high (and despite the
> fact that it's short-circuited in this version), I'm not at all sure it
> would be a benefit for most programs and / or invocations thereof.
>
>
> And if we didn't know before, it's should now be obvious why these rules
> were coined [1]:
>
> “The First Rule of Program Optimization: Don't do it.
> The Second Rule of Program Optimization (for experts only!): Don't do it yet.”
> — Michael A. Jackson [2]
>
>
> Are the techniques mentioned in "Optimizing Higher-Order Functions in
> Scala" [3] implemented in the current Scala compiler?
>
>
>> > Anyway, to generalize the question, is there a compendium of such
>> > replacements or constructs or some set of rules to use to optimize
>> > code in places where performance is particularly critical?
>
> Do you know of anything of this sort? I could find nothing on the Web.
>
>
> [1]
> [2]
> [3]
>
>
> Randall Schulz
>

Sun, 2009-07-12, 23:17

Randall R Schulz

Joined: 2008-12-16,

Re: When You Absolutely Positively Have to Have it ASAP...

On Sunday July 12 2009, Ricky Clarkson wrote:
> Generally I'd suggest looking at the generated bytecode. Consider
> reflection a showstopper, boxing something to eliminate if possible,
> and extra method calls a possible issue (many are inlined by the JIT,
> so they're not a definite issue). Or just profile two or more
> approaches.

I can't wait to look at profiler displays from Scala code. All those
inner classes! Still, I'm sure it's enlightening.

> If you understand that for comprehensions always translate to
> foreach, map, or flatMap at least, then you understand that they're
> typically more expensive than while loops/tail recursion.

I guess it all keeps coming back to that, eh?

> ...

RRS

Mon, 2009-07-13, 15:37

ijuma

Joined: 2008-08-20,

Re: When You Absolutely Positively Have to Have it ASAP...

On Sun, 2009-07-12 at 22:47 +0100, Ricky Clarkson wrote:
> Or just profile two or more
> approaches.

If the code is a real hotspot, then you can also just benchmark the
application with a realistic load and check the wall clock time when
each approach is used.

Profilers are very useful to find hotspots, but if you have one (real or
imagined), then the simple benchmark approach also works.

Best,
Ismael

Mon, 2009-07-13, 15:47

Randall R Schulz

Joined: 2008-12-16,

Re: When You Absolutely Positively Have to Have it ASAP...

On Monday July 13 2009, Ismael Juma wrote:
> On Sun, 2009-07-12 at 22:47 +0100, Ricky Clarkson wrote:
> > Or just profile two or more
> > approaches.
>
> If the code is a real hotspot, then you can also just benchmark the
> application with a realistic load and check the wall clock time when
> each approach is used.
>
> Profilers are very useful to find hotspots, but if you have one (real
> or imagined), then the simple benchmark approach also works.

Of course.

The real question is about which Scala language constructs carry which
overheads and which are minimal in terms of "hidden" overheads like
object creation (ranges, function literals, etc.) method calls (for
loop blocks, e.g.) and so on.

Another one I wonder about is which uses of a case block generate inline
code and which generate function objects.

> Best,
> Ismael

Randall Schulz

Mon, 2009-07-13, 15:57

Ricky Clarkson

Joined: 2008-12-19,

Re: When You Absolutely Positively Have to Have it ASAP...

Then I suggest using javap a lot. What case generates depends on what
the code's doing. I checked once, and using match..case as a Scala
'let' was only one bytecode instruction away from using a val. I.e.:

val x = Math.random
println(x * x)

was the same cost as:

Math.random match { case x => println(x * x) }

Which makes sense, but one can imagine a more naive implementation.

2009/7/13 Randall R Schulz :
> On Monday July 13 2009, Ismael Juma wrote:
>> On Sun, 2009-07-12 at 22:47 +0100, Ricky Clarkson wrote:
>> > Or just profile two or more
>> > approaches.
>>
>> If the code is a real hotspot, then you can also just benchmark the
>> application with a realistic load and check the wall clock time when
>> each approach is used.
>>
>> Profilers are very useful to find hotspots, but if you have one (real
>> or imagined), then the simple benchmark approach also works.
>
> Of course.
>
> The real question is about which Scala language constructs carry which
> overheads and which are minimal in terms of "hidden" overheads like
> object creation (ranges, function literals, etc.) method calls (for
> loop blocks, e.g.) and so on.
>
> Another one I wonder about is which uses of a case block generate inline
> code and which generate function objects.
>
>
>> Best,
>> Ismael
>
>
> Randall Schulz
>

Mon, 2009-07-13, 15:57

ijuma

Joined: 2008-08-20,

Re: When You Absolutely Positively Have to Have it ASAP...

On Mon, 2009-07-13 at 07:41 -0700, Randall R Schulz wrote:
> On Monday July 13 2009, Ismael Juma wrote:
> > On Sun, 2009-07-12 at 22:47 +0100, Ricky Clarkson wrote:
> > > Or just profile two or more
> > > approaches.
> >
> > If the code is a real hotspot, then you can also just benchmark the
> > application with a realistic load and check the wall clock time when
> > each approach is used.
> >
> > Profilers are very useful to find hotspots, but if you have one (real
> > or imagined), then the simple benchmark approach also works.
>
> Of course.
>
> The real question is about which Scala language constructs carry which
> overheads and which are minimal in terms of "hidden" overheads like
> object creation (ranges, function literals, etc.) method calls (for
> loop blocks, e.g.) and so on.

I didn't answer that because Ricky had already done so:

"Consider reflection a showstopper, boxing something to eliminate if
possible, and extra method calls a possible issue (many are inlined by
the JIT, so they're not a definite issue)"

I'd even narrow that down to reflection and boxing in the vast majority
of the cases. Reflection is used by structural types and boxing in cases
where you use primitives in generic code (very generally, it might be
possible to define this better).

Ismael

Mon, 2009-07-13, 16:07

Randall R Schulz

Joined: 2008-12-16,

Re: When You Absolutely Positively Have to Have it ASAP...

On Monday July 13 2009, Ricky Clarkson wrote:
> Then I suggest using javap a lot. What case generates depends on
> what the code's doing. I checked once, and using match..case as a
> Scala 'let' was only one bytecode instruction away from using a val.
> ...

It's been so very many years since I've looked at anything that could be
considered assembly. Are there tools to aid comprehension of JVM
bytecodes?

> 2009/7/13 Randall R Schulz :
> > ...
> >
> > The real question is about which Scala language constructs carry
> > which overheads and which are minimal in terms of "hidden"
> > overheads like object creation (ranges, function literals, etc.)
> > method calls (for loop blocks, e.g.) and so on.
> >
> > Another one I wonder about is which uses of a case block generate
> > inline code and which generate function objects.

Randall Schulz

Mon, 2009-07-13, 16:07

#10

Johannes Rudolph

Joined: 2008-12-17,

Re: When You Absolutely Positively Have to Have it ASAP...

On Mon, Jul 13, 2009 at 4:50 PM, Randall R Schulz wrote:
> It's been so very many years since I've looked at anything that could be
> considered assembly. Are there tools to aid comprehension of JVM
> bytecodes?
A list of bytecodes and their function:

http://homepages.inf.ed.ac.uk/kwxm/JVM/jvm.html

And of course the JVM specs themselves:
http://java.sun.com/docs/books/jvms/second_edition/html/Instructions.doc...

And there is a bytecode outline eclipse plugin which can show the
state of stack and local variables at each point in a method.
http://andrei.gmxhome.de/bytecode/index.html

What other tools do you think of?

Mon, 2009-07-13, 16:47

#11

Randall R Schulz

Joined: 2008-12-16,

Re: When You Absolutely Positively Have to Have it ASAP...

On Monday July 13 2009, Johannes Rudolph wrote:
> On Mon, Jul 13, 2009 at 4:50 PM, Randall R Schulz wrote:
> > It's been so very many years since I've looked at anything that
> > could be considered assembly. Are there tools to aid comprehension
> > of JVM bytecodes?
>
> A list of bytecodes and their function:
>
> http://homepages.inf.ed.ac.uk/kwxm/JVM/jvm.html
>
> And of course the JVM specs themselves:
> http://java.sun.com/docs/books/jvms/second_edition/html/Instructions.
>doc.html

These are good information resources, of course. I have a copy of the
JVM Spec 2nd Ed.

> And there is a bytecode outline eclipse plugin which can show the
> state of stack and local variables at each point in a method.
> http://andrei.gmxhome.de/bytecode/index.html
>
> What other tools do you think of?

I don't use Eclipse and generally won't unless there's some really
compelling functionality available only there.

I did find a CafeBabe plug-in for IDEA and that seems to give some basic
disassembly and display capabilities. I'll see what I can learn about
my compiled code using that.

Randall Schulz

Mon, 2009-07-13, 20:37

#12

Randall R Schulz

Joined: 2008-12-16,

Re: When You Absolutely Positively Have to Have it ASAP...

On Monday July 13 2009, Randall R Schulz wrote:
> On Monday July 13 2009, Johannes Rudolph wrote:
> > ...
> >
> > What other tools do you think of?
>
> ...
>
> I did find a CafeBabe plug-in for IDEA and that seems to give some
> basic disassembly and display capabilities. I'll see what I can learn
> about my compiled code using that.

That plug-in seems to be unreliable. It displayed one class file and
after that hung every time I tried to open another, even after quitting
and restarting IDEA.

I found a stand-alone tool called "jclasslib" [1] from ej-technologies
[2] (the people who make JProfiler) and while it has not been updated
since early 2005 (and I was concerned it wouldn't accept version 49
class files, as Jad does not) it seems to work fine for me.

[1]
[2]

Randall Schulz

Scala Main Menu

When You Absolutely Positively Have to Have it ASAP...

Scala Quick Links

Featured News

User login