improve performance on multi-core machine

Fri, 2010-10-22, 23:57

#1

Randall R Schulz

Joined: 2008-12-16,

Re: improve performance on multi-core machine

On Friday October 22 2010, Jiansen He wrote:
> Hi all,
>
> I write a small program to try how Scala will improve the performance
> of parallel computing on a multi-core machine. To my surprise, I
> find that it only 110-120% CPU is used. I running a dual core iMac
> with hyper-threading technology.
>
> In Haskell, I can explicitly point out how many cores in my machine
> and I find my program consumes about 380% CPU.
>
> How can I let Scala use more CPU? Can I tell Scala that my machine
> could run 4 processes at the same time?
>
>
> Jiansen

Parallelizing your algorithms is up to you.

The 110-120% you're seeing represents parallel execution of the garbage
collector, most likely.

Randall Schulz

Sat, 2010-10-23, 00:27

#2

Jiansen He

Joined: 2010-10-12,

Re: improve performance on multi-core machine

I'm not sure what is 110-120% represents for.

I included a timer method in my program. I also timed the execution time myself.

My Scala program is almost identical to the Haskell program.

In Scala, I write:
val (sqs_up_to_n1, sqs_len_n) = par(gen((len-1), l), gen_ (0, len, l))
merge(sqs_len_n, sqs_up_to_n1)

In Haskell, I write:
sqs_len_n `par` sqs_up_to_n1 `pseq` (merge sqs_len_n sqs_up_to_n1)
where sqs_up_to_n1 = (gen (n-1) l)
sqs_len_n = (gen' 0 n l)

Any ideas?

Jiansen

Sun, 2010-10-24, 05:37

#3

Naftoli Gugenheim

Joined: 2008-12-17,

Re: improve performance on multi-core machine

There is work in progress to add parallelization built in to the core collections. In the meantime you have to code your own parallelization. You can use actors, scala.concurrent, or plain old Java concurrency.

On Fri, Oct 22, 2010 at 6:30 PM, Jiansen He <jiansenhe@googlemail.com> wrote:

Hi all,

I write a small program to try how Scala will improve the performance of parallel computing on a multi-core machine. To my surprise, I find that it only 110-120% CPU is used. I running a dual core iMac with hyper-threading technology.

In Haskell, I can explicitly point out how many cores in my machine and I find my program consumes about 380% CPU.

How can I let Scala use more CPU? Can I tell Scala that my machine could run 4 processes at the same time?

Jiansen

Sun, 2010-10-24, 18:57

#4

Jiansen He

Joined: 2010-10-12,

Re: improve performance on multi-core machine

What is the differences between my Scala code and Haskell code? I thought they are semantically identical: sqs_len_n and sqs_up_to_n1could be parallel evaluated, after getting their value, do a merge sort. Why the performance of scala and Haskell will be so different.
Is there any method to get the number of CPU cores that my scala platform is aware of? And, how many processes are alive at a particular time?
Jiansen

On Sun, Oct 24, 2010 at 5:35 AM, Naftoli Gugenheim <naftoligug@gmail.com> wrote:

There is work in progress to add parallelization built in to the core collections. In the meantime you have to code your own parallelization. You can use actors, scala.concurrent, or plain old Java concurrency.

On Fri, Oct 22, 2010 at 6:30 PM, Jiansen He <jiansenhe@googlemail.com> wrote:
Hi all,

I write a small program to try how Scala will improve the performance of parallel computing on a multi-core machine. To my surprise, I find that it only 110-120% CPU is used. I running a dual core iMac with hyper-threading technology.

In Haskell, I can explicitly point out how many cores in my machine and I find my program consumes about 380% CPU.

How can I let Scala use more CPU? Can I tell Scala that my machine could run 4 processes at the same time?

Jiansen

Mon, 2010-10-25, 04:48

#5

Naftoli Gugenheim

Joined: 2008-12-17,

Re: improve performance on multi-core machine

I'm sure the snippets mean the same thing, but apparently Haskell has automatic parallelization built in, while Scala doesn't. The difference is not in your code, but in the platforms. Did you read anywhere something that implied Scala has such a feature? Although as I said, a similar feature is planned.

On Sun, Oct 24, 2010 at 1:55 PM, Jiansen He <jiansenhe@googlemail.com> wrote:

What is the differences between my Scala code and Haskell code? I thought they are semantically identical: sqs_len_n and sqs_up_to_n1could be parallel evaluated, after getting their value, do a merge sort. Why the performance of scala and Haskell will be so different.
Is there any method to get the number of CPU cores that my scala platform is aware of? And, how many processes are alive at a particular time?
Jiansen

On Sun, Oct 24, 2010 at 5:35 AM, Naftoli Gugenheim <naftoligug@gmail.com> wrote:

There is work in progress to add parallelization built in to the core collections. In the meantime you have to code your own parallelization. You can use actors, scala.concurrent, or plain old Java concurrency.

On Fri, Oct 22, 2010 at 6:30 PM, Jiansen He <jiansenhe@googlemail.com> wrote:
Hi all,

I write a small program to try how Scala will improve the performance of parallel computing on a multi-core machine. To my surprise, I find that it only 110-120% CPU is used. I running a dual core iMac with hyper-threading technology.

In Haskell, I can explicitly point out how many cores in my machine and I find my program consumes about 380% CPU.

How can I let Scala use more CPU? Can I tell Scala that my machine could run 4 processes at the same time?

Jiansen

Mon, 2010-10-25, 05:47

#6

Tony Morris 2

Joined: 2009-03-20,

Re: improve performance on multi-core machine

Scala has had such a feature for quite a while and the implementation has a tad more features than Haskell's (and a lot more than Scala-library actors). The performance of the implementation also exceeds Scala-library actors.

http://code.google.com/p/scalaz

Look for the scalaz.concurrent package. Unlike Haskell/GHC where you specify "the number of cores" as a compiler option, you instead specify it as a function argument to a library function. This function also generalises to be a computable value (see Strategy).

The actors implementation is a bit different to many others in that it is "the right way around" with regard to mutability. This provides significant benefit.

On 25/10/10 13:48, Naftoli Gugenheim wrote:

x4zDhF0r4+QK4Q9 [at] mail [dot] gmail [dot] com" type="cite"> I'm sure the snippets mean the same thing, but apparently Haskell has automatic parallelization built in, while Scala doesn't. The difference is not in your code, but in the platforms. Did you read anywhere something that implied Scala has such a feature? Although as I said, a similar feature is planned.

On Sun, Oct 24, 2010 at 1:55 PM, Jiansen He <jiansenhe [at] googlemail [dot] com" rel="nofollow">jiansenhe@googlemail.com> wrote:

What is the differences between my Scala code and Haskell code? I thought they are semantically identical: sqs_len_n and sqs_up_to_n1could be parallel evaluated, after getting their value, do a merge sort. Why the performance of scala and Haskell will be so different.
Is there any method to get the number of CPU cores that my scala platform is aware of? And, how many processes are alive at a particular time?
Jiansen

On Sun, Oct 24, 2010 at 5:35 AM, Naftoli Gugenheim <naftoligug [at] gmail [dot] com" target="_blank" rel="nofollow">naftoligug@gmail.com> wrote:

There is work in progress to add parallelization built in to the core collections. In the meantime you have to code your own parallelization. You can use actors, scala.concurrent, or plain old Java concurrency.

On Fri, Oct 22, 2010 at 6:30 PM, Jiansen He <jiansenhe [at] googlemail [dot] com" target="_blank" rel="nofollow">jiansenhe@googlemail.com> wrote:

Hi all,

I write a small program to try how Scala will improve the performance of parallel computing on a multi-core machine. To my surprise, I find that it only 110-120% CPU is used. I running a dual core iMac with hyper-threading technology.

In Haskell, I can explicitly point out how many cores in my machine and I find my program consumes about 380% CPU.

How can I let Scala use more CPU? Can I tell Scala that my machine could run 4 processes at the same time?

Jiansen

-- 
Tony Morris
http://tmorris.net/

Mon, 2010-10-25, 13:57

#7

Randall R Schulz

Joined: 2008-12-16,

Re: improve performance on multi-core machine

On Sunday October 24 2010, Tony Morris wrote:
> Scala has had such a feature for quite a while ...

Presumably you mean "ScalaZ has had such a feature ..."

> ... and the implementation
> has a tad more features than Haskell's (and a lot more than
> Scala-library actors). The performance of the implementation also
> exceeds Scala-library actors.
>
> http://code.google.com/p/scalaz
>
> ...

Randall Schulz

Mon, 2010-10-25, 23:37

#8

Tony Morris 2

Joined: 2009-03-20,

Re: improve performance on multi-core machine

On 25/10/10 22:55, Randall R Schulz wrote:
> On Sunday October 24 2010, Tony Morris wrote:
>> Scala has had such a feature for quite a while ...
>
> Presumably you mean "ScalaZ has had such a feature ..."
Yes. I tend to play on the perverted desire for a language to have
features that are more appropriately implemented as libraries. Please
excuse the indulgence.

>
>> ... and the implementation has a tad more features than Haskell's
>> (and a lot more than Scala-library actors). The performance of
>> the implementation also exceeds Scala-library actors.
>>
>> http://code.google.com/p/scalaz
>>
>> ...
>
>
> Randall Schulz

Tue, 2010-10-26, 00:57

#9

William Uther

Joined: 2010-09-13,

Re: improve performance on multi-core machine

On 25/10/2010, at 3:40 PM, Tony Morris wrote:

> Scala has had such a feature for quite a while and the implementation has a tad more features than Haskell's (and a lot more than Scala-library actors). The performance of the implementation also exceeds Scala-library actors.
>
> http://code.google.com/p/scalaz
>
> Look for the scalaz.concurrent package. Unlike Haskell/GHC where you specify "the number of cores" as a compiler option, you instead specify it as a function argument to a library function. This function also generalises to be a computable value (see Strategy).
>
> The actors implementation is a bit different to many others in that it is "the right way around" with regard to mutability. This provides significant benefit.

Forgive me for asking a potentially stupid question, but can you supply a little more detail here?

I looked at the scalaz.concurrent package here: and also the example code for Fib here:

and it doesn't look like auto-parallelisation to me. It looks quite manual. What am I missing?

(Jiansen: I'm not sure the Haskell looks auto-parallelised either - If I'm guessing at the meaning of that code correctly, there appears to be an explicit annotation to evaluate sqs_len_n in parallel. Oh and what is the 'par' function in the scala code earlier in this thread?)

Cheers,

Will :-}

Tue, 2010-10-26, 02:07

#10

Tony Morris 2

Joined: 2009-03-20,

Re: improve performance on multi-core machine

On 26/10/10 09:52, William Uther wrote:
> On 25/10/2010, at 3:40 PM, Tony Morris wrote:
>
>
>> Scala has had such a feature for quite a while and the implementation has a tad more features than Haskell's (and a lot more than Scala-library actors). The performance of the implementation also exceeds Scala-library actors.
>>
>> http://code.google.com/p/scalaz
>>
>> Look for the scalaz.concurrent package. Unlike Haskell/GHC where you specify "the number of cores" as a compiler option, you instead specify it as a function argument to a library function. This function also generalises to be a computable value (see Strategy).
>>
>> The actors implementation is a bit different to many others in that it is "the right way around" with regard to mutability. This provides significant benefit.
>>
> Forgive me for asking a potentially stupid question, but can you supply a little more detail here?
>
> I looked at the scalaz.concurrent package here: and also the example code for Fib here:
>
> and it doesn't look like auto-parallelisation to me. It looks quite manual. What am I missing?
>
> (Jiansen: I'm not sure the Haskell looks auto-parallelised either - If I'm guessing at the meaning of that code correctly, there appears to be an explicit annotation to evaluate sqs_len_n in parallel. Oh and what is the 'par' function in the scala code earlier in this thread?)
>
> Cheers,
>
> Will :-}
>
>
What would you expect a non-manual example to look like?

Tue, 2010-10-26, 02:47

#11

William Uther

Joined: 2010-09-13,

Re: improve performance on multi-core machine

On 26/10/2010, at 11:57 AM, Tony Morris wrote:

>
> On 26/10/10 09:52, William Uther wrote:
>> On 25/10/2010, at 3:40 PM, Tony Morris wrote:
>>
>>
>>> Scala has had such a feature for quite a while and the implementation has a tad more features than Haskell's (and a lot more than Scala-library actors). The performance of the implementation also exceeds Scala-library actors.
>>>
>>> http://code.google.com/p/scalaz
>>>
>>> Look for the scalaz.concurrent package. Unlike Haskell/GHC where you specify "the number of cores" as a compiler option, you instead specify it as a function argument to a library function. This function also generalises to be a computable value (see Strategy).
>>>
>>> The actors implementation is a bit different to many others in that it is "the right way around" with regard to mutability. This provides significant benefit.
>>>
>> Forgive me for asking a potentially stupid question, but can you supply a little more detail here?
>>
>> I looked at the scalaz.concurrent package here: and also the example code for Fib here:
>>
>> and it doesn't look like auto-parallelisation to me. It looks quite manual. What am I missing?
>>
>> (Jiansen: I'm not sure the Haskell looks auto-parallelised either - If I'm guessing at the meaning of that code correctly, there appears to be an explicit annotation to evaluate sqs_len_n in parallel. Oh and what is the 'par' function in the scala code earlier in this thread?)
>>
>> Cheers,
>>
>> Will :-}
>>
>>
> What would you expect a non-manual example to look like?

Well, using your example, I guess something like this:

def seqFib(n: Int): Int = if (n < 2) n else seqFib(n - 1) + seqFib(n - 2)

then you might have a single threaded calculation: seqFib(10), and an auto-parallelised calculation: autoPar(seqFib, 5)(10) where autoPar takes a function and a number of cores and returns a new function designed for that number of cores, which is then applied with argument 10.

In particular, your example does stuff that I don't follow:

def fib(n: Int): Promise[Int] = if (n < 2) n else fib(n - 1).<**>(fib(n - 2))(_ + _)

It is clearly related to the sequential version, but it isn't something I could write without learning more about scalaz and how its concurrency stuff works.

I could also imagine an approach that used a modified map function in the collection classes. The modified map call takes an implicit parameter containing a threadPool and automatically farms the function calls out over that threadPool. The default implicit would be a 'null' threadpool that uses the single-threaded behaviour.

e.g. (and I'm somewhat new to scala and writing this off the top of my head so there will be errors :)

def double(l : List[Int]) : List[int] = l.map(_ + _)

which could then be parallelised using something like:

implicit val collectionPool = new threadPool(5);

But I haven't given this a huge amount of thought - I'm just answering your question about the sort of thing I was looking for.

Cheers,

Will :-}

Tue, 2010-10-26, 05:17

#12

Jiansen He

Joined: 2010-10-12,

Re: improve performance on multi-core machine

On Tue, Oct 26, 2010 at 2:41 AM, William Uther <willu.mailingLists@cse.unsw.edu.au> wrote:

On 26/10/2010, at 11:57 AM, Tony Morris wrote:

>
> On 26/10/10 09:52, William Uther wrote:
>> On 25/10/2010, at 3:40 PM, Tony Morris wrote:
>>
>>
>>> Scala has had such a feature for quite a while and the implementation has a tad more features than Haskell's (and a lot more than Scala-library actors). The performance of the implementation also exceeds Scala-library actors.
>>>
>>> http://code.google.com/p/scalaz
>>>
>>> Look for the scalaz.concurrent package. Unlike Haskell/GHC where you specify "the number of cores" as a compiler option, you instead specify it as a function argument to a library function. This function also generalises to be a computable value (see Strategy).

Just a picky correction: Haskell/GHC specify "the number of cores" at run time, not at compile time. At compiler time, what we need to do is using -threaded option to specify that the program might contain parallel features.

>>>
>>> The actors implementation is a bit different to many others in that it is "the right way around" with regard to mutability. This provides significant benefit.
>>>
>> Forgive me for asking a potentially stupid question, but can you supply a little more detail here?
>>
>> I looked at the scalaz.concurrent package here: <http://scalaz.googlecode.com/svn/continuous/latest/doc/scalaz/example/concurrent/package.html> and also the example code for Fib here: <http://scalaz.googlecode.com/svn/continuous/latest/browse.sxr/scalaz/example/concurrent/Fibs.scala.html>
>>
>> and it doesn't look like auto-parallelisation to me. It looks quite manual. What am I missing?
>>
>> (Jiansen: I'm not sure the Haskell looks auto-parallelised either - If I'm guessing at the meaning of that code correctly, there appears to be an explicit annotation to evaluate sqs_len_n in parallel. Oh and what is the 'par' function in the scala code earlier in this thread?)
>>
>> Cheers,
>>
>> Will :-}
>>
>>
> What would you expect a non-manual example to look like?

Well, using your example, I guess something like this:

def seqFib(n: Int): Int = if (n < 2) n else seqFib(n - 1) + seqFib(n - 2)

then you might have a single threaded calculation: seqFib(10), and an auto-parallelised calculation: autoPar(seqFib, 5)(10) where autoPar takes a function and a number of cores and returns a new function designed for that number of cores, which is then applied with argument 10.

In Haskell, using 'par' and 'pseq' is categorized to "semi-implicit parallelism". Following code is cited from Haskell's official side. http://www.haskell.org/ghc/docs/6.12.2/html/users_guide/lang-parallel.html

import Control.Parallel

nfib :: Int -> Int
nfib n | n <= 1 = 1
       | otherwise = par n1 (pseq n2 (n1 + n2 + 1))
                     where n1 = nfib (n-1)
                           n2 = nfib (n-2)

From my point of view, you specified the number of cores in your code, what if I don't have 10 cores or I have more than 10 cores on my machine?

By contrast, the Haskell code could complied and run on any machine. The only time users need to specify how many cores could be (or willing to be) used is when the program is invoked.
Doesn't Haskell more implicit than yours?

For me, this level of "manual" is acceptable.

Your magic autoPar is very attractive. A fundamental question to any parallel decomposition is whether two operations have dependency relationship. If you can get a general algorithm that could analysis the dependency relationship between operations, probably you can further omit the number of cores in your code.

In particular, your example does stuff that I don't follow:

def fib(n: Int): Promise[Int] = if (n < 2) n else fib(n - 1).<**>(fib(n - 2))(_ + _)

It is clearly related to the sequential version, but it isn't something I could write without learning more about scalaz and how its concurrency stuff works.

I could also imagine an approach that used a modified map function in the collection classes. The modified map call takes an implicit parameter containing a threadPool and automatically farms the function calls out over that threadPool. The default implicit would be a 'null' threadpool that uses the single-threaded behaviour.

e.g. (and I'm somewhat new to scala and writing this off the top of my head so there will be errors :)

def double(l : List[Int]) : List[int] = l.map(_ + _)

which could then be parallelised using something like:

implicit val collectionPool = new threadPool(5);

But I haven't given this a huge amount of thought - I'm just answering your question about the sort of thing I was looking for.

Cheers,

Will :-}

If I read your double function correctly, you want to double all elements in a list, right? If you want to do this, in fact, map is a typical pattern to perform data parallelism, which is totally implicit. Parallel map has been well implemented in many languages, although it might be call mapP or something else.

I may post too much Haskell stuff in this thread :)

I was thinking using Actors to explicitly express what I want my computer to do. But I also think it might lead to verbose code but low efficiency.

Cheers,

Jiansen

Tue, 2010-10-26, 08:17

#13

William Uther

Joined: 2010-09-13,

Re: improve performance on multi-core machine

> = Jiansen He
>> = Tony Morris

>> What would you expect a non-manual example to look like?

[snip - my examples - two potential APIs. The first would probably be tricky to implement. The second more achievable.]

> Your magic autoPar is very attractive.

I'm not sure who you are referring to with 'your'. Tony's examples exist. Mine are imaginary :).

> If I read your double function correctly, you want to double all elements in a list, right? If you want to do this, in fact, map is a typical pattern to perform data parallelism, which is totally implicit. Parallel map has been well implemented in many languages, although it might be call mapP or something else.

You read my code correctly. I'm not surprised that it has been parallelised in many languages - It seems the obvious case.

I may go back to lurking as I don't really follow the other examples, and I don't really have the time to figure them out right now. I just reacted to Tony's strong claims and didn't think Scala was quite as automatic as he suggested. I may still be misunderstanding something though.

Cheers,

Will :-}

Wed, 2010-10-27, 16:07

#14

Razvan Cojocaru 3

Joined: 2010-07-28,

Re: improve performance on multi-core machine

Not really answering the original question, but I have a (rather stupid)
parallel implementation for map, in my scala workflow DSL, you pass it the
number of branches you want, while the number of the actual threads (or
actors) is configured in the engine itself:

http://github.com/razie/gremlins/blob/master/src/main/scala/razie/wfs.scala
look for def wsmap[A, B](branches: Int)(f: A => B)

it's used like this:
http://github.com/razie/gremlins/blob/master/src/test/scala/razie/wfstes...

def wsmap1 =
seq {
wsmap[Int,Int] (3) { x:Int => x + 1 }
}
def testwsmap1 = expect (List(2,3,4)) { prun (wsmap1, List(1,2,3)) }
def testwsmap1s = expect (List(2,3,4)) { prun (wfs strict wsmap1,
List(1,2,3)) }

read more seq/par goodness at:
http://github.com/razie/gremlins/blob/master/ScalaWorkflows.markdown

I'm now looking for the seq/par monad...anyone seen it?

cheers,
Razvan

-----Original Message-----
From: William Uther
Sent: Tuesday, October 26, 2010 3:07 AM
To: Jiansen He
Cc: scala User List
Subject: Re: [scala-user] improve performance on multi-core machine

> = Jiansen He
>> = Tony Morris

>> What would you expect a non-manual example to look like?

[snip - my examples - two potential APIs. The first would probably be
tricky to implement. The second more achievable.]

> Your magic autoPar is very attractive.

I'm not sure who you are referring to with 'your'. Tony's examples exist.
Mine are imaginary :).

> If I read your double function correctly, you want to double all elements
> in a list, right? If you want to do this, in fact, map is a typical
> pattern to perform data parallelism, which is totally implicit. Parallel
> map has been well implemented in many languages, although it might be call
> mapP or something else.

You read my code correctly. I'm not surprised that it has been parallelised
in many languages - It seems the obvious case.

I may go back to lurking as I don't really follow the other examples, and I
don't really have the time to figure them out right now. I just reacted to
Tony's strong claims and didn't think Scala was quite as automatic as he
suggested. I may still be misunderstanding something though.

Cheers,

Will :-}

Wed, 2010-10-27, 16:17

#15

Donald McLean

Joined: 2009-11-11,

Re: improve performance on multi-core machine

For really good Fibonacci sequence performance, just use the formula -
no parallelism needed. :-)

Donald

P.S. Yes, there really is a formula. No, it isn't pretty. Yes, the
derivation is some ugly advanced math.

On Mon, Oct 25, 2010 at 9:41 PM, William Uther
wrote:
> def seqFib(n: Int): Int = if (n < 2) n else seqFib(n - 1) + seqFib(n - 2)

Scala Main Menu

Scala Quick Links

Featured News

User login