- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Actor receiveWithin too slow
Sat, 2009-07-25, 00:26
I have an actor that gets messages,
A B C D
and needs these messages processed in FIFO order, but for some
messages it needs to break that message down into a sub-list of
messages:
A becomes A' B' C' D', then the desired order of messages becomes:
A' B' C' D' B C D
maintaining the FIFO order for messages in the mailbox. This would be wrong:
B C D A' B' C' D'
but this is the order you get if the actor sends itself the subdivided
messages; they get appended to the end.
I'd love to have a way to insert messages at the front of an actor's
mailbox, but right now I'm making due by draining the mailbox with
receiveWithin.
I'm pulling all the messages out, sending the new messages, and then
re-sending the older messages.
The profiler is showing a huge amount of time spent in
scala.runtime.BoxesRunTime.equals(Object, Object), inside
MessageQueue.extractFirst, inside receiveWithin.
The mailbox can get pretty big, such as 50k messages.
Any thoughts on a better strategy for getting the new messages at the
front of the mailbox? Is using receiveWithin in a loop to grab all the
messages as bad an idea as it appears to be?
Sat, 2009-07-25, 01:07
#2
Re: Actor receiveWithin too slow
On Fri, Jul 24, 2009 at 4:34 PM, Josh Stratton wrote:
>> A B C D
>>
>> and needs these messages processed in FIFO order, but for some
>> messages it needs to break that message down into a sub-list of
>> messages:
>>
>> A becomes A' B' C' D', then the desired order of messages becomes:
>>
>> A' B' C' D' B C D
>
> Just out of curiousity, why? If you want to break your messages into
What I'm doing is very similar to recursively rmdir-ing a directory hierarchy.
/foo/bar/baz
An actor receives a message to 'rmdir foo', but it can't because it
has subdirectories.
So it must turn 'rmdir foo' into 'rmdir foo/bar, rmdir foo'. But then
'rmdir foo/bar' has to be further divided.
> separate ones, couldn't you keep a separate queue you process? Like
> receive a message and if it's a message you want to break up into
> other messages, do it and put them on the end of a different list.
How do I maintain FIFO order across different lists (actors)? When a
message is broken up, the pieces may need to also be broken up, so now
I need a 3rd list, and so on. All these events need to maintain their
order.
>
>> I'd love to have a way to insert messages at the front of an actor's
>> mailbox, but right now I'm making due by draining the mailbox with
>> receiveWithin.
>>
>> I'm pulling all the messages out, sending the new messages, and then
>> re-sending the older messages.
>>
>> The profiler is showing a huge amount of time spent in
>> scala.runtime.BoxesRunTime.equals(Object, Object), inside
>> MessageQueue.extractFirst, inside receiveWithin.
>>
>> The mailbox can get pretty big, such as 50k messages.
>>
>> Any thoughts on a better strategy for getting the new messages at the
>> front of the mailbox? Is using receiveWithin in a loop to grab all the
>> messages as bad an idea as it appears to be?
>
> Wouldn't you get race conditions this way when you clear the box, and
> your class receives items before you refill it?
There is a race of sorts here, yes. In my case, messages sent to the
actor from the outside are "top level" events that have no relative
order. Only when the events are broken up into pieces does the order
start to matter. The pieces are only broken up from within the actor,
so there is a guarantee that only events that were already in the
mailbox when the actor runs have to happen before any of the new
events created.
That may not be clear but yes I understand that new messages could be
added to the mailbox from the outside between the time I "drain" it
and add my new events. This isn't a problem.
Sat, 2009-07-25, 02:27
#3
Re: Actor receiveWithin too slow
I have changed my approach to avoid having to reorder the mailbox.
When I have a message that needs to be subdivided, I recursively
subdivide it up front so the messages are sent to the actor in their
final "unit" size and order.
Where the old code spent over an hour subdividing an event into 50,000
messages, this new code has been running for about 30 minutes and
mailbox has grown to 8,000,000 messages.
On Fri, Jul 24, 2009 at 4:25 PM, J Robert Ray wrote:
> I have an actor that gets messages,
>
> A B C D
>
> and needs these messages processed in FIFO order, but for some
> messages it needs to break that message down into a sub-list of
> messages:
>
> A becomes A' B' C' D', then the desired order of messages becomes:
>
> A' B' C' D' B C D
>
> maintaining the FIFO order for messages in the mailbox. This would be wrong:
>
> B C D A' B' C' D'
>
> but this is the order you get if the actor sends itself the subdivided
> messages; they get appended to the end.
>
> I'd love to have a way to insert messages at the front of an actor's
> mailbox, but right now I'm making due by draining the mailbox with
> receiveWithin.
>
> I'm pulling all the messages out, sending the new messages, and then
> re-sending the older messages.
>
> The profiler is showing a huge amount of time spent in
> scala.runtime.BoxesRunTime.equals(Object, Object), inside
> MessageQueue.extractFirst, inside receiveWithin.
>
> The mailbox can get pretty big, such as 50k messages.
>
> Any thoughts on a better strategy for getting the new messages at the
> front of the mailbox? Is using receiveWithin in a loop to grab all the
> messages as bad an idea as it appears to be?
>
Sat, 2009-07-25, 08:57
#4
Re: Re: Actor receiveWithin too slow
Why don' you have one actor that potentially splits messages and send them to another actor that processess them?
Sat, 2009-07-25, 10:57
#5
Re: Re: Actor receiveWithin too slow
On Sat, Jul 25, 2009 at 12:47 AM, Viktor Klang wrote:
>
> Why don' you have one actor that potentially splits messages and send them to another actor that processess them?
Because I don't see a way to maintain the proper order of events. I
reached out to the list in the hopes that someone would smack me
upside the head and get me to see a way to do it.
I have two message types:
case class Splittable(path: String)
case class Unsplittable(path: String)
The Splittable message may be split up into multiple sub-Splittable
messages, plus one Unsplittable post-process that represents the same
work as the original Splittable.
Only Splittable messages come into the actor from the outside. A
Splittable message can elect to perform the work, or split itself up.
Kick off the work:
act ! Splittable("/A") - msg #1
act decides #1 is too big to process and wants to subdivide the work,
so it carves it up into:
Splittable("/A/A") #2
Splittable("/A/B") #3
Splittable("/A/C") #4
Unsplittable("/A") #5
#5 must be run after #2-#4 otherwise splitting #1 up was pointless.
Consider these messages to represent "rm -rf" jobs.
So, do I send #2-#4 back to act, and send #5 to another actor? I can't
control the order anymore and #5 is likely to be run before #2-#4.
There is an implicit relationship between #2-#4 and #5, and the only
thing respecting that relationship is the order of the messages in a
single actor's mailbox.
When the actor processes #2, it is not aware that #5 exists. Nor is it
practical to make #2 aware of #5, or the other way around, because...
Messages #2-#4 are all splittable and may be further split. Say #2 gets split:
Splittable("/A/A/A") #6
Splittable("/A/A/B") #7
Splittable("/A/A/C") #8
Unsplittable("/A/A") #9
#9 must be run after #6-#8, and #5 must be run after #9.
When an event is split, the new set of messages need to take the place
of the split event in the overall order.
The actor can't just split up the messages and send them to itself,
because then #6-#9 would run after #5, meaning "rm -rf /A" would run
before "rm -rf /A/A".
Here's the progression of the mailbox:
Splittable("/A") #1
-split-
Splittable("/A/A") #2
Splittable("/A/B") #3
Splittable("/A/C") #4
Unsplittable("/A") #5
-split-
Splittable("/A/A/A") #6
Splittable("/A/A/B") #7
Splittable("/A/A/C") #8
Unsplittable("/A/A") #9
Splittable("/A/B") #3
Splittable("/A/C") #4
Unsplittable("/A") #5
One level of splitting is easy to deal with, but multiple levels of
splitting becomes tricky. I don't see a way to manage this with
multiple actors. I see it as needing a way to efficiently insert
messages at the front of the mailbox.
>
Sat, 2009-07-25, 11:37
#6
Re: Re: Actor receiveWithin too slow
On Sat, Jul 25, 2009 at 11:50 AM, J Robert Ray <jrobertray@gmail.com> wrote:
On Sat, Jul 25, 2009 at 12:47 AM, Viktor Klang <viktor.klang@gmail.com> wrote:
>
> Why don' you have one actor that potentially splits messages and send them to another actor that processess them?
Because I don't see a way to maintain the proper order of events. I
reached out to the list in the hopes that someone would smack me
upside the head and get me to see a way to do it.
Ah, so you're saying that the events are recursively splittable. So there's not just one pass at taking a Splittable and turning it into x Unsplittables?
Wouldn't it be possible to process it like this:
1. recieve task
2. recursively generate a sequence of Unsplittable tasks from that task
3. either process the sequence in the actor, or send it to another actor for processing ( that actor recieves on case s : Seq[Task] )
I have two message types:
case class Splittable(path: String)
case class Unsplittable(path: String)
The Splittable message may be split up into multiple sub-Splittable
messages, plus one Unsplittable post-process that represents the same
work as the original Splittable.
Only Splittable messages come into the actor from the outside. A
Splittable message can elect to perform the work, or split itself up.
Kick off the work:
act ! Splittable("/A") - msg #1
act decides #1 is too big to process and wants to subdivide the work,
so it carves it up into:
Splittable("/A/A") #2
Splittable("/A/B") #3
Splittable("/A/C") #4
Unsplittable("/A") #5
#5 must be run after #2-#4 otherwise splitting #1 up was pointless.
Consider these messages to represent "rm -rf" jobs.
So, do I send #2-#4 back to act, and send #5 to another actor? I can't
control the order anymore and #5 is likely to be run before #2-#4.
There is an implicit relationship between #2-#4 and #5, and the only
thing respecting that relationship is the order of the messages in a
single actor's mailbox.
When the actor processes #2, it is not aware that #5 exists. Nor is it
practical to make #2 aware of #5, or the other way around, because...
Messages #2-#4 are all splittable and may be further split. Say #2 gets split:
Splittable("/A/A/A") #6
Splittable("/A/A/B") #7
Splittable("/A/A/C") #8
Unsplittable("/A/A") #9
#9 must be run after #6-#8, and #5 must be run after #9.
When an event is split, the new set of messages need to take the place
of the split event in the overall order.
The actor can't just split up the messages and send them to itself,
because then #6-#9 would run after #5, meaning "rm -rf /A" would run
before "rm -rf /A/A".
Here's the progression of the mailbox:
Splittable("/A") #1
-split-
Splittable("/A/A") #2
Splittable("/A/B") #3
Splittable("/A/C") #4
Unsplittable("/A") #5
-split-
Splittable("/A/A/A") #6
Splittable("/A/A/B") #7
Splittable("/A/A/C") #8
Unsplittable("/A/A") #9
Splittable("/A/B") #3
Splittable("/A/C") #4
Unsplittable("/A") #5
One level of splitting is easy to deal with, but multiple levels of
splitting becomes tricky. I don't see a way to manage this with
multiple actors. I see it as needing a way to efficiently insert
messages at the front of the mailbox.
>
> -- Viktor
>
> On Jul 25, 2009 3:23 AM, "J Robert Ray" <jrobertray@gmail.com> wrote:
>
> I have changed my approach to avoid having to reorder the mailbox.
> When I have a message that needs to be subdivided, I recursively
> subdivide it up front so the messages are sent to the actor in their
> final "unit" size and order.
>
> Where the old code spent over an hour subdividing an event into 50,000
> messages, this new code has been running for about 30 minutes and
> mailbox has grown to 8,000,000 messages.
>
> On Fri, Jul 24, 2009 at 4:25 PM, J Robert Ray<jrobertray@gmail.com> wrote: > I have an actor that g...
--
Viktor Klang
Rogue Scala-head
Blog: klangism.blogspot.com
Twttr: viktorklang
> A B C D
>
> and needs these messages processed in FIFO order, but for some
> messages it needs to break that message down into a sub-list of
> messages:
>
> A becomes A' B' C' D', then the desired order of messages becomes:
>
> A' B' C' D' B C D
Just out of curiousity, why? If you want to break your messages into
separate ones, couldn't you keep a separate queue you process? Like
receive a message and if it's a message you want to break up into
other messages, do it and put them on the end of a different list.
> I'd love to have a way to insert messages at the front of an actor's
> mailbox, but right now I'm making due by draining the mailbox with
> receiveWithin.
>
> I'm pulling all the messages out, sending the new messages, and then
> re-sending the older messages.
>
> The profiler is showing a huge amount of time spent in
> scala.runtime.BoxesRunTime.equals(Object, Object), inside
> MessageQueue.extractFirst, inside receiveWithin.
>
> The mailbox can get pretty big, such as 50k messages.
>
> Any thoughts on a better strategy for getting the new messages at the
> front of the mailbox? Is using receiveWithin in a loop to grab all the
> messages as bad an idea as it appears to be?
Wouldn't you get race conditions this way when you clear the box, and
your class receives items before you refill it?
Josh