Re: Time to take a long hard look at Actor design and implementation

Mon, 2009-05-18, 10:37

#1

David Pollak

Joined: 2008-12-16,

Time to take a long hard look at Actor design and implementation

Folks,

Scala's Actors are a very important to the language, both as a means of showing off Scala's amazing syntactic flexibility and as an excellent was to do concurrency. I started using Actors in Lift more than 2 years ago and am a fan of the concept. Lately, however, the Actor code base has gotten buggy. More generally, with a couple of years of experience with Actors, the slavish adherence to the Erlang Actor model, particularly in light of no OTP-style libary, just doesn't make sense. I think it's time to take a long hard look at the Actor libraries, both in terms of implementation and in terms of design.

After more than 6 months of chasing down Actor-related memory retention problems, I added the following patch to Lift today:
object PointlessActorToWorkAroundBug extends Actor {
import scala.collection.mutable.HashSet
import java.lang.ref.Reference

def act = loop {
    react {
      case "ActorBug" =>
        try {
          import scala.collection.mutable.HashSet
          import java.lang.ref.Reference

          val agc = ActorGC
          agc.synchronized {
            val rsf = agc.getClass.getDeclaredField("refSet")
            rsf.setAccessible(true)
            rsf.get(agc) match {
              case h: HashSet[Reference[Object]] =>
                Log.trace("[MEMDEBUG] got the actor refSet... length: "+h.size)

                val nullRefs = h.elements.filter(f => f.get eq null).toList

                nullRefs.foreach(r => h -= r)

                val nonNull = h.elements.filter(f => f.get ne null).toList

                Log.trace("[MEMDEBUG] got the actor refSet... non null elems: "+
                          nonNull.size)

                nonNull.foreach{r =>
                  val a = r.get.getClass.getDeclaredField("exiting")
                  a.setAccessible(true)
                  if (a.getBoolean(r.get)) {
                    h -= r
                    r.clear
                  }
                }

                Log.trace("[MEMDEBUG] (again) got the actor refSet... length: "+h.size)

              case _ =>
            }
          }
        } catch {
          case e => Log.error("[MEMDEBUG] failure", e)
        }
        ping()

      case _ =>
    }
}

private def ctor() {
    this.start
    ping()
}

private def ping() {
    ActorPing.schedule(this, "ActorBug", 1 minute)
}

ctor()
}

This patch was necessary for a bunch of reasons:

The 2.7.x branch of Actors relies on WeakReference to clean up Actors. Well... in production that does not work. I've got megabytes of retained references pointed to by Weak References that are never garbage collected (or not GCed soon enough) and ultimately cause OOME.
Actors that are not linked do not mark themselves are exiting and thus hang out forever, creating another memory retention issue.

I have spent the last 2 days in the bowels of the Actor code and it seems that the state transitions are undocumented and indeterminate. Sometimes state transitions are caused by method invocation. Sometimes state transition is caused by catching particular magic exceptions. The ActorGC object retains a lot of references, but it's unclear why the ActorGC exists other than to ref-count the number of Actors in the system (wouldn't a finalizer be better for this?)

So, at a code level, it seems to me that it's time to do a major-league refactoring of the Actor code so that all the state transitions are documented and handled correctly and places where there is possible memory retention (any references from an object) are justified in the documentation.

At the design level, the current Actor library slavishly copies the Erlang Actor design. The most glaring problem is the lack of exception handling in Actors, except via "linking". The design ignores the fact that exceptions are much more common in JVM libraries than they are in Erlang. All exceptions are unchecked in Scala, yet many Java libraries use exceptions to convey information (e.g., Integer.parseInt). This leads to exceptions being thrown, Actors getting shut down, and no indication to the user. It's a very common problem in Lift-related Actor code to have exceptions cause Actors to terminate without any warning to the developer. So, as a design criteria, I strongly recommend that the Actor libraries support both Erlang and JVM styles of Exception management.

Scala's Actors have a bunch of nifty features including "!?" for send and wait for reply as well as "!!" to send and receive a Future. I'm sure that these features are useful in some applications. However, in order to implement these features, there's a lot of bloat in the Actor library. Each message includes return channels, etc. This machinery is unnecesary for "Simple Actors." I submit that these features should be separate and optional. If someone wants a "basic actor (on that only supports "!")", they should not be saddled with the heavy machinery required for the !?, !!, sender, etc. stuff.

So, to make my design recommendations concrete:

All Actors should have a method that returns a PartialFunction[Throwable, Unit] and that partial function should be consulted in the event of an unhandled exception from a react/receive. If the handler handles the exception, the Actor should continue to run.
An ActorLight class that's got an inbox and supports the "!" method. Messages are fast-tracked into the ActorLight's inbox and the Actor is scheduled. Nothing more.

I will file defects based on my review of the Actor code, but I wanted to let folks know about my thoughts. One final thought is that I am seriously considering writing an Actor library for Lift and dropping the use of Scala's Actor library. I'd prefer to build on top of Scala's libraries, but the Actor libraries really need to improve for 2.8.

Thanks,

David

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Mon, 2009-05-18, 10:37

#2

Philipp Haller

Joined: 2009-01-13,

Re: Time to take a long hard look at Actor design and implement

Hi David,

The issues you raise are very important, and I agree that we should
fix them as soon as possible.

David Pollak wrote:
> This patch was necessary for a bunch of reasons:
>
> * The 2.7.x branch of Actors relies on WeakReference to clean up
> Actors. Well... in production that does not work. I've got
> megabytes of retained references pointed to by Weak References
> that are never garbage collected (or not GCed soon enough) and
> ultimately cause OOME.
> * Actors that are not linked do not mark themselves are exiting and
> thus hang out forever, creating another memory retention issue.
>
> I have spent the last 2 days in the bowels of the Actor code and it
> seems that the state transitions are undocumented and indeterminate.
> Sometimes state transitions are caused by method invocation.
> Sometimes
> state transition is caused by catching particular magic exceptions. The
> ActorGC object retains a lot of references, but it's unclear why the
> ActorGC exists other than to ref-count the number of Actors in the
> system (wouldn't a finalizer be better for this?)

The ActorGC was added so that actors that become garbage are treated
as terminated even if they were waiting for a message. However, it
seems that WeakReferences are not a good way to do the ref-counting.
On the other hand, I recall reading in Java Concurrency in Practice
that finalizers are not that reliable either.

I suppose this means we have to make the usage of ActorGC optional. Or
come up with a solution that really works reliably.

> So, at a code level, it seems to me that it's time to do a major-league
> refactoring of the Actor code so that all the state transitions are
> documented and handled correctly and places where there is possible
> memory retention (any references from an object) are justified in the
> documentation.

I agree. In fact, I have been talking to Erik Engbrecht earlier this
year and proposed to work together to merge some of his refactorings,
which address precisely this issue, into trunk.
I would really like to do some refactorings in this direction before 2.8.

> So, to make my design recommendations concrete:
>
> * All Actors should have a method that returns a
> PartialFunction[Throwable, Unit] and that partial function should
> be consulted in the event of an unhandled exception from a
> react/receive. If the handler handles the exception, the Actor
> should continue to run.> * An ActorLight class that's got
> an inbox and supports the "!"
> method. Messages are fast-tracked into the ActorLight's inbox and
> the Actor is scheduled. Nothing more.

I agree that handling exceptions only through linking is not ideal.

About your proposal: after the PartialFunction[Throwable, Unit] has
been applied to the unhandled exception, from which point should the
actor resume its execution? One possibility is to run its act method
from the beginning. On the other hand, the exception handler could
invoke act itself if that's the desired behaviour. What do you think?

An ActorLight could indeed be slightly more efficient by dropping the
reply channels. AFAICS it should be no problem to send a message from
an ActorLight to a normal Actor and vice versa.

> I will file defects based on my review of the Actor code, but I wanted
> to let folks know about my thoughts. One final thought is that I am
> seriously considering writing an Actor library for Lift and dropping the
> use of Scala's Actor library. I'd prefer to build on top of Scala's
> libraries, but the Actor libraries really need to improve for 2.8.

Your feedback is very much appreciated, David. I am working on
improved actors for 2.8. Your mail is a great summary of the most
important issues that we have to address.

Cheers,
Philipp

Mon, 2009-05-18, 10:57

#3

Rich Dougherty

Joined: 2008-08-20,

Re: Time to take a long hard look at Actor design and implemen

On Sat, May 16, 2009 at 5:52 AM, David Pollak
wrote:
> So, at a code level, it seems to me that it's time to do a major-league
> refactoring of the Actor code so that all the state transitions are
> documented and handled correctly and places where there is possible memory
> retention (any references from an object) are justified in the
> documentation.

Hi David

You may be interested in Erik Engbrecht's work at documenting actor states.

http://erikengbrecht.blogspot.com/2009/01/refactoring-scala-actors.html

I like the idea of a simpler actor implementation, supporting only
mailboxes, and acting as a kernel for a more complicated
implementation. I wonder also if there's functionality that can be
reused from Java 1.5's concurrency utils.

Cheers
Rich

--
http://www.richdougherty.com/

Mon, 2009-05-18, 23:47

#4

David Pollak

Joined: 2008-12-16,

Re: Time to take a long hard look at Actor design and implemen

On Sat, May 16, 2009 at 12:39 PM, <philipp.haller@epfl.ch> wrote:

Hi David,

The issues you raise are very important, and I agree that we should fix them as soon as possible.

David Pollak wrote:

This patch was necessary for a bunch of reasons:

* The 2.7.x branch of Actors relies on WeakReference to clean up
Actors. Well... in production that does not work. I've got
megabytes of retained references pointed to by Weak References
that are never garbage collected (or not GCed soon enough) and
ultimately cause OOME.
* Actors that are not linked do not mark themselves are exiting and
thus hang out forever, creating another memory retention issue.

I have spent the last 2 days in the bowels of the Actor code and it
seems that the state transitions are undocumented and indeterminate. Sometimes state transitions are caused by method invocation. Sometimes
state transition is caused by catching particular magic exceptions. The
ActorGC object retains a lot of references, but it's unclear why the
ActorGC exists other than to ref-count the number of Actors in the
system (wouldn't a finalizer be better for this?)

The ActorGC was added so that actors that become garbage are treated as terminated even if they were waiting for a message. However, it seems that WeakReferences are not a good way to do the ref-counting. On the other hand, I recall reading in Java Concurrency in Practice that finalizers are not that reliable either.

I suppose this means we have to make the usage of ActorGC optional. Or come up with a solution that really works reliably.

Yes. I would opt for optional. The lighter weight we can make Actors, the better.

So, at a code level, it seems to me that it's time to do a major-league
refactoring of the Actor code so that all the state transitions are
documented and handled correctly and places where there is possible
memory retention (any references from an object) are justified in the
documentation.

I agree. In fact, I have been talking to Erik Engbrecht earlier this year and proposed to work together to merge some of his refactorings, which address precisely this issue, into trunk.
I would really like to do some refactorings in this direction before 2.8.

That'd be great. I'd also like to see this stuff in 2.8.

So, to make my design recommendations concrete:

* All Actors should have a method that returns a
PartialFunction[Throwable, Unit] and that partial function should
be consulted in the event of an unhandled exception from a
react/receive. If the handler handles the exception, the Actor
should continue to run.> * An ActorLight class that's got an inbox and supports the "!"
method. Messages are fast-tracked into the ActorLight's inbox and
the Actor is scheduled. Nothing more.

I agree that handling exceptions only through linking is not ideal.

About your proposal: after the PartialFunction[Throwable, Unit] has been applied to the unhandled exception, from which point should the actor resume its execution?

One possibility is to run its act method from the beginning. On the other hand, the exception handler could invoke act itself if that's the desired behaviour. What do you think?

I think the handler should invoke act. But if the Actor is running in loop{}. it should just run the next item in the mailbox.

An ActorLight could indeed be slightly more efficient by dropping the reply channels. AFAICS it should be no problem to send a message from an ActorLight to a normal Actor and vice versa.

Right... the target of the Actor is the thing that choose the "weight" of the message send. So, if you're sending to an Actor, then the Actor has "sender" and "reply" in scope. If you're sending to an ActorLight, the ActorLight doesn't get a "sender" or a "reply" method in its handler.

I will file defects based on my review of the Actor code, but I wanted
to let folks know about my thoughts. One final thought is that I am
seriously considering writing an Actor library for Lift and dropping the
use of Scala's Actor library. I'd prefer to build on top of Scala's
libraries, but the Actor libraries really need to improve for 2.8.

Your feedback is very much appreciated, David. I am working on improved actors for 2.8. Your mail is a great summary of the most important issues that we have to address.

Cool.

I also found out today that the ForkJoin library was a source of memory retention. I've written a simple scheduler that runs on top of the java.util.concurrent code. It seems to work just fine.

Thanks,

David

Cheers,
Philipp

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Tue, 2009-05-19, 12:07

#5

Erik Engbrecht

Joined: 2008-12-19,

Re: Time to take a long hard look at Actor design and implemen

David, In trunk there is one ActorGC per scheduler rather than it being a singleton. For lift, I think what you need is a custom scheduler that is started up and shut down by the servlet container on application load/unload (I forget the name of the hooks) rather than being magical about counting actors.
There's a ticket you might want to follow about making Schedulers more "pluggable:"http://lampsvn.epfl.ch/trac/scala/ticket/1405
Most of what's in the patch is in trunk.
If you look at the actor_state_machine branch here:http://bitbucket.org/eengbrec/scala-enhancements/
I'm trying to make it so actors are more modular and the user can replace components by having two Actor implementations (the current one in scala.actors and my state-based one in scalax.actors). The default implementations need to be sensible for an unknown environment, but a framework like lift could provide it's own that generally reduce the magic and make it more suitable for its use cases. I'm working on that, along with breaking things down into a more gradual patch series for Philipp. I haven't had much time lately, so progress has been slow, but I expect it to pickup.
-Erik
On Mon, May 18, 2009 at 6:30 PM, David Pollak <feeder.of.the.bears@gmail.com> wrote:

On Sat, May 16, 2009 at 12:39 PM, <philipp.haller@epfl.ch> wrote:

Hi David,

The issues you raise are very important, and I agree that we should fix them as soon as possible.

David Pollak wrote:

This patch was necessary for a bunch of reasons:

* The 2.7.x branch of Actors relies on WeakReference to clean up
Actors. Well... in production that does not work. I've got
megabytes of retained references pointed to by Weak References
that are never garbage collected (or not GCed soon enough) and
ultimately cause OOME.
* Actors that are not linked do not mark themselves are exiting and
thus hang out forever, creating another memory retention issue.

I have spent the last 2 days in the bowels of the Actor code and it
seems that the state transitions are undocumented and indeterminate. Sometimes state transitions are caused by method invocation. Sometimes
state transition is caused by catching particular magic exceptions. The
ActorGC object retains a lot of references, but it's unclear why the
ActorGC exists other than to ref-count the number of Actors in the
system (wouldn't a finalizer be better for this?)

The ActorGC was added so that actors that become garbage are treated as terminated even if they were waiting for a message. However, it seems that WeakReferences are not a good way to do the ref-counting. On the other hand, I recall reading in Java Concurrency in Practice that finalizers are not that reliable either.

I suppose this means we have to make the usage of ActorGC optional. Or come up with a solution that really works reliably.

Yes. I would opt for optional. The lighter weight we can make Actors, the better.

So, at a code level, it seems to me that it's time to do a major-league
refactoring of the Actor code so that all the state transitions are
documented and handled correctly and places where there is possible
memory retention (any references from an object) are justified in the
documentation.

I agree. In fact, I have been talking to Erik Engbrecht earlier this year and proposed to work together to merge some of his refactorings, which address precisely this issue, into trunk.
I would really like to do some refactorings in this direction before 2.8.

That'd be great. I'd also like to see this stuff in 2.8.

So, to make my design recommendations concrete:

* All Actors should have a method that returns a
PartialFunction[Throwable, Unit] and that partial function should
be consulted in the event of an unhandled exception from a
react/receive. If the handler handles the exception, the Actor
should continue to run.> * An ActorLight class that's got an inbox and supports the "!"
method. Messages are fast-tracked into the ActorLight's inbox and
the Actor is scheduled. Nothing more.

I agree that handling exceptions only through linking is not ideal.

About your proposal: after the PartialFunction[Throwable, Unit] has been applied to the unhandled exception, from which point should the actor resume its execution?

One possibility is to run its act method from the beginning. On the other hand, the exception handler could invoke act itself if that's the desired behaviour. What do you think?

I think the handler should invoke act. But if the Actor is running in loop{}. it should just run the next item in the mailbox.

An ActorLight could indeed be slightly more efficient by dropping the reply channels. AFAICS it should be no problem to send a message from an ActorLight to a normal Actor and vice versa.

Right... the target of the Actor is the thing that choose the "weight" of the message send. So, if you're sending to an Actor, then the Actor has "sender" and "reply" in scope. If you're sending to an ActorLight, the ActorLight doesn't get a "sender" or a "reply" method in its handler.

I will file defects based on my review of the Actor code, but I wanted
to let folks know about my thoughts. One final thought is that I am
seriously considering writing an Actor library for Lift and dropping the
use of Scala's Actor library. I'd prefer to build on top of Scala's
libraries, but the Actor libraries really need to improve for 2.8.

Your feedback is very much appreciated, David. I am working on improved actors for 2.8. Your mail is a great summary of the most important issues that we have to address.

Cool.

I also found out today that the ForkJoin library was a source of memory retention. I've written a simple scheduler that runs on top of the java.util.concurrent code. It seems to work just fine.

Thanks,

David

Cheers,
Philipp

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

--
http://erikengbrecht.blogspot.com/

Tue, 2009-05-19, 14:27

#6

David Pollak

Joined: 2008-12-16,

Re: Time to take a long hard look at Actor design and implemen

Erik,

I appreciate making stuff plugable and generally refectoring.

On Tue, May 19, 2009 at 4:01 AM, Erik Engbrecht <erik.engbrecht@gmail.com> wrote:

David, In trunk there is one ActorGC per scheduler rather than it being a singleton. For lift, I think what you need is a custom scheduler that is started up and shut down by the servlet container on application load/unload (I forget the name of the hooks) rather than being magical about counting actors.
There's a ticket you might want to follow about making Schedulers more "pluggable:"http://lampsvn.epfl.ch/trac/scala/ticket/1405
Most of what's in the patch is in trunk.
If you look at the actor_state_machine branch here:http://bitbucket.org/eengbrec/scala-enhancements/
I'm trying to make it so actors are more modular and the user can replace components by having two Actor implementations (the current one in scala.actors and my state-based one in scalax.actors). The default implementations need to be sensible for an unknown environment, but a framework like lift could provide it's own that generally reduce the magic and make it more suitable for its use cases. I'm working on that, along with breaking things down into a more gradual patch series for Philipp. I haven't had much time lately, so progress has been slow, but I expect it to pickup.

I'm actually not looking for anything special other than for the Actors to not have memory leaks. The only reason that I've plugged in my own scheduler is that the current scheduler has memory leaks and gets into some nasty CPU consumption patterns.

So, my goal is not flexibility, but is to have reliable Actors.

I'll look through your stuff and I'll see if I can plug some of it into Lift and see if the default settings make the memory leaks and CPU consumption problems go away.

Thanks,

David

-Erik
On Mon, May 18, 2009 at 6:30 PM, David Pollak <feeder.of.the.bears@gmail.com> wrote:

On Sat, May 16, 2009 at 12:39 PM, <philipp.haller@epfl.ch> wrote:

Hi David,

The issues you raise are very important, and I agree that we should fix them as soon as possible.

David Pollak wrote:

This patch was necessary for a bunch of reasons:

* The 2.7.x branch of Actors relies on WeakReference to clean up
Actors. Well... in production that does not work. I've got
megabytes of retained references pointed to by Weak References
that are never garbage collected (or not GCed soon enough) and
ultimately cause OOME.
* Actors that are not linked do not mark themselves are exiting and
thus hang out forever, creating another memory retention issue.

I have spent the last 2 days in the bowels of the Actor code and it
seems that the state transitions are undocumented and indeterminate. Sometimes state transitions are caused by method invocation. Sometimes
state transition is caused by catching particular magic exceptions. The
ActorGC object retains a lot of references, but it's unclear why the
ActorGC exists other than to ref-count the number of Actors in the
system (wouldn't a finalizer be better for this?)

The ActorGC was added so that actors that become garbage are treated as terminated even if they were waiting for a message. However, it seems that WeakReferences are not a good way to do the ref-counting. On the other hand, I recall reading in Java Concurrency in Practice that finalizers are not that reliable either.

I suppose this means we have to make the usage of ActorGC optional. Or come up with a solution that really works reliably.

Yes. I would opt for optional. The lighter weight we can make Actors, the better.

So, at a code level, it seems to me that it's time to do a major-league
refactoring of the Actor code so that all the state transitions are
documented and handled correctly and places where there is possible
memory retention (any references from an object) are justified in the
documentation.

I agree. In fact, I have been talking to Erik Engbrecht earlier this year and proposed to work together to merge some of his refactorings, which address precisely this issue, into trunk.
I would really like to do some refactorings in this direction before 2.8.

That'd be great. I'd also like to see this stuff in 2.8.

So, to make my design recommendations concrete:

* All Actors should have a method that returns a
PartialFunction[Throwable, Unit] and that partial function should
be consulted in the event of an unhandled exception from a
react/receive. If the handler handles the exception, the Actor
should continue to run.> * An ActorLight class that's got an inbox and supports the "!"
method. Messages are fast-tracked into the ActorLight's inbox and
the Actor is scheduled. Nothing more.

I agree that handling exceptions only through linking is not ideal.

About your proposal: after the PartialFunction[Throwable, Unit] has been applied to the unhandled exception, from which point should the actor resume its execution?

One possibility is to run its act method from the beginning. On the other hand, the exception handler could invoke act itself if that's the desired behaviour. What do you think?

I think the handler should invoke act. But if the Actor is running in loop{}. it should just run the next item in the mailbox.

An ActorLight could indeed be slightly more efficient by dropping the reply channels. AFAICS it should be no problem to send a message from an ActorLight to a normal Actor and vice versa.

Right... the target of the Actor is the thing that choose the "weight" of the message send. So, if you're sending to an Actor, then the Actor has "sender" and "reply" in scope. If you're sending to an ActorLight, the ActorLight doesn't get a "sender" or a "reply" method in its handler.

I will file defects based on my review of the Actor code, but I wanted
to let folks know about my thoughts. One final thought is that I am
seriously considering writing an Actor library for Lift and dropping the
use of Scala's Actor library. I'd prefer to build on top of Scala's
libraries, but the Actor libraries really need to improve for 2.8.

Your feedback is very much appreciated, David. I am working on improved actors for 2.8. Your mail is a great summary of the most important issues that we have to address.

Cool.

I also found out today that the ForkJoin library was a source of memory retention. I've written a simple scheduler that runs on top of the java.util.concurrent code. It seems to work just fine.

Thanks,

David

Cheers,
Philipp

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

--
http://erikengbrecht.blogspot.com/

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Tue, 2009-05-19, 15:07

#7

Colin Bullock

Joined: 2009-01-23,

Re: Time to take a long hard look at Actor design and implemen

I also found out today that the ForkJoin library was a source of memory retention. I've written a simple scheduler that runs on top of the java.util.concurrent code. It seems to work just fine.

I also found this to be the case a while back (if I remember correctly, FJTaskRunner.VolatileTaskRef was the cuplrit). At some point I had a copy of scala with the Scheduler implementation rewritten using the latest JSR166-y jars, and it seemed to vastly improve both performance and memory usage. Unfortunately, this depends on a number of JDK6 features, and relies heavily on sun.misc.Unsafe as it is currently implemented.

I am far (very far) from an expert in the area of concurrency, but I would be more than happy to share a patch if anyone is interested (and assuming I can dig up wherever I left it).

- Colin

Tue, 2009-05-19, 15:17

#8

Erik Engbrecht

Joined: 2008-12-19,

Re: Time to take a long hard look at Actor design and implemen

David,
In order for actors to be reliable they have to play well in their environment without knowing hardly anything about it. For example, ActorGC exists so that the scheduler doesn't need to be explicitly shut down. I think it needs to have all the memory leaks driven out of it, but even when that happens I think it still may introduce too much delay in the garbage collection of actors and/or have other weird edge conditions. ActorGC is necessary for actor to enable the implicit startup/shutdown of the scheduler and worker threads, but not all environments require an implicit startup/shutdown. Given that ActorGC relies on the garbage collector, and the gc is inherently rather magical, ActorGC itself becomes rather non-deterministic.

Consequently, if you remove ActorGC and replace it with an explicit startup/shutdown of the scheduler the resulting behavior becomes much more deterministic. That makes it less likely to introduce problems and it also makes it easier to debug when problems occur, and thus leads to more robust software.

Basically in order to make things like ActorGC optional w/o introducing lots of spaghetti or code duplicate the library needs to be more componentized, and it is headed in that direction.

-Erik

On Tue, May 19, 2009 at 9:24 AM, David Pollak <feeder.of.the.bears@gmail.com> wrote:

Erik,

I appreciate making stuff plugable and generally refectoring.

On Tue, May 19, 2009 at 4:01 AM, Erik Engbrecht <erik.engbrecht@gmail.com> wrote:

David, In trunk there is one ActorGC per scheduler rather than it being a singleton. For lift, I think what you need is a custom scheduler that is started up and shut down by the servlet container on application load/unload (I forget the name of the hooks) rather than being magical about counting actors.
There's a ticket you might want to follow about making Schedulers more "pluggable:"http://lampsvn.epfl.ch/trac/scala/ticket/1405
Most of what's in the patch is in trunk.
If you look at the actor_state_machine branch here:http://bitbucket.org/eengbrec/scala-enhancements/
I'm trying to make it so actors are more modular and the user can replace components by having two Actor implementations (the current one in scala.actors and my state-based one in scalax.actors). The default implementations need to be sensible for an unknown environment, but a framework like lift could provide it's own that generally reduce the magic and make it more suitable for its use cases. I'm working on that, along with breaking things down into a more gradual patch series for Philipp. I haven't had much time lately, so progress has been slow, but I expect it to pickup.

I'm actually not looking for anything special other than for the Actors to not have memory leaks. The only reason that I've plugged in my own scheduler is that the current scheduler has memory leaks and gets into some nasty CPU consumption patterns.

So, my goal is not flexibility, but is to have reliable Actors.

I'll look through your stuff and I'll see if I can plug some of it into Lift and see if the default settings make the memory leaks and CPU consumption problems go away.

Thanks,

David

-Erik
On Mon, May 18, 2009 at 6:30 PM, David Pollak <feeder.of.the.bears@gmail.com> wrote:

On Sat, May 16, 2009 at 12:39 PM, <philipp.haller@epfl.ch> wrote:

Hi David,

The issues you raise are very important, and I agree that we should fix them as soon as possible.

David Pollak wrote:

This patch was necessary for a bunch of reasons:

* The 2.7.x branch of Actors relies on WeakReference to clean up
Actors. Well... in production that does not work. I've got
megabytes of retained references pointed to by Weak References
that are never garbage collected (or not GCed soon enough) and
ultimately cause OOME.
* Actors that are not linked do not mark themselves are exiting and
thus hang out forever, creating another memory retention issue.

I have spent the last 2 days in the bowels of the Actor code and it
seems that the state transitions are undocumented and indeterminate. Sometimes state transitions are caused by method invocation. Sometimes
state transition is caused by catching particular magic exceptions. The
ActorGC object retains a lot of references, but it's unclear why the
ActorGC exists other than to ref-count the number of Actors in the
system (wouldn't a finalizer be better for this?)

The ActorGC was added so that actors that become garbage are treated as terminated even if they were waiting for a message. However, it seems that WeakReferences are not a good way to do the ref-counting. On the other hand, I recall reading in Java Concurrency in Practice that finalizers are not that reliable either.

I suppose this means we have to make the usage of ActorGC optional. Or come up with a solution that really works reliably.

Yes. I would opt for optional. The lighter weight we can make Actors, the better.

So, at a code level, it seems to me that it's time to do a major-league
refactoring of the Actor code so that all the state transitions are
documented and handled correctly and places where there is possible
memory retention (any references from an object) are justified in the
documentation.

I agree. In fact, I have been talking to Erik Engbrecht earlier this year and proposed to work together to merge some of his refactorings, which address precisely this issue, into trunk.
I would really like to do some refactorings in this direction before 2.8.

That'd be great. I'd also like to see this stuff in 2.8.

So, to make my design recommendations concrete:

* All Actors should have a method that returns a
PartialFunction[Throwable, Unit] and that partial function should
be consulted in the event of an unhandled exception from a
react/receive. If the handler handles the exception, the Actor
should continue to run.> * An ActorLight class that's got an inbox and supports the "!"
method. Messages are fast-tracked into the ActorLight's inbox and
the Actor is scheduled. Nothing more.

I agree that handling exceptions only through linking is not ideal.

About your proposal: after the PartialFunction[Throwable, Unit] has been applied to the unhandled exception, from which point should the actor resume its execution?

One possibility is to run its act method from the beginning. On the other hand, the exception handler could invoke act itself if that's the desired behaviour. What do you think?

I think the handler should invoke act. But if the Actor is running in loop{}. it should just run the next item in the mailbox.

An ActorLight could indeed be slightly more efficient by dropping the reply channels. AFAICS it should be no problem to send a message from an ActorLight to a normal Actor and vice versa.

Right... the target of the Actor is the thing that choose the "weight" of the message send. So, if you're sending to an Actor, then the Actor has "sender" and "reply" in scope. If you're sending to an ActorLight, the ActorLight doesn't get a "sender" or a "reply" method in its handler.

I will file defects based on my review of the Actor code, but I wanted
to let folks know about my thoughts. One final thought is that I am
seriously considering writing an Actor library for Lift and dropping the
use of Scala's Actor library. I'd prefer to build on top of Scala's
libraries, but the Actor libraries really need to improve for 2.8.

Your feedback is very much appreciated, David. I am working on improved actors for 2.8. Your mail is a great summary of the most important issues that we have to address.

Cool.

I also found out today that the ForkJoin library was a source of memory retention. I've written a simple scheduler that runs on top of the java.util.concurrent code. It seems to work just fine.

Thanks,

David

Cheers,
Philipp

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

--
http://erikengbrecht.blogspot.com/

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

--
http://erikengbrecht.blogspot.com/

Tue, 2009-05-19, 15:27

#9

Erik Engbrecht

Joined: 2008-12-19,

Re: Time to take a long hard look at Actor design and implemen

Colin,
Are the dependencies on JDK6 or JDK5 features? A lot of the java.util.concurrent stuff came with JDK5, which is OK to use.

I think it would be useful if we had a set of benchmarks to compare alternative implementations on various use cases. My observation is that the methods that perform best of for one usage pattern tend to fall down on others. If we had a good collection of them then we could at least make solid empirical statements about the effects of changes. If the current FJ* implementation isn't providing any advantages over stock JDK5 stuff then I think it should be stripped out, because it contains a lot of complex code. But I'm not willing to say that is doesn't have advantages without a broad set of benchmarks for comparison.

-Erik

On Tue, May 19, 2009 at 10:03 AM, Colin Bullock <cmbullock@gmail.com> wrote:

I also found out today that the ForkJoin library was a source of memory retention. I've written a simple scheduler that runs on top of the java.util.concurrent code. It seems to work just fine.

I also found this to be the case a while back (if I remember correctly, FJTaskRunner.VolatileTaskRef was the cuplrit). At some point I had a copy of scala with the Scheduler implementation rewritten using the latest JSR166-y jars, and it seemed to vastly improve both performance and memory usage. Unfortunately, this depends on a number of JDK6 features, and relies heavily on sun.misc.Unsafe as it is currently implemented.

I am far (very far) from an expert in the area of concurrency, but I would be more than happy to share a patch if anyone is interested (and assuming I can dig up wherever I left it).

- Colin

--
http://erikengbrecht.blogspot.com/

Tue, 2009-05-19, 15:27

#10

Colin Bullock

Joined: 2009-01-23,

Re: Time to take a long hard look at Actor design and implemen

On Tue, May 19, 2009 at 9:15 AM, Erik Engbrecht <erik.engbrecht@gmail.com> wrote:

Colin,
Are the dependencies on JDK6 or JDK5 features? A lot of the java.util.concurrent stuff came with JDK5, which is OK to use.

JDK6. There aren't a great number of dependencies (though I can't remember specifically what off the top of my head), but I ran up against the wall that is my limited knowledge at the point where it was using Unsafe to access a private field in java.lang.Thread that appeared not to exist in JDK5.

I think it would be useful if we had a set of benchmarks to compare alternative implementations on various use cases. My observation is that the methods that perform best of for one usage pattern tend to fall down on others. If we had a good collection of them then we could at least make solid empirical statements about the effects of changes. If the current FJ* implementation isn't providing any advantages over stock JDK5 stuff then I think it should be stripped out, because it contains a lot of complex code. But I'm not willing to say that is doesn't have advantages without a broad set of benchmarks for comparison.

I completely agree. I'm a bit swamped with day-job stuff at the moment, but I'll see if I can dig up my code and post a patch or branch at some point in the next week.

- Colin

Tue, 2009-05-19, 15:37

#11

loverdos

Joined: 2008-11-18,

Re: Time to take a long hard look at Actor design and implemen

There is high probability that sun.misc.Unsafe exists anywhere.
The reason I say so, is that we recently had a discussion on the simple-build-tool list regarding sun.misc.Signal{Handler} and I did some researching. I found that JVMs of: IBM (on AIX), Apple (on Macs), HP (on HP-UX, although I am not sure HP maintains a VM of its own, I just tested for the presence of the feature) and the soylatte & openJDK VMs (on Macs) did support the feature. Since Sun provides VMs for Windows/Solaris/Linux, I could bet it is there too. The only place I have not searched is BSDs, but that is because a friend of mine keeps ignoring my emails ( :) )...
I just thought to share my (related?) findings :)
Christos

On Tue, May 19, 2009 at 17:03, Colin Bullock <cmbullock@gmail.com> wrote:

I also found out today that the ForkJoin library was a source of memory retention. I've written a simple scheduler that runs on top of the java.util.concurrent code. It seems to work just fine.

I also found this to be the case a while back (if I remember correctly, FJTaskRunner.VolatileTaskRef was the cuplrit). At some point I had a copy of scala with the Scheduler implementation rewritten using the latest JSR166-y jars, and it seemed to vastly improve both performance and memory usage. Unfortunately, this depends on a number of JDK6 features, and relies heavily on sun.misc.Unsafe as it is currently implemented.

I am far (very far) from an expert in the area of concurrency, but I would be more than happy to share a patch if anyone is interested (and assuming I can dig up wherever I left it).

- Colin

--
__~O
-\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com

Tue, 2009-05-19, 15:47

#12

ijuma

Joined: 2008-08-20,

Re: Time to take a long hard look at Actor design and implement

On Tue, 2009-05-19 at 17:19 +0300, Christos KK Loverdos wrote:
> There is high probability that sun.misc.Unsafe exists anywhere.

But not all of its methods. Part of an email from Doug Lea to the
concurrency mailing list about Unsafe:

"Even though not officially spec'ed anywhere,
these Unsafe methods are for the most part supported in
the same way across Sun, IBM, etc JVMs, because in most
cases there aren't any other good choices for how to do it.
However, for some versions of IBM J9, they apparently did find
some other way to support the lazySet methods (I believe this
on x86 only, PowerPC J9 seems to have them)."

Best,
Ismael

Tue, 2009-05-19, 15:57

#13

Colin Bullock

Joined: 2009-01-23,

Re: Time to take a long hard look at Actor design and implemen

In the JSR166 library, Unsafe is mostly used for ordered puts and CAS operations. I can personally confirm that it works fine on the Apple JVM, in addition to Sun's.

- Colin

Tue, 2009-05-19, 16:07

#14

Erik Engbrecht

Joined: 2008-12-19,

Re: Time to take a long hard look at Actor design and implemen

Shouldn't we just use the public, standardized, high level APIs like Executer*, Atomic*, etc. instead of trying to do something super-low-level?

On Tue, May 19, 2009 at 10:39 AM, Colin Bullock <cmbullock@gmail.com> wrote:

In the JSR166 library, Unsafe is mostly used for ordered puts and CAS operations. I can personally confirm that it works fine on the Apple JVM, in addition to Sun's.

- Colin

--
http://erikengbrecht.blogspot.com/

Tue, 2009-05-19, 16:07

#15

ijuma

Joined: 2008-08-20,

Re: Time to take a long hard look at Actor design and implement

On Tue, 2009-05-19 at 09:39 -0500, Colin Bullock wrote:
> In the JSR166 library, Unsafe is mostly used for ordered puts and CAS
> operations. I can personally confirm that it works fine on the Apple
> JVM, in addition to Sun's.

Right, the message from Doug Lea I pasted was a reply to someone who had
issues running the Fork Join library in JSR166y with J9 on Linux.

Best,
Ismael

Tue, 2009-05-19, 17:07

#16

Erik Engbrecht

Joined: 2008-12-19,

Re: Time to take a long hard look at Actor design and implemen

Ahh...I missed the y at the end of JSR166y.

So we could have a scheduler based on:
(1) The JDK5 executor
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html

(2) JSR166y Fork-join extensions for JDK7 that require at least Sun's JDK6 (or a derivative) to work:
http://gee.cs.oswego.edu/dl/concurrency-interest/

(3) The existing scheduler based on an enhanced version of Doug Lea's FJ library for JDK4.

It seems that #2 should be done as a separate library, and perhaps the Scheduler object should parse a command line parameter specifying the class to use as the backing scheduler. As for the default scheduler, it seems like it has to be either #1 or an evolution of #3.

On Tue, May 19, 2009 at 10:47 AM, Erik Engbrecht <erik.engbrecht@gmail.com> wrote:

Shouldn't we just use the public, standardized, high level APIs like Executer*, Atomic*, etc. instead of trying to do something super-low-level?

On Tue, May 19, 2009 at 10:39 AM, Colin Bullock <cmbullock@gmail.com> wrote:

In the JSR166 library, Unsafe is mostly used for ordered puts and CAS operations. I can personally confirm that it works fine on the Apple JVM, in addition to Sun's.

- Colin

--
http://erikengbrecht.blogspot.com/

--
http://erikengbrecht.blogspot.com/

Wed, 2009-05-20, 09:57

#17

Rich Dougherty

Joined: 2008-08-20,

Re: Time to take a long hard look at Actor design and implemen

On Wed, May 20, 2009 at 1:24 AM, David Pollak
wrote:
> I'm actually not looking for anything special other than for the Actors to
> not have memory leaks. The only reason that I've plugged in my own
> scheduler is that the current scheduler has memory leaks and gets into some
> nasty CPU consumption patterns.
>
> So, my goal is not flexibility, but is to have reliable Actors.

Another potential source of leaks is the way that actors are stored in
ThreadLocals. This isn't such a problem for actors created with "actor
{ ... }", because the ThreadLocal is set and cleared by the scheduler,
but it is sometimes a problem for ActorProxies that are bound to
already-existing threads. The trouble is that these proxies stay bound
to the thread even beyond the scope where they are used. And threads
can stay alive for a long time in a servlet container.

The automatic binding is probably helpful for a user-friendly
high-level library. But if we're aiming for a lower-level library that
provides better control over resource use, it might make sense to make
the binding operation more explicit somehow.

For example, we could create a method "actorProxy" that binds a new
ActorProxy, runs an anonymous function with the bound proxy, then
unbinds the ActorProxy when the function exits. Calls to this method
could potentially be nested.

actorProxy {
// can use actor operations
}

There are actually already some methods for managing the ThreadLocal
('resetProxy' and 'clearSelf'), but they seem a bit ad hoc at the
moment.

Cheers
Rich
--
http://www.richdougherty.com/

Wed, 2009-05-20, 10:07

#18

Philipp Haller

Joined: 2009-01-13,

Re: Time to take a long hard look at Actor design and implemen

David Pollak wrote:
> This patch was necessary for a bunch of reasons:
>
> * The 2.7.x branch of Actors relies on WeakReference to clean up
> Actors. Well... in production that does not work. I've got
> megabytes of retained references pointed to by Weak References
> that are never garbage collected (or not GCed soon enough) and
> ultimately cause OOME.
> * Actors that are not linked do not mark themselves are
> exiting and
> thus hang out forever, creating another memory retention issue.

I just committed a fix for the second problem:

https://lampsvn.epfl.ch/trac/scala/ticket/2000

About the first problem: as Erik found out it is possible that the OOME
is due to an optimization when executing tasks by FJTaskRunners:

https://lampsvn.epfl.ch/trac/scala/ticket/1999

Therefore, we either have to disable the optimization, make it optional,
or make it less aggressive.

For 2.8 the best default may be to use a JDK 5 ThreadPoolExecutor, since
the JDK 7 ForkJoinPool does not run as-is on JDK 5 (a back port to JDK 5
would be very useful--provided it does not cripple performance).

> An ActorLight could indeed be slightly more efficient by dropping
> the reply channels. AFAICS it should be no problem to send a message
> from an ActorLight to a normal Actor and vice versa.
>
>
> Right... the target of the Actor is the thing that choose the "weight"
> of the message send. So, if you're sending to an Actor, then the Actor
> has "sender" and "reply" in scope. If you're sending to an ActorLight,
> the ActorLight doesn't get a "sender" or a "reply" method in its handler.

Exactly.

> I also found out today that the ForkJoin library was a source of memory
> retention. I've written a simple scheduler that runs on top of the
> java.util.concurrent code. It seems to work just fine.

OK, good.

Cheers,
Philipp

Wed, 2009-05-20, 10:17

#19

Philipp Haller

Joined: 2009-01-13,

Re: Time to take a long hard look at Actor design and implemen

Erik Engbrecht wrote:
> Ahh...I missed the y at the end of JSR166y.
>
> So we could have a scheduler based on:
> (1) The JDK5 executor
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolE...
>
> (2) JSR166y Fork-join extensions for JDK7 that require at least Sun's
> JDK6 (or a derivative) to work:
> http://gee.cs.oswego.edu/dl/concurrency-interest/
>
> (3) The existing scheduler based on an enhanced version of Doug Lea's FJ
> library for JDK4.
>
>
> It seems that #2 should be done as a separate library, and perhaps the
> Scheduler object should parse a command line parameter specifying the
> class to use as the backing scheduler. As for the default scheduler, it
> seems like it has to be either #1 or an evolution of #3.

That's also my current understanding.

Cheers,
Philipp

Wed, 2009-05-20, 13:17

#20

Erik Engbrecht

Joined: 2008-12-19,

Re: Time to take a long hard look at Actor design and implemen

Have you seen retained ActorProxies cause memory problems? It seems that they are pretty small relative to the memory footprint of the thread to which they are bound (if they weren't, event-based actors wouldn't be any where near as appealing).
The only thing I can think of is if you send the ActorProxy a lot of messages, most or all of which are ignored. You'd like those messages to go away. Now that I think about it, if you're doing that in, especially in a servlet container and assuming you process some of them, having those messages hanging around could interfere with the correctness of your program because you might pull a message that was received in a different logical scope..
I'll ponder this a bit...

On Wed, May 20, 2009 at 4:51 AM, Rich Dougherty <rich@rd.gen.nz> wrote:

On Wed, May 20, 2009 at 1:24 AM, David Pollak
<feeder.of.the.bears@gmail.com> wrote:
> I'm actually not looking for anything special other than for the Actors to
> not have memory leaks. The only reason that I've plugged in my own
> scheduler is that the current scheduler has memory leaks and gets into some
> nasty CPU consumption patterns.
>
> So, my goal is not flexibility, but is to have reliable Actors.

Another potential source of leaks is the way that actors are stored in
ThreadLocals. This isn't such a problem for actors created with "actor
{ ... }", because the ThreadLocal is set and cleared by the scheduler,
but it is sometimes a problem for ActorProxies that are bound to
already-existing threads. The trouble is that these proxies stay bound
to the thread even beyond the scope where they are used. And threads
can stay alive for a long time in a servlet container.

The automatic binding is probably helpful for a user-friendly
high-level library. But if we're aiming for a lower-level library that
provides better control over resource use, it might make sense to make
the binding operation more explicit somehow.

For example, we could create a method "actorProxy" that binds a new
ActorProxy, runs an anonymous function with the bound proxy, then
unbinds the ActorProxy when the function exits. Calls to this method
could potentially be nested.

actorProxy {
// can use actor operations
}

There are actually already some methods for managing the ThreadLocal
('resetProxy' and 'clearSelf'), but they seem a bit ad hoc at the
moment.

Cheers
Rich
--
http://www.richdougherty.com/

--
http://erikengbrecht.blogspot.com/

Wed, 2009-05-20, 17:47

#21

Rich Dougherty

Joined: 2008-08-20,

Re: Time to take a long hard look at Actor design and implemen

On Thu, May 21, 2009 at 12:07 AM, Erik Engbrecht
wrote:
> Have you seen retained ActorProxies cause memory problems? It seems that
> they are pretty small relative to the memory footprint of the thread to
> which they are bound (if they weren't, event-based actors wouldn't be any
> where near as appealing).

I think the problems I've had (which I can't remember exactly now)
were more to do with the ActorGC not running because of ActorProxies
hanging around. But here is a bug report of a memory leak:
http://lampsvn.epfl.ch/trac/scala/ticket/103

> The only thing I can think of is if you send the ActorProxy a lot of
> messages, most or all of which are ignored. You'd like those messages to go
> away. Now that I think about it, if you're doing that in, especially in a
> servlet container and assuming you process some of them, having those
> messages hanging around could interfere with the correctness of your program
> because you might pull a message that was received in a different logical
> scope..

Yes, that's another potential issue.

Cheers
Rich
--
http://blog.richdougherty.com/

Wed, 2009-05-20, 17:57

#22

David Pollak

Joined: 2008-12-16,

Re: Time to take a long hard look at Actor design and implemen

The whole ActorProxy issue goes back to remote systems and leases. I'm not sure one can solve the problem without leases and communication from the remote system.

On Wed, May 20, 2009 at 9:40 AM, Rich Dougherty <rich@rd.gen.nz> wrote:

On Thu, May 21, 2009 at 12:07 AM, Erik Engbrecht
<erik.engbrecht@gmail.com> wrote:
> Have you seen retained ActorProxies cause memory problems? It seems that
> they are pretty small relative to the memory footprint of the thread to
> which they are bound (if they weren't, event-based actors wouldn't be any
> where near as appealing).

I think the problems I've had (which I can't remember exactly now)
were more to do with the ActorGC not running because of ActorProxies
hanging around. But here is a bug report of a memory leak:
http://lampsvn.epfl.ch/trac/scala/ticket/103

> The only thing I can think of is if you send the ActorProxy a lot of
> messages, most or all of which are ignored. You'd like those messages to go
> away. Now that I think about it, if you're doing that in, especially in a
> servlet container and assuming you process some of them, having those
> messages hanging around could interfere with the correctness of your program
> because you might pull a message that was received in a different logical
> scope..

Yes, that's another potential issue.

Cheers
Rich
--
http://blog.richdougherty.com/

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Wed, 2009-05-20, 18:07

#23

Rich Dougherty

Joined: 2008-08-20,

Re: Time to take a long hard look at Actor design and implemen

On Thu, May 21, 2009 at 4:50 AM, David Pollak
wrote:
> The whole ActorProxy issue goes back to remote systems and leases. I'm not
> sure one can solve the problem without leases and communication from the
> remote system.

Hi David

There are two types of proxy in the actors library. The one I was
talking about was scala.actors.ActorProxy, which binds to local
threads. Are you thinking of scala.actors.remote.Proxy?

Cheers
RIch

Wed, 2009-05-20, 18:17

#24

David Pollak

Joined: 2008-12-16,

Re: Time to take a long hard look at Actor design and implemen

On Wed, May 20, 2009 at 10:00 AM, Rich Dougherty <rich@rd.gen.nz> wrote:

On Thu, May 21, 2009 at 4:50 AM, David Pollak
<feeder.of.the.bears@gmail.com> wrote:
> The whole ActorProxy issue goes back to remote systems and leases. I'm not
> sure one can solve the problem without leases and communication from the
> remote system.

Hi David

There are two types of proxy in the actors library. The one I was
talking about was scala.actors.ActorProxy, which binds to local
threads. Are you thinking of scala.actors.remote.Proxy?

Yeah... sorry.

My take on the ThreadLocal actors is that it's suboptimal. In the event of doing a ?!, you should get an Actor with a Receive block that exits when the receive is satisfied. The cost of creating a destroying an Actor is so low that it seems like a real waste to have them hang around on a ThreadLocal.

Cheers
RIch

--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Wed, 2009-05-20, 18:27

#25

Rich Dougherty

Joined: 2008-08-20,

Re: Time to take a long hard look at Actor design and implemen

On Thu, May 21, 2009 at 4:40 AM, Rich Dougherty wrote:
> I think the problems I've had (which I can't remember exactly now)
> were more to do with the ActorGC not running because of ActorProxies
> hanging around.

Here's one of the problems I was thinking of:
http://lampsvn.epfl.ch/trac/scala/ticket/1652. I've just re-opened it,
after realising that the solution I proposed didn't work. (Sorry
Philipp.)

The problem is that ActorGC needs to run a timer thread (to manage
TIMEOUTs). To allow the VM to exit the timer thread either needs to be
a daemon thread, or it needs to be shut down when not used. The daemon
thread approach will not work, I've just realised. The shutting down
approach requires accurate tracking of actor lifetimes, which would be
difficult for ActorProxies. i.e. It would probably have to rely on the
threads exiting, and their ThreadLocal ActorProxies being garbage
collected. This would probably result in the VM occasionally failing
to shutdown. (Another option would be to start and stop the thread
based on a count of outstanding TIMEOUT messages, but this might be
expensive.)

Cheers
Rich
--
http://twitter.com/richdougherty

Thu, 2009-05-28, 12:07

#26

Philipp Haller

Joined: 2009-01-13,

Re: OutputChannelActor trait

Hi all,

Following up on the recent discussion on lighter-weight actors and
refactoring the Actor <: AbstractActor <: OutputChannel hierarchy, I
came up with a new OutputChannelActor trait that I think is useful for
several reasons.

What it is:

- It implements the OutputChannel trait, that is, it provides
implementations for `!`, `send`, `forward`, and `receiver`, but not
more than that, hence its name. If you need linking, you have to use
a full-blown Actor.

- It implements pure event-based actors, that is, it only supports
`react` for receiving messages. The usual control-flow operators
(`loop`, `andThen`, etc.) are also available.

- Optionally, it ignores senders. If the `ignoreSender` fields is set
to true, `send` ignores its second argument. Moreover, the `!` and
`forward` method do not require Actor.self to be available.

This keeps both the overhead and the complexity of OutputChannelActor
very low. Also, I made Actor inherit from OutputChannelActor, which
nicely factors out the common functionality. I believe this also makes
the Actor trait easier to understand.

OutputChannelActors still store themselves in a ThreadLocal, so that the
operations in the Actor object are usable. However, those operations
that are not available in OutputChannelActor will throw a
ClassCastException if the ThreadLocal does not contain a full Actor. An
alternative could be to move the control-flow operations from the Actor
object into the OutputChannelActor trait. Then, those operations would
not require an OutputChannelActor in a ThreadLocal.

You can browse the code using the following changeset

http://lampsvn.epfl.ch/trac/scala/changeset/17871

Following David's suggestion, I also added a def exceptionHandler:
PartialFunction[Exception, Unit] (inherited by Actor), which is used to
handle exceptions that occur during message processing. It turns out
that it works very well together with control-flow combinators such as
`loop`: after the exception is handled, the actor simply continues
executing with the closure that the combinator has registered as
continuation. This means that in a `loop`, the actor simply continues
with the next iteration after handling the exception. (For Actors,
exceptions not handled by `exceptionHandler` are propagated using links
as usual.)

Suggestions and comments on that are very welcome.

Thanks,
Philipp

David Pollak wrote:
> Folks,
>
> Scala's Actors are a very important to the language, both as a means of
> showing off Scala's amazing syntactic flexibility and as an excellent
> was to do concurrency. I started using Actors in Lift more than 2 years
> ago and am a fan of the concept. Lately, however, the Actor code base
> has gotten buggy. More generally, with a couple of years of experience
> with Actors, the slavish adherence to the Erlang Actor model,
> particularly in light of no OTP-style libary, just doesn't make sense.
> I think it's time to take a long hard look at the Actor libraries, both
> in terms of implementation and in terms of design.
>
> After more than 6 months of chasing down Actor-related memory retention
> problems, I added the following patch to Lift today:
> object PointlessActorToWorkAroundBug extends Actor {
> import scala.collection.mutable.HashSet
> import java.lang.ref.Reference
>
> def act = loop {
> react {
> case "ActorBug" =>
> try {
> import scala.collection.mutable.HashSet
> import java.lang.ref.Reference
>
> val agc = ActorGC
> agc.synchronized {
> val rsf = agc.getClass.getDeclaredField("refSet")
> rsf.setAccessible(true)
> rsf.get(agc) match {
> case h: HashSet[Reference[Object]] =>
> Log.trace("[MEMDEBUG] got the actor refSet... length:
> "+h.size)
>
> val nullRefs = h.elements.filter(f => f.get eq null).toList
>
> nullRefs.foreach(r => h -= r)
>
> val nonNull = h.elements.filter(f => f.get ne null).toList
>
> Log.trace("[MEMDEBUG] got the actor refSet... non null
> elems: "+
> nonNull.size)
>
> nonNull.foreach{r =>
> val a = r.get.getClass.getDeclaredField("exiting")
> a.setAccessible(true)
> if (a.getBoolean(r.get)) {
> h -= r
> r.clear
> }
> }
>
> Log.trace("[MEMDEBUG] (again) got the actor refSet...
> length: "+h.size)
>
> case _ =>
> }
> }
> } catch {
> case e => Log.error("[MEMDEBUG] failure", e)
> }
> ping()
>
> case _ =>
> }
> }
>
> private def ctor() {
> this.start
> ping()
> }
>
> private def ping() {
> ActorPing.schedule(this, "ActorBug", 1 minute)
> }
>
> ctor()
> }
>
> This patch was necessary for a bunch of reasons:
>
> * The 2.7.x branch of Actors relies on WeakReference to clean up
> Actors. Well... in production that does not work. I've got
> megabytes of retained references pointed to by Weak References
> that are never garbage collected (or not GCed soon enough) and
> ultimately cause OOME.
> * Actors that are not linked do not mark themselves are exiting and
> thus hang out forever, creating another memory retention issue.
>
> I have spent the last 2 days in the bowels of the Actor code and it
> seems that the state transitions are undocumented and indeterminate.
> Sometimes state transitions are caused by method invocation. Sometimes
> state transition is caused by catching particular magic exceptions. The
> ActorGC object retains a lot of references, but it's unclear why the
> ActorGC exists other than to ref-count the number of Actors in the
> system (wouldn't a finalizer be better for this?)
>
> So, at a code level, it seems to me that it's time to do a major-league
> refactoring of the Actor code so that all the state transitions are
> documented and handled correctly and places where there is possible
> memory retention (any references from an object) are justified in the
> documentation.
>
> At the design level, the current Actor library slavishly copies the
> Erlang Actor design. The most glaring problem is the lack of exception
> handling in Actors, except via "linking". The design ignores the fact
> that exceptions are much more common in JVM libraries than they are in
> Erlang. All exceptions are unchecked in Scala, yet many Java libraries
> use exceptions to convey information (e.g., Integer.parseInt). This
> leads to exceptions being thrown, Actors getting shut down, and no
> indication to the user. It's a very common problem in Lift-related Actor
> code to have exceptions cause Actors to terminate without any warning to
> the developer. So, as a design criteria, I strongly recommend that the
> Actor libraries support both Erlang and JVM styles of Exception management.
>
> Scala's Actors have a bunch of nifty features including "!?" for send
> and wait for reply as well as "!!" to send and receive a Future. I'm
> sure that these features are useful in some applications. However, in
> order to implement these features, there's a lot of bloat in the Actor
> library. Each message includes return channels, etc. This machinery is
> unnecesary for "Simple Actors." I submit that these features should be
> separate and optional. If someone wants a "basic actor (on that only
> supports "!")", they should not be saddled with the heavy machinery
> required for the !?, !!, sender, etc. stuff.
>
> So, to make my design recommendations concrete:
>
> * All Actors should have a method that returns a
> PartialFunction[Throwable, Unit] and that partial function should
> be consulted in the event of an unhandled exception from a
> react/receive. If the handler handles the exception, the Actor
> should continue to run.
> * An ActorLight class that's got an inbox and supports the "!"
> method. Messages are fast-tracked into the ActorLight's inbox and
> the Actor is scheduled. Nothing more.
>
> I will file defects based on my review of the Actor code, but I wanted
> to let folks know about my thoughts. One final thought is that I am
> seriously considering writing an Actor library for Lift and dropping the
> use of Scala's Actor library. I'd prefer to build on top of Scala's
> libraries, but the Actor libraries really need to improve for 2.8.
>
> Thanks,
>
> David
>

Thu, 2009-05-28, 13:47

#27

odersky

Joined: 2008-07-29,

Re: OutputChannelActor trait

This looks prettty good to me. One suggestion: Can we rename
OutputChannelActor (long and unwieldy) to Reactor?

Cheers

Thu, 2009-05-28, 14:07

#28

Joshua.Suereth

Joined: 2008-09-02,

Re: OutputChannelActor trait

All,

I've also been toying with Unifying the Actors model with scala.swing's Reaction model. It appears like the Reactor trait would be a nice place to inject my code. I set up some sample test code on github. It consists of the following:

1) A SwingScheduler object that just defers all processing to the AWTThread (technically placing it in the EventQueue)
2) A trait that extends actor and defaults the scheduler to SwingEventQueueScheduler
3) A mixn trait (ActorReactorBridge) that Extends scala.swing.Reactor + SwingActor. It passes messages to reactions, optionally wrapping them if desired.

I'm curious with all the other refactorings of actors going on, if this functionality would be useful. I've merely done some prototyping, but I was hoping the experts could let me know if further effort in this direction would be useful.

Here's the "core" classes and the silly sample code.

Thanks! I look forward to the new actor changes! I haven't gone beyond actors basics at work yet, but I can tell you they definitely made my life easier, and hopefully will gain attention from my coworkers.

-Josh

On Thu, May 28, 2009 at 8:46 AM, martin odersky <martin.odersky@epfl.ch> wrote:

This looks prettty good to me. One suggestion: Can we rename
OutputChannelActor (long and unwieldy) to Reactor?

Cheers

Sun, 2009-05-31, 14:37

#29

Philipp Haller

Joined: 2009-01-13,

Re: OutputChannelActor trait

martin odersky wrote:
> This looks prettty good to me. One suggestion: Can we rename
> OutputChannelActor (long and unwieldy) to Reactor?

Good idea. I renamed it in r17933.

Cheers,
Philipp

> On Thu, May 28, 2009 at 1:00 PM, Philipp Haller wrote:
>> Hi all,
>>
>> Following up on the recent discussion on lighter-weight actors and
>> refactoring the Actor <: AbstractActor <: OutputChannel hierarchy, I
>> came up with a new OutputChannelActor trait that I think is useful for
>> several reasons.
>>
>> What it is:
>>
>> - It implements the OutputChannel trait, that is, it provides
>> implementations for `!`, `send`, `forward`, and `receiver`, but not
>> more than that, hence its name. If you need linking, you have to use
>> a full-blown Actor.
>>
>> - It implements pure event-based actors, that is, it only supports
>> `react` for receiving messages. The usual control-flow operators
>> (`loop`, `andThen`, etc.) are also available.
>>
>> - Optionally, it ignores senders. If the `ignoreSender` fields is set
>> to true, `send` ignores its second argument. Moreover, the `!` and
>> `forward` method do not require Actor.self to be available.
>>
>> This keeps both the overhead and the complexity of OutputChannelActor
>> very low. Also, I made Actor inherit from OutputChannelActor, which
>> nicely factors out the common functionality. I believe this also makes
>> the Actor trait easier to understand.
>>
>> OutputChannelActors still store themselves in a ThreadLocal, so that the
>> operations in the Actor object are usable. However, those operations
>> that are not available in OutputChannelActor will throw a
>> ClassCastException if the ThreadLocal does not contain a full Actor. An
>> alternative could be to move the control-flow operations from the Actor
>> object into the OutputChannelActor trait. Then, those operations would
>> not require an OutputChannelActor in a ThreadLocal.
>>
>> You can browse the code using the following changeset
>>
>> http://lampsvn.epfl.ch/trac/scala/changeset/17871
>>
>> Following David's suggestion, I also added a def exceptionHandler:
>> PartialFunction[Exception, Unit] (inherited by Actor), which is used to
>> handle exceptions that occur during message processing. It turns out
>> that it works very well together with control-flow combinators such as
>> `loop`: after the exception is handled, the actor simply continues
>> executing with the closure that the combinator has registered as
>> continuation. This means that in a `loop`, the actor simply continues
>> with the next iteration after handling the exception. (For Actors,
>> exceptions not handled by `exceptionHandler` are propagated using links
>> as usual.)
>>
>> Suggestions and comments on that are very welcome.
>>
>> Thanks,
>> Philipp
>>
>>
>> David Pollak wrote:
>>> Folks,
>>>
>>> Scala's Actors are a very important to the language, both as a means of
>>> showing off Scala's amazing syntactic flexibility and as an excellent
>>> was to do concurrency. I started using Actors in Lift more than 2 years
>>> ago and am a fan of the concept. Lately, however, the Actor code base
>>> has gotten buggy. More generally, with a couple of years of experience
>>> with Actors, the slavish adherence to the Erlang Actor model,
>>> particularly in light of no OTP-style libary, just doesn't make sense.
>>> I think it's time to take a long hard look at the Actor libraries, both
>>> in terms of implementation and in terms of design.
>>>
>>> After more than 6 months of chasing down Actor-related memory retention
>>> problems, I added the following patch to Lift today:
>>> object PointlessActorToWorkAroundBug extends Actor {
>>> import scala.collection.mutable.HashSet
>>> import java.lang.ref.Reference
>>>
>>> def act = loop {
>>> react {
>>> case "ActorBug" =>
>>> try {
>>> import scala.collection.mutable.HashSet
>>> import java.lang.ref.Reference
>>>
>>> val agc = ActorGC
>>> agc.synchronized {
>>> val rsf = agc.getClass.getDeclaredField("refSet")
>>> rsf.setAccessible(true)
>>> rsf.get(agc) match {
>>> case h: HashSet[Reference[Object]] =>
>>> Log.trace("[MEMDEBUG] got the actor refSet... length:
>>> "+h.size)
>>>
>>> val nullRefs = h.elements.filter(f => f.get eq null).toList
>>>
>>> nullRefs.foreach(r => h -= r)
>>>
>>> val nonNull = h.elements.filter(f => f.get ne null).toList
>>>
>>> Log.trace("[MEMDEBUG] got the actor refSet... non null
>>> elems: "+
>>> nonNull.size)
>>>
>>> nonNull.foreach{r =>
>>> val a = r.get.getClass.getDeclaredField("exiting")
>>> a.setAccessible(true)
>>> if (a.getBoolean(r.get)) {
>>> h -= r
>>> r.clear
>>> }
>>> }
>>>
>>> Log.trace("[MEMDEBUG] (again) got the actor refSet...
>>> length: "+h.size)
>>>
>>> case _ =>
>>> }
>>> }
>>> } catch {
>>> case e => Log.error("[MEMDEBUG] failure", e)
>>> }
>>> ping()
>>>
>>> case _ =>
>>> }
>>> }
>>>
>>> private def ctor() {
>>> this.start
>>> ping()
>>> }
>>>
>>> private def ping() {
>>> ActorPing.schedule(this, "ActorBug", 1 minute)
>>> }
>>>
>>> ctor()
>>> }
>>>
>>> This patch was necessary for a bunch of reasons:
>>>
>>> * The 2.7.x branch of Actors relies on WeakReference to clean up
>>> Actors. Well... in production that does not work. I've got
>>> megabytes of retained references pointed to by Weak References
>>> that are never garbage collected (or not GCed soon enough) and
>>> ultimately cause OOME.
>>> * Actors that are not linked do not mark themselves are exiting and
>>> thus hang out forever, creating another memory retention issue.
>>>
>>> I have spent the last 2 days in the bowels of the Actor code and it
>>> seems that the state transitions are undocumented and indeterminate.
>>> Sometimes state transitions are caused by method invocation. Sometimes
>>> state transition is caused by catching particular magic exceptions. The
>>> ActorGC object retains a lot of references, but it's unclear why the
>>> ActorGC exists other than to ref-count the number of Actors in the
>>> system (wouldn't a finalizer be better for this?)
>>>
>>> So, at a code level, it seems to me that it's time to do a major-league
>>> refactoring of the Actor code so that all the state transitions are
>>> documented and handled correctly and places where there is possible
>>> memory retention (any references from an object) are justified in the
>>> documentation.
>>>
>>> At the design level, the current Actor library slavishly copies the
>>> Erlang Actor design. The most glaring problem is the lack of exception
>>> handling in Actors, except via "linking". The design ignores the fact
>>> that exceptions are much more common in JVM libraries than they are in
>>> Erlang. All exceptions are unchecked in Scala, yet many Java libraries
>>> use exceptions to convey information (e.g., Integer.parseInt). This
>>> leads to exceptions being thrown, Actors getting shut down, and no
>>> indication to the user. It's a very common problem in Lift-related Actor
>>> code to have exceptions cause Actors to terminate without any warning to
>>> the developer. So, as a design criteria, I strongly recommend that the
>>> Actor libraries support both Erlang and JVM styles of Exception management.
>>>
>>> Scala's Actors have a bunch of nifty features including "!?" for send
>>> and wait for reply as well as "!!" to send and receive a Future. I'm
>>> sure that these features are useful in some applications. However, in
>>> order to implement these features, there's a lot of bloat in the Actor
>>> library. Each message includes return channels, etc. This machinery is
>>> unnecesary for "Simple Actors." I submit that these features should be
>>> separate and optional. If someone wants a "basic actor (on that only
>>> supports "!")", they should not be saddled with the heavy machinery
>>> required for the !?, !!, sender, etc. stuff.
>>>
>>> So, to make my design recommendations concrete:
>>>
>>> * All Actors should have a method that returns a
>>> PartialFunction[Throwable, Unit] and that partial function should
>>> be consulted in the event of an unhandled exception from a
>>> react/receive. If the handler handles the exception, the Actor
>>> should continue to run.
>>> * An ActorLight class that's got an inbox and supports the "!"
>>> method. Messages are fast-tracked into the ActorLight's inbox and
>>> the Actor is scheduled. Nothing more.
>>>
>>> I will file defects based on my review of the Actor code, but I wanted
>>> to let folks know about my thoughts. One final thought is that I am
>>> seriously considering writing an Actor library for Lift and dropping the
>>> use of Scala's Actor library. I'd prefer to build on top of Scala's
>>> libraries, but the Actor libraries really need to improve for 2.8.
>>>
>>> Thanks,
>>>
>>> David
>>>
>>> --
>>> Lift, the simply functional web framework http://liftweb.net
>>> Beginning Scala http://www.apress.com/book/view/1430219890
>>> Follow me: http://twitter.com/dpp
>>> Git some: http://github.com/dpp
>>

Scala Main Menu

Re: Time to take a long hard look at Actor design and implementation

Scala Quick Links

Featured News

User login