This page is no longer maintained — Please continue to the home page at www.scala-lang.org

How to do an efficient update of an immutable map

30 replies
Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.

I have an immutable map which occasionally needs updating and replacing
with a new version. The update will add some new keys, change the
contents of some others and occasionally delete keys as well. I could
do this with ++ for the addition/updates and -- for the removals but
that would mean creating an intermediate immutable hash. Is there a way
of doing a set of additions, updates and removals all in one go?

Ken McDonald
Joined: 2011-02-13,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
I believe there should be operators for adding one map to another (in a rush right now so can't check, sorry). So you could create a new map, add your new elements to it which would be fast since it's small (in fact, you could even make it mutable), and then add the new map to the old map to get the final map. I'm assuming that the underlying implementations are efficient (which probably means they use transient mutable structures).
Cheers,Ken
Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

> I believe there should be operators for adding one map to another (in a
> rush right now so can't check, sorry). So you could create a new map, add
> your new elements to it which would be fast since it's small (in fact, you
> could even make it mutable), and then add the new map to the old map to get
> the final map. I'm assuming that the underlying implementations are
> efficient (which probably means they use transient mutable structures).

That's more or less what I was intending to to for the additions and
updates, but it won't cater for removals - that's the problem. Also
you don't have to use a map for the add/update you can use a list of
key/value pairs.

Ken McDonald
Joined: 2011-02-13,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
Oh, I see what you mean; you'll get an intermediate map between doing the additions and the removals. I've never seen a method for doing additions and deletions at the same time, and I imagine you've looked as well. I doubt there is a way to improve on that. But I don't see it being a problem--if the creation of the intermediate map causes a performance problem, then even the creation of a single new map per update is suspect, and you should probably be using mutable maps.
Generally speaking, I've found that premature optimization is at the heart of a lot of problems--I've done it many times myself, so I know :-)
Ken
Bernd Johannes
Joined: 2011-01-28,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
p, li { white-space: pre-wrap; }

Am Mittwoch, 21. Dezember 2011, 00:10:11 schrieb Alan Burlison:

> I have an immutable map which occasionally needs updating and replacing

> with a new version. The update will add some new keys, change the

> contents of some others and occasionally delete keys as well. I could

> do this with ++ for the addition/updates and -- for the removals but

> that would mean creating an intermediate immutable hash. Is there a way

> of doing a set of additions, updates and removals all in one go?


What I did in a similar situation was:

- gather the changes qualified in a mutable collection (addition/update versus deletions)

- optimize the changes (e.g. throw out additions/updates when a deletion follows, keep only the "latest" update)

- look at the size of the resulting changes

- when "small" - perform changes on the immutable collection directly

- when "big" - process everything in a mutable collection and transform into an immutable result


That was fast enough (for me) but it requires that you are able to gather the changes during your processing. If they have to be applied on the spot then this model isn't an option.

(In my situation it was a simulation over time so each immutable collection represented a point in time and the mutable buffers were calculation intermediates which got integrated into the t0 collection to yield the t0+1 collection).


Greetings

Bernd



H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: How to do an efficient update of an immutable map

get your hands on the newbuilder.

-------- Original-Nachricht --------
> Datum: Tue, 20 Dec 2011 23:10:11 +0000
> Von: Alan Burlison
> An: scala-user@googlegroups.com
> Betreff: [scala-user] How to do an efficient update of an immutable map

> I have an immutable map which occasionally needs updating and replacing
> with a new version. The update will add some new keys, change the
> contents of some others and occasionally delete keys as well. I could
> do this with ++ for the addition/updates and -- for the removals but
> that would mean creating an intermediate immutable hash. Is there a way
> of doing a set of additions, updates and removals all in one go?
>

Rüdiger Klaehn
Joined: 2009-06-02,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
As far as I know a map builder just contains a map. Bulk operations on maps are not implemented. I guess since map uses a (very flat ) tree internally, you could do mapa ++ mapb by merging the trees instead of just adding all elements of mapb to mapa sequentially.
A data structure that supports bulk union is IntMap/LongMap in scala.collection.immutable. But of course you can only have ints or longs as keys, and it is not as memory efficient as a normal map since it is basically a binary trie instead of a ~32way trie like the new maps.

That said, Map is pretty efficient as it is. To the OP: Did you check if this is an actual performance problem?

On Wed, Dec 21, 2011 at 9:45 AM, Dennis Haupt <h-star@gmx.de> wrote:
get your hands on the newbuilder.

-------- Original-Nachricht --------
> Datum: Tue, 20 Dec 2011 23:10:11 +0000
> Von: Alan Burlison <alan.burlison@gmail.com>
> An: scala-user@googlegroups.com
> Betreff: [scala-user] How to do an efficient update of an immutable map

> I have an immutable map which occasionally needs updating and replacing
> with a new version.  The update will add some new keys, change the
> contents of some others and occasionally delete keys as well.  I could
> do this with ++ for the addition/updates and -- for the removals but
> that would mean creating an intermediate immutable hash.  Is there a way
> of doing a set of additions, updates and removals all in one go?
>
> --
> Alan Burlison
> --

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

On 21/12/2011 08:45, Dennis Haupt wrote:

> get your hands on the newbuilder.

What is that?

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: How to do an efficient update of an immutable map

On 21/12/2011 03:44, Ken McDonald wrote:

> Oh, I see what you mean; you'll get an intermediate map between doing the
> additions and the removals. I've never seen a method for doing additions
> and deletions at the same time, and I imagine you've looked as well.

Yes, I've looked and I thought I might be missing something obvious,
which is why I asked :-) Thanks for the confirmation. Deletions will
probably be rare, so I can just gather them up in a list and avoid
creating the intermediate hash if there aren't any deletions to do.

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

On 21/12/2011 09:01, Rüdiger Klaehn wrote:

> As far as I know a map builder just contains a map. Bulk operations on maps
> are not implemented. I guess since map uses a (very flat ) tree internally,
> you could do mapa ++ mapb by merging the trees instead of just adding all
> elements of mapb to mapa sequentially.

I've been blundering around trying to find the code that actually
implements ++ and -- and got pretty lost. From what I can tell it's in
effect a copy & append operation, is that correct? For large maps with
a low number of changes being added that seems rather inefficient.

> A data structure that supports bulk union is IntMap/LongMap in
> scala.collection.immutable. But of course you can only have ints or longs
> as keys, and it is not as memory efficient as a normal map since it is
> basically a binary trie instead of a ~32way trie like the new maps.
>
> That said, Map is pretty efficient as it is. To the OP: Did you check if
> this is an actual performance problem?

Not yet, and in practice I can probably get away with a slightly
inefficient implementation. However I'm trying to understand the
tradeoffs between using a mutable map + locking or using an immutable
map that gets replaced on update - the use case is multithreaded access,
mostly read-only.

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: How to do an efficient update of an immutable map

On Wed, Dec 21, 2011 at 09:53, Alan Burlison wrote:
> On 21/12/2011 09:01, Rüdiger Klaehn wrote:
>
>> As far as I know a map builder just contains a map. Bulk operations on
>> maps
>> are not implemented. I guess since map uses a (very flat ) tree
>> internally,
>> you could do mapa ++ mapb by merging the trees instead of just adding all
>> elements of mapb to mapa sequentially.
>
>
> I've been blundering around trying to find the code that actually implements
> ++ and -- and got pretty lost.  From what I can tell it's in effect a copy &
> append operation, is that correct?  For large maps with a low number of
> changes being added that seems rather inefficient.
>
>
>> A data structure that supports bulk union is IntMap/LongMap in
>> scala.collection.immutable. But of course you can only have ints or longs
>> as keys, and it is not as memory efficient as a normal map since it is
>> basically a binary trie instead of a ~32way trie like the new maps.
>>
>> That said, Map is pretty efficient as it is. To the OP: Did you check if
>> this is an actual performance problem?
>
>
> Not yet, and in practice I can probably get away with a slightly inefficient
> implementation.  However I'm trying to understand the tradeoffs between
> using a mutable map + locking or using an immutable map that gets replaced
> on update - the use case is multithreaded access, mostly read-only.

An immutable map doesn't get replaced wholesale. Most of the new map
uses the same data as the old map, with just a sprinkling of new data
structures adjusted for the key that was inserted or removed.

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

On 21/12/2011 12:17, Daniel Sobral wrote:

> An immutable map doesn't get replaced wholesale. Most of the new map
> uses the same data as the old map, with just a sprinkling of new data
> structures adjusted for the key that was inserted or removed.

Neat, thanks for the info. If I wanted to go get my head around the
implementation, where would be the best place to start looking?

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: How to do an efficient update of an immutable map

On Wed, Dec 21, 2011 at 10:32, Alan Burlison wrote:
> On 21/12/2011 12:17, Daniel Sobral wrote:
>
>> An immutable map doesn't get replaced wholesale. Most of the new map
>> uses the same data as the old map, with just a sprinkling of new data
>> structures adjusted for the key that was inserted or removed.
>
>
> Neat, thanks for the info.  If I wanted to go get my head around the
> implementation, where would be the best place to start looking?

From what I saw, it uses a TrieMap based on the hash (HashTrieMap),
plus some optimizations to provide good parallelism. This is a rather
complex structure, and I advise you to get a general grounding on
functional tries before tackling the implementation. Daniel Spiewak's
Extreme Cleverness presentation
(http://www.infoq.com/presentations/Functional-Data-Structures-in-Scala)
might get you started. It's very interesting anyway, and explains very
well the basic concepts of data structure reuse (persistent data
structures).

The whole implementation seems to be inside HashMap.scala, so you can
just look up the immutable HashMap class on Scaladoc, and then click
on the link to the source code.

Tony Morris 2
Joined: 2009-03-20,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: How to do an efficient update of an immutable map

On 12/21/2011 09:42 AM, Alan Burlison wrote:
>> I believe there should be operators for adding one map to another (in a
>> rush right now so can't check, sorry). So you could create a new map, add
>> your new elements to it which would be fast since it's small (in fact, you
>> could even make it mutable), and then add the new map to the old map to get
>> the final map. I'm assuming that the underlying implementations are
>> efficient (which probably means they use transient mutable structures).
> That's more or less what I was intending to to for the additions and
> updates, but it won't cater for removals - that's the problem. Also
> you don't have to use a map for the add/update you can use a list of
> key/value pairs.

You want a zipper. Unfortunately there isn't one in the standard Map
implementations. I hear there is one on the way in a third party
library, or you could write one yourself pretty easily. You want to
avoid mutability for efficiency reasons.

Tony Morris
Joined: 2008-12-19,
User offline. Last seen 30 weeks 4 days ago.
Re: How to do an efficient update of an immutable map

I recommend writing a (immutable) self-balancing binary tree map, with accompanying zipper to achieve your operations efficiently.

On Dec 21, 2011 10:34 PM, "Alan Burlison" <alan.burlison@gmail.com> wrote:
On 21/12/2011 12:17, Daniel Sobral wrote:

An immutable map doesn't get replaced wholesale. Most of the new map
uses the same data as the old map, with just a sprinkling of new data
structures adjusted for the key that was inserted or removed.

Neat, thanks for the info.  If I wanted to go get my head around the implementation, where would be the best place to start looking?

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

That's exactly what I was looking for, thanks very much Daniel :-)

Ken McDonald
Joined: 2011-02-13,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
Also, what is your case for using an immutable data structure? If it is simply to prevent unintended mutations, then you might do the following:
1) Use a mutable map, but that map is never passed out into the world; it is used only for the batch updates.2) Once a batch update has been done, create an immutable map from the mutable map, and use that for all the data access.
You could actually wrap a mutable map inside a simple class that would force you to use it in a fairly functional manner. Call the class IMap, it has 3 operations; add, delete, and return immutable map. The first two return a new IMap containing the old mutable map, and set a flag marking the original IMap as invalid; after that, any operation on the original IMap throws an error. Of course, this isn't functional, what it really is is an ad hoc way of enforcing a uniqueness constraint such as provided by Concurrent Clean (I have to blow my own horn here--that was my idea, the CC folks were nice enough to like it and implement it), so in this case you're doing at runtime what CC can do at compile time. This isn't ideal, but it's simple, and should sharply reduce imperative-type errors.
Of course, if you need different versions of the map active at the same time, this won't help.
Cheers,Ken
Rüdiger Klaehn
Joined: 2009-06-02,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
Sometimes the scala collections seem a bit unfinished. 
I looked at the code, and it seems that map does in fact support bulk operations. There is a function called merge in scala.collection.immutable.HashMap. But it is not being used when merging two maps with ++. 
Somebody went through all the trouble of implementing a bulk merge, but since 99.5% of all users of scala.collection.immutable.Map will use ++ to merge two maps, they will never benefit from it. 
Instead ++ just adds all elements of the second map sequentially to the first map. Now, that makes sense when the second argument of ++ is not actually a map, but if both arguments of ++ are maps the bulk merge function should be used!
Here is a microbenchmark showing the difference of using merge vs. using ++:
code:
package mapmerge
import scala.collection.immutable.HashMap import com.google.caliper.{Runner, Param, SimpleBenchmark}
class MapMergeTest extends SimpleBenchmark {
  @Param var size = 0
  var a: HashMap[Int, Int] = _
  var b: HashMap[Int, Int] = _
  override def setUp() {    val odd = (0 until size by 2).map(x => x -> x)
    val even = (1 until size by 2).map(x => x -> x)
    a = HashMap(even: _*)    b = HashMap(odd: _*)  }
  def timeMerge(reps: Int) = {
    var result = 0    var i = 0     while (i < reps) {      val c = a.merge(b)      result += c.size      i += 1    }    result  }
  def timePlusPlus(reps: Int) = {
    var result = 0    var i = 0    while (i < reps) {      val c = a ++ b      result += c.size      i += 1    }    result   }}
object MapMergeTest {  def main(args:Array[String]) {    Runner.main(classOf[MapMergeTest], args :_*)  }}
results:
java mapmerge.MapMergeTest -Dsize=10,100,1000,10000 0% Scenario{vm=java, trial=0, benchmark=Merge, size=10} 156,73 ns; σ=0,35 ns @ 3 trials13% Scenario{vm=java, trial=0, benchmark=PlusPlus, size=10} 2107,29 ns; σ=19,64 ns @ 5 trials 25% Scenario{vm=java, trial=0, benchmark=Merge, size=100} 3625,39 ns; σ=8,85 ns @ 3 trials38% Scenario{vm=java, trial=0, benchmark=PlusPlus, size=100} 14030,30 ns; σ=100,49 ns @ 3 trials50% Scenario{vm=java, trial=0, benchmark=Merge, size=1000} 29422,37 ns; σ=175,16 ns @ 3 trials 63% Scenario{vm=java, trial=0, benchmark=PlusPlus, size=1000} 156201,21 ns; σ=426,59 ns @ 3 trials75% Scenario{vm=java, trial=0, benchmark=Merge, size=10000} 423233,09 ns; σ=9805,64 ns @ 10 trials 88% Scenario{vm=java, trial=0, benchmark=PlusPlus, size=10000} 1974952,39 ns; σ=14975,68 ns @ 3 trials
 size benchmark      ns linear runtime   10     Merge     157 =   10  PlusPlus    2107 =   100     Merge    3625 =  100  PlusPlus   14030 = 1000     Merge   29422 = 1000  PlusPlus  156201 ==10000     Merge  423233 ======10000  PlusPlus 1974952 ==============================
vm: javatrial: 0
Process finished with exit code 0
On Wed, Dec 21, 2011 at 10:01 AM, Rüdiger Klaehn <rklaehn@googlemail.com> wrote:
As far as I know a map builder just contains a map. Bulk operations on maps are not implemented. I guess since map uses a (very flat ) tree internally, you could do mapa ++ mapb by merging the trees instead of just adding all elements of mapb to mapa sequentially.
A data structure that supports bulk union is IntMap/LongMap in scala.collection.immutable. But of course you can only have ints or longs as keys, and it is not as memory efficient as a normal map since it is basically a binary trie instead of a ~32way trie like the new maps.

That said, Map is pretty efficient as it is. To the OP: Did you check if this is an actual performance problem?

On Wed, Dec 21, 2011 at 9:45 AM, Dennis Haupt <h-star@gmx.de> wrote:
get your hands on the newbuilder.

-------- Original-Nachricht --------
> Datum: Tue, 20 Dec 2011 23:10:11 +0000
> Von: Alan Burlison <alan.burlison@gmail.com>
> An: scala-user@googlegroups.com
> Betreff: [scala-user] How to do an efficient update of an immutable map

> I have an immutable map which occasionally needs updating and replacing
> with a new version.  The update will add some new keys, change the
> contents of some others and occasionally delete keys as well.  I could
> do this with ++ for the addition/updates and -- for the removals but
> that would mean creating an intermediate immutable hash.  Is there a way
> of doing a set of additions, updates and removals all in one go?
>
> --
> Alan Burlison
> --


Tony Morris 2
Joined: 2009-03-20,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
To be clear, I recommend this exercise for gaining an introduction to functional programming and associated data structures -- to "be the best place to start looking to get your head around the implementation."

Happy to start it off if you like.


On 12/21/2011 11:43 PM, Tony Morris wrote:
CAJf6Usj5oEutFq14YH1_DBRNhZVrhDgD++-bD9SiqT7rNcRTxQ [at] mail [dot] gmail [dot] com" type="cite">

I recommend writing a (immutable) self-balancing binary tree map, with accompanying zipper to achieve your operations efficiently.

On Dec 21, 2011 10:34 PM, "Alan Burlison" <alan [dot] burlison [at] gmail [dot] com" rel="nofollow">alan.burlison@gmail.com> wrote:
On 21/12/2011 12:17, Daniel Sobral wrote:

An immutable map doesn't get replaced wholesale. Most of the new map
uses the same data as the old map, with just a sprinkling of new data
structures adjusted for the key that was inserted or removed.

Neat, thanks for the info.  If I wanted to go get my head around the implementation, where would be the best place to start looking?

--
Alan Burlison
--


-- 
Tony Morris
http://tmorris.net/

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

On 22/12/2011 00:14, Rüdiger Klaehn wrote:

> Here is a microbenchmark showing the difference of using merge vs. using ++:

Wow, that's a huge difference even with low numbers of additions.
Thanks for the info, and for a useful example of the SimpleBenchmark
stuff as well :-)

Perhaps a bug should be logged against this?

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: How to do an efficient update of an immutable map

On 21/12/2011 22:07, Ken McDonald wrote:

> Also, what is your case for using an immutable data structure? If it is
> simply to prevent unintended mutations, then you might do the following:

It's for multithreaded access. It's OK for threads to have a slightly
out-of-date version, but updates should be atomic. From reading the
Scala literature that would seem to be the classic case where you'd use
an immutable map, hence the question.

> 1) Use a mutable map, but that map is never passed out into the world; it
> is used only for the batch updates.
> 2) Once a batch update has been done, create an immutable map from the
> mutable map, and use that for all the data access.

The map will only have a low percentage of entries modified when it is
modified, so converting between immutable and mutable maps each time
seemed a little clumsy.

> You could actually wrap a mutable map inside a simple class that would
> force you to use it in a fairly functional manner. Call the class IMap, it
> has 3 operations; add, delete, and return immutable map. The first two
> return a new IMap containing the old mutable map, and set a flag marking
> the original IMap as invalid; after that, any operation on the original
> IMap throws an error. Of course, this isn't functional, what it really is
> is an ad hoc way of enforcing a uniqueness constraint such as provided by
> Concurrent Clean (I have to blow my own horn here--that was my idea, the CC
> folks were nice enough to like it and implement it), so in this case you're
> doing at runtime what CC can do at compile time. This isn't ideal, but it's
> simple, and should sharply reduce imperative-type errors.
>
> Of course, if you need different versions of the map active at the same
> time, this won't help.

Threads that have grabbed the old version can carry on using it, so yes
potentially there may be multiple versions active.

Rüdiger Klaehn
Joined: 2009-06-02,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: How to do an efficient update of an immutable map
On Thu, Dec 22, 2011 at 11:21 AM, Alan Burlison <alan.burlison@gmail.com> wrote:
On 21/12/2011 22:07, Ken McDonald wrote:

Also, what is your case for using an immutable data structure? If it is
simply to prevent unintended mutations, then you might do the following:

It's for multithreaded access.  It's OK for threads to have a slightly out-of-date version, but updates should be atomic.  From reading the Scala literature that would seem to be the classic case where you'd use an immutable map, hence the question.
Yes, this is the classic case where using an immutable map makes sense, and dressing up a mutable data structure as immutable won't help. 

1) Use a mutable map, but that map is never passed out into the world; it
is used only for the batch updates.
2) Once a batch update has been done, create an immutable map from the
mutable map, and use that for all the data access.

The map will only have a low percentage of entries modified when it is modified, so converting between immutable and mutable maps each time seemed a little clumsy.
If the updates are a small number of key->value tuples, just use ++. It will be pretty fast. If the updates are already a s.c.i.HashMap, use merge. That will be even faster. It is probably not worth it to convert a Seq of Pairs into a map just so you can use merge. If in doubt, just benchmark it.
The downside of using merge is that you have to explicitly declare your map as scala.collection.immutable.HashMap, because for whatever reason ++ does not use the fast bulk merge when both sides are HashMaps. 
nilskp
Joined: 2009-01-30,
User offline. Last seen 1 year 27 weeks ago.
Re: Re: How to do an efficient update of an immutable map
On Thu, Dec 22, 2011 at 4:21 AM, Alan Burlison <alan.burlison@gmail.com> wrote:
On 21/12/2011 22:07, Ken McDonald wrote:

Also, what is your case for using an immutable data structure? If it is
simply to prevent unintended mutations, then you might do the following:

It's for multithreaded access.  It's OK for threads to have a slightly out-of-date version, but updates should be atomic.  From reading the Scala literature that would seem to be the classic case where you'd use an immutable map, hence the question.

If you have multithreaded mutation, remember to use an AtomicReference to hold the map, and do the necessary steps (CAS looping etc.).
Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: How to do an efficient update of an immutable map

On 22/12/2011 14:01, Nils Kilden-Pedersen wrote:

> If you have multithreaded mutation, remember to use an AtomicReference to
> hold the map, and do the necessary steps (CAS looping etc.).

The mutation will only be done by one thread, but I'll still probably
need to lock around the reads/writes of the hash reference.

nilskp
Joined: 2009-01-30,
User offline. Last seen 1 year 27 weeks ago.
Re: Re: How to do an efficient update of an immutable map
On Thu, Dec 22, 2011 at 8:38 AM, Alan Burlison <alan.burlison@gmail.com> wrote:
On 22/12/2011 14:01, Nils Kilden-Pedersen wrote:

If you have multithreaded mutation, remember to use an AtomicReference to
hold the map, and do the necessary steps (CAS looping etc.).

The mutation will only be done by one thread, but I'll still probably need to lock around the reads/writes of the hash reference.

Why?
Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: Re: How to do an efficient update of an immutable map

Then you only need to use @volatile

On Dec 22, 2011 3:39 PM, "Alan Burlison" <alan.burlison@gmail.com> wrote:
On 22/12/2011 14:01, Nils Kilden-Pedersen wrote:

If you have multithreaded mutation, remember to use an AtomicReference to
hold the map, and do the necessary steps (CAS looping etc.).

The mutation will only be done by one thread, but I'll still probably need to lock around the reads/writes of the hash reference.

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: How to do an efficient update of an immutable map

On 22/12/2011 14:56, Nils Kilden-Pedersen wrote:

>> The mutation will only be done by one thread, but I'll still probably need
>> to lock around the reads/writes of the hash reference.

Caching.

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: How to do an efficient update of an immutable map

On 22/12/2011 14:57, √iktor Ҡlang wrote:

> Then you only need to use @volatile

Yeah, volatile would do. I keep forgetting that they fixed that in Java5.

Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

On 22/12/2011 00:14, Rüdiger Klaehn wrote:

> Sometimes the scala collections seem a bit unfinished.
>
> I looked at the code, and it seems that map does in fact support bulk
> operations. There is a function called merge in
> scala.collection.immutable.HashMap. But it is not being used when merging
> two maps with ++.
>
> Somebody went through all the trouble of implementing a bulk merge, but
> since 99.5% of all users of scala.collection.immutable.Map will use ++ to
> merge two maps, they will never benefit from it.
>
> Instead ++ just adds all elements of the second map sequentially to the
> first map. Now, that makes sense when the second argument of ++ is not
> actually a map, but if both arguments of ++ are maps the bulk merge
> function should be used!

In practice merge doesn't seem to be all that useful. It's only defined
on immutable.HashMap. If you build up the changes you want to merge in
a mutable.HashHap there's no obvious way to convert that to a
immutable.HashMap so you can pass it to merge - the closest method is
toMap but that gives you back a Map rather than a HashMap.

I have to echo the comment above about the collections feeling
unfinished - what I'm trying to do seems like an obvious use case, but
apparently there's no way to do it.

Rüdiger Klaehn
Joined: 2009-06-02,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map
On Sat, Dec 24, 2011 at 12:22 PM, Alan Burlison <alan.burlison@gmail.com> wrote:
On 22/12/2011 00:14, Rüdiger Klaehn wrote:

Sometimes the scala collections seem a bit unfinished.

I looked at the code, and it seems that map does in fact support bulk
operations. There is a function called merge in
scala.collection.immutable.HashMap. But it is not being used when merging
two maps with ++.

Somebody went through all the trouble of implementing a bulk merge, but
since 99.5% of all users of scala.collection.immutable.Map will use ++ to
merge two maps, they will never benefit from it.

Instead ++ just adds all elements of the second map sequentially to the
first map. Now, that makes sense when the second argument of ++ is not
actually a map, but if both arguments of ++ are maps the bulk merge
function should be used!

In practice merge doesn't seem to be all that useful.  It's only defined on immutable.HashMap.  If you build up the changes you want to merge in a mutable.HashHap there's no obvious way to convert that to a immutable.HashMap so you can pass it to merge - the closest method is toMap but that gives you back a Map rather than a HashMap.

This is just a hunch, but I think that building an immutable hash map containing changes just so you can use merge instead of adding them one by one is probably not worth it. The time you save using merge instead of ++ will instead be spent building the changes map. The only situation where this would make sense is if you want the time spent in the actual merge to be as short as possible because you have to hold a lock during the merge, but not during the building of the changes map.  
I have to echo the comment above about the collections feeling unfinished - what I'm trying to do seems like an obvious use case, but apparently there's no way to do it.

Well, calling them unfinished is probably a bit harsh. They are still vastly better (more consistent, faster when using them naively) than any other collection library I am aware of. Especially the immutable data structures like Vector and Map are amazingly clever. Even if you just use them in a naive way they are very fast for immutable data structures.
They just need some more polish. In particular: you should always get the best possible algorithm if you have the same type on both sides of an operator. So e.g. x union y, when both x and y are a Set, should use a bulk merge or at least check check the sizes of both arguments and add the smaller to the larger. a ++ b should use merge if both a and b are s.c.i.HashSet. The problem is that there is such a vast number methods and traits being mixed in that it can be quite confusing for an outsider to try to add such special cases.
Alan Burlison
Joined: 2011-08-26,
User offline. Last seen 42 years 45 weeks ago.
Re: How to do an efficient update of an immutable map

On 24 December 2011 12:10, Rüdiger Klaehn wrote:

> This is just a hunch, but I think that building an immutable hash map
> containing changes just so you can use merge instead of adding them one by
> one is probably not worth it. The time you save using merge instead of ++
> will instead be spent building the changes map. The only situation where
> this would make sense is if you want the time spent in the actual merge to
> be as short as possible because you have to hold a lock during the merge,
> but not during the building of the changes map.

I think you are almost certainly right. And I can build the new map
without locking and then just switch in the reference whilst holding
the lock.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland