This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Are Map/Set1..4 too clever?

9 replies

Mon, 2011-05-09, 14:02

Peter 2

Joined: 2011-02-16,

We are often playing around in REPL with brief examples and tend to
draw our conclusions from what we experience there. Yesterday, the
following intelligent behaviour of Map with respect to mutual objects
astonished me:

case class C(var s:String) // ugly but good to simulate a mutual class
val (c1, c2) = (C("A"), C("B"))
val m = Map(c1->1,c2->2)
c2.s = "X"
m contains c2 // true – what a magic lookup!

After some time I realized that the same was false when using
mutable.Map, instead. Then I realized that this positive magic was due
to the classes Map1 to Map4 in object immutable.Map. So Map looses its
intelligence and returns false as soon as its size grows to 5 or more.

Should not Map/Set1..4 be better deprived of this behavior by adding
hashCode checks to == checks? Thus we could avoid some user
misunderstanding…Or is that intentional?

Mon, 2011-05-09, 14:47

#1

Kevin Wright 2

Joined: 2010-05-30,

Re: Are Map/Set1..4 too clever?

Don't use mutable objects for hash keys.(at least, don't use them if the hash is based on the mutable part)
Break that cardinal rule and you're in for a whole world of pain. If you have to use mutable objects as keys, then you *must* be explicit about using a subclass of Map that guarantees not to use hashing, the default implementation doesn't make this guarantee.

I'm all for defensive coding, but don't feel that this issue is best resolved by slowing down equality checks in small maps that have been specifically optimised for speed. What we really need is effect tracking so that mutability can be encoded as part af a type, and the compiler can warn or throw an error when you break this contract.

On 9 May 2011 14:02, Peter Empen <peter.empen@arcor.de> wrote:

We are often playing around in REPL with brief examples and tend to
draw our conclusions from what we experience there. Yesterday, the
following intelligent behaviour of Map with respect to mutual objects
astonished me:

case class C(var s:String) // ugly but good to simulate a mutual class
val (c1, c2) = (C("A"), C("B"))
val m = Map(c1->1,c2->2)
c2.s = "X"
m contains c2 // true – what a magic lookup!

After some time I realized that the same was false when using
mutable.Map, instead. Then I realized that this positive magic was due
to the classes Map1 to Map4 in object immutable.Map. So Map looses its
intelligence and returns false as soon as its size grows to 5 or more.

Should not Map/Set1..4 be better deprived of this behavior by adding
hashCode checks to == checks? Thus we could avoid some user
misunderstanding…Or is that intentional?

--
Kevin Wright

gtalk / msn : kev.lee.wright@gmail.com kev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Mon, 2011-05-09, 14:57

#2

Peter 2

Joined: 2011-02-16,

Re: Are Map/Set1..4 too clever?

Hi Kevin,

I am not a user in this context. The default implementation of my
contribution scalax.collection.Graph uses mutable.Map internally to
store graph nodes (key part) along with their adjacents (value part).
Therefore I had to remember users to beware of mutual node types with
inproper implementation of hashCode / equals...

Mon, 2011-05-09, 15:07

#3

Kevin Wright 2

Joined: 2010-05-30,

Re: Re: Are Map/Set1..4 too clever?

That's your problem, there's no such thing as a correct implementation of hashCode when it's going in a bucket and might be mutated.
If you need mutability, then you *can't* use a HashMap here.

On 9 May 2011 14:53, Peter Empen <peter.empen@arcor.de> wrote:

Hi Kevin,

I am not a user in this context. The default implementation of my
contribution scalax.collection.Graph uses mutable.Map internally to
store graph nodes (key part) along with their adjacents (value part).
Therefore I had to remember users to beware of mutual node types with
inproper implementation of hashCode / equals...

--
Kevin Wright

gtalk / msn : kev.lee.wright@gmail.com kev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Mon, 2011-05-09, 15:17

#4

Peter 2

Joined: 2011-02-16,

Re: Are Map/Set1..4 too clever?

Maybe you're right: consistent behavior in case of ill-designed keys
should be less important than performance.

Mon, 2011-05-09, 15:27

#5

Kevin Wright 2

Joined: 2010-05-30,

Re: Re: Are Map/Set1..4 too clever?

It's not just about consistency, you're asking to intentionally force something to be broken that only happens to work in the first place through a lucky coincidence. This won't just slow it down, it'll add an (albeit small) maintenance tax to every future release of Scala hereafter.
I agree that there's potential to help programmers avoid this mistake, but think it's best achieved more globally through effect tracking and the type system, which would also make the cost of the check a one-time affair at compilation, instead of affecting every single key lookup in a small map.

On 9 May 2011 15:05, Peter Empen <peter.empen@arcor.de> wrote:

Maybe you're right: consistent behavior in case of ill-designed keys
should be less important than performance.

--
Kevin Wright

gtalk / msn : kev.lee.wright@gmail.com kev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Mon, 2011-05-09, 15:37

#6

Peter 2

Joined: 2011-02-16,

Re: Are Map/Set1..4 too clever?

I fully agree with you that its much more desirable to let the
compiler to discover such ill-design.
Though I must admit I didn't know this was possible. What do you mean
by "effect tracking"?

Mon, 2011-05-09, 15:47

#7

Kevin Wright 2

Joined: 2010-05-30,

Re: Re: Are Map/Set1..4 too clever?

It just refers to the ability of the compiler to keep track of which methods/functions are pure (no side effects, hence the "effect tracking") and which types are immutable. This can then be used to enforce contracts on generics and method signatures, such as "The type of keys in a hashmap must be immutable"
I imagine this would be done through marker interfaces, allowing definitions such as:
class HashMap[K <: Immutable, V] ...

On 9 May 2011 15:29, Peter Empen <peter.empen@arcor.de> wrote:

I fully agree with you that its much more desirable to let the
compiler to discover such ill-design.
Though I must admit I didn't know this was possible. What do you mean
by "effect tracking"?

--
Kevin Wright

gtalk / msn : kev.lee.wright@gmail.com kev.lee.wright@gmail.commail: kevin.wright@scalatechnology.com
vibe / skype: kev.lee.wrightquora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Mon, 2011-05-09, 16:17

#8

dcsobral

Joined: 2009-04-23,

Re: Are Map/Set1..4 too clever?

Kevin is concentrating on what could be done and what shouldn't be
done, so I'll explain what is happening and why it is bad.

On Mon, May 9, 2011 at 10:02, Peter Empen wrote:
> We are often playing around in REPL with brief examples and tend to
> draw our conclusions from what we experience there. Yesterday, the
> following intelligent behaviour of Map with respect to mutual objects
> astonished me:
>
> case class C(var s:String) // ugly but good to simulate a mutual class
> val (c1, c2) = (C("A"), C("B"))
> val m = Map(c1->1,c2->2)

Roughly speaking, at this point, m is formed by an Array[List[C,
Int]]. There are one or two lists in this array. At position c1.## %
array.size, there's a list with (c1, 1), and at position c2.## %
array.size, there's a list with (c2, 2). They may be the same list or
not.

> c2.s = "X"

This changes c2. Since m contains a reference to c2, not a copy of it,
then that's changed there to.

> m contains c2 // true – what a magic lookup!

And if you tried some other examples, it would yield false. You CANNOT
predict what this will return.

What happens here is this:

Compute c2.## % array.size, which may or may not be the same as it was
before. For small maps (small array size), the chance of coincidence
is relatively high. For huge maps, it is very small.

Then it will search on that list for a tuple whose first element is
equal to c2. If the previous result was the same, it will find it. If
it was not, it will not find it.

This is how hash tables work -- and how they work doesn't work in the
presence of mutable keys: you can neither guarantee it will find them,
nor guarantee it will not find them.

There are three things you can do:

1. Do not use hash maps.
2. Do not use mutable keys.
3. Make sure the mutable part of the keys are not used for equality or
hash code. If _either_ operation takes a mutable part into account,
won't work with hash maps.

>
> After some time I realized that the same was false when using
> mutable.Map, instead. Then I realized that this positive magic was due
> to the classes Map1 to Map4 in object immutable.Map. So Map looses its
> intelligence and returns false as soon as its size grows to 5 or more.

It doesn't become false once size grows to 5. It becomes undefined --
it may return either true or false.

> Should not Map/Set1..4 be better deprived of this behavior by adding
> hashCode checks to == checks? Thus we could avoid some user
> misunderstanding…Or is that intentional?

Any consistent behavior here would be at odds with larger maps
inconsistent behavior, so it would never be a gain.

But ignore that argument, and... check *what* hash code? Store initial
hash code with the key, and check whether the hash code changed
afterwards?

What behavior, exactly, do you expect? Do you expect "c2" to disappear
from the map? To stay in the map but return false on any check for it?
Would it appear when iterated? Even if the hash code was stored along
with the key and checked for, a change in the element could still
result in the same hash code.

There's nothing wrong with hash maps, you just shouldn't use mutable
elements as keys. What Kevin is saying is that the only fix is to
somehow be able to prevent mutable keys from being used, which is not
possible in the absence of an effect system.

Mon, 2011-05-09, 20:07

#9

Peter 2

Joined: 2011-02-16,

Re: Are Map/Set1..4 too clever?

today I seem to have a bad day: first I'm writing "mutual" instead of
"mutable"; second - even worth - I don't manage to write
understandably enough.

Clearly, I did know that mutable keys are a wonderful poison to hash
tables users. So the Graph User Guide states:
"Although there is no constraint on the node type, you should be
especially careful when designing mutable nodes: the hashCode returned
by nodes should not change over their lifetime even if nodes are
mutating. Otherwise, looking up mutated elements in a graph will fail
- at least in case of the HashMap-based default implementation."

@Kevin: As it is not necessary for the whole key type to be immutable,
but just for its fields involved in the hashCode calculation - do you
really think the compiler can ever track this kind of partial
immutability ? I think Paul has also told at other places that this
was not or at least not easaly possible.

@Daniel: I demand the use the 3. thing: "Make sure the mutable part of
the keys are not used for equality or
hash code." This is consistent with what Map and Set are doing.

With my post I just wanted to give collection designers a chance to
make thoughts about the necessity of retaining consistency between the
degenerated Map/Set1..4 and the HashTable implementations. That's all.
A rather minor issue. Luckily, I don't have any problem.

Peter

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland