- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
equality: the walls are closing in
Sun, 2009-06-21, 15:20
It's amazing this one was not the first equality issue I found. I
didn't see it sooner because I made the mistake of using 0 as my common
value among types, and they all hash 0 to 0.
We have a really fundamental problem with equality. The decisions for
scala to a) unify primitives and boxed types under AnyVal and b)
continue to use the java machinery for boxing and equals/hashCode, taken
in combination, have created a situation where consistency even among
the simple boxed types is impossible.
Here is a small java program:
public class A {
public static void main(String[] args) {
int x1 = 1;
float x2 = 1.0f;
Integer o1 = Integer.valueOf(x1);
Float o2 = Float.valueOf(x2);
System.out.println(x1 == x2);
System.out.println(o1.equals(o2));
System.out.println(o1.hashCode() + " != " + o2.hashCode());
}
}
This prints:
true
false
1 != 1065353216
In java this is internally consistent because they accept that
primitives and boxed types each have their own behavior. There is no
way to translate this into scala without sacrificing either "1 == 1.0"
or "equal objects have equal hashCodes."
Even floats and doubles have different hashcodes:
scala> (1.0f.hashCode, 1.0d.hashCode)
res2: (Int, Int) = (1065353216,1072693248)
Do you want to live in a world where 1.0f != 1.0d? Me neither, but
something has to give!
Here is what can be done consistently - I think.
Byte/Short/Int/Long/BigInt all use the boxed value as the hashCode. So
all the types in this group can be made equal to one another.
Float and Double each use their own hashCode, and neither can be equal
to anything not of its own type. That means indeed 1.0f != 1.0d.
BigDecimal also uses its own hashCode, and the scala wrapper reuses it,
so the java and scala BigDecimals can equal one another, but cannot be
equal to anything else.
Character uses itself as if it were an Int, so it COULD be grouped in
with the first five types and they could all equal one another.
However, given that we're already segregating all these types, I think
that is a mistake, and a Char should only claim equality with other
Chars, unless it is explicitly .toInt-ed.
SUMMARY:
Something has to give! We need a conscious decision on this question:
What are the relative priorities of "equal objects must have equal
hashCodes" and "primitive values should be equal even when boxed in
different types" ? One has to lose.
I do not think "equal objects must have equal hashCodes" can be
sacrificed under any conditions -- if that requirement is too onerous we
should create a new equality method rather than continuing to try to
shoehorn ourselves into java's.
Just for fun, can you predict what's in this map at the end? I mean with
current scala 2.7.5, before I started trying to address anything. And
then, how many of those values can you get back out?
import scala.collection.mutable.HashMap
val map = new HashMap[Any,String]
map(1.toByte) = "byte"
map(1.toShort) = "short"
map(1.toInt) = "int"
map(1.toLong) = "long"
map(1.toFloat) = "float"
map(1.toDouble) = "double"
map(BigInt(1)) = "scala.BigInt"
map(BigInt(1).bigInteger) = "java.lang.BigInteger"
... wait, take a guess before you look at the answer ...
Map(1.0 -> float, 1 -> java.lang.BigInteger,
1 -> scala.BigInt, 1.0 -> double)
Hmmm, can't seem to get at some of those:
scala> map(BigInt(1).bigInteger)
res23: String = java.lang.BigInteger
scala> map(BigInt(1))
res24: String = java.lang.BigInteger
Oh wait, I can get the BigInt out:
scala> map(1.toByte)
res25: String = scala.BigInt
Sun, 2009-06-21, 17:47
#2
Re: equality: the walls are closing in
2009/6/21 martin odersky :
> No, this does does not work, because 'A' == 65 is true (in Java as
> well as in Scala)
> I blieve Char has to be treated as an integer type; it's our only choice.
It's very non ideal behaviour, independently of the hash code problem.
It means == is not transitive because of the unsigned nature of chars.
>> Something has to give! We need a conscious decision on this question:
>> What are the relative priorities of "equal objects must have equal
>> hashCodes" and "primitive values should be equal even when boxed in
>> different types" ? One has to lose.
>>
>> I do not think "equal objects must have equal hashCodes" can be
>> sacrificed under any conditions -- if that requirement is too onerous we
>> should create a new equality method rather than continuing to try to
>> shoehorn ourselves into java's.
>>
> But I think 1 != 1.0d is equally impossible. Think how many numeric
> programs would break!
this also makes == non transitive due to precision issues (or at least
1.0f does, and 1L == 1.0d does. I'm not sure offhand about ints and
doubles) - large integers can't be precisely represented as a float,
so you end up with cases where n.toFloat == (n + 1).toFloat
In general equaity between fractional types of different precision is
a very bad idea.
> I believe we need to investigate the idea to separate == from equals.
> I am less convinced we should have a third (or, rather, fourth, given
> eq) equality method, that gets messy. So, there could be a set of
> overloaded == methods. For numeric types we'd have
>
> Int == Int
> Int == Float
> Int == Double
> Float == Int
> Float == Float
> Float = Double
> Double == Int
> ...
>
> you get the idea. There'd also be an overloaded universal ==in class Any:
>
> def ==(other: Any): Boolean = equals(other)
>
> Then we rely on overloading resolution to give us the right ==. This
> means that, indeed,
> storing Int 1 in a hashtable and then looking for 1.0 will yield
> nothing. But at least we now have a good explanation why :-): The
> equals/hashcode contract holds only for universal equals, not for the
> overloaded variants.
>
> What do you think abiut that?
It's better than the current behaviour in that it guarantees
consistency in the mainline case, which is something we currently
don't have at all. Having this sort of overload where you can have
compatible types with different behaviour is a bit dodgy to my mind,
but I can live with it. :-)
Sun, 2009-06-21, 17:57
#3
Re: equality: the walls are closing in
On Sun, Jun 21, 2009 at 06:25:36PM +0200, martin odersky wrote:
> Or else we could rewrite the hashcode for Scala BigDecimal.
FYI I'm just covering strategies for the Big* classes for completeness,
I also am unconcerned about these - mostly because people don't have
well defined expectations for what happens when they compare an obvious
object like BigInt with some object of another class.
> Int == Int
> Int == Float
> ...
> def ==(other: Any): Boolean = equals(other)
If we rely on overloading resolution, "x == y" and "x == (y: Any)" will
give different answers. In general there's no way to coexist with an
==(Any) method and get consistent behavior from overloading. I guess
what we need are multimethods - they dispatch on the argument's runtime
identity rather than its static type, right?
So I'll just take a day to deploy multimethods and we'll come back to
this tomorrow, ha ha.
Sun, 2009-06-21, 18:47
#4
Re: equality: the walls are closing in
On Sun, Jun 21, 2009 at 6:46 PM, Paul Phillips wrote:
> On Sun, Jun 21, 2009 at 06:25:36PM +0200, martin odersky wrote:
>> Or else we could rewrite the hashcode for Scala BigDecimal.
>
> FYI I'm just covering strategies for the Big* classes for completeness,
> I also am unconcerned about these - mostly because people don't have
> well defined expectations for what happens when they compare an obvious
> object like BigInt with some object of another class.
>
>> Int == Int
>> Int == Float
>> ...
>> def ==(other: Any): Boolean = equals(other)
>
> If we rely on overloading resolution, "x == y" and "x == (y: Any)" will
> give different answers.
Yes. I am not much concerned about that. If we spec egality as an
overloaded function that will be a natural consequence.
> In general there's no way to coexist with an
> ==(Any) method and get consistent behavior from overloading.
Yes, and that's the point! We want to have inconsistent behavior from
overloading. There needs to be an inconsistency somewhere to have both
1 == 1.0f, and 1.hashCode != 1.0f.hashCode. If we have the
inconsistency in overloading at least that's something that's familiar
for Java and Scala programmers.
> I guess
> what we need are multimethods - they dispatch on the argument's runtime
> identity rather than its static type, right?
>
I fear they would be too consistent, so they could not support the
same behavior :-)
Cheers
Sun, 2009-06-21, 19:27
#5
Re: equality: the walls are closing in
Okay, maybe this is just crazy enough to work. I had earlier ruled out
overloading-based approaches because I figured the inconsistency of the
cure was worse than the disease, but as you say "there needs to be an
inconsistency somewhere" and right now this appears most plausible.
Off to implement,
Mon, 2009-06-22, 02:27
#6
Re: equality: the walls are closing in
On Sun, Jun 21, 2009 at 4:20 PM, Paul Phillips wrote:
>> Do you want to live in a world where 1.0f != 1.0d? Me neither, but
>> something has to give!
On Sun, Jun 21, 2009 at 6:25 PM, martin odersky wrote:
> But I think 1 != 1.0d is equally impossible. Think how many numeric
> programs would break!
I would say those programs are broken already. Why do you want
equality to work for floats, it's just a source of very nasty bugs.
I vote for implementing equality on floats and doubles with a nice
UnsupportedOperationException.
BR,
John
Mon, 2009-06-22, 09:47
#7
Re: equality: the walls are closing in
On Mon, Jun 22, 2009 at 3:21 AM, John Nilsson wrote:
> On Sun, Jun 21, 2009 at 4:20 PM, Paul Phillips wrote:
>>> Do you want to live in a world where 1.0f != 1.0d? Me neither, but
>>> something has to give!
>
> On Sun, Jun 21, 2009 at 6:25 PM, martin odersky wrote:
>> But I think 1 != 1.0d is equally impossible. Think how many numeric
>> programs would break!
>
> I would say those programs are broken already. Why do you want
> equality to work for floats, it's just a source of very nasty bugs.
>
> I vote for implementing equality on floats and doubles with a nice
> UnsupportedOperationException.
>
Can't do that. We cannot break random Java brograms that are ported to Scala.
Thye need to behave the same. It's true that in a lot of cases
comparing float equality makes no sense, but we are not the judges of
that.
Cheers
On Sun, Jun 21, 2009 at 4:20 PM, Paul Phillips wrote:
> It's amazing this one was not the first equality issue I found. I
> didn't see it sooner because I made the mistake of using 0 as my common
> value among types, and they all hash 0 to 0.
>
> We have a really fundamental problem with equality. The decisions for
> scala to a) unify primitives and boxed types under AnyVal and b)
> continue to use the java machinery for boxing and equals/hashCode, taken
> in combination, have created a situation where consistency even among
> the simple boxed types is impossible.
>
> Here is a small java program:
>
> public class A {
> public static void main(String[] args) {
> int x1 = 1;
> float x2 = 1.0f;
>
> Integer o1 = Integer.valueOf(x1);
> Float o2 = Float.valueOf(x2);
>
> System.out.println(x1 == x2);
> System.out.println(o1.equals(o2));
> System.out.println(o1.hashCode() + " != " + o2.hashCode());
> }
> }
>
> This prints:
>
> true
> false
> 1 != 1065353216
>
> In java this is internally consistent because they accept that
> primitives and boxed types each have their own behavior. There is no
> way to translate this into scala without sacrificing either "1 == 1.0"
> or "equal objects have equal hashCodes."
>
> Even floats and doubles have different hashcodes:
>
> scala> (1.0f.hashCode, 1.0d.hashCode)
> res2: (Int, Int) = (1065353216,1072693248)
>
> Do you want to live in a world where 1.0f != 1.0d? Me neither, but
> something has to give!
>
> Here is what can be done consistently - I think.
>
> Byte/Short/Int/Long/BigInt all use the boxed value as the hashCode. So
> all the types in this group can be made equal to one another.
>
> Float and Double each use their own hashCode, and neither can be equal
> to anything not of its own type. That means indeed 1.0f != 1.0d.
>
> BigDecimal also uses its own hashCode, and the scala wrapper reuses it,
> so the java and scala BigDecimals can equal one another, but cannot be
> equal to anything else.
Or else we could rewrite the hashcode for Scala BigDecimal. I think
that's better. As I wrote before, I think it's not important whether
comparisons between Scala's and Java's BigInteger/BigDecimal types
yield true or not.
> Character uses itself as if it were an Int, so it COULD be grouped in
> with the first five types and they could all equal one another.
> However, given that we're already segregating all these types, I think
> that is a mistake, and a Char should only claim equality with other
> Chars, unless it is explicitly .toInt-ed.
No, this does does not work, because 'A' == 65 is true (in Java as
well as in Scala)
I blieve Char has to be treated as an integer type; it's our only choice.
> Something has to give! We need a conscious decision on this question:
> What are the relative priorities of "equal objects must have equal
> hashCodes" and "primitive values should be equal even when boxed in
> different types" ? One has to lose.
>
> I do not think "equal objects must have equal hashCodes" can be
> sacrificed under any conditions -- if that requirement is too onerous we
> should create a new equality method rather than continuing to try to
> shoehorn ourselves into java's.
>
But I think 1 != 1.0d is equally impossible. Think how many numeric
programs would break!
I believe we need to investigate the idea to separate == from equals.
I am less convinced we should have a third (or, rather, fourth, given
eq) equality method, that gets messy. So, there could be a set of
overloaded == methods. For numeric types we'd have
Int == Int
Int == Float
Int == Double
Float == Int
Float == Float
Float = Double
Double == Int
...
you get the idea. There'd also be an overloaded universal ==in class Any:
def ==(other: Any): Boolean = equals(other)
Then we rely on overloading resolution to give us the right ==. This
means that, indeed,
storing Int 1 in a hashtable and then looking for 1.0 will yield
nothing. But at least we now have a good explanation why :-): The
equals/hashcode contract holds only for universal equals, not for the
overloaded variants.
What do you think abiut that?
Cheers