This page is no longer maintained — Please continue to the home page at www.scala-lang.org

hashes

10 replies
odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.

Hi Paul,

What are your current thoughts about hashing? Here are our options, as
I see them.

1. Add a hash method to Any. It would have to have a rare name, which
makes it unlikely that it is accidentally overridden by existing code.
scalaHashCode? hash_==?

2. Add a hash(x) method to Predef or the scala package. Advantage:
Simplest, optimizes well through overloading. Inconvenience: Have to
write hash(x) instead of x.hashCode.

3. Add a nullary hash method to StringAdd (which could be renamed to
reflect its new role). Advantage: Can write x.hash. Disadvantage:
Harder to optimize,
(but one can throw magic at it). There might still be hidden
conflicts with inherited hash fields or methods.

4. Combination of 3 and 4, maybe with the hash(x) method in another
object, which needs explicit import. Advantage: Can get speed back,
without requiring compiler support. Disadvantage: Two ways to do the
same thing.

Currently we have implemented (2), but everyone still uses Java's
hashCode instead of Predef.hash(x).

We need to settle and implement a scheme before the public beta goes out.

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: hashes

Hi Martin,

On Tue, 2009-11-17 at 11:24 +0100, martin odersky wrote:
> 1. Add a hash method to Any. It would have to have a rare name, which
> makes it unlikely that it is accidentally overridden by existing code.
> scalaHashCode? hash_==?

(Below, I use `hash` to refer to this method whose name we don't know
yet)

Yes. It's worth mentioning that accidental overrides would only happen
if the user had defined an abstract method `hash` somewhere and had
overridden it in one or more places. Otherwise, it would be a compiler
error (which is not good, but much better than an accidental override).

I still see `hash` as similar to ==, but perhaps I am missing something.
Is there a reason why == is final, but `hash` would not be?

Also, since the name has to be used by user code, it should not be too
ugly if this solution is taken. ;)

> 4. Combination of 3 and 4, maybe with the hash(x) method in another
> object, which needs explicit import. Advantage: Can get speed back,
> without requiring compiler support. Disadvantage: Two ways to do the
> same thing.

I assume you mean 2 and 3, here?

Best,
Ismael

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: hashes

On Tue, Nov 17, 2009 at 12:30 PM, Ismael Juma wrote:
> Hi Martin,
>
> On Tue, 2009-11-17 at 11:24 +0100, martin odersky wrote:
>>  1. Add a hash method to Any. It would have to have a rare name, which
>> makes it unlikely that it is accidentally overridden by existing code.
>> scalaHashCode? hash_==?
>
> (Below, I use `hash` to refer to this method whose name we don't know
> yet)
>
> Yes. It's worth mentioning that accidental overrides would only happen
> if the user had defined an abstract method `hash` somewhere and had
> overridden it in one or more places. Otherwise, it would be a compiler
> error (which is not good, but much better than an accidental override).
>
Yes, but it would be worrisome enough.

> I still see `hash` as similar to ==, but perhaps I am missing something.
> Is there a reason why == is final, but `hash` would not be?
>
No, hash could be final. But I hesitate to take over such a common
name from user programs. A final method in Any is effectively a
reserved word.

> Also, since the name has to be used by user code, it should not be too
> ugly if this solution is taken. ;)
>
>>   4. Combination of 3 and 4, maybe with the hash(x) method in another
>> object, which needs explicit import. Advantage: Can get speed back,
>> without requiring compiler support. Disadvantage: Two ways to do the
>> same thing.
>
> I assume you mean 2 and 3, here?
>
Yes, of course.

Thanks

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: hashes

On Tue, 2009-11-17 at 12:39 +0100, martin odersky wrote:
> > I still see `hash` as similar to ==, but perhaps I am missing something.
> > Is there a reason why == is final, but `hash` would not be?
> >
> No, hash could be final. But I hesitate to take over such a common
> name from user programs. A final method in Any is effectively a
> reserved word.

Definitely.

OK, for whatever is worth, my favoured option is a final method in Any
that is less common than hash, but still not too ugly as it has to be
used by user code. The main reason for this is that it seems much easier
to explain == and scalaHashCode (or whatever else) to users if both work
the same way.

I can see the advantages for the other options too though, so don't feel
too strongly either way.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: hashes

On Tue, Nov 17, 2009 at 11:24:32AM +0100, martin odersky wrote:
> 1. Add a hash method to Any. It would have to have a rare name, which
> makes it unlikely that it is accidentally overridden by existing code.
> scalaHashCode? hash_==?
>
> 2. Add a hash(x) method to Predef or the scala package. Advantage:
> Simplest, optimizes well through overloading. Inconvenience: Have to
> write hash(x) instead of x.hashCode.

I think we should limit the options to these two. Adding methods via
implicit should be a last resort, the more so if we feel like we have to
magicalize it to make it fast. StringAdd causes enough unhappiness as
it is.

As usual my strategy on decisions like this is to imagine myself
explaining it to someone else. And like ijuma I prefer the explanation
where it's a method on Any like == is, because we achieve essentially
uniform treatment of the two methods which (should) work as a team. So
I think it's worth some inconvenience to achieve.

I don't see the optimization advantage in Predef. It's less magical in
Predef because we can overload at the language level, but I can put the
same type-specific logic in the Any_hash method. And from the user
standpoint, everything in Predef may as well be magic, it's not as if
Predef is "just a library", not yet anyway.

Your hash_== suggestion is also what I was thinking at first because
there's zero chance of conflicting with a legit java method, and
near-zero for scala. But semantically it's not quite right since ==
would tend to imply a 1-arity method which does a comparison, not a
0-arity method used in a comparison. And x.hash_== looks pretty silly
by itself.

fanf suggested on IRC we use ##. That's kind of awesome. It is $hash
after all, and offers a nice parallel to ==. Or maybe the redundantish
hash_# would do.

How often do people actually call hashCode, except in other hashCode
methods? People don't even have to know about this unless they're
writing their own hash tables or otherwise consuming hash codes, right?
From that perspective the name doesn't seem that important, nor does the
"oh no more punctuation" aspect. Of course that argument can be used to
indicate we should just put it in Predef.

Kevin Wright
Joined: 2009-06-09,
User offline. Last seen 49 weeks 3 days ago.
Re: Re: hashes
+1 just for use of "magicalize"I very much like the ## proposal as well, it has a beautiful symmetry

On Tue, Nov 17, 2009 at 2:44 PM, Paul Phillips <paulp@improving.org> wrote:
On Tue, Nov 17, 2009 at 11:24:32AM +0100, martin odersky wrote:
>  1. Add a hash method to Any. It would have to have a rare name, which
> makes it unlikely that it is accidentally overridden by existing code.
> scalaHashCode? hash_==?
>
>  2. Add a hash(x) method to Predef or the scala package. Advantage:
> Simplest, optimizes well through overloading. Inconvenience: Have to
> write hash(x) instead of x.hashCode.

I think we should limit the options to these two.  Adding methods via
implicit should be a last resort, the more so if we feel like we have to
magicalize it to make it fast.  StringAdd causes enough unhappiness as
it is.

As usual my strategy on decisions like this is to imagine myself
explaining it to someone else.  And like ijuma I prefer the explanation
where it's a method on Any like == is, because we achieve essentially
uniform treatment of the two methods which (should) work as a team.  So
I think it's worth some inconvenience to achieve.

I don't see the optimization advantage in Predef.  It's less magical in
Predef because we can overload at the language level, but I can put the
same type-specific logic in the Any_hash method.  And from the user
standpoint, everything in Predef may as well be magic, it's not as if
Predef is "just a library", not yet anyway.

Your hash_== suggestion is also what I was thinking at first because
there's zero chance of conflicting with a legit java method, and
near-zero for scala.  But semantically it's not quite right since ==
would tend to imply a 1-arity method which does a comparison, not a
0-arity method used in a comparison.  And x.hash_== looks pretty silly
by itself.

fanf suggested on IRC we use ##.  That's kind of awesome.  It is $hash
after all, and offers a nice parallel to ==.  Or maybe the redundantish
hash_# would do.

How often do people actually call hashCode, except in other hashCode
methods? People don't even have to know about this unless they're
writing their own hash tables or otherwise consuming hash codes, right?
From that perspective the name doesn't seem that important, nor does the
"oh no more punctuation" aspect.  Of course that argument can be used to
indicate we should just put it in Predef.

--
Paul Phillips      | Those who can make you believe absurdities
Apatheist          | can make you commit atrocities.
Empiricist         |     -- Voltaire
all hip pupils!    |----------* http://www.improving.org/paulp/ *----------

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Re: hashes

On Tue, Nov 17, 2009 at 3:51 PM, Kevin Wright
wrote:
> +1 just for use of "magicalize"
> I very much like the ## proposal as well, it has a beautiful symmetry
>
Yes, too cute to resist! I ran it by the people in the scala meeting
and there was no opposition. So I am fine with that one.

Jorge Ortiz
Joined: 2008-12-16,
User offline. Last seen 29 weeks 4 days ago.
Re: Re: hashes
Oh man, I can see it now:

  (0 /: xs) ( _ ## + _ )

It's like the Euler Identity for Scala...

--j

On Tue, Nov 17, 2009 at 7:57 AM, martin odersky <martin.odersky@epfl.ch> wrote:
On Tue, Nov 17, 2009 at 3:51 PM, Kevin Wright
<kev.lee.wright@googlemail.com> wrote:
> +1 just for use of "magicalize"
> I very much like the ## proposal as well, it has a beautiful symmetry
>
Yes, too cute to resist! I ran it by the people in the scala meeting
and there was no opposition. So I am fine with that one.

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re: hashes
It won't really work, as "+" will be taken to be a parameter to "##". OTOH,   (0 /: xs)(_.## + _)   will work as intended. :-)

On Tue, Nov 17, 2009 at 4:43 PM, Jorge Ortiz <jorge.ortiz@gmail.com> wrote:
Oh man, I can see it now:

  (0 /: xs) ( _ ## + _ )

It's like the Euler Identity for Scala...

--j

On Tue, Nov 17, 2009 at 7:57 AM, martin odersky <martin.odersky@epfl.ch> wrote:
On Tue, Nov 17, 2009 at 3:51 PM, Kevin Wright
<kev.lee.wright@googlemail.com> wrote:
> +1 just for use of "magicalize"
> I very much like the ## proposal as well, it has a beautiful symmetry
>
Yes, too cute to resist! I ran it by the people in the scala meeting
and there was no opposition. So I am fine with that one.

 -- Martin




--
Daniel C. Sobral

Veni, vidi, veterni.
Jorge Ortiz
Joined: 2008-12-16,
User offline. Last seen 29 weeks 4 days ago.
Re: Re: hashes
Yeah, I realized after sending that a better version would be something like:

  (0 /: xs)(_ + _ ##)

(works for any Traversable[_]!)

To be a true Euler Idenity it would have to involve equality somehow though...

--j

On Tue, Nov 17, 2009 at 2:21 PM, Daniel Sobral <dcsobral@gmail.com> wrote:
It won't really work, as "+" will be taken to be a parameter to "##". OTOH,   (0 /: xs)(_.## + _)   will work as intended. :-)

On Tue, Nov 17, 2009 at 4:43 PM, Jorge Ortiz <jorge.ortiz@gmail.com> wrote:
Oh man, I can see it now:

  (0 /: xs) ( _ ## + _ )

It's like the Euler Identity for Scala...

--j

On Tue, Nov 17, 2009 at 7:57 AM, martin odersky <martin.odersky@epfl.ch> wrote:
On Tue, Nov 17, 2009 at 3:51 PM, Kevin Wright
<kev.lee.wright@googlemail.com> wrote:
> +1 just for use of "magicalize"
> I very much like the ## proposal as well, it has a beautiful symmetry
>
Yes, too cute to resist! I ran it by the people in the scala meeting
and there was no opposition. So I am fine with that one.

 -- Martin




--
Daniel C. Sobral

Veni, vidi, veterni.

Colin Bullock
Joined: 2009-01-23,
User offline. Last seen 42 years 45 weeks ago.
Re: Re: hashes


On Tue, Nov 17, 2009 at 5:03 PM, Jorge Ortiz <jorge.ortiz@gmail.com> wrote:
Yeah, I realized after sending that a better version would be something like:

  (0 /: xs)(_ + _ ##)

(works for any Traversable[_]!)

To be a true Euler Idenity it would have to involve equality somehow though...


(0 /: xs)(_ + _ ##) == 1

And it's even true for List(1).

- Colin

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland