This page is no longer maintained — Please continue to the home page at www.scala-lang.org

feeling sort of out of sorts

17 replies
extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.

I couldn't take the sorting situation for another minute so I went in
and did some stuff in r20771. I hope it's to everyone's liking -- it
looks like an easy winner to me -- but let me know if not.

One thing that process reminded me of is that removeDuplicates has to be
among the worst named methods ever. I didn't know about it for the
first many moons I used scala because the letters "uniq" never appear in
it, but that aside, all that verbosity in the name serves only to
mislead. The word "remove" pretty much throughout the rest of the
library refers to mutating a collection in place, the opposite of what
removeDuplicates actually does.

I boldly propose we deprecate that method and rename it "unique".
Everyone knows what unique means. Then in combination with r20771 you
can say

xs.unique.sorted

instead of the current

xs.removeDuplicates.sortWith(_ < _)

Home run.

David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: feeling sort of out of sorts


On Tue, Feb 2, 2010 at 11:44 AM, Paul Phillips <paulp@improving.org> wrote:
I couldn't take the sorting situation for another minute so I went in
and did some stuff in r20771.  I hope it's to everyone's liking -- it
looks like an easy winner to me -- but let me know if not.

One thing that process reminded me of is that removeDuplicates has to be
among the worst named methods ever.  I didn't know about it for the
first many moons I used scala because the letters "uniq" never appear in
it, but that aside, all that verbosity in the name serves only to
mislead.  The word "remove" pretty much throughout the rest of the
library refers to mutating a collection in place, the opposite of what
removeDuplicates actually does.

I boldly propose we deprecate that method and rename it "unique".
Everyone knows what unique means.  Then in combination with r20771 you
can say

 xs.unique.sorted

Looks almost Ruby-like in its cleanliness.
 

instead of the current

 xs.removeDuplicates.sortWith(_ < _)

Home run.

--
Paul Phillips      | The important thing here is that the music is not in
Vivid              | the piano.  And knowledge and edification is not in the
Empiricist         | computer.  The computer is simply an instrument whose
up hill, pi pals!  | music is ideas.  -- Alan Kay



--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics
Randall R Schulz
Joined: 2008-12-16,
User offline. Last seen 1 year 29 weeks ago.
Re: feeling sort of out of sorts

On Tuesday February 2 2010, Paul Phillips wrote:
> ...
>
> I boldly propose we deprecate that method and rename it "unique".
> Everyone knows what unique means. Then in combination with r20771
> you can say

Wouldn't "distinct" be more accurate?

> ...

Randall Schulz

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: feeling sort of out of sorts

On Tue, Feb 02, 2010 at 01:02:56PM -0800, Randall R Schulz wrote:
> Wouldn't "distinct" be more accurate?

From the standpoint of what words mean, yes. From the standpoint of
what things are usually called, no. I would gladly take either over
removeDuplicates.

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: feeling sort of out of sorts

On Tue, Feb 2, 2010 at 10:17 PM, Paul Phillips wrote:
> On Tue, Feb 02, 2010 at 01:02:56PM -0800, Randall R Schulz wrote:
>> Wouldn't "distinct" be more accurate?
>
> From the standpoint of what words mean, yes.  From the standpoint of
> what things are usually called, no.  I would gladly take either over
> removeDuplicates.
>
I think in SQL the analogous function is called distinct. I agree I
would take either over removeDuplicates.

Cheers

Seth Tisue
Joined: 2008-12-16,
User offline. Last seen 34 weeks 3 days ago.
Re: feeling sort of out of sorts

>>>>> "Paul" == Paul Phillips writes:

Paul> I boldly propose we deprecate that method and rename it "unique".
Paul> Everyone knows what unique means. Then in combination with
Paul> r20771 you can say
Paul> xs.unique.sorted

Ahhhhh... therein lies the nub.

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: feeling sort of out of sorts

On Tue, Feb 2, 2010 at 10:17 PM, Paul Phillips wrote:
> On Tue, Feb 02, 2010 at 01:02:56PM -0800, Randall R Schulz wrote:
>> Wouldn't "distinct" be more accurate?
>
> From the standpoint of what words mean, yes.  From the standpoint of
> what things are usually called, no.  I would gladly take either over
> removeDuplicates.
>
After having signed off on the code review I am getting doubts. We
agree that distinct has the closer connotation. SQL uses it. Who uses
unique?

Cheers

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: feeling sort of out of sorts

On Wed, Feb 03, 2010 at 07:22:02PM +0100, martin odersky wrote:
> After having signed off on the code review I am getting doubts. We
> agree that distinct has the closer connotation. SQL uses it. Who uses
> unique?

I think most of them spell it without all the fancy vowels (and I'd have
proposed uniq if I thought it would fly) but unix, perl, ruby, I don't
know that many languages.

This page isn't that helpful because it mostly implements it without
informing us whether a built-in way exists. However I'd say it's quite
telling that almost every sample implementation uses the term "unique"
in so doing. I realize one could conjure up any number of explanations
for that, but I still think that "unique" more than any other word is
what people think of first.

http://rosettacode.org/wiki/Create_a_Sequence_of_unique_elements

Grand, Mark D.
Joined: 2009-12-24,
User offline. Last seen 42 years 45 weeks ago.
RE: feeling sort of out of sorts

SQL uses both distinct and unique.

Distinct is used in queries to remove duplicate records from results.

Unique is used in column definitions to specify that the values in column or combination of columns must be unique and that the database should reject operations that would introduce duplicates.

-----Original Message-----
From: odersky@gmail.com [mailto:odersky@gmail.com] On Behalf Of martin odersky
Sent: Wednesday, February 03, 2010 1:22 PM
To: Paul Phillips
Cc: Randall R Schulz; scala-internals@listes.epfl.ch
Subject: Re: [scala-internals] feeling sort of out of sorts

On Tue, Feb 2, 2010 at 10:17 PM, Paul Phillips wrote:
> On Tue, Feb 02, 2010 at 01:02:56PM -0800, Randall R Schulz wrote:
>> Wouldn't "distinct" be more accurate?
>
> From the standpoint of what words mean, yes. From the standpoint of
> what things are usually called, no. I would gladly take either over
> removeDuplicates.
>
After having signed off on the code review I am getting doubts. We
agree that distinct has the closer connotation. SQL uses it. Who uses
unique?

Cheers

maciek.makowski
Joined: 2009-12-05,
User offline. Last seen 29 weeks 5 days ago.
Re: feeling sort of out of sorts

> After having signed off on the code review I am getting doubts. We
> agree that distinct has the closer connotation. SQL uses it. Who uses
> unique?

The Unix tool for removing duplicates from (sorted) list of lines is
called 'uniq'. I'd vote for 'distinct' in Scala myself.

Regards,
Maciek

LouisB
Joined: 2009-11-25,
User offline. Last seen 2 years 46 weeks ago.
Re: feeling sort of out of sorts
What's in a name...

but unique / distinct both work quite well.

for what it's worth, a similar method in .NET LINQ is called distinct

http://msdn.microsoft.com/en-us/library/cc716801.aspx

I suppose what you're actually getting is a Set, but set has specific meaning in the API, as a method distinct would be clear and in-line with both SQL and .NET LINQ

cheers, Louis


--
Web: www.chillipower.com
Blog: http://louisbotterill.blogspot.com/
Twitter: http://twitter.com/BinaryJunkie
LinkedIn: http://uk.linkedin.com/pub/louis-botterill/10/3b2/265

Please consider your environmental responsibility before printing this e-mail

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: feeling sort of out of sorts
Out of curiosity:   Distinct (C#) nub (Haskell) uniq (IDL) removeDuplicates & deleteDuplicates (Lisp) remdup (Logo) prune (Factor) DeleteDuplicated (Mathematica) cull (Nial) uniq (Perl) array_unique (PHP) sort-object -unique (powershell) unique (R) unique (Rebol) unique (Raven) uniq (Ruby) removeDuplicates (Scala) remove-duplicates (Scheme) lsort -unique (Tcl) uniq (Unix command)   The use of "unique" as a variable in the other languages was really prevalent, though the question itself uses that word. Oz defined a function called "nub", but of exceptions worth noting, that's it.
The first time *I* searched for it, I tried "distinct" first, "uniq" second. I think it was paulp who pointed me at removeDuplicates.   Paul's objection against removeDuplicates ressonates with me. It describes an action, which one usually associate with methods that produce mutation. On the other hand, "distinct" or "unique" describe properties, which sound better for operations that don't mutate. Furthermore, one can think of "isDistinct" or "isUnique", which is definitely not the case for "removeDuplicates".
On Wed, Feb 3, 2010 at 4:31 PM, Paul Phillips <paulp@improving.org> wrote:
On Wed, Feb 03, 2010 at 07:22:02PM +0100, martin odersky wrote:
> After having signed off on the code review I am getting doubts. We
> agree that distinct has the closer connotation. SQL uses it. Who uses
> unique?

I think most of them spell it without all the fancy vowels (and I'd have
proposed uniq if I thought it would fly) but unix, perl, ruby, I don't
know that many languages.

This page isn't that helpful because it mostly implements it without
informing us whether a built-in way exists.  However I'd say it's quite
telling that almost every sample implementation uses the term "unique"
in so doing.  I realize one could conjure up any number of explanations
for that, but I still think that "unique" more than any other word is
what people think of first.

 http://rosettacode.org/wiki/Create_a_Sequence_of_unique_elements

--
Paul Phillips      | Where there's smoke, there's mirrors!
Everyman           |
Empiricist         |
all hip pupils!    |----------* http://www.improving.org/paulp/ *----------



--
Daniel C. Sobral

I travel to the future all the time.
dbrooksnz
Joined: 2010-01-12,
User offline. Last seen 2 years 42 weeks ago.
Re: feeling sort of out of sorts

+1 for distinct.

I like the SQL differentiation between distinct and unique -- "distinct"
to specify what an action does; "unique" to specify a structural
requirement.

Dave

On 4/02/10 7:52 AM, Grand, Mark wrote:
> SQL uses both distinct and unique.
>
> Distinct is used in queries to remove duplicate records from results.
>
> Unique is used in column definitions to specify that the values in column or combination of columns must be unique and that the database should reject operations that would introduce duplicates.
>
> -----Original Message-----
> From: odersky@gmail.com [mailto:odersky@gmail.com] On Behalf Of martin odersky
> Sent: Wednesday, February 03, 2010 1:22 PM
> To: Paul Phillips
> Cc: Randall R Schulz; scala-internals@listes.epfl.ch
> Subject: Re: [scala-internals] feeling sort of out of sorts
>
> On Tue, Feb 2, 2010 at 10:17 PM, Paul Phillips wrote:
>
>> On Tue, Feb 02, 2010 at 01:02:56PM -0800, Randall R Schulz wrote:
>>
>>> Wouldn't "distinct" be more accurate?
>>>
>> From the standpoint of what words mean, yes. From the standpoint of
>> what things are usually called, no. I would gladly take either over
>> removeDuplicates.
>>
>>
> After having signed off on the code review I am getting doubts. We
> agree that distinct has the closer connotation. SQL uses it. Who uses
> unique?
>
> Cheers
>
> -- Martin
>
> This e-mail message (including any attachments) is for the sole use of
> the intended recipient(s) and may contain confidential and privileged
> information. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution
> or copying of this message (including any attachments) is strictly
> prohibited.
>
> If you have received this message in error, please contact
> the sender by reply e-mail message and destroy all copies of the
> original message (including attachments).
>

nilskp
Joined: 2009-01-30,
User offline. Last seen 1 year 27 weeks ago.
Re: feeling sort of out of sorts
On Tue, Feb 2, 2010 at 1:44 PM, Paul Phillips <paulp@improving.org> wrote:
xs.unique.sorted

instead of the current

 xs.removeDuplicates.sortWith(_ < _)

Interestingly, a seq with a low ratio of duplicates would be both faster and more memory efficient to sort first, then filter.
Mohamed Bana 2
Joined: 2009-10-21,
User offline. Last seen 42 years 45 weeks ago.
Re: feeling sort of out of sorts
Please use distinct as LINQ does.

—Mohamed


What's in a name...

but unique / distinct both work quite well.

for what it's worth, a similar method in .NET LINQ is called distinct

http://msdn.microsoft.com/en-us/library/cc716801.aspx

I suppose what you're actually getting is a Set, but set has specific meaning in the API, as a method distinct would be clear and in-line with both SQL and .NET LINQ

cheers, Louis

On 3 February 2010 19:39, David Brooks <d.brooks@auckland.ac.nz> wrote:
+1 for distinct.

I like the SQL differentiation between distinct and unique -- "distinct" to specify what an action does; "unique" to specify a structural requirement.


Dave

On 4/02/10 7:52 AM, Grand, Mark wrote:
SQL uses both distinct and unique.

Distinct is used in queries to remove duplicate records from results.

Unique is used in column definitions to specify that the values in column or combination of columns must be unique and that the database should reject operations that would introduce duplicates.

-----Original Message-----
From: odersky@gmail.com [mailto:odersky@gmail.com] On Behalf Of martin odersky
Sent: Wednesday, February 03, 2010 1:22 PM
To: Paul Phillips
Cc: Randall R Schulz; scala-internals@listes.epfl.ch
Subject: Re: [scala-internals] feeling sort of out of sorts

On Tue, Feb 2, 2010 at 10:17 PM, Paul Phillips<paulp@improving.org>  wrote:
 
On Tue, Feb 02, 2010 at 01:02:56PM -0800, Randall R Schulz wrote:
   
Wouldn't "distinct" be more accurate?
     
 From the standpoint of what words mean, yes.  From the standpoint of
what things are usually called, no.  I would gladly take either over
removeDuplicates.

   
After having signed off on the code review I am getting doubts. We
agree that distinct has the closer connotation. SQL uses it. Who uses
unique?

Cheers

Viktor Klang
Joined: 2008-12-17,
User offline. Last seen 1 year 27 weeks ago.
Re: feeling sort of out of sorts
What about "unsimilar" or the catchy "oneOfEach" ;)

On Wed, Feb 3, 2010 at 9:54 PM, Mohamed Bana <mohamed@bana.org.uk> wrote:
Please use distinct as LINQ does.

—Mohamed


What's in a name...

but unique / distinct both work quite well.

for what it's worth, a similar method in .NET LINQ is called distinct

http://msdn.microsoft.com/en-us/library/cc716801.aspx

I suppose what you're actually getting is a Set, but set has specific meaning in the API, as a method distinct would be clear and in-line with both SQL and .NET LINQ

cheers, Louis

On 3 February 2010 19:39, David Brooks <d.brooks@auckland.ac.nz> wrote:
+1 for distinct.

I like the SQL differentiation between distinct and unique -- "distinct" to specify what an action does; "unique" to specify a structural requirement.


Dave

On 4/02/10 7:52 AM, Grand, Mark wrote:
SQL uses both distinct and unique.

Distinct is used in queries to remove duplicate records from results.

Unique is used in column definitions to specify that the values in column or combination of columns must be unique and that the database should reject operations that would introduce duplicates.

-----Original Message-----
From: odersky@gmail.com [mailto:odersky@gmail.com] On Behalf Of martin odersky
Sent: Wednesday, February 03, 2010 1:22 PM
To: Paul Phillips
Cc: Randall R Schulz; scala-internals@listes.epfl.ch
Subject: Re: [scala-internals] feeling sort of out of sorts

On Tue, Feb 2, 2010 at 10:17 PM, Paul Phillips<paulp@improving.org>  wrote:
 
On Tue, Feb 02, 2010 at 01:02:56PM -0800, Randall R Schulz wrote:
   
Wouldn't "distinct" be more accurate?
     
 From the standpoint of what words mean, yes.  From the standpoint of
what things are usually called, no.  I would gladly take either over
removeDuplicates.

   
After having signed off on the code review I am getting doubts. We
agree that distinct has the closer connotation. SQL uses it. Who uses
unique?

Cheers

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: feeling sort of out of sorts

I am more and more convinced it should be "distinct". We want
for-comprehensions to be easily mappable to SQL and LINQ. So having
distinct instead of unique removes one small hurdle for that.

Besides unique seems to vary in meaning a lot (SQL: require uniqeness:
Unix: remove identical following lines, Ruby: same as
removeDuplicates). Distinct seems to be more consistent in its
meaning.

Cheers

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: feeling sort of out of sorts

I'm an easy sell on this one. Distinct it is.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland