New collections library: methods that take or return only iterators

5 replies

Fri, 2009-05-15, 12:49

dubochet

Joined: 2008-06-30,

Hello.

I would like to raise another design point with the new collections
library. This problem already existed in the old library, but I think
it is time to change it. It has to do with when to use iterABLEs (and
the whole subtree of collection classes), and when to use iterATORs.

As an example, I tripped yesterday on the "keys" and "values" methods
of maps, which return iterATORs. In my code, I had a method that
printed any iterABLE as a list. Only, when I tried printing the values
of a map, it didn't work because they were represented as an iterATOR.
Of course, it is easy to cast an iterATOR to an iterABLE (for example,
using toSequence), but it isn't always practical to think about when
to cast.

A similar problem arises for method that only take iterators as
parameters, such as for example the "hasAll" method of "Sorted".

I would like to propose the following design convention in the new
collections library:

For methods that return iterATORs:
- They are called "elements" if they return the same data as the
collection they are called upon.
- They are called"xxxElements" for whatever xxx transformation is
operated on the collection (reverse, key or values of a map, etc.).
- For every "xxxElements" method, there is a "xxx" method that does
the same transformation but returns a proper iterABLE.

For methods that take iterATORs as parameter:
- always are overloaded: one version takes the iterATOR, one the
corresponding iterABLE.

It isn't clear to me whether all methods that take a collection should
be overloaded for iterATORs. In the case of methods taking multiple
collection parameters, it is probably saner to have an all-iterABLE
and an all-iterATOR version, instead of all possible combinations
thereof.

What do you think?

Cheers,
Gilles.

Mon, 2009-05-18, 10:28

DRMacIver

Joined: 2008-09-02,

Re: New collections library: methods that take or return only

2009/5/15 Gilles Dubochet :
> Hello.
>
> I would like to raise another design point with the new collections library.
> This problem already existed in the old library, but I think it is time to
> change it. It has to do with when to use iterABLEs (and the whole subtree of
> collection classes), and when to use iterATORs.
>
> As an example, I tripped yesterday on the "keys" and "values" methods of
> maps, which return iterATORs. In my code, I had a method that printed any

Thanks for reminding me of this. I complained about it a while ago and
meant to try to see it get fixed in the new design, but totalyl forgot
to check. :-)

> iterABLE as a list. Only, when I tried printing the values of a map, it
> didn't work because they were represented as an iterATOR. Of course, it is
> easy to cast an iterATOR to an iterABLE (for example, using toSequence), but
> it isn't always practical to think about when to cast.
>
> A similar problem arises for method that only take iterators as parameters,
> such as for example the "hasAll" method of "Sorted".
>
> I would like to propose the following design convention in the new
> collections library:
>
> For methods that return iterATORs:
> - They are called "elements" if they return the same data as the collection
> they are called upon.
> - They are called"xxxElements" for whatever xxx transformation is operated
> on the collection (reverse, key or values of a map, etc.).
> - For every "xxxElements" method, there is a "xxx" method that does the
> same transformation but returns a proper iterABLE.
>
> For methods that take iterATORs as parameter:
> - always are overloaded: one version takes the iterATOR, one the
> corresponding iterABLE.
>
> It isn't clear to me whether all methods that take a collection should be
> overloaded for iterATORs. In the case of methods taking multiple collection
> parameters, it is probably saner to have an all-iterABLE and an all-iterATOR
> version, instead of all possible combinations thereof.
>
> What do you think?

The goal here is extremely sound, and thanks for raising it, but I
think the suggested design needs work. It forces convention where what
you should have is abstraction.

Suppose you have a

def xxx

and

def xxxElements

Why is this better than simply having

def xxx

and getting the iterator with xxx.elements ?

Suppose you have to overload everything that takes an Iterable to take
an Iterator. Then clearly there is a generic need to be able to use an
iterator where an iterable is expected. Iterator supports all the
functionality it needs to be iterable (it only lacks repeatability,
which I think it is not alone in amongst reasonably iterable things).
So why should Iterator[T] not extend Iterator[T]?

One of the problems with the old collections API is that it forced a
lot of "always do this" rules - like always overriding methods to
return a more specific type - and it's really important to try to
avoid having similar in the new design.

Mon, 2009-05-18, 10:28

dubochet

Joined: 2008-06-30,

Re: New collections library: methods that take or return only

>> I would like to raise another design point with the new collections
>> library.
>> This problem already existed in the old library, but I think it is
>> time to
>> change it. It has to do with when to use iterABLEs (and the whole
>> subtree of
>> collection classes), and when to use iterATORs.
>>
>> As an example, I tripped yesterday on the "keys" and "values"
>> methods of
>> maps, which return iterATORs. In my code, I had a method that
>> printed any
>
> Thanks for reminding me of this. I complained about it a while ago and
> meant to try to see it get fixed in the new design, but totalyl forgot
> to check. :-)
>
>> iterABLE as a list. Only, when I tried printing the values of a
>> map, it
>> didn't work because they were represented as an iterATOR. Of
>> course, it is
>> easy to cast an iterATOR to an iterABLE (for example, using
>> toSequence), but
>> it isn't always practical to think about when to cast.
>>
>> A similar problem arises for method that only take iterators as
>> parameters,
>> such as for example the "hasAll" method of "Sorted".
>>
>> I would like to propose the following design convention in the new
>> collections library:
>>
>> For methods that return iterATORs:
>> - They are called "elements" if they return the same data as the
>> collection
>> they are called upon.
>> - They are called"xxxElements" for whatever xxx transformation is
>> operated
>> on the collection (reverse, key or values of a map, etc.).
>> - For every "xxxElements" method, there is a "xxx" method that
>> does the
>> same transformation but returns a proper iterABLE.
>>
>> For methods that take iterATORs as parameter:
>> - always are overloaded: one version takes the iterATOR, one the
>> corresponding iterABLE.
>>
>> It isn't clear to me whether all methods that take a collection
>> should be
>> overloaded for iterATORs. In the case of methods taking multiple
>> collection
>> parameters, it is probably saner to have an all-iterABLE and an all-
>> iterATOR
>> version, instead of all possible combinations thereof.
>>
>> What do you think?
>
> The goal here is extremely sound, and thanks for raising it, but I
> think the suggested design needs work. It forces convention where what
> you should have is abstraction.
>
> Suppose you have a
>
> def xxx
>
> and
>
> def xxxElements
>
> Why is this better than simply having
>
> def xxx
>
> and getting the iterator with xxx.elements ?

I agree: from a design's perspective, the second solution is better. I
just wonder if it can be implemented in an efficient way. For example,
given an array, you can trivial implement a very efficient
"reverseElements" (or "reverseIterator", as Martin proposed). On the
other hand, if you have a simple forward-only iterator on the same
array, an efficient implementation of its reverse is impossible. I
don't say that an efficient implementation of your proposal is
impossible, but I think it needs to be addressed.

> Suppose you have to overload everything that takes an Iterable to take
> an Iterator. Then clearly there is a generic need to be able to use an
> iterator where an iterable is expected. Iterator supports all the
> functionality it needs to be iterable (it only lacks repeatability,
> which I think it is not alone in amongst reasonably iterable things).
> So why should Iterator[T] not extend Iterator[T]?

You probably mean "extend Iterable[T]". I wondered about that too. I
didn't mention it as I suspect that such a fundamental change might
come too late to be introduced in the new collection library. Still,
I'd be interested to know what the rationale behind it was. The one
reason that comes to my mind is that Matthias (the original author of
the collection library) wanted to provide a type-distinction between
iterables, which remain usable after use, from iterators, which are
consumed when used.

> One of the problems with the old collections API is that it forced a
> lot of "always do this" rules - like always overriding methods to
> return a more specific type - and it's really important to try to
> avoid having similar in the new design.

Ah, the old disagreement between software engineers and language
theorists. I, for one, think that no matter how much language theory
is applied to the design, we won't be able to do without some "design
patterns". I'd be more than happy to be proved wrong, though.

Cheers,
Gilles.

Mon, 2009-05-18, 10:37

DRMacIver

Joined: 2008-09-02,

Re: New collections library: methods that take or return only

2009/5/15 David MacIver :
> So why should Iterator[T] not extend Iterator[T]?

Sorry. "So why should Iterator[T] not extend Iterable[T]?"

Mon, 2009-05-18, 10:47

DRMacIver

Joined: 2008-09-02,

Re: New collections library: methods that take or return only

2009/5/15 Gilles Dubochet :
>> and getting the iterator with xxx.elements ?
>
> I agree: from a design's perspective, the second solution is better. I just
> wonder if it can be implemented in an efficient way. For example, given an
> array, you can trivial implement a very efficient "reverseElements" (or
> "reverseIterator", as Martin proposed). On the other hand, if you have a
> simple forward-only iterator on the same array, an efficient implementation
> of its reverse is impossible. I don't say that an efficient implementation
> of your proposal is impossible, but I think it needs to be addressed.

I think it's doable as long as the methods used are views. i.e. if
reverse returned a new array it would clearly be less efficient, but
if you have a reverse view then reverse.elements can simply return the
reverseElements method you would have defined.

>> Suppose you have to overload everything that takes an Iterable to take
>> an Iterator. Then clearly there is a generic need to be able to use an
>> iterator where an iterable is expected. Iterator supports all the
>> functionality it needs to be iterable (it only lacks repeatability,
>> which I think it is not alone in amongst reasonably iterable things).
>> So why should Iterator[T] not extend Iterator[T]?
>
> You probably mean "extend Iterable[T]". I wondered about that too. I didn't
> mention it as I suspect that such a fundamental change might come too late
> to be introduced in the new collection library. Still, I'd be interested to

The new collection library is a pretty big change. I think we can
afford a few more if they improve its usability. :-)

> know what the rationale behind it was. The one reason that comes to my mind
> is that Matthias (the original author of the collection library) wanted to
> provide a type-distinction between iterables, which remain usable after use,
> from iterators, which are consumed when used.

I agree some sort of type distinction is useful. Maybe a
ReusableIterable or something? But a type distinction which introduces
massive code duplication is not a good solution.

>> One of the problems with the old collections API is that it forced a
>> lot of "always do this" rules - like always overriding methods to
>> return a more specific type - and it's really important to try to
>> avoid having similar in the new design.
>
> Ah, the old disagreement between software engineers and language theorists.

I'm not sure which label you're intending to apply to me, but I think
the fact that I had no background in programming before I started a
career in the software industry probably tentatively classifies me as
more of a software engineer. ;-)

> I, for one, think that no matter how much language theory is applied to the
> design, we won't be able to do without some "design patterns". I'd be more
> than happy to be proved wrong, though.

There's a difference between a design pattern and having to repeat
yourself all over the place. Overloading everything to take an
Iterator is very definitely in the latter category.

Mon, 2009-05-18, 10:57

odersky

Joined: 2008-07-29,

Re: New collections library: methods that take or return only

On Fri, May 15, 2009 at 1:49 PM, Gilles Dubochet
wrote:
> Hello.
>
> I would like to raise another design point with the new collections library.
> This problem already existed in the old library, but I think it is time to
> change it. It has to do with when to use iterABLEs (and the whole subtree of
> collection classes), and when to use iterATORs.
>
> As an example, I tripped yesterday on the "keys" and "values" methods of
> maps, which return iterATORs. In my code, I had a method that printed any
> iterABLE as a list. Only, when I tried printing the values of a map, it
> didn't work because they were represented as an iterATOR. Of course, it is
> easy to cast an iterATOR to an iterABLE (for example, using toSequence), but
> it isn't always practical to think about when to cast.
>
> A similar problem arises for method that only take iterators as parameters,
> such as for example the "hasAll" method of "Sorted".
>
> I would like to propose the following design convention in the new
> collections library:
>
> For methods that return iterATORs:
> - They are called "elements" if they return the same data as the collection
> they are called upon.
> - They are called"xxxElements" for whatever xxx transformation is operated
> on the collection (reverse, key or values of a map, etc.).
> - For every "xxxElements" method, there is a "xxx" method that does the
> same transformation but returns a proper iterABLE.
>
> For methods that take iterATORs as parameter:
> - always are overloaded: one version takes the iterATOR, one the
> corresponding iterABLE.
>
> It isn't clear to me whether all methods that take a collection should be
> overloaded for iterATORs. In the case of methods taking multiple collection
> parameters, it is probably saner to have an all-iterABLE and an all-iterATOR
> version, instead of all possible combinations thereof.
>
> What do you think?
>
I like it. It would clean up things. Maybe we can be even clearer by
renaming elements to iterator. The problem is that would break a lot
of code. Well, we can deprecate first, but nevertheless...

Cheers

Scala Main Menu

New collections library: methods that take or return only iterators

Scala Quick Links

Featured News

User login