Julien Richard-Foy
One more article about the standard collections, really? Indeed, during the last 18 months a lot of work has been done on the collections side and we’ve published several blog articles and given several talks to explain the various changes or challenges we were facing. This article summarizes what is going to change from an end-user perspective.
In case you’ve thoroughly followed our previous blog posts and talks, you might not learn much from this article. Otherwise, this is the perfect opportunity to catch up on the topic in a few minutes!
The next section presents the changes that are internal to the collections implementation
but that might have some visible impact on the surface. Then, I will show why I think
that the removal of CanBuildFrom
made the API more beginner friendly. Next, I
will introduce some new operations available in the collections. Finally, I
will mention the main deprecations, the motivations behind them, and their
recommended replacements.
Under The Hood: A Cleaner Ground
The most important change in the new collections framework is that transformation
operations (such as map
or filter
) are now implemented in a way that works with both
strict collections (such as List
) and non-strict collections (such as Stream
).
This is a change because this was not the case before. Indeed, the previous
implementations were strict (they eagerly evaluated the collection elements) and had
to be overridden by non-strict collection types. You can find more details about that in
this blog post.
The good news is that the new design is more correct in the sense that you can
now implement custom non-strict collection types without having to worry about
re-implementing a ton of operations. (Some operations, though, still eagerly evaluate
the collection elements (e.g. groupBy
) and will be clearly documented.) Another benefit
is that transformation operations defined outside of the collections (like in the
cvogt/scala-extensions project)
now work with non-strict collections (such as View
or Stream
).
Speaking of non-strict collections, the View
type has been redesigned and
views should behave in a more predictable way. Also, Stream
has been
deprecated in favor of LazyList
(see the last section).
Life Without CanBuildFrom
I think the most visible change for end-users is that transformation operations
don’t use CanBuildFrom
anymore. I believe this will be quite visible despite our previous
efforts to hide CanBuildFrom
from the API documentation of the collections.
Indeed, if you take a look at the
old 2.12 List
API, the signature
shown for the map
operation does not mention CanBuildFrom
:
However, if you use this operation in your code, then your IDE reveals its actual signature:
As you can see, the type signature shown in the API documentation has been “simplified”
to make it more approachable, but I believe that this is probably introducing more
confusion to the users. Especially when you look at the
TreeMap[A, B]
API:
This type signature makes no sense: the result type can not be TreeMap[B]
since
TreeMap
takes two type parameters (the type of keys and the type
of values). Also, the function f
actually takes a key-value pair as parameter,
not just a key (as incorrectly indicated by the type A
).
CanBuildFrom
was used for good reasons, in particular the type That
shown
in the above screenshot was computed according to the type of the source
collection and the type of elements of the new collection. The case of TreeMap
is compelling: in case you transform your key-value pairs into other key-value
pairs for which the type of keys has an implicit Ordering
instance, then map
returns a TreeMap
, but if there is no such Ordering
instance then the best
collection type that can be returned is Map
. And if you transform the key-value
pairs into something that is not even a pair, then the best collection type
that can be returned is Iterable
. These three cases were supported by
a single operation implementation, and CanBuildFrom
was used to abstract over
the various possible return types.
In the new collections we wanted to have simpler type signatures so that we can show their actual signature in the API documentation, and auto-completion provided by IDEs is not scary. We achieve that by using overloading, as explained in more detail in this blog article.
In practice, this means that the new TreeMap
has three overloads of the
map
operation:
These type signatures are the actual ones and they essentially translate
“in types” what I’ve written above about the possible result types of map
according to the type of elements returned by the transformation function f
.
We believe that the new API is simpler to understand.
New And Noteworthy
We have introduced a few new operations. The following sections present some of them.
groupMap
A common pattern with the old 2.12 collections is to use groupBy
followed by mapValues
to transform the groups. For instance,
this is how we can index the names of a collection of users by
their age:
case class User(name: String, age: Int)
def namesByAge(users: Seq[User]): Map[Int, Seq[String]] =
users.groupBy(_.age).mapValues(users => users.map(_.name))
There is a subtlety in this code. The static return type is Map
but the Map
implementation actually returned is lazy and evaluates
its elements each time it is traversed (ie the users => users.map(_.name)
function is evaluated each time the Map
is traversed).
In the new collections the return type of mapValues
is a MapView
instead
of a Map
, to clearly indicate that its contents is evaluated each time it
is traversed.
Furthermore, we have introduced an operation named groupMap
that both groups elements and transforms the groups. The above code
can be rewritten as follows to take advantage of groupMap
:
def namesByAge(users: Seq[User]): Map[Int, Seq[String]] =
users.groupMap(_.age)(_.name)
The returned Map
is strict: it eagerly evaluates its elements
once. Also, the fact that it is implemented as a single operation
makes it possible to apply some optimizations that make it
~1.3x faster than the version that uses mapValues
.
InPlace
Transformation Operations
Mutable collections have a couple of new operations for transforming
their elements in place: instead of returning a new collection (like
map
and filter
do) they mutate the source collection. These
operations are suffixed with InPlace
. For instance, to remove
users whose name start with the letter J
from a buffer and then
increment their age, one can now write:
val users = ArrayBuffer(…)
users
.filterInPlace(user => !user.name.startsWith("J"))
.mapInPlace(user => user.copy(age = user.age + 1))
Deprecations For Less Confusion
A consequence of cleaning and simplifying the collections framework is that several types or operations have been deprecated in Scala 2.13 and will be removed in 2.14.
Iterable
Is The Top Collection Type
We felt that having a distinction between Traversable
and Iterable
was not
worth it, so we removed Traversable
(it is now an alias to Iterable[A]
).
Iterable[A]
is now the collection type at the top of the hierarchy.
Its only abstract member is def iterator: Iterator[A]
.
LazyList
Is Preferred Over Stream
Stream
is deprecated in favor of LazyList
. As its name suggests,
a LazyList
is a linked list whose elements are lazily evaluated. An
important semantic difference with Stream
is that in LazyList
both
the head and the tail are lazy, whereas in Stream
only the tail is lazy.
Insertion And Removal Operations Are Not Available On Generic Collections
In the old 2.12 framework, the scala.collection.Map
type has a +
and a -
operations
to add and remove entries. The semantics of these operations is to return a new collection
with the added or removed entries, without changing the source collection.
These operations are then inherited by the mutable branch of the collections. But the mutable
collection types also introduce their own insertion and removal operations, namely +=
and -=
,
which modify the source collection in place. This means that the scala.collection.mutable.Map
type
has +
and +=
, as well as -
and -=
.
Having all these operations can be handy in some cases but can also introduce confusion. If you want
to use +
or -
, then you probably wanted to use an immutable collection type in the first place…
Consequently, +
, -
and updated
have been moved from scala.collection.Map
to scala.collection.immutable.Map
,
and +
and -
have been moved from scala.collection.Set
to scala.collection.immutable.Set
We think that by deprecating these insertion and removal operations from generic collection
types and by having distinct operations between the mutable
and immutable
branches we make
the situation clearer.
Summary
In summary, the changes for end-users are the following:
- non-strict collections (such as views) are safer to use and simpler implement,
- type signatures of transformation operations (such as
map
) are simpler (no implicitCanBuildFrom
parameter), - new cool operations have been added,
- the type hierarchy is simpler (no
Traversable
), - mutable collection types do not inherit immutable insertion and removal operations.