This page is no longer maintained — Please continue to the home page at www.scala-lang.org

2.8 Collections Design Not Amenable to Alternative ADT Implementations

6 replies
daniel
Joined: 2008-08-20,
User offline. Last seen 44 weeks 14 hours ago.
Consider the concept of a set complement partially defined in the following way:

class ComplementarySet[A](without: Set[A]) extends Set[A] with SetLike[A, ComplementarySet[A]] {
  override val size = Math.MAX_INT     // should be infinite
 
  def this() = this(Set())
 
  override def empty = new ComplementarySet[A](Set())
 
  def contains(e: A) = !without.contains(e)
 
  def iterator = throw new AssertionError("Cannot iterate over a set complement")

  // ...
}

You get the idea.  The important point is this set doesn't actually contain any definite elements, it only knows what it doesn't contain.  This design works just fine in 2.7 and continues to work under 2.8 until we get to the map and flatMap methods.  Under 2.7, I can define map and flatMap in the following way:

  def map[B](f: A => B) = new ComplementarySet(without map f)

  def flatMap[B](f: A => Iterable[B]) = new ComplementarySet(without flatMap f)    // not quite right, but close enough

Unfortunately, the signatures for map and flatMap are somewhat more complex in 2.8.  Specifically, they are dependent on CanBuildFrom and do not return a concrete type.  This is a problem because the CanBuildFrom typeclass defines access to Builder, which is heavily dependent on collections with well-defined elements.  The only way to build a new instance of a collection is to enumerate its values, something that you can't do with a ComplementarySet.  At the moment, I'm "solving" this problem by providing a dummy implementation of CanBuildFrom[ComplementarSet[A], B, ComplementarySet[B]] which returns null for both methods.  I then override map and flatMap, defining them similarly to the above but casting the return value to the type inferred from the implicit CanBuildFrom parameter.

Needless to say, this isn't ideal, but I can't see a way around it.  Am I missing something in the collections design?

Daniel
Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8 Collections Design Not Amenable to Alternative ADT Impl

Hello,

If you take a look at the way ++ in Streams is implemented:

/** Create a new stream which contains all elements of this stream
* followed by all elements of Traversable `that'
* @note It's subtle why this works. We know that if the target type
* of the Builder That is either a Stream, or one of its supertypes,
or undefined,
* then StreamBuilder will be chosen for the implicit.
* we recognize that fact and optimize to get more laziness.
*/
override def ++[B >: A, That](that: Traversable[B])(implicit bf:
CanBuildFrom[Stream[A], B, That]): That = {
// we assume there is no other builder factory on streams and
therefore know that That = Stream[A]
(if (isEmpty) that.toStream
else new Stream.Cons(head, (tail ++
that).asInstanceOf[Stream[A]])).asInstanceOf[That]
}

you will see a similar situation. The map method does the same thing.

You could do this for the ComplementarySet as well.

Alex

On 04/04/2010 07:18 AM, Daniel Spiewak wrote:
> Consider the concept of a set complement partially defined in the
> following way:
>
> class ComplementarySet[A](without: Set[A]) extends Set[A] with
> SetLike[A, ComplementarySet[A]] {
> override val size = Math.MAX_INT // should be infinite
>
> def this() = this(Set())
>
> override def empty = new ComplementarySet[A](Set())
>
> def contains(e: A) = !without.contains(e)
>
> def iterator = throw new AssertionError("Cannot iterate over a set
> complement")
>
> // ...
> }
>
> You get the idea. The important point is this set doesn't actually
> contain any definite elements, it only knows what it /doesn't/ contain.
> This design works just fine in 2.7 and continues to work under 2.8 until
> we get to the map and flatMap methods. Under 2.7, I can define map and
> flatMap in the following way:
>
> def map[B](f: A => B) = new ComplementarySet(without map f)
>
> def flatMap[B](f: A => Iterable[B]) = new ComplementarySet(without
> flatMap f) // not quite right, but close enough
>
> Unfortunately, the signatures for map and flatMap are somewhat more
> complex in 2.8. Specifically, they are dependent on CanBuildFrom and do
> not return a concrete type. This is a problem because the CanBuildFrom
> typeclass defines access to Builder, which is heavily dependent on
> collections with well-defined elements. The only way to build a new
> instance of a collection is to enumerate its values, something that you
> can't do with a ComplementarySet. At the moment, I'm "solving" this
> problem by providing a dummy implementation of
> CanBuildFrom[ComplementarSet[A], B, ComplementarySet[B]] which returns
> null for both methods. I then override map and flatMap, defining them
> similarly to the above but /casting/ the return value to the type
> inferred from the implicit CanBuildFrom parameter.
>
> Needless to say, this isn't ideal, but I can't see a way around it. Am
> I missing something in the collections design?
>
> Daniel

Danielk
Joined: 2009-06-08,
User offline. Last seen 3 years 21 weeks ago.
Re: 2.8 Collections Design Not Amenable to Alternative ADT Imp
It looks like a dangerous implementation of the map method, since it is only valid if f is a bijection from A to B.

For example, consider the ComplementarySet containing all Int:s except 1, and map x*x over it. The resulting set should only contain Int:s that are squares of some Int (0,1,4,9,16,25,36...), but your set will contain (MIN_INT,...-1, 0, 2, ... MAX_INT), which is obviosly not correct.

Cheers,
Daniel

On Sun, Apr 4, 2010 at 7:18 AM, Daniel Spiewak <djspiewak@gmail.com> wrote:
Consider the concept of a set complement partially defined in the following way:

class ComplementarySet[A](without: Set[A]) extends Set[A] with SetLike[A, ComplementarySet[A]] {
  override val size = Math.MAX_INT     // should be infinite
 
  def this() = this(Set())
 
  override def empty = new ComplementarySet[A](Set())
 
  def contains(e: A) = !without.contains(e)
 
  def iterator = throw new AssertionError("Cannot iterate over a set complement")

  // ...
}

You get the idea.  The important point is this set doesn't actually contain any definite elements, it only knows what it doesn't contain.  This design works just fine in 2.7 and continues to work under 2.8 until we get to the map and flatMap methods.  Under 2.7, I can define map and flatMap in the following way:

  def map[B](f: A => B) = new ComplementarySet(without map f)

daniel
Joined: 2008-08-20,
User offline. Last seen 44 weeks 14 hours ago.
Re: Re: 2.8 Collections Design Not Amenable to Alternative ADT
I am doing exactly that.  However, doesn't it just seem like a collections design which requires hacks like this is not quite covering all of the cases it needs to?  Don't get me wrong, the Scala 2.8 collections library is a beautiful bit of API design, but I think these sorts of alternative collections (Stream, ComplementarySet, etc) define edge cases which illustrate weak points in the framework.  I can't say that I have a better solution at present, but I do believe it merits some thought.

Daniel

On Sun, Apr 4, 2010 at 4:24 AM, Aleksandar Prokopec <aleksandar.prokopec@gmail.com> wrote:
Hello,

If you take a look at the way ++ in Streams is implemented:

/** Create a new stream which contains all elements of this stream
  *  followed by all elements of Traversable `that'
  *  @note It's subtle why this works. We know that if the target type
  *  of the Builder That is either a Stream, or one of its supertypes, or undefined,
  *  then StreamBuilder will be chosen for the implicit.
  *  we recognize that fact and optimize to get more laziness.
  */
 override def ++[B >: A, That](that: Traversable[B])(implicit bf: CanBuildFrom[Stream[A], B, That]): That = {
   // we assume there is no other builder factory on streams and therefore know that That = Stream[A]
   (if (isEmpty) that.toStream
    else new Stream.Cons(head, (tail ++ that).asInstanceOf[Stream[A]])).asInstanceOf[That]
 }

you will see a similar situation. The map method does the same thing.

You could do this for the ComplementarySet as well.

Alex

On 04/04/2010 07:18 AM, Daniel Spiewak wrote:
Consider the concept of a set complement partially defined in the
following way:

class ComplementarySet[A](without: Set[A]) extends Set[A] with
SetLike[A, ComplementarySet[A]] {
  override val size = Math.MAX_INT     // should be infinite

  def this() = this(Set())

  override def empty = new ComplementarySet[A](Set())

  def contains(e: A) = !without.contains(e)

  def iterator = throw new AssertionError("Cannot iterate over a set
complement")

  // ...
}

You get the idea.  The important point is this set doesn't actually
contain any definite elements, it only knows what it /doesn't/ contain.
This design works just fine in 2.7 and continues to work under 2.8 until
we get to the map and flatMap methods.  Under 2.7, I can define map and
flatMap in the following way:

  def map[B](f: A => B) = new ComplementarySet(without map f)

  def flatMap[B](f: A => Iterable[B]) = new ComplementarySet(without
flatMap f)    // not quite right, but close enough

Unfortunately, the signatures for map and flatMap are somewhat more
complex in 2.8.  Specifically, they are dependent on CanBuildFrom and do
not return a concrete type.  This is a problem because the CanBuildFrom
typeclass defines access to Builder, which is heavily dependent on
collections with well-defined elements.  The only way to build a new
instance of a collection is to enumerate its values, something that you
can't do with a ComplementarySet.  At the moment, I'm "solving" this
problem by providing a dummy implementation of
CanBuildFrom[ComplementarSet[A], B, ComplementarySet[B]] which returns
null for both methods.  I then override map and flatMap, defining them
similarly to the above but /casting/ the return value to the type
inferred from the implicit CanBuildFrom parameter.

Needless to say, this isn't ideal, but I can't see a way around it.  Am
I missing something in the collections design?

Daniel



daniel
Joined: 2008-08-20,
User offline. Last seen 44 weeks 14 hours ago.
Re: 2.8 Collections Design Not Amenable to Alternative ADT Imp
Quite so.  In general, my ComplementarySet is unsound.  Fortunately, my use-cases are fairly limited.  Consider the regular expression /[^abc]/.  What is the set of all characters which may occur as the first character of a string matching this regexp?  The answer is ~{a, b, c, \epsilon}, which I represent as `new ComplementarySet(Set(Some('a'), Some('b'), Some('c'), None))`.  I couldn't possibly enumerate all of the characters which are not 'a', 'b', 'c' or \epsilon, so I use lazy complementation.  The mapping comes in when I flatten this set into a Set[Char] to use in predictive parsing (for http://github.com/djspiewak/gll-combinators).  I have something like this:

lazy val first = {
  val res = computeFirst(Set())    // => Set[Option[Char]]
  if (res contains None)
    new ComplementarySet(Set[Char]())
  else
    res flatMap { x => x }         // flatten into Set[Char]
}

In other words, compute the FIRST set.  If that set contains the empty string, then predict the universal set (since we can't compute a FOLLOW set).  If the set does not contain the empty string, flatten it into a Set[Char].  This Set may still be the complement of another set, which is why I needed to define map and flatMap.

Obviously, the identity function is going to be a bijection, so I think I'm pretty safe there.  :-)

Daniel

On Sun, Apr 4, 2010 at 7:35 AM, Daniel Kristensen <daniel.kristensen@gmail.com> wrote:
It looks like a dangerous implementation of the map method, since it is only valid if f is a bijection from A to B.

For example, consider the ComplementarySet containing all Int:s except 1, and map x*x over it. The resulting set should only contain Int:s that are squares of some Int (0,1,4,9,16,25,36...), but your set will contain (MIN_INT,...-1, 0, 2, ... MAX_INT), which is obviosly not correct.

Cheers,
Daniel

On Sun, Apr 4, 2010 at 7:18 AM, Daniel Spiewak <djspiewak@gmail.com> wrote:
Consider the concept of a set complement partially defined in the following way:

class ComplementarySet[A](without: Set[A]) extends Set[A] with SetLike[A, ComplementarySet[A]] {
  override val size = Math.MAX_INT     // should be infinite
 
  def this() = this(Set())
 
  override def empty = new ComplementarySet[A](Set())
 
  def contains(e: A) = !without.contains(e)
 
  def iterator = throw new AssertionError("Cannot iterate over a set complement")

  // ...
}

You get the idea.  The important point is this set doesn't actually contain any definite elements, it only knows what it doesn't contain.  This design works just fine in 2.7 and continues to work under 2.8 until we get to the map and flatMap methods.  Under 2.7, I can define map and flatMap in the following way:

  def map[B](f: A => B) = new ComplementarySet(without map f)


Danielk
Joined: 2009-06-08,
User offline. Last seen 3 years 21 weeks ago.
Re: 2.8 Collections Design Not Amenable to Alternative ADT Imp
Ok, I see. Interesting project by the way, I'll have to check that out :)

/Daniel K

2010/4/4 Daniel Spiewak <djspiewak@gmail.com>
Quite so.  In general, my ComplementarySet is unsound.  Fortunately, my use-cases are fairly limited.  Consider the regular expression /[^abc]/.  What is the set of all characters which may occur as the first character of a string matching this regexp?  The answer is ~{a, b, c, \epsilon}, which I represent as `new ComplementarySet(Set(Some('a'), Some('b'), Some('c'), None))`.  I couldn't possibly enumerate all of the characters which are not 'a', 'b', 'c' or \epsilon, so I use lazy complementation.  The mapping comes in when I flatten this set into a Set[Char] to use in predictive parsing (for http://github.com/djspiewak/gll-combinators). 

Aleksandar Prokopec
Joined: 2010-04-04,
User offline. Last seen 42 years 45 weeks ago.
Re: 2.8 Collections Design Not Amenable to Alternative ADT Impl

I agree with you here, but I'm not sure what the best alternative is.

What one could do is pattern match the builder returned by the factory
and proceed casewise. An alternative would be to have collection
building operations embedded in the builders and having a default
implementation there.

Alex

On 04/04/2010 03:39 PM, Daniel Spiewak wrote:
> I am doing exactly that. However, doesn't it just seem like a
> collections design which requires hacks like this is not quite covering
> all of the cases it needs to? Don't get me wrong, the Scala 2.8
> collections library is a beautiful bit of API design, but I think these
> sorts of alternative collections (Stream, ComplementarySet, etc) define
> edge cases which illustrate weak points in the framework. I can't say
> that I have a better solution at present, but I do believe it merits
> some thought.
>
> Daniel
>

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland