scala.collection.mutable.SortedSet

Tue, 2010-05-25, 20:57

#1

H-star Development

Joined: 2010-04-14,

Re: scala.collection.mutable.SortedSet

there is a treemap/set

Eric Leman schrieb:
>
> Why isn't there a scala.collection.mutable.SortedSet ?
> and why not a scala.collection.mutable.SortedMap ?
>
> It seems that there would be use cases for traditional
> red&black, b-tree, etc.. kind of sorted map and sets,
> is the reason for the lack of these structures that
> there are better alternative ?
> If yes, I'd like to know which ones.
>
> Thanks

Tue, 2010-05-25, 21:07

#2

H-star Development

Joined: 2010-04-14,

Re: scala.collection.mutable.SortedSet

oops, not mutable :(

Eric Leman schrieb:
>
> Why isn't there a scala.collection.mutable.SortedSet ?
> and why not a scala.collection.mutable.SortedMap ?
>
> It seems that there would be use cases for traditional
> red&black, b-tree, etc.. kind of sorted map and sets,
> is the reason for the lack of these structures that
> there are better alternative ?
> If yes, I'd like to know which ones.
>
> Thanks

Tue, 2010-05-25, 21:27

#3

ebowman

Joined: 2009-04-13,

Re: scala.collection.mutable.SortedSet

Eric Leman wrote:
>
> Why isn't there a scala.collection.mutable.SortedSet ?
> and why not a scala.collection.mutable.SortedMap ?
>
> It seems that there would be use cases for traditional
> red&black, b-tree, etc.. kind of sorted map and sets,
> is the reason for the lack of these structures that
> there are better alternative ?
> If yes, I'd like to know which ones.
>
> Thanks

It's not ideal, but you can wrap a Java TreeSet using the jcl SortedSet
and probably get most of what you want, backed by the Java one (in 2.7.7
at least).

cheers,
Eric

Tue, 2010-05-25, 21:37

#4

Eric Leman

Joined: 2010-05-25,

Re: scala.collection.mutable.SortedSet

Ok, so the reason that they are not there is more pragmatic (effort to implement them)
rather than "philosophical", is it something that is likely to exist in a future version
of the collection API ?

Thanks

On Tue, May 25, 2010 at 4:09 PM, Eric Bowman <ebowman@boboco.ie> wrote:

Eric Leman wrote:

Why isn't there a scala.collection.mutable.SortedSet ?
and why not a scala.collection.mutable.SortedMap ?

It seems that there would be use cases for traditional
red&black, b-tree, etc.. kind of sorted map and sets,
is the reason for the lack of these structures that
there are better alternative ?
If yes, I'd like to know which ones.

Thanks

It's not ideal, but you can wrap a Java TreeSet using the jcl SortedSet and probably get most of what you want, backed by the Java one (in 2.7.7 at least).

cheers,
Eric

Tue, 2010-05-25, 21:47

#5

Eric Leman

Joined: 2010-05-25,

Re: scala.collection.mutable.SortedSet

Ok, so the reason that they are not there is more pragmatic (effort to implement them)
rather than "philosophical", is it something that is likely to exist in a future version
of the collection API ?

Thanks

On Tue, May 25, 2010 at 4:09 PM, Eric Bowman <ebowman@boboco.ie> wrote:

Eric Leman wrote:

Why isn't there a scala.collection.mutable.SortedSet ?
and why not a scala.collection.mutable.SortedMap ?

It seems that there would be use cases for traditional
red&black, b-tree, etc.. kind of sorted map and sets,
is the reason for the lack of these structures that
there are better alternative ?
If yes, I'd like to know which ones.

Thanks

It's not ideal, but you can wrap a Java TreeSet using the jcl SortedSet and probably get most of what you want, backed by the Java one (in 2.7.7 at least).

cheers,
Eric

Thu, 2010-05-27, 18:27

#6

odersky

Joined: 2008-07-29,

Re: Re: scala.collection.mutable.SortedSet

On Tue, May 25, 2010 at 10:35 PM, Eric Leman <tarlaxz@gmail.com> wrote:

Ok, so the reason that they are not there is more pragmatic (effort to implement them)
rather than "philosophical", is it something that is likely to exist in a future version
of the collection API ?

If you provide such an implementation, it's appreciated. I don't see anyone else working on this right now.

Cheers

-- Martin

Thu, 2010-05-27, 18:27

#7

odersky

Joined: 2008-07-29,

Re: Re: scala.collection.mutable.SortedSet

On Tue, May 25, 2010 at 10:35 PM, Eric Leman <tarlaxz@gmail.com> wrote:

Ok, so the reason that they are not there is more pragmatic (effort to implement them)
rather than "philosophical", is it something that is likely to exist in a future version
of the collection API ?

If you provide such an implementation, it's appreciated. I don't see anyone else working on this right now.

Cheers

-- Martin

Thu, 2010-05-27, 19:17

#8

ichoran

Joined: 2009-08-14,

Re: Re: scala.collection.mutable.SortedSet

I might provide such an implementation since I've been grumblingly using the java.util versions. Is there a strong argument for any particular algorithm? (AA / 2-3 vs. AVL vs. RB / 2-3-4 vs. scapegoat vs. skip list vs. splay vs. whatever?) Or, alternatively, mutable.SortedSet could wrap a Java equivalent.

--Rex

On Thu, May 27, 2010 at 1:24 PM, martin odersky <martin.odersky@epfl.ch> wrote:

On Tue, May 25, 2010 at 10:35 PM, Eric Leman <tarlaxz@gmail.com> wrote:

Ok, so the reason that they are not there is more pragmatic (effort to implement them)
rather than "philosophical", is it something that is likely to exist in a future version
of the collection API ?

If you provide such an implementation, it's appreciated. I don't see anyone else working on this right now.

Cheers

-- Martin

Thu, 2010-05-27, 20:07

#9

nbronson

Joined: 2010-03-12,

Re: Re: scala.collection.mutable.SortedSet

Michael Spiegel's Dense Skip Trees look quite nice. They have a similar
intuition to skip lists, but better cache behavior during reads.

- Nathan

On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> I might provide such an implementation since I've been grumblingly
> using the java.util versions. Is there a strong argument for any
> particular algorithm? (AA / 2-3 vs. AVL vs. RB / 2-3-4 vs. scapegoat
> vs. skip list vs. splay vs. whatever?) Or, alternatively,
> mutable.SortedSet could wrap a Java equivalent.
>
> --Rex
>
> On Thu, May 27, 2010 at 1:24 PM, martin odersky
> wrote:
>
>
> On Tue, May 25, 2010 at 10:35 PM, Eric Leman
> wrote:
>
> Ok, so the reason that they are not there is more
> pragmatic (effort to implement them)
> rather than "philosophical", is it something that is
> likely to exist in a future version
> of the collection API ?
>
> If you provide such an implementation, it's appreciated. I
> don't see anyone else working on this right now.
>
> Cheers
>
> -- Martin
>
>
>

Fri, 2010-05-28, 02:07

#10

ichoran

Joined: 2009-08-14,

Re: Re: scala.collection.mutable.SortedSet

That does look potentially promising, but I'd value simplicity of implementation also, and dense skip trees are not obviously easy to code, nor do they have much written about them since they're new. For simplicity, I'd tend towards AA trees due to the small number of cases one needs to consider.

On the other hand, one has to evaluate why one would be using a mutable rather than an immutable sorted set. The primary argument would be if one was doing a large number of insert/deletes compared to the number of seeks. My (untested) understanding is that RB trees are less balanced than AA or AVL trees, and thus that they should be a bit faster on insert/delete. I am not sufficiently familiar with skip lists or scapegoat trees (or skip trees or dense skip trees) to know whether either of them might be preferable. I would argue that if there are lots of inserts/deletes, the splay tree feature of moving recently accessed items to the top is less likely to be a win.

--Rex

On Thu, May 27, 2010 at 3:00 PM, Nathan Bronson <nbronson@cs.stanford.edu> wrote:

Michael Spiegel's Dense Skip Trees look quite nice. They have a similar
intuition to skip lists, but better cache behavior during reads.

- Nathan

On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> I might provide such an implementation since I've been grumblingly
> using the java.util versions. Is there a strong argument for any
> particular algorithm? (AA / 2-3 vs. AVL vs. RB / 2-3-4 vs. scapegoat
> vs. skip list vs. splay vs. whatever?) Or, alternatively,
> mutable.SortedSet could wrap a Java equivalent.
>
> --Rex
>
> On Thu, May 27, 2010 at 1:24 PM, martin odersky
> <martin.odersky@epfl.ch> wrote:
>
>
> On Tue, May 25, 2010 at 10:35 PM, Eric Leman
> <tarlaxz@gmail.com> wrote:
>
> Ok, so the reason that they are not there is more
> pragmatic (effort to implement them)
> rather than "philosophical", is it something that is
> likely to exist in a future version
> of the collection API ?
>
> If you provide such an implementation, it's appreciated. I
> don't see anyone else working on this right now.
>
> Cheers
>
> -- Martin
>
>
>

Fri, 2010-05-28, 03:37

#11

nbronson

Joined: 2010-03-12,

Re: Re: scala.collection.mutable.SortedSet

In my experience (RB, AVL and treap) the bulk of the code goes into all
of the various ways you can query the tree (get, first, last, range), so
there may not be that much difference between the binary search trees in
complexity. You are right that this stuff gets more complicated with
the trees that have a non-constant branching factor.

I think that splay trees are a non-starter, because they have the
surprising property that concurrent _readers_ must use locking, even if
there are no writers. This is not the way that any of the existing
mutable collections in Java or Scala work (see Java's SimpleDateFormat
for the resulting problems).

One intriguing possibility is to try to unify the immutable and mutable
trees, similarly to Clojure's transients. For a past project I
implemented immutable and mutable red-black trees that shared the same
node implementation. The immutable map had a method named mutableClone
that returned a mutable map, and the mutable map had both clone and
immutableClone, both of which used a generation number to perform
copy-on-write. The last piece was a visitDelta function that used
reference equality to skip portions of two trees that were identical.

- Nathan

On Thu, 2010-05-27 at 21:01 -0400, Rex Kerr wrote:
> That does look potentially promising, but I'd value simplicity of
> implementation also, and dense skip trees are not obviously easy to
> code, nor do they have much written about them since they're new. For
> simplicity, I'd tend towards AA trees due to the small number of cases
> one needs to consider.
>
> On the other hand, one has to evaluate why one would be using a
> mutable rather than an immutable sorted set. The primary argument
> would be if one was doing a large number of insert/deletes compared to
> the number of seeks. My (untested) understanding is that RB trees are
> less balanced than AA or AVL trees, and thus that they should be a bit
> faster on insert/delete. I am not sufficiently familiar with skip
> lists or scapegoat trees (or skip trees or dense skip trees) to know
> whether either of them might be preferable. I would argue that if
> there are lots of inserts/deletes, the splay tree feature of moving
> recently accessed items to the top is less likely to be a win.
>
> --Rex
>
> On Thu, May 27, 2010 at 3:00 PM, Nathan Bronson
> wrote:
> Michael Spiegel's Dense Skip Trees look quite nice. They have
> a similar
> intuition to skip lists, but better cache behavior during
> reads.
>
> - Nathan
>
>
> On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> > I might provide such an implementation since I've been
> grumblingly
> > using the java.util versions. Is there a strong argument
> for any
> > particular algorithm? (AA / 2-3 vs. AVL vs. RB / 2-3-4 vs.
> scapegoat
> > vs. skip list vs. splay vs. whatever?) Or, alternatively,
> > mutable.SortedSet could wrap a Java equivalent.
> >
> > --Rex
> >
> > On Thu, May 27, 2010 at 1:24 PM, martin odersky
> > wrote:
> >
> >
> > On Tue, May 25, 2010 at 10:35 PM, Eric Leman
> > wrote:
> >
> > Ok, so the reason that they are not there
> is more
> > pragmatic (effort to implement them)
> > rather than "philosophical", is it something
> that is
> > likely to exist in a future version
> > of the collection API ?
> >
> > If you provide such an implementation, it's
> appreciated. I
> > don't see anyone else working on this right now.
> >
> > Cheers
> >
> > -- Martin
> >
> >
> >
>
>
>
>

Fri, 2010-05-28, 21:57

#12

ichoran

Joined: 2009-08-14,

Re: Re: scala.collection.mutable.SortedSet

If you can manage to unify the mutable and immutable trees in a safe way, that sounds good--beyond the scope of what I was considering doing! If you're planning on doing this, please let me know; no reason to duplicate efforts. (We don't need two mutable sorted set implementations!)

After a little reading, I think that T-trees would provide the most usefully distinct capability on top of the immutable trees we already have. (A T-tree is like a B-tree, except it has only one child off each end of its data array, not a child between every pair of elements.)

The advantage of T-trees is that they are more memory efficient when implemented mutably; a typical RB tree requires an overhead of one object plus three pointers and one color per element in the tree. A T-tree of maximum width N requires only (on average) 8/3N of an object plus 4/3 + 16/3N pointers plus 8/3N ints per element in the tree--but then you have to sort a flat array of length N. Fortunately, sorted insertion/deletion in an array of length 8 or less is pretty fast (~= object creation time), so if we set N=8 we have overhead of 1/3 of an object, 2 pointers, and 1/3 of an int per element, without a large expected time penalty. One can reduce RB trees--through carefully recursive algorithms--to requiring only one object, two pointers, and one color, but this still is a heaver load on the GC and significantly more space.

So if I were to write a mutable SortedSet, I would probably implement it with a T-tree (with T's balanced using an AA tree).

Now the question is whether I should.

--Rex

On Thu, May 27, 2010 at 10:36 PM, Nathan Bronson <nbronson@cs.stanford.edu> wrote:

In my experience (RB, AVL and treap) the bulk of the code goes into all
of the various ways you can query the tree (get, first, last, range), so
there may not be that much difference between the binary search trees in
complexity. You are right that this stuff gets more complicated with
the trees that have a non-constant branching factor.

I think that splay trees are a non-starter, because they have the
surprising property that concurrent _readers_ must use locking, even if
there are no writers. This is not the way that any of the existing
mutable collections in Java or Scala work (see Java's SimpleDateFormat
for the resulting problems).

One intriguing possibility is to try to unify the immutable and mutable
trees, similarly to Clojure's transients. For a past project I
implemented immutable and mutable red-black trees that shared the same
node implementation. The immutable map had a method named mutableClone
that returned a mutable map, and the mutable map had both clone and
immutableClone, both of which used a generation number to perform
copy-on-write. The last piece was a visitDelta function that used
reference equality to skip portions of two trees that were identical.

- Nathan

On Thu, 2010-05-27 at 21:01 -0400, Rex Kerr wrote:
> That does look potentially promising, but I'd value simplicity of
> implementation also, and dense skip trees are not obviously easy to
> code, nor do they have much written about them since they're new. For
> simplicity, I'd tend towards AA trees due to the small number of cases
> one needs to consider.
>
> On the other hand, one has to evaluate why one would be using a
> mutable rather than an immutable sorted set. The primary argument
> would be if one was doing a large number of insert/deletes compared to
> the number of seeks. My (untested) understanding is that RB trees are
> less balanced than AA or AVL trees, and thus that they should be a bit
> faster on insert/delete. I am not sufficiently familiar with skip
> lists or scapegoat trees (or skip trees or dense skip trees) to know
> whether either of them might be preferable. I would argue that if
> there are lots of inserts/deletes, the splay tree feature of moving
> recently accessed items to the top is less likely to be a win.
>
> --Rex
>
> On Thu, May 27, 2010 at 3:00 PM, Nathan Bronson
> <nbronson@cs.stanford.edu> wrote:
> Michael Spiegel's Dense Skip Trees look quite nice. They have
> a similar
> intuition to skip lists, but better cache behavior during
> reads.
>
> - Nathan
>
>
> On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> > I might provide such an implementation since I've been
> grumblingly
> > using the java.util versions. Is there a strong argument
> for any
> > particular algorithm? (AA / 2-3 vs. AVL vs. RB / 2-3-4 vs.
> scapegoat
> > vs. skip list vs. splay vs. whatever?) Or, alternatively,
> > mutable.SortedSet could wrap a Java equivalent.
> >
> > --Rex
> >
> > On Thu, May 27, 2010 at 1:24 PM, martin odersky
> > <martin.odersky@epfl.ch> wrote:
> >
> >
> > On Tue, May 25, 2010 at 10:35 PM, Eric Leman
> > <tarlaxz@gmail.com> wrote:
> >
> > Ok, so the reason that they are not there
> is more
> > pragmatic (effort to implement them)
> > rather than "philosophical", is it something
> that is
> > likely to exist in a future version
> > of the collection API ?
> >
> > If you provide such an implementation, it's
> appreciated. I
> > don't see anyone else working on this right now.
> >
> > Cheers
> >
> > -- Martin
> >
> >
> >
>
>
>
>

Fri, 2010-05-28, 22:57

#13

nbronson

Joined: 2010-03-12,

Re: Re: scala.collection.mutable.SortedSet

I knew my suggestion was dangerously close to volunteering! There are a
couple of reasonable approaches, I'll code up the core of some of them
over the weekend.

T-trees seem quite interesting. Would storing the keys and values in an
array put them outside the scope of specialization?

- Nathan

On Fri, 2010-05-28 at 16:56 -0400, Rex Kerr wrote:
> If you can manage to unify the mutable and immutable trees in a safe
> way, that sounds good--beyond the scope of what I was considering
> doing! If you're planning on doing this, please let me know; no
> reason to duplicate efforts. (We don't need two mutable sorted set
> implementations!)
>
> After a little reading, I think that T-trees would provide the most
> usefully distinct capability on top of the immutable trees we already
> have. (A T-tree is like a B-tree, except it has only one child off
> each end of its data array, not a child between every pair of
> elements.)
>
> The advantage of T-trees is that they are more memory efficient when
> implemented mutably; a typical RB tree requires an overhead of one
> object plus three pointers and one color per element in the tree. A
> T-tree of maximum width N requires only (on average) 8/3N of an object
> plus 4/3 + 16/3N pointers plus 8/3N ints per element in the tree--but
> then you have to sort a flat array of length N. Fortunately, sorted
> insertion/deletion in an array of length 8 or less is pretty fast (~=
> object creation time), so if we set N=8 we have overhead of 1/3 of an
> object, 2 pointers, and 1/3 of an int per element, without a large
> expected time penalty. One can reduce RB trees--through carefully
> recursive algorithms--to requiring only one object, two pointers, and
> one color, but this still is a heaver load on the GC and significantly
> more space.
>
> So if I were to write a mutable SortedSet, I would probably implement
> it with a T-tree (with T's balanced using an AA tree).
>
> Now the question is whether I should.
>
> --Rex
>
> On Thu, May 27, 2010 at 10:36 PM, Nathan Bronson
> wrote:
> In my experience (RB, AVL and treap) the bulk of the code goes
> into all
> of the various ways you can query the tree (get, first, last,
> range), so
> there may not be that much difference between the binary
> search trees in
> complexity. You are right that this stuff gets more
> complicated with
> the trees that have a non-constant branching factor.
>
> I think that splay trees are a non-starter, because they have
> the
> surprising property that concurrent _readers_ must use
> locking, even if
> there are no writers. This is not the way that any of the
> existing
> mutable collections in Java or Scala work (see Java's
> SimpleDateFormat
> for the resulting problems).
>
> One intriguing possibility is to try to unify the immutable
> and mutable
> trees, similarly to Clojure's transients. For a past project
> I
> implemented immutable and mutable red-black trees that shared
> the same
> node implementation. The immutable map had a method named
> mutableClone
> that returned a mutable map, and the mutable map had both
> clone and
> immutableClone, both of which used a generation number to
> perform
> copy-on-write. The last piece was a visitDelta function that
> used
> reference equality to skip portions of two trees that were
> identical.
>
> - Nathan
>
>
> On Thu, 2010-05-27 at 21:01 -0400, Rex Kerr wrote:
> > That does look potentially promising, but I'd value
> simplicity of
> > implementation also, and dense skip trees are not obviously
> easy to
> > code, nor do they have much written about them since they're
> new. For
> > simplicity, I'd tend towards AA trees due to the small
> number of cases
> > one needs to consider.
> >
> > On the other hand, one has to evaluate why one would be
> using a
> > mutable rather than an immutable sorted set. The primary
> argument
> > would be if one was doing a large number of insert/deletes
> compared to
> > the number of seeks. My (untested) understanding is that RB
> trees are
> > less balanced than AA or AVL trees, and thus that they
> should be a bit
> > faster on insert/delete. I am not sufficiently familiar
> with skip
> > lists or scapegoat trees (or skip trees or dense skip trees)
> to know
> > whether either of them might be preferable. I would argue
> that if
> > there are lots of inserts/deletes, the splay tree feature of
> moving
> > recently accessed items to the top is less likely to be a
> win.
> >
> > --Rex
> >
> > On Thu, May 27, 2010 at 3:00 PM, Nathan Bronson
> > wrote:
> > Michael Spiegel's Dense Skip Trees look quite nice.
> They have
> > a similar
> > intuition to skip lists, but better cache behavior
> during
> > reads.
> >
> > - Nathan
> >
> >
> > On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> > > I might provide such an implementation since I've
> been
> > grumblingly
> > > using the java.util versions. Is there a strong
> argument
> > for any
> > > particular algorithm? (AA / 2-3 vs. AVL vs. RB /
> 2-3-4 vs.
> > scapegoat
> > > vs. skip list vs. splay vs. whatever?) Or,
> alternatively,
> > > mutable.SortedSet could wrap a Java equivalent.
> > >
> > > --Rex
> > >
> > > On Thu, May 27, 2010 at 1:24 PM, martin odersky
> > > wrote:
> > >
> > >
> > > On Tue, May 25, 2010 at 10:35 PM, Eric
> Leman
> > > wrote:
> > >
> > > Ok, so the reason that they are
> not there
> > is more
> > > pragmatic (effort to implement
> them)
> > > rather than "philosophical", is it
> something
> > that is
> > > likely to exist in a future
> version
> > > of the collection API ?
> > >
> > > If you provide such an implementation,
> it's
> > appreciated. I
> > > don't see anyone else working on this
> right now.
> > >
> > > Cheers
> > >
> > > -- Martin
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>

Fri, 2010-05-28, 23:17

#14

Eric Leman

Joined: 2010-05-25,

Re: Re: scala.collection.mutable.SortedSet

This community rocks !!!
I was afraid to get a response like there isn't a mutable sorted set because you don't need one !!!

Now, I'm just a bit ashamed of not volunteering to write it
... and only having the lame excuse of ... not having the time ! ;)

On Fri, May 28, 2010 at 5:50 PM, Nathan Bronson <nbronson@cs.stanford.edu> wrote:

I knew my suggestion was dangerously close to volunteering! There are a
couple of reasonable approaches, I'll code up the core of some of them
over the weekend.

T-trees seem quite interesting. Would storing the keys and values in an
array put them outside the scope of specialization?

- Nathan

On Fri, 2010-05-28 at 16:56 -0400, Rex Kerr wrote:
> If you can manage to unify the mutable and immutable trees in a safe
> way, that sounds good--beyond the scope of what I was considering
> doing! If you're planning on doing this, please let me know; no
> reason to duplicate efforts. (We don't need two mutable sorted set
> implementations!)
>
> After a little reading, I think that T-trees would provide the most
> usefully distinct capability on top of the immutable trees we already
> have. (A T-tree is like a B-tree, except it has only one child off
> each end of its data array, not a child between every pair of
> elements.)
>
> The advantage of T-trees is that they are more memory efficient when
> implemented mutably; a typical RB tree requires an overhead of one
> object plus three pointers and one color per element in the tree. A
> T-tree of maximum width N requires only (on average) 8/3N of an object
> plus 4/3 + 16/3N pointers plus 8/3N ints per element in the tree--but
> then you have to sort a flat array of length N. Fortunately, sorted
> insertion/deletion in an array of length 8 or less is pretty fast (~=
> object creation time), so if we set N=8 we have overhead of 1/3 of an
> object, 2 pointers, and 1/3 of an int per element, without a large
> expected time penalty. One can reduce RB trees--through carefully
> recursive algorithms--to requiring only one object, two pointers, and
> one color, but this still is a heaver load on the GC and significantly
> more space.
>
> So if I were to write a mutable SortedSet, I would probably implement
> it with a T-tree (with T's balanced using an AA tree).
>
> Now the question is whether I should.
>
> --Rex
>
> On Thu, May 27, 2010 at 10:36 PM, Nathan Bronson
> <nbronson@cs.stanford.edu> wrote:
> In my experience (RB, AVL and treap) the bulk of the code goes
> into all
> of the various ways you can query the tree (get, first, last,
> range), so
> there may not be that much difference between the binary
> search trees in
> complexity. You are right that this stuff gets more
> complicated with
> the trees that have a non-constant branching factor.
>
> I think that splay trees are a non-starter, because they have
> the
> surprising property that concurrent _readers_ must use
> locking, even if
> there are no writers. This is not the way that any of the
> existing
> mutable collections in Java or Scala work (see Java's
> SimpleDateFormat
> for the resulting problems).
>
> One intriguing possibility is to try to unify the immutable
> and mutable
> trees, similarly to Clojure's transients. For a past project
> I
> implemented immutable and mutable red-black trees that shared
> the same
> node implementation. The immutable map had a method named
> mutableClone
> that returned a mutable map, and the mutable map had both
> clone and
> immutableClone, both of which used a generation number to
> perform
> copy-on-write. The last piece was a visitDelta function that
> used
> reference equality to skip portions of two trees that were
> identical.
>
> - Nathan
>
>
> On Thu, 2010-05-27 at 21:01 -0400, Rex Kerr wrote:
> > That does look potentially promising, but I'd value
> simplicity of
> > implementation also, and dense skip trees are not obviously
> easy to
> > code, nor do they have much written about them since they're
> new. For
> > simplicity, I'd tend towards AA trees due to the small
> number of cases
> > one needs to consider.
> >
> > On the other hand, one has to evaluate why one would be
> using a
> > mutable rather than an immutable sorted set. The primary
> argument
> > would be if one was doing a large number of insert/deletes
> compared to
> > the number of seeks. My (untested) understanding is that RB
> trees are
> > less balanced than AA or AVL trees, and thus that they
> should be a bit
> > faster on insert/delete. I am not sufficiently familiar
> with skip
> > lists or scapegoat trees (or skip trees or dense skip trees)
> to know
> > whether either of them might be preferable. I would argue
> that if
> > there are lots of inserts/deletes, the splay tree feature of
> moving
> > recently accessed items to the top is less likely to be a
> win.
> >
> > --Rex
> >
> > On Thu, May 27, 2010 at 3:00 PM, Nathan Bronson
> > <nbronson@cs.stanford.edu> wrote:
> > Michael Spiegel's Dense Skip Trees look quite nice.
> They have
> > a similar
> > intuition to skip lists, but better cache behavior
> during
> > reads.
> >
> > - Nathan
> >
> >
> > On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> > > I might provide such an implementation since I've
> been
> > grumblingly
> > > using the java.util versions. Is there a strong
> argument
> > for any
> > > particular algorithm? (AA / 2-3 vs. AVL vs. RB /
> 2-3-4 vs.
> > scapegoat
> > > vs. skip list vs. splay vs. whatever?) Or,
> alternatively,
> > > mutable.SortedSet could wrap a Java equivalent.
> > >
> > > --Rex
> > >
> > > On Thu, May 27, 2010 at 1:24 PM, martin odersky
> > > <martin.odersky@epfl.ch> wrote:
> > >
> > >
> > > On Tue, May 25, 2010 at 10:35 PM, Eric
> Leman
> > > <tarlaxz@gmail.com> wrote:
> > >
> > > Ok, so the reason that they are
> not there
> > is more
> > > pragmatic (effort to implement
> them)
> > > rather than "philosophical", is it
> something
> > that is
> > > likely to exist in a future
> version
> > > of the collection API ?
> > >
> > > If you provide such an implementation,
> it's
> > appreciated. I
> > > don't see anyone else working on this
> right now.
> > >
> > > Cheers
> > >
> > > -- Martin
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>

Mon, 2010-05-31, 07:57

#15

pkolaczk

Joined: 2010-01-14,

Re: Re: scala.collection.mutable.SortedSet

W dniu 2010-05-29 00:08, Eric Leman pisze:
>
> This community rocks !!!
> I was afraid to get a response like there isn't a mutable sorted set
> because you don't need one !!!
>
> Now, I'm just a bit ashamed of not volunteering to write it
> ... and only having the lame excuse of ... not having the time ! ;)

By the way, because there is no mutable.SortedSet / SortedMap, I cannot
easily build a sorted MultiSet/MultiMap. The MultiMap trait requires to
be mixed into mutable classes and we don't have immutable.MultiMap.
Would it be possible in future releases of Scala to add an
immutable.MultiMap working in a similar manner as mutable.MultiMap?
I really, really need one (together with mutable.SortedSet and
SortedMap). :)

Regards,
Piotr Kołaczkowski

Mon, 2010-05-31, 18:57

#16

nbronson

Joined: 2010-03-12,

Re: Re: scala.collection.mutable.SortedSet

Rex,

I spend some time on a mutable T-tree, but was unable to get its
performance to match a binary tree. The problem seems to be that
navigating the tree requires 2 comparisons at each node (to check the
lower and the upper keys present), rather than the 1 for a normal tree.
Since the branching factor of the T-tree is still 2 (unlike a B-tree),
the height of the tree is not that much smaller than the equivalent
binary tree (unless I make the nodes very fat).

I'm still working on it.

- Nathan

On Fri, 2010-05-28 at 16:56 -0400, Rex Kerr wrote:
> If you can manage to unify the mutable and immutable trees in a safe
> way, that sounds good--beyond the scope of what I was considering
> doing! If you're planning on doing this, please let me know; no
> reason to duplicate efforts. (We don't need two mutable sorted set
> implementations!)
>
> After a little reading, I think that T-trees would provide the most
> usefully distinct capability on top of the immutable trees we already
> have. (A T-tree is like a B-tree, except it has only one child off
> each end of its data array, not a child between every pair of
> elements.)
>
> The advantage of T-trees is that they are more memory efficient when
> implemented mutably; a typical RB tree requires an overhead of one
> object plus three pointers and one color per element in the tree. A
> T-tree of maximum width N requires only (on average) 8/3N of an object
> plus 4/3 + 16/3N pointers plus 8/3N ints per element in the tree--but
> then you have to sort a flat array of length N. Fortunately, sorted
> insertion/deletion in an array of length 8 or less is pretty fast (~=
> object creation time), so if we set N=8 we have overhead of 1/3 of an
> object, 2 pointers, and 1/3 of an int per element, without a large
> expected time penalty. One can reduce RB trees--through carefully
> recursive algorithms--to requiring only one object, two pointers, and
> one color, but this still is a heaver load on the GC and significantly
> more space.
>
> So if I were to write a mutable SortedSet, I would probably implement
> it with a T-tree (with T's balanced using an AA tree).
>
> Now the question is whether I should.
>
> --Rex
>
> On Thu, May 27, 2010 at 10:36 PM, Nathan Bronson
> wrote:
> In my experience (RB, AVL and treap) the bulk of the code goes
> into all
> of the various ways you can query the tree (get, first, last,
> range), so
> there may not be that much difference between the binary
> search trees in
> complexity. You are right that this stuff gets more
> complicated with
> the trees that have a non-constant branching factor.
>
> I think that splay trees are a non-starter, because they have
> the
> surprising property that concurrent _readers_ must use
> locking, even if
> there are no writers. This is not the way that any of the
> existing
> mutable collections in Java or Scala work (see Java's
> SimpleDateFormat
> for the resulting problems).
>
> One intriguing possibility is to try to unify the immutable
> and mutable
> trees, similarly to Clojure's transients. For a past project
> I
> implemented immutable and mutable red-black trees that shared
> the same
> node implementation. The immutable map had a method named
> mutableClone
> that returned a mutable map, and the mutable map had both
> clone and
> immutableClone, both of which used a generation number to
> perform
> copy-on-write. The last piece was a visitDelta function that
> used
> reference equality to skip portions of two trees that were
> identical.
>
> - Nathan
>
>
> On Thu, 2010-05-27 at 21:01 -0400, Rex Kerr wrote:
> > That does look potentially promising, but I'd value
> simplicity of
> > implementation also, and dense skip trees are not obviously
> easy to
> > code, nor do they have much written about them since they're
> new. For
> > simplicity, I'd tend towards AA trees due to the small
> number of cases
> > one needs to consider.
> >
> > On the other hand, one has to evaluate why one would be
> using a
> > mutable rather than an immutable sorted set. The primary
> argument
> > would be if one was doing a large number of insert/deletes
> compared to
> > the number of seeks. My (untested) understanding is that RB
> trees are
> > less balanced than AA or AVL trees, and thus that they
> should be a bit
> > faster on insert/delete. I am not sufficiently familiar
> with skip
> > lists or scapegoat trees (or skip trees or dense skip trees)
> to know
> > whether either of them might be preferable. I would argue
> that if
> > there are lots of inserts/deletes, the splay tree feature of
> moving
> > recently accessed items to the top is less likely to be a
> win.
> >
> > --Rex
> >
> > On Thu, May 27, 2010 at 3:00 PM, Nathan Bronson
> > wrote:
> > Michael Spiegel's Dense Skip Trees look quite nice.
> They have
> > a similar
> > intuition to skip lists, but better cache behavior
> during
> > reads.
> >
> > - Nathan
> >
> >
> > On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> > > I might provide such an implementation since I've
> been
> > grumblingly
> > > using the java.util versions. Is there a strong
> argument
> > for any
> > > particular algorithm? (AA / 2-3 vs. AVL vs. RB /
> 2-3-4 vs.
> > scapegoat
> > > vs. skip list vs. splay vs. whatever?) Or,
> alternatively,
> > > mutable.SortedSet could wrap a Java equivalent.
> > >
> > > --Rex
> > >
> > > On Thu, May 27, 2010 at 1:24 PM, martin odersky
> > > wrote:
> > >
> > >
> > > On Tue, May 25, 2010 at 10:35 PM, Eric
> Leman
> > > wrote:
> > >
> > > Ok, so the reason that they are
> not there
> > is more
> > > pragmatic (effort to implement
> them)
> > > rather than "philosophical", is it
> something
> > that is
> > > likely to exist in a future
> version
> > > of the collection API ?
> > >
> > > If you provide such an implementation,
> it's
> > appreciated. I
> > > don't see anyone else working on this
> right now.
> > >
> > > Cheers
> > >
> > > -- Martin
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>

Mon, 2010-05-31, 19:37

#17

ichoran

Joined: 2009-08-14,

Re: Re: scala.collection.mutable.SortedSet

Nathan,

Hm. What kind of benchmarks are you testing it on? I was assuming a heavy insert/delete usage, since otherwise one may as well use the immutable flavor.

In that case, in a binary tree one expects one compare 50% of the time ("less than") and two compares 50% of the time ("equals"--to check if you've hit the node you're looking for). In a T tree, one expects during traversal to compare 50% of the time ("less than") and to do two compares 50% of the time ("greater than"). Since these compares are on _different values_ it may be that the cache miss penalty is greater, and this would give up to 2x worse performance. However, in the final tail of the tree, you have binary search instead of tree-walking which has half the number of pointer dereferences (on average for about 2 1/2 branch points), which should be a win again. Thus, for binary trees of depth 5 or 6 (i.e. <50 nodes or so), I'd expect the two to be comparable in speed. For deeper trees, depending on the miss penalty, it could be up to twice as bad (although I wouldn't expect it to be more than 50%). On the other hand, I'd expect the ability to drop and add nodes locally without having to rebalance anything to significantly reduce the rebalance cost, and the fact that you need fewer object allocations to significantly reduce the GC cost, which should offset the slower lookup in add/delete-heavy tests.

But there's nothing like having working implementations to compare--so if you're seeing the T trees perform significantly worse across the board, then I guess that is the answer. I wonder if this is in part due to the JIT compiler having an easier time optimizing the binary tree code?

Anyway, if you would like me to try to optimize the performance of your T tree implementation, send me the code off list and I'll see what I can do. (I do quite a bit of performance testing / tuning, so I'm pretty well set up here to do it with as little effort as possible.) Otherwise, I guess the verdict is that T trees are not in practice so suitable for a generic implementation after all, and my apologies for the wild goose chase.

--Rex

On Mon, May 31, 2010 at 1:52 PM, Nathan Bronson <nbronson@cs.stanford.edu> wrote:

Rex,

I spend some time on a mutable T-tree, but was unable to get its
performance to match a binary tree. The problem seems to be that
navigating the tree requires 2 comparisons at each node (to check the
lower and the upper keys present), rather than the 1 for a normal tree.
Since the branching factor of the T-tree is still 2 (unlike a B-tree),
the height of the tree is not that much smaller than the equivalent
binary tree (unless I make the nodes very fat).

I'm still working on it.

- Nathan

On Fri, 2010-05-28 at 16:56 -0400, Rex Kerr wrote:
> If you can manage to unify the mutable and immutable trees in a safe
> way, that sounds good--beyond the scope of what I was considering
> doing! If you're planning on doing this, please let me know; no
> reason to duplicate efforts. (We don't need two mutable sorted set
> implementations!)
>
> After a little reading, I think that T-trees would provide the most
> usefully distinct capability on top of the immutable trees we already
> have. (A T-tree is like a B-tree, except it has only one child off
> each end of its data array, not a child between every pair of
> elements.)
>
> The advantage of T-trees is that they are more memory efficient when
> implemented mutably; a typical RB tree requires an overhead of one
> object plus three pointers and one color per element in the tree. A
> T-tree of maximum width N requires only (on average) 8/3N of an object
> plus 4/3 + 16/3N pointers plus 8/3N ints per element in the tree--but
> then you have to sort a flat array of length N. Fortunately, sorted
> insertion/deletion in an array of length 8 or less is pretty fast (~=
> object creation time), so if we set N=8 we have overhead of 1/3 of an
> object, 2 pointers, and 1/3 of an int per element, without a large
> expected time penalty. One can reduce RB trees--through carefully
> recursive algorithms--to requiring only one object, two pointers, and
> one color, but this still is a heaver load on the GC and significantly
> more space.
>
> So if I were to write a mutable SortedSet, I would probably implement
> it with a T-tree (with T's balanced using an AA tree).
>
> Now the question is whether I should.
>
> --Rex
>
> On Thu, May 27, 2010 at 10:36 PM, Nathan Bronson
> <nbronson@cs.stanford.edu> wrote:
> In my experience (RB, AVL and treap) the bulk of the code goes
> into all
> of the various ways you can query the tree (get, first, last,
> range), so
> there may not be that much difference between the binary
> search trees in
> complexity. You are right that this stuff gets more
> complicated with
> the trees that have a non-constant branching factor.
>
> I think that splay trees are a non-starter, because they have
> the
> surprising property that concurrent _readers_ must use
> locking, even if
> there are no writers. This is not the way that any of the
> existing
> mutable collections in Java or Scala work (see Java's
> SimpleDateFormat
> for the resulting problems).
>
> One intriguing possibility is to try to unify the immutable
> and mutable
> trees, similarly to Clojure's transients. For a past project
> I
> implemented immutable and mutable red-black trees that shared
> the same
> node implementation. The immutable map had a method named
> mutableClone
> that returned a mutable map, and the mutable map had both
> clone and
> immutableClone, both of which used a generation number to
> perform
> copy-on-write. The last piece was a visitDelta function that
> used
> reference equality to skip portions of two trees that were
> identical.
>
> - Nathan
>
>
> On Thu, 2010-05-27 at 21:01 -0400, Rex Kerr wrote:
> > That does look potentially promising, but I'd value
> simplicity of
> > implementation also, and dense skip trees are not obviously
> easy to
> > code, nor do they have much written about them since they're
> new. For
> > simplicity, I'd tend towards AA trees due to the small
> number of cases
> > one needs to consider.
> >
> > On the other hand, one has to evaluate why one would be
> using a
> > mutable rather than an immutable sorted set. The primary
> argument
> > would be if one was doing a large number of insert/deletes
> compared to
> > the number of seeks. My (untested) understanding is that RB
> trees are
> > less balanced than AA or AVL trees, and thus that they
> should be a bit
> > faster on insert/delete. I am not sufficiently familiar
> with skip
> > lists or scapegoat trees (or skip trees or dense skip trees)
> to know
> > whether either of them might be preferable. I would argue
> that if
> > there are lots of inserts/deletes, the splay tree feature of
> moving
> > recently accessed items to the top is less likely to be a
> win.
> >
> > --Rex
> >
> > On Thu, May 27, 2010 at 3:00 PM, Nathan Bronson
> > <nbronson@cs.stanford.edu> wrote:
> > Michael Spiegel's Dense Skip Trees look quite nice.
> They have
> > a similar
> > intuition to skip lists, but better cache behavior
> during
> > reads.
> >
> > - Nathan
> >
> >
> > On Thu, 2010-05-27 at 14:12 -0400, Rex Kerr wrote:
> > > I might provide such an implementation since I've
> been
> > grumblingly
> > > using the java.util versions. Is there a strong
> argument
> > for any
> > > particular algorithm? (AA / 2-3 vs. AVL vs. RB /
> 2-3-4 vs.
> > scapegoat
> > > vs. skip list vs. splay vs. whatever?) Or,
> alternatively,
> > > mutable.SortedSet could wrap a Java equivalent.
> > >
> > > --Rex
> > >
> > > On Thu, May 27, 2010 at 1:24 PM, martin odersky
> > > <martin.odersky@epfl.ch> wrote:
> > >
> > >
> > > On Tue, May 25, 2010 at 10:35 PM, Eric
> Leman
> > > <tarlaxz@gmail.com> wrote:
> > >
> > > Ok, so the reason that they are
> not there
> > is more
> > > pragmatic (effort to implement
> them)
> > > rather than "philosophical", is it
> something
> > that is
> > > likely to exist in a future
> version
> > > of the collection API ?
> > >
> > > If you provide such an implementation,
> it's
> > appreciated. I
> > > don't see anyone else working on this
> right now.
> > >
> > > Cheers
> > >
> > > -- Martin
> > >
> > >
> > >
> >
> >
> >
> >
>
>
>
>

Thu, 2010-06-17, 12:37

#18

Matthew Pocock

Joined: 2010-02-04,

Re: Re: scala.collection.mutable.SortedSet

I've a style question that relates to this thread on data structures. I wrote an implementation of Tries (suffix trees), taken from Okasaki [1]. It looks something like this:

class Trie[K, V](value: Option[V], children: Trie.ChildMap[K, Trie[K, V]]) extends Map[Seq[K], V] {
type ChildMap[_, _] <: Map[_, _]

...
}

Is this a reasonable strategy for these kinds of data structures that piggyback upon simpler implementations? Very often, you would not want to fix upon a concrete map implementation for the children when developing the algorithms, but it is important to pick an appropriate one when you use the data structure.

For example, consider the case where you are indexing paths through a graph where each node is identified by a URI String - you'd want a trie to key over lists of node ID strings, where the child map was a trie keyed over lists of char in the strings, where that base map was optimised for a small number of unique, numeric keys derived from the chars. This is one of the things that ML-style functors do quite nicely. Are nested types the canonical way to do this in scala?

I had another slight glitch with this. Initially I had Trie extend Map. However, most of the map methods seemed to return Map, and I couldn't see how to convince them to instead return Trie. This was a show-stopper for implementing some of the recursive methods needed to maintain the trie. So I either had to implement my own management functions and then expose them through the map interface, or not extend Map but provide all the same method names so at least I get source-level compatibility. This was annoying.

Matthew

Fri, 2010-06-18, 22:27

#19

nbronson

Joined: 2010-03-12,

Re: Re: scala.collection.mutable.SortedSet

Do you need to parameterize on children's type, or just on its
implementation?

I can think of two reasonable strategies:
1) pass around a factory instance, possibly implicitly
2) let nodes be prototypes for new nodes

Factory instance:

class Trie[K,V](
value: Option[V], children: Map[K,Trie[K,V]])(
implicit childMapBuilder: () => Map[K,Trie[K,V]]) {
...
... {
val newChild = new Trie(v, childMapBuilder())
}
}

Prototype nodes:

abstract class Trie[K,V](
value: Option[V], children: Map[K,Trie[K,V]]) {
def newNode(v: Option[V]): Trie[K,V]
...
... {
val newChild = newNode(v)
}
}

class MyTrie[K,V](
v: Option[V]) extends Trie[K,V](v, new HashMap[K,V]) {
def newNode(v: Option[V]) = new MyTrie[K,V](v)
}

On Thu, 2010-06-17 at 12:29 +0100, Matthew Pocock wrote:
> I've a style question that relates to this thread on data structures.
> I wrote an implementation of Tries (suffix trees), taken from Okasaki
> [1]. It looks something like this:
>
> class Trie[K, V](value: Option[V], children: Trie.ChildMap[K, Trie[K,
> V]]) extends Map[Seq[K], V] {
> type ChildMap[_, _] <: Map[_, _]
>
> ...
> }
>
> Is this a reasonable strategy for these kinds of data structures that
> piggyback upon simpler implementations? Very often, you would not want
> to fix upon a concrete map implementation for the children when
> developing the algorithms, but it is important to pick an appropriate
> one when you use the data structure.
>
> For example, consider the case where you are indexing paths through a
> graph where each node is identified by a URI String - you'd want a
> trie to key over lists of node ID strings, where the child map was a
> trie keyed over lists of char in the strings, where that base map was
> optimised for a small number of unique, numeric keys derived from the
> chars. This is one of the things that ML-style functors do quite
> nicely. Are nested types the canonical way to do this in scala?
>
> I had another slight glitch with this. Initially I had Trie extend
> Map. However, most of the map methods seemed to return Map, and I
> couldn't see how to convince them to instead return Trie. This was a
> show-stopper for implementing some of the recursive methods needed to
> maintain the trie. So I either had to implement my own management
> functions and then expose them through the map interface, or not
> extend Map but provide all the same method names so at least I get
> source-level compatibility. This was annoying.
>
> Matthew

Sun, 2010-06-20, 17:37

#20

Matthew Pocock

Joined: 2010-02-04,

Re: Re: scala.collection.mutable.SortedSet

On 18 June 2010 22:17, Nathan Bronson <nbronson@cs.stanford.edu> wrote:

Do you need to parameterize on children's type, or just on its
implementation?

Thanks - good point. As far as the Trie implementation cares, it is a Map. No need for the trie to know any more than that, as long as it can make instances somehow.

On to the next issue - how to make +/-/... return a Trie, rather than Map

Matthew

Mon, 2010-06-21, 19:37

#21

Colin Bullock

Joined: 2009-01-23,

Re: Re: scala.collection.mutable.SortedSet

On to the next issue - how to make +/-/... return a Trie, rather than Map

You may want to have a look at the implementation for the new 2.8 collections, as they solve this specific problem.

- Colin

Mon, 2010-06-21, 20:57

#22

Matthew Pocock

Joined: 2010-02-04,

Re: Re: scala.collection.mutable.SortedSet

I got part way by extending MapLike, but the version I was compiling against (RC2) had + returning Map, not This. It did solve the problem with - though.

Matthew

On 21 June 2010 19:27, Colin Bullock <cmbullock@gmail.com> wrote:

On to the next issue - how to make +/-/... return a Trie, rather than Map

You may want to have a look at the implementation for the new 2.8 collections, as they solve this specific problem.

- Colin

Scala Main Menu

Scala Quick Links

Featured News

User login