- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Performance of immutable TreeMap/TreeSet implementations
Wed, 2011-12-21, 19:43
Hi all,
(please let me know if this is not the right list for discussing this)
Recently I wanted to use the collection.immutable.TreeMap implementation with some fairly large trees (10k - 200k elements). Unfortunately it turns out that only the basic lookup/add/delete operations perform well. All others are usually O(n), even simple operations like head.
There seem to be three root causes:
1. Most operations use the default implementations from IterableLike/TraversableLike, so many of these operations are defined in terms of 'iterator'. If you look at the definition of iterator in TreeSet/TreeMap you'll find that they use RedBlack#Tree.toStream, which constructs a fully realized stream of elements in the tree upon _creation_ (so it is an O(n) space/time operation even before starting any iteration).
2. The range related operations (range, from, to, until) construct a new tree by splitting the existing tree. Unfortunately, the size of the new tree is calculated by calling RedBlack#Tree.count. This is an O(n) operation (also see SI-4026).
3. The RedBlack implementation uses nested classes and is extended by TreeMap/TreeSet. That means that any node in the tree has a pointer to the root of the tree itself. This is an obvious source of space leaks, especially when using structural sharing. See for example SI-4084, although that particular issue may have been fixed already.
To fix the first issue I created a custom iterator for RedBlack trees (RedBlack#TreeIterator) that is constructed on O(n log n) time/space and is also much quicker at iteration. I've also added custom implementations for head/last/headOption/lastOption/init/tail. This can be found on the "trees-iterator" branch at http://github.com/erikrozendaal/scala. The diff is here: https://github.com/erikrozendaal/scala/compare/master...trees-iterator
Here's a quick overview of the performance difference of various operations. Take a large pinch of salt when looking at the numbers, since getting consistent benchmarks out of the JVM seems to be a black art. The benchmark code can be found at https://gist.github.com/1507127
Master build number is 2.10.0.rdev-4019-2011-12-20-gbba3b00.
TreeSet containing 10000 random ints.
- foreach: master: 2443 ops/sec, trees-iterator: same.
- head: master: 354 ops/sec, trees-iterator: 10M ops/sec
- last: master: 322 ops/sec, trees-iterator: 10M ops/sec
- tail: master: 41 ops/sec, trees-iterator: 180K ops/sec
- init: master: 89 ops/sec, trees-iterator: 138K ops/sec
- iterator: master: 360 ops/sec, trees-iterator: 1.8M ops/sec
- iterator.size: master: 41 ops/sec, trees-iterator: 1428 ops/sec
(Note that 'iterator.size' is much slower than the 'last' operation, which is almost as fast as 'head'. This is due to 'head' and 'iterator.size' using an iterator, while 'last' is defined in TraversableLike and uses the faster RedBlack#Tree.foreach operation directly. 'last' would be even faster if it didn't call 'head' first).
The quick fix for the second issue is to turn the count method in RedBlack#NonEmpty into a val. This speeds up range operations and also allows O(log n) versions of drop/take/dropRight/takeRight/slice/splitAt by using structural tree sharing. Also takeWhile/dropWhile/span can be implemented more efficiently. This change may effect serialization compatibility though (is this a problem?) and also increases the memory used per RedBlack node. See the trees-ranges branch (diff:https://github.com/erikrozendaal/scala/compare/trees-iterator...trees-ranges).
Compared to master and trees-iterator the performance changes are:
TreeSet containing 10000 random ints.
- drop(5): master: 33 ops/sec, trees-iterator: 113 ops/sec, trees-ranges: 270K ops/sec
- take(5): master: 381 ops/sec, trees-iterator: 1M ops/sec, trees-ranges: 1M ops/sec
- drop(5000): master: 39 ops/sec, trees-iterator: 225 ops/sec, trees-ranges: 326K ops/sec
- take(5000): master: 59 ops/sec, trees-iterator: 262 ops/sec, trees-ranges: 356K ops/sec
- dropWhile(_ < 0): master: 241 ops/sec, trees-iterator: 242 ops/sec, trees-ranges: 2822 ops/sec
- takeWhile(_ < 0): master: 61 ops/sec, trees-iterator: 265 ops/sec, trees-ranges: 2812 ops/sec
The only way I see to fix issue 3 is to changed the nested RedBlack classes into ordinary classes. This obviously breaks binary and source compatibility (TreeMap/TreeSet can no longer extend this class, but my suspicion is that RedBlack is not meant to be used outside of the scala library). But this does fix the space leak and brings back down the memory used by a RedBlack node. The code can be found in the trees-redblack branch and also includes some micro-optimizations for head/tail functionality. Seehttps://github.com/erikrozendaal/scala/compare/trees-ranges...trees-redblack. Also note that this breaks test/files/run/t2873.scala, since it uses RedBlack#Empty for testing for the absence of a compiler bug.
Finally, these patches are mainly about getting usable big-O performance out of TreeMap/TreeSet. A lot can still be done to increase efficiency (smarter use of the Ordering class to avoid additional comparisons, etc).
Any chance that these patches can become part of 2.10? What about 2.9 for at least the binary compatible changes?
Regards,
Erik
Wed, 2011-12-21, 21:01
#2
Re: Performance of immutable TreeMap/TreeSet implementations
On Wed, Dec 21, 2011 at 16:43, Erik Rozendaal wrote:
> 2. The range related operations (range, from, to, until) construct a new tree by splitting the existing tree. Unfortunately, the size of the new tree is calculated by calling RedBlack#Tree.count. This is an O(n) operation (also see SI-4026).
I decided to take a look at this, and the curious thing is that this
count is used in *only one place*. It is used only when removing an
element to see if an empty tree should be returned or not. All other
uses are simply to feed this parameter again, or could be replaced
with isEmpty.
So, since you are feeling all pro-active about this (which I commend!
:), I suggest trying to remove this accursed parameter might, perhaps,
not be hard to do. One can always provide an alternate constructor
that takes and discards this parameter for compatibility reasons.
Wed, 2011-12-21, 21:11
#3
Re: Performance of immutable TreeMap/TreeSet implementations
Using size to check != 0 or > 0 or >= 1 should be punished by a slap on the wrist.
On Wed, Dec 21, 2011 at 8:53 PM, Daniel Sobral <dcsobral@gmail.com> wrote:
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
On Wed, Dec 21, 2011 at 8:53 PM, Daniel Sobral <dcsobral@gmail.com> wrote:
On Wed, Dec 21, 2011 at 16:43, Erik Rozendaal <erik@deler.org> wrote:
> 2. The range related operations (range, from, to, until) construct a new tree by splitting the existing tree. Unfortunately, the size of the new tree is calculated by calling RedBlack#Tree.count. This is an O(n) operation (also see SI-4026).
I decided to take a look at this, and the curious thing is that this
count is used in *only one place*. It is used only when removing an
element to see if an empty tree should be returned or not. All other
uses are simply to feed this parameter again, or could be replaced
with isEmpty.
So, since you are feeling all pro-active about this (which I commend!
:), I suggest trying to remove this accursed parameter might, perhaps,
not be hard to do. One can always provide an alternate constructor
that takes and discards this parameter for compatibility reasons.
--
Daniel C. Sobral
I travel to the future all the time.
--
Viktor Klang
Akka Tech LeadTypesafe - Enterprise-Grade Scala from the Experts
Twitter: @viktorklang
Wed, 2011-12-21, 21:21
#4
Re: Performance of immutable TreeMap/TreeSet implementations
2011/12/21 √iktor Ҡlang :
> Using size to check != 0 or > 0 or >= 1 should be punished by a slap on the
> wrist.
Here's a simple grep:
dcs@ayanami:~/github/scala/src/library/scala/collection/immutable
(subrange)$ grep size TreeSet.scala
class TreeSet[A](override val size: Int, t: RedBlack[A]#Tree[Unit])
protected val tree: RedBlack[A]#Tree[Unit] = if (size == 0) Empty else t
val newsize = if (tree.lookup(elem).isEmpty) size + 1 else size
newSet(newsize, tree.update(elem, ()))
newSet(size + 1, tree.update(elem, ()))
else newSet(size - 1, tree delete elem)
dcs@ayanami:~/github/scala/src/library/scala/collection/immutable
(subrange)$ grep size TreeMap.scala
class TreeMap[A, +B](override val size: Int, t:
RedBlack[A]#Tree[B])(implicit val ordering: Ordering[A])
protected val tree: RedBlack[A]#Tree[B] = if (size == 0) Empty else t
val newsize = if (tree.lookup(key).isEmpty) size + 1 else size
TreeMap.make(newsize, tree.update(key, value))
TreeMap.make(size + 1, tree.update(key, value))
else if (size == 1) empty
else TreeMap.make(size - 1, tree.delete(key))
There's one size == 0, and one size == 1. All others are just
record-keeping. Now, this gives TreeSet and TreeMap an O(1)
size/length method, but turns range and all methods derived from it
into O(n). Removing size from constructor turns it into O(n), and
makes range O(log n). Putting a count on the immutable RedBlack
increases its size, keeps size/length O(1) and turns range into O(log
n).
As for size increase, this is what a Red tree looks like:
private final java.lang.Object key;
private final java.lang.Object value;
private final scala.collection.immutable.RedBlack$Tree left;
private final scala.collection.immutable.RedBlack$Tree right;
So, given the 4-bytes used by every Object and the 8-bytes increment
of atual size allocated, there would not be any increment memory
consumption if size was added to RedBlack.
Which doesn't make that size == 0 any less punishable. :-)
Wed, 2011-12-21, 22:01
#5
Re: Performance of immutable TreeMap/TreeSet implementations
Ticket SI-5331 was just opened. SI-4026 is related.
The most horrifying part is probably abusing type-erasure to ensure we need only one instance to represent an Empty RedBlack tree, without making the first type-parameter covariant (impossible due to the use of Ordering which has an invariant type parameter): https://github.com/erikrozendaal/scala/blob/trees-redblack/src/library/scala/collection/immutable/RedBlack.scala#L264
Besides that there are still some things that could be fixed:- make the RedBlack implementation private to the Scala library (so we can keep improving the implementation), make sure it doesn't leak into the TreeMap/TreeSet API (why is this https://github.com/erikrozendaal/scala/blob/trees-redblack/src/library/scala/collection/immutable/TreeMap.scala#L64 protected instead of private?)- Like Daniel mentioned, remove the need for the size field in TreeMap/TreeSet and the unnecessary tests, now that RedBlack.Tree contains a count.- Efficiency optimizations and see if we can match/beat java.util.TreeMap/TreeSet :)
Regards,Erik
On 21 dec. 2011, at 20:06, Paul Phillips wrote:
The most horrifying part is probably abusing type-erasure to ensure we need only one instance to represent an Empty RedBlack tree, without making the first type-parameter covariant (impossible due to the use of Ordering which has an invariant type parameter): https://github.com/erikrozendaal/scala/blob/trees-redblack/src/library/scala/collection/immutable/RedBlack.scala#L264
Besides that there are still some things that could be fixed:- make the RedBlack implementation private to the Scala library (so we can keep improving the implementation), make sure it doesn't leak into the TreeMap/TreeSet API (why is this https://github.com/erikrozendaal/scala/blob/trees-redblack/src/library/scala/collection/immutable/TreeMap.scala#L64 protected instead of private?)- Like Daniel mentioned, remove the need for the size field in TreeMap/TreeSet and the unnecessary tests, now that RedBlack.Tree contains a count.- Efficiency optimizations and see if we can match/beat java.util.TreeMap/TreeSet :)
Regards,Erik
On 21 dec. 2011, at 20:06, Paul Phillips wrote:
On Wed, Dec 21, 2011 at 10:43 AM, Erik Rozendaal <erik@deler.org> wrote:(please let me know if this is not the right list for discussing this)
Library issues in general should probably go somewhere else, but we'll let it slide this time, because those are awful things you found. I have to look more closely at the patches, but assuming there's nothing too horrifying they can definitely be in 2.10. For 2.9 I have to wait-and-see. (I wonder if we should have a 2.9.x-binary-incompatible series, because otherwise the binary compatibility mandate scoops up a lot of baby as the patch bathwater goes by.) Can you please open a ticket if you haven't already.
Thu, 2011-12-22, 15:11
#6
Re: Performance of immutable TreeMap/TreeSet implementations
I hit most of these issues a while back - there should be a thread about it on -users. I started tryingto put together patches, but the inheritance/mixin hierarchy is so tangled that it's really hard to follow what code is *actually* invoked without stepping through with a debugger, which is very time-consuming. Sorry, I probably should have persevered.
Have we got any closer to having a benchmarking harness for common collection operations, ideally with graphs/histograms and regression lines for estimating the observed complexity? Most of the performance issues I found where things like finding the lowest element of an ordered set not being constant time, and some code-paths being stupidly slow because they chain into size() that was O(n). The other thing that put me off making patches was the lack of any obvious test harness for collections that I could use to make sure that I hadn't obviously broken anything in the process, or inadvertently turned an O(log(n)) into an O(n^2) because the better version chains into some other badly behaved code.
Sorry if this sounds like a laundry list of griping, but perception of the quality of a language often stands or falls on the efficiency of the default collections library. While the scala one is good, having an O(n) lowest op on a sorted set is not a Good Thing.
Matthew
On 21 December 2011 20:59, Erik Rozendaal <erik@deler.org> wrote:
--
Dr Matthew PocockIntegrative Bioinformatics Group, School of Computing Science, Newcastle University mailto: turingatemyhamster@gmail.comgchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.ukirc.freenode.net: drdozerskype: matthew.pocock tel: (0191) 2566550mob: +447535664143
Have we got any closer to having a benchmarking harness for common collection operations, ideally with graphs/histograms and regression lines for estimating the observed complexity? Most of the performance issues I found where things like finding the lowest element of an ordered set not being constant time, and some code-paths being stupidly slow because they chain into size() that was O(n). The other thing that put me off making patches was the lack of any obvious test harness for collections that I could use to make sure that I hadn't obviously broken anything in the process, or inadvertently turned an O(log(n)) into an O(n^2) because the better version chains into some other badly behaved code.
Sorry if this sounds like a laundry list of griping, but perception of the quality of a language often stands or falls on the efficiency of the default collections library. While the scala one is good, having an O(n) lowest op on a sorted set is not a Good Thing.
Matthew
On 21 December 2011 20:59, Erik Rozendaal <erik@deler.org> wrote:
Ticket SI-5331 was just opened. SI-4026 is related.
The most horrifying part is probably abusing type-erasure to ensure we need only one instance to represent an Empty RedBlack tree, without making the first type-parameter covariant (impossible due to the use of Ordering which has an invariant type parameter): https://github.com/erikrozendaal/scala/blob/trees-redblack/src/library/scala/collection/immutable/RedBlack.scala#L264
Besides that there are still some things that could be fixed:- make the RedBlack implementation private to the Scala library (so we can keep improving the implementation), make sure it doesn't leak into the TreeMap/TreeSet API (why is this https://github.com/erikrozendaal/scala/blob/trees-redblack/src/library/scala/collection/immutable/TreeMap.scala#L64 protected instead of private?) - Like Daniel mentioned, remove the need for the size field in TreeMap/TreeSet and the unnecessary tests, now that RedBlack.Tree contains a count.- Efficiency optimizations and see if we can match/beat java.util.TreeMap/TreeSet :)
Regards,Erik
On 21 dec. 2011, at 20:06, Paul Phillips wrote:
On Wed, Dec 21, 2011 at 10:43 AM, Erik Rozendaal <erik@deler.org> wrote:
(please let me know if this is not the right list for discussing this)
Library issues in general should probably go somewhere else, but we'll let it slide this time, because those are awful things you found. I have to look more closely at the patches, but assuming there's nothing too horrifying they can definitely be in 2.10. For 2.9 I have to wait-and-see. (I wonder if we should have a 2.9.x-binary-incompatible series, because otherwise the binary compatibility mandate scoops up a lot of baby as the patch bathwater goes by.) Can you please open a ticket if you haven't already.
--
Dr Matthew PocockIntegrative Bioinformatics Group, School of Computing Science, Newcastle University mailto: turingatemyhamster@gmail.comgchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.ukirc.freenode.net: drdozerskype: matthew.pocock tel: (0191) 2566550mob: +447535664143
Tue, 2012-01-03, 15:31
#7
Re: Performance of immutable TreeMap/TreeSet implementations
On Thu, Dec 22, 2011 at 6:01 AM, Matthew Pocock <turingatemyhamster@gmail.com> wrote:
Have we got any closer to having a benchmarking harness for common collection operations, ideally with graphs/histograms and regression lines for estimating the observed complexity? Most of the performance issues I found where things like finding the lowest element of an ordered set not being constant time, and some code-paths being stupidly slow because they chain into size() that was O(n). The other thing that put me off making patches was the lack of any obvious test harness for collections that I could use to make sure that I hadn't obviously broken anything in the process, or inadvertently turned an O(log(n)) into an O(n^2) because the better version chains into some other badly behaved code.
Sorry if this sounds like a laundry list of griping, but perception of the quality of a language often stands or falls on the efficiency of the default collections library. While the scala one is good, having an O(n) lowest op on a sorted set is not a Good Thing.
I can only observe that it would be hard to dream up anything which would be easier for some individual to implement independently. When I hear someone talking about it being difficult to contribute to scala (I hear that somewhat less lately, but anyway) I can't help but think that the interest in doing so isn't actually all that high, because we still haven't managed to eke this one out.
Nobody will contest the desirability of what you describe, but in addition to the fact that I can't do everything, I intentionally resist doing this one precisely because it would be so easy (easy in terms of low coordination requirements and the absence of high-inertia decisions, not necessarily in terms of time or effort) for someone who isn't me to do it, and I'm not in any danger of running out of important matters which lack that attribute.
On Wed, Dec 21, 2011 at 10:43 AM, Erik Rozendaal <erik@deler.org> wrote:
Library issues in general should probably go somewhere else, but we'll let it slide this time, because those are awful things you found. I have to look more closely at the patches, but assuming there's nothing too horrifying they can definitely be in 2.10. For 2.9 I have to wait-and-see. (I wonder if we should have a 2.9.x-binary-incompatible series, because otherwise the binary compatibility mandate scoops up a lot of baby as the patch bathwater goes by.) Can you please open a ticket if you haven't already.