- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Bag/Multiset in scala and a touch of multimap
Sat, 2010-11-20, 03:15
Hi
I've tried to find out if there already exists a multiset/bag implementation in collections, but neither the scaladoc nor google can turn up anything useful for me.
I know it is easy to roll my own, but seems like something that should be support already; it has the MultiMap trait.
---
BTW: I'm also using multimap like:
var votes = new HashMap[Int,MSet[Vote]] with MultiMap[Int,Vote]
This seems a bit redundant (types appear twice); is there a better way?
And I'm currently using addBinding() to add stuff to the multimap; is this the only way?
Thanks a bunch,
:) Hein
Sun, 2010-11-21, 22:57
#2
Re: Bag/Multiset in scala and a touch of multimap
what is a "multiset"? sounds a bit self contradicting.
if you just need several equal elements in the same collection, just use
anything that doesn't have "set" in its name (list, array, vector, *buffer)
Am 21.11.2010 19:54, schrieb Hein Meling:
> Hi
>
> I've tried to find out if there already exists a multiset/bag implementation in collections, but neither the scaladoc nor google can turn up anything useful for me.
>
> I know it is easy to roll my own, but seems like something that should be support already; it has the MultiMap trait.
>
> ---
>
> BTW: I'm also using multimap like:
>
> var votes = new HashMap[Int,MSet[Vote]] with MultiMap[Int,Vote]
>
> This seems a bit redundant (types appear twice); is there a better way?
>
> And I'm currently using addBinding() to add stuff to the multimap; is this the only/recommended way? Anyone have a link to use case examples for multimap and multiset.
>
> Thanks a bunch,
>
> :) Hein
Sun, 2010-11-21, 23:17
#3
Re: Bag/Multiset in scala and a touch of multimap
On Sunday November 21 2010, HamsterofDeath wrote:
> what is a "multiset"? sounds a bit self contradicting.
> if you just need several equal elements in the same collection, just
> use anything that doesn't have "set" in its name (list, array,
> vector, *buffer)
Multisets are well defined. They associate with each element of their
domain a count. Adding an element of the domain N times (starting from
none) leaves the count associated with that element N. Set union takes
the maximum of each element's counts, set difference subtracts them
(bounding the counts below at zero, of course) and set intersection
takes the minimum of the counts of the elements in the intersected
sets.
You can find more details in the Wikipedia entry for Multiset [1].
Randall Schulz
[1]
Mon, 2010-11-22, 03:47
#4
Re: Bag/Multiset in scala and a touch of multimap
Thanks to Randall for clearing up this potential confusion.
A multiset can, amongst other things, be used to construct a histogram (keep a count of the number of times a given 'same' element has been added to the multiset.)
It is actually used quite often in some systems and is of course part of guava (previously google collections) and of course in the now outdated apache commons collections library.
Take a look here:
http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/commo...
---
Now, does anyone know the answer to my questions?
Thanks,
:) Hein
On Nov 21, 2010, at 13:49 , HamsterofDeath wrote:
> what is a "multiset"? sounds a bit self contradicting.
> if you just need several equal elements in the same collection, just use
> anything that doesn't have "set" in its name (list, array, vector, *buffer)
>
>
> Am 21.11.2010 19:54, schrieb Hein Meling:
>> Hi
>>
>> I've tried to find out if there already exists a multiset/bag implementation in collections, but neither the scaladoc nor google can turn up anything useful for me.
>>
>> I know it is easy to roll my own, but seems like something that should be support already; it has the MultiMap trait.
>>
>> ---
>>
>> BTW: I'm also using multimap like:
>>
>> var votes = new HashMap[Int,MSet[Vote]] with MultiMap[Int,Vote]
>>
>> This seems a bit redundant (types appear twice); is there a better way?
>>
>> And I'm currently using addBinding() to add stuff to the multimap; is this the only/recommended way? Anyone have a link to use case examples for multimap and multiset.
>>
>> Thanks a bunch,
>>
>> :) Hein
>
Mon, 2010-11-22, 13:07
#5
Re: Bag/Multiset in scala and a touch of multimap
Hello Hein,
Is there a reason that Map[X, Int] isn't sufficient for your needs? It
seems to cover the histogram need in any case, though it doesn't
provide the set-like operations, of course.
/David Christiansen
Sun, 2010-11-28, 03:47
#6
Re: Bag/Multiset in scala and a touch of multimap
Randall R Schulz wrote:
> On Sunday November 21 2010, HamsterofDeath wrote:
>> what is a "multiset"? sounds a bit self contradicting.
>> if you just need several equal elements in the same collection, just
>> use anything that doesn't have "set" in its name (list, array,
>> vector, *buffer)
> Multisets are well defined. They associate with each element of their
> domain a count. Adding an element of the domain N times (starting from
> none) leaves the count associated with that element N. Set union takes
> the maximum of each element's counts, set difference subtracts them
> (bounding the counts below at zero, of course) and set intersection
> takes the minimum of the counts of the elements in the intersected
> sets.
And just for completeness' sake: a Bag is very similar. A MultiSet is
generally interpreted as "Set + Counting", a Bag is interpreted as
"Array - Ordering". So, whereas a MultiSet contains an item and a
count, the Bag contains the item multiple times.
The two main differences are that a MultiSet supports Set operations
such as intersection, union and difference, a Bag supports Array
operations such as concatenation. (In particular, Set Union takes the
maximum of the element counts, but for Array Concatenation I would
expect the *sum*.) The other main difference is in interation: I would
expect a MultiSet to yield Tuples of (, ) and a Bag to
yield times.
So, basically the only difference between the two is that I would
expect
val ms = MultiSet('A', 'B', 'A')
ms.foreach(println _)
println(ms('A'))
println(ms('C'))
to print
('A', 2)
('B', 1)
2
0
whereas for the Bag I would expect
val b = Bag('A', 'B', 'A')
b.foreach(println _)
println(b('A'))
println(b('C'))
to print
'A'
'A'
'B'
true
false
(Or something like that.)
jwm
Sun, 2010-11-28, 04:27
#7
Re: Bag/Multiset in scala and a touch of multimap
David Christiansen wrote:
> Is there a reason that Map[X, Int] isn't sufficient for your needs? It
> seems to cover the histogram need in any case, though it doesn't
> provide the set-like operations, of course.
Probably for the same reason that we use *any* abstraction instead of
a hand-rolled concrete implementation.
Personally, what I like about MultiSets is that they make some
algorithms completely disappear. Like in the histogram case, for
example. Sure, folding over the input and constructing the Map is a
one-liner, but the cool thing about a MultiSet is that it is a
*zero-liner*, since the MultiSet *is* already the answer.
You simply pass the input to the MultiSet constructor and you're done!
It's the ultimate example of how choosing the right data structure can
massively simplify your algorithms. In this case, the algorithm using
the MultiSet is (at least for some not very mathemetically strict
definition of infinity) actually infinitely simpler!
Say, I want to compute a histogram of letters in a word. The most
straightforward way would be something like
word foldLeft(Map[Char, Int]())((m, c) => m updated(c, m.getOrElse(c, 0) + 1))
Although this one is much better IMHO:
word groupBy(identity) map { case (c, cs) => (c, cs.size) }
But behold!
MultiSet(word:_*)
Yep. That's it. That is the *entire* "algorithm" for computing
histograms.
jwm
PS: Of course, all the *work* is still there, in the constructor of
MultiSet, but *I* neither have to write nor (more importantly) read
and maintain that.
Sun, 2010-11-28, 07:17
#8
Re: Re: Bag/Multiset in scala and a touch of multimap
jwm, Thanks a lot for this!
So, given the "silence" of the list concerning my actual questions, I take it MultiSet is not supported in scala yet (or other means to support it easily). I guess I'll just roll my own then.
Best,
:) Hein
On Nov 27, 2010, at 19:17 , Jörg W Mittag wrote:
> David Christiansen wrote:
>> Is there a reason that Map[X, Int] isn't sufficient for your needs? It
>> seems to cover the histogram need in any case, though it doesn't
>> provide the set-like operations, of course.
>
> Probably for the same reason that we use *any* abstraction instead of
> a hand-rolled concrete implementation.
>
> Personally, what I like about MultiSets is that they make some
> algorithms completely disappear. Like in the histogram case, for
> example. Sure, folding over the input and constructing the Map is a
> one-liner, but the cool thing about a MultiSet is that it is a
> *zero-liner*, since the MultiSet *is* already the answer.
>
> You simply pass the input to the MultiSet constructor and you're done!
>
> It's the ultimate example of how choosing the right data structure can
> massively simplify your algorithms. In this case, the algorithm using
> the MultiSet is (at least for some not very mathemetically strict
> definition of infinity) actually infinitely simpler!
>
> Say, I want to compute a histogram of letters in a word. The most
> straightforward way would be something like
>
> word foldLeft(Map[Char, Int]())((m, c) => m updated(c, m.getOrElse(c, 0) + 1))
>
> Although this one is much better IMHO:
>
> word groupBy(identity) map { case (c, cs) => (c, cs.size) }
>
> But behold!
>
> MultiSet(word:_*)
>
> Yep. That's it. That is the *entire* "algorithm" for computing
> histograms.
>
> jwm
>
> PS: Of course, all the *work* is still there, in the constructor of
> MultiSet, but *I* neither have to write nor (more importantly) read
> and maintain that.
>









Hi
I've tried to find out if there already exists a multiset/bag implementation in collections, but neither the scaladoc nor google can turn up anything useful for me.
I know it is easy to roll my own, but seems like something that should be support already; it has the MultiMap trait.
---
BTW: I'm also using multimap like:
var votes = new HashMap[Int,MSet[Vote]] with MultiMap[Int,Vote]
This seems a bit redundant (types appear twice); is there a better way?
And I'm currently using addBinding() to add stuff to the multimap; is this the only/recommended way? Anyone have a link to use case examples for multimap and multiset.
Thanks a bunch,
:) Hein