- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
GroupedIterator with (start, end) grouping predicates?
Fri, 2012-01-20, 20:06
I want a GroupedIterator with an equivalent of grouped() that uses start
and end predicates to control each returned group. For example, given a
start predicate that looks for 'A' and an end predicate that looks for
'Z', iterating over the following list:
(1, 2, 3, A, 4, 5, Z, 6, A, 7, 8, 9, Z, A, Z, 10)
would produce in turn:
(A, 4, 5, Z), (A, 7, 8, 9, Z), (A, Z)
Is there an obvious way of doing this already that I'm missing? If not,
would extending GroupedIterator be the way to go?
The actual usage is iterating over sequences of lines from a file where
each block is delimited by start and end markers.
An interesting little project for the weekend :-)
Thanks,
Sat, 2012-01-21, 01:41
#2
Re: GroupedIterator with (start, end) grouping predicates?
On 20/01/2012 21:10, Alex Cruise wrote:
> Here's mine.
>
> http://paste.pocoo.org/show/537986/
Thanks that's a great help :-)
> It's come up 2-3 times in the past month, I humbly submit that something
> like it ought to be in the standard library.
I humbly second that :-)
Sat, 2012-01-21, 09:01
#3
Re: GroupedIterator with (start, end) grouping predicates?
On Fri, Jan 20, 2012 at 4:10 PM, Alex Cruise <alex@cluonflux.com> wrote:
Here's mine.
http://paste.pocoo.org/show/537986/
It's come up 2-3 times in the past month, I humbly submit that something like it ought to be in the standard library.
Indeed. Something like this is such a common request that I used it for my example of how to extend the collections library:
http://stackoverflow.com/questions/5410846
The "groupedWhile" method there could easily be converted to a start/end pair.
--Rex
P.S. Neither of these are iterators, though. Iterators are hard to do without caching an entire block, which is rather wasteful.
Sun, 2012-01-22, 21:11
#4
Re: GroupedIterator with (start, end) grouping predicates?
On 21/01/2012 07:59, Rex Kerr wrote:
> Indeed. Something like this is such a common request that I used it for my
> example of how to extend the collections library:
> http://stackoverflow.com/questions/5410846
>
> The "groupedWhile" method there could easily be converted to a start/end
> pair.
As well as giving me a template to work from this is a brilliant little
tutorial on how to hook into the collections library. I've read the
Scala Collections paper which is a good description of the design of the
collections library, but this is a really good template to start from.
I also particularly like the way you show how to broaden it to work on
arrays as well as sequences. I really think this should be added to the
official Scala documentation. Docs folks, please take note! :-)
> P.S. Neither of these are iterators, though. Iterators are hard to do
> without caching an entire block, which is rather wasteful.
That's fine, the use case is what I'd describe as a 'dirty parser',
something that just needs to pull a number of delimited sequences of
lines from a large-ish text file. The sequences are small and I need to
inspect them all anyway, so not being an iterator isn't an issue.
Thanks again, your example was a great help.
Fri, 2012-02-03, 00:01
#5
Re: GroupedIterator with (start, end) grouping predicates?
On 21/01/2012 07:59, Rex Kerr wrote:
> Indeed. Something like this is such a common request that I used it for my
> example of how to extend the collections library:
> http://stackoverflow.com/questions/5410846
>
> The "groupedWhile" method there could easily be converted to a start/end
> pair.
>
> --Rex
>
> P.S. Neither of these are iterators, though. Iterators are hard to do
> without caching an entire block, which is rather wasteful.
I've found your example very helpful, I have a follow-up question that's
more related to design than to the implementation. Your example code
uses iterators to step along the collection and split it up, is there
any particular reason why you used iterators rather than a recursive
head :: tail scan? I'm suspecting the reason might to be to do with the
implicits/CanBuildFrom mechanism, but that's just a guess.
I also have a requirement for sequence matching, i.e. scanning for
tokens "a b c" on a list of the form "a b a b a c a b c" would return
just the last 3 tokens from the list. That's more naturally expressed
with a recursive implementation, which isn't easily doable with an iterator.
Thanks,
Fri, 2012-02-03, 02:11
#6
Re: GroupedIterator with (start, end) grouping predicates?
I was going for maximum generality, which means Iterable (use iterators) or Traversable (use foreach), neither of which are conducive to recursion.
To do what you're asking for, you either need your (conceptual) cursor to maintain a history, or you need to use a more restricted collection type (e.g. list with recursion, or IndexedSeq with indexing).
If you are only concerned with matches that are length-invariant, you can create a ring buffer of the sequence and match it against your target iteratively. I don't think this is particularly less natural than the recursive solution, though the implementation is less functional in style.
--Rex
On Thu, Feb 2, 2012 at 4:38 PM, Alan Burlison <alan.burlison@gmail.com> wrote:
To do what you're asking for, you either need your (conceptual) cursor to maintain a history, or you need to use a more restricted collection type (e.g. list with recursion, or IndexedSeq with indexing).
If you are only concerned with matches that are length-invariant, you can create a ring buffer of the sequence and match it against your target iteratively. I don't think this is particularly less natural than the recursive solution, though the implementation is less functional in style.
--Rex
On Thu, Feb 2, 2012 at 4:38 PM, Alan Burlison <alan.burlison@gmail.com> wrote:
On 21/01/2012 07:59, Rex Kerr wrote:
Indeed. Something like this is such a common request that I used it for my
example of how to extend the collections library:
http://stackoverflow.com/questions/5410846
The "groupedWhile" method there could easily be converted to a start/end
pair.
--Rex
P.S. Neither of these are iterators, though. Iterators are hard to do
without caching an entire block, which is rather wasteful.
I've found your example very helpful, I have a follow-up question that's more related to design than to the implementation. Your example code uses iterators to step along the collection and split it up, is there any particular reason why you used iterators rather than a recursive head :: tail scan? I'm suspecting the reason might to be to do with the implicits/CanBuildFrom mechanism, but that's just a guess.
I also have a requirement for sequence matching, i.e. scanning for tokens "a b c" on a list of the form "a b a b a c a b c" would return just the last 3 tokens from the list. That's more naturally expressed with a recursive implementation, which isn't easily doable with an iterator.
Thanks,
Fri, 2012-02-03, 02:31
#7
Re: GroupedIterator with (start, end) grouping predicates?
On 03/02/2012 01:10, Rex Kerr wrote:
> I was going for maximum generality, which means Iterable (use iterators) or
> Traversable (use foreach), neither of which are conducive to recursion.
OK, that makes good sense, thanks.
> To do what you're asking for, you either need your (conceptual) cursor to
> maintain a history, or you need to use a more restricted collection type
> (e.g. list with recursion, or IndexedSeq with indexing).
That's as I vaguely suspected but I wasn't confident that I wasn't
missing something obvious. I'm starting with a List anyway, so the List
restriction for using recursion isn't an issue.
> If you are only concerned with matches that are length-invariant, you can
> create a ring buffer of the sequence and match it against your target
> iteratively. I don't think this is particularly less natural than the
> recursive solution, though the implementation is less functional in style.
The matches may be variable-length and there may be multiple matches.
Plus I want to wrap that up in a higher-level function that takes a list
of potential matches and tries them in priority order at each point
along the list, so a recursive solution seemed best, to allow
backtracking if a particular match gets so far along and then fails.
Thanks for the explanation, it's helped me get a better idea of the
'What to use and when' tradeoffs between Iterable, Traversable and List
recursion.
Fri, 2012-02-03, 04:11
#8
Re: GroupedIterator with (start, end) grouping predicates?
On Fri, Jan 20, 2012 at 19:10, Alex Cruise wrote:
> Here's mine.
>
> http://paste.pocoo.org/show/537986/
>
> It's come up 2-3 times in the past month, I humbly submit that something
> like it ought to be in the standard library.
I find it curious that no one seems to have suggested "splitBy" as the
name. This works a lot like split with a regex. Then again, split
"eats" the elements that match, so maybe splitBy would have a slightly
different semantic... :-/
Fri, 2012-02-03, 11:51
#9
Re: GroupedIterator with (start, end) grouping predicates?
On 03/02/2012 02:55, Daniel Sobral wrote:
> I find it curious that no one seems to have suggested "splitBy" as the
> name. This works a lot like split with a regex. Then again, split
> "eats" the elements that match, so maybe splitBy would have a slightly
> different semantic... :-/
Perl's split copes with the include/exclude match issue by including any
capture groups in the split regexp in the results, I'm a little puzzled
as to why Scala didn't do the same.
There's already a splitAt on List, which breaks the list into 2 parts
given an offset into the list, so 'split' or 'splitBy' seemed like they
might be confusing. I came up with the following names:
groupWhile - Collects into groups while a predicate is true
separatedBy - Splits into groups at a predicate
boundedBy - Splits into groups using (begin, end) predicates.
Elements outside the bounds are discarded.
nestedBoundedBy - As above but handles nesting.
The last three have optional boolean flags which specify if the
separator terms should be included or excluded from the returned groups.
I'm using these operators for what I've called "Rough parsing". Quite
often I have a need to extract segments from files for processing - the
current example is extracting command-line flag information from GROFF
documentation for the GNU utilities. It's difficult and time-consuming
to write a conventional parser for this sort of semi-structured data,
and even if I wrote one I'd be discarding most of the parsed information
as I only want a small subset of the file. In the case of GROFF
documentation, I start by reading the file into a list of lines, then
extracting the DESCRIPTION section:
desc = lines.boundedBy(_ == ".SH DESCRIPTION", _.startsWith(".SH "),
false, true)
then split out the section that contains the flag descriptions and in
turn split that into sections for each flag, where each of them always
starts with a .TP macro:
desc.head.boundedBy(_ == ".TP", _startsWith(".SH")
.head.separatedBy(_ == ".TP").foreach(sect => { ... })
Admittedly it's a crude approach with significant limitations and is
only appropriate for some types of data. However there are applications
such as the NROFF one where it's "good enough", and being able to carve
out the chunks of interest with only a couple of lines of code is rather
nice ;-)
Fri, 2012-02-03, 20:11
#10
Re: GroupedIterator with (start, end) grouping predicates?
On 03/02/2012 02:55, Daniel Sobral wrote:
That's a pretty good name.
While we're on the subject, here's a more general version that will map from input to output types, and supports continuation elements. :)
http://paste.pocoo.org/show/545330/
-0xe1a
I find it curious that no one seems to have suggested "splitBy" as the
name. This works a lot like split with a regex. Then again, split
"eats" the elements that match, so maybe splitBy would have a slightly
different semantic... :-/
That's a pretty good name.
While we're on the subject, here's a more general version that will map from input to output types, and supports continuation elements. :)
http://paste.pocoo.org/show/545330/
-0xe1a
http://paste.pocoo.org/show/537986/
It's come up 2-3 times in the past month, I humbly submit that something like it ought to be in the standard library.
-0xe1a
On Fri, Jan 20, 2012 at 9:53 AM, Alan Burlison <alan.burlison@gmail.com> wrote: