- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
capture groups in regexparsers
Thu, 2012-01-19, 14:32
Hi,
I'm writing a parser using combinator.RegexParsers, and some of my parsers are regular expressions. Most of the time I just want to grab the whole thing. However, in a few cases I want specific capture group(s). The code that promotes regular expressions to parsers is #56-70 of RegexParsers, starting with: "implicit def regex(r: Regex): Parser[String] = new Parser[String] {". Would it make sense to add variations to this that also accept a capture group index?
implicit def regex(rg: (Regex, Int)): Parser[String] = new Parser[String] ...implicit def regex(rg: (Regex, (Int, Int))): Parser[String] = new Parser[String] ...
"""hi (\w+)""".r -> 1 """(\w+)\s*=\s*(\w+)""".r -> (1,2)
Here's my implementation for the single-group case:
implicit def regex(rg: (Regex, Int)): Parser[String] = new Parser[String] { def apply(in: Input) = { val (r, g) = rg val source = in.source val offset = in.offset val start = handleWhiteSpace(source, offset) (r findPrefixMatchOf (source.subSequence(start, source.length))) match { case Some(matched) => Success(matched.group(g), in.drop(start + matched.end - offset)) case None => val found = if (start == source.length()) "end of source" else "`"+source.charAt(start)+"'" Failure("string matching regex `"+r+"' expected but "+found+" found", in.drop(start - offset)) } } }
Thanks,
Matthew
--
Dr Matthew PocockIntegrative Bioinformatics Group, School of Computing Science, Newcastle Universitymailto: turingatemyhamster@gmail.com gchat: turingatemyhamster@gmail.commsn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozerskype: matthew.pococktel: (0191) 2566550mob: +447535664143
I'm writing a parser using combinator.RegexParsers, and some of my parsers are regular expressions. Most of the time I just want to grab the whole thing. However, in a few cases I want specific capture group(s). The code that promotes regular expressions to parsers is #56-70 of RegexParsers, starting with: "implicit def regex(r: Regex): Parser[String] = new Parser[String] {". Would it make sense to add variations to this that also accept a capture group index?
implicit def regex(rg: (Regex, Int)): Parser[String] = new Parser[String] ...implicit def regex(rg: (Regex, (Int, Int))): Parser[String] = new Parser[String] ...
"""hi (\w+)""".r -> 1 """(\w+)\s*=\s*(\w+)""".r -> (1,2)
Here's my implementation for the single-group case:
implicit def regex(rg: (Regex, Int)): Parser[String] = new Parser[String] { def apply(in: Input) = { val (r, g) = rg val source = in.source val offset = in.offset val start = handleWhiteSpace(source, offset) (r findPrefixMatchOf (source.subSequence(start, source.length))) match { case Some(matched) => Success(matched.group(g), in.drop(start + matched.end - offset)) case None => val found = if (start == source.length()) "end of source" else "`"+source.charAt(start)+"'" Failure("string matching regex `"+r+"' expected but "+found+" found", in.drop(start - offset)) } } }
Thanks,
Matthew
--
Dr Matthew PocockIntegrative Bioinformatics Group, School of Computing Science, Newcastle Universitymailto: turingatemyhamster@gmail.com gchat: turingatemyhamster@gmail.commsn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozerskype: matthew.pococktel: (0191) 2566550mob: +447535664143
It's kind of a specialized need, since -- performance notwithstanding