This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: Regular Expression Literals?

24 replies
Ricky Clarkson
Joined: 2008-12-19,
User offline. Last seen 3 years 2 weeks ago.

Let's [ab]use XML literals to make regexps more readable, then.

a
rgh
!
.r

2009/6/17 David Copeland :
> I think the advantage of regexp literals in PERL and Ruby is simply
> expedience and clarity.  Regexps can be hard enough to parse without
> extra noise surrounding them as required by the language.  Scala's
> triple-quotes go a REALLY long way, but it would be cool to be able to
>
> match someString {
>   case /$1\-800-(\d\d\d-\d\d\d\d)$/ => "Your call to " + $1 + " will work"
>   case /$1\-888-(\d\d\d-\d\d\d\d)$/ => "Your call to " + $1 + " won't
> work on this old phone"
>   case _ => throw new RuntimeExcpetion("String isn't a 1-800 style number")
> }
>
> On Wed, Jun 17, 2009 at 11:37 AM, Vlad Patryshev wrote:
>> I must confess that I used Perl's ability to insert snippets of code in
>> "regular expression" to process the data. However bad Perl is, it was
>> extremely convenient: one could write a page-long regexp that, with
>> comments, was pretty readable and maintainable.
>>
>> 2009/6/16 Gordon Tyler
>>>
>>> Ricky Clarkson wrote:
>>>>
>>>> I'd rather see parser combinators used more where people might
>>>> otherwise use regex.  That said, I haven't used them much.
>>>>
>>>> What would it take for parser combinators to replace regex?
>>>
>>> Far less boiler plate. Requiring the developer to create tokenizer and
>>> parser classes, etc. just to match a string is overkill.
>>>
>>> Ciao,
>>> Gordon
>>
>>
>>
>> --
>> Thanks,
>> -Vlad
>>
>

Robert Fischer
Joined: 2009-01-31,
User offline. Last seen 42 years 45 weeks ago.
Re: Regular Expression Literals?

You have a funny definition of "more readable": I'll take a+rgh!{10,50} over that XML argh! any day.

~~ Robert.

Ricky Clarkson wrote:
> Let's [ab]use XML literals to make regexps more readable, then.
>
>
> a
> rgh
> !
> .r
>
> 2009/6/17 David Copeland :
>> I think the advantage of regexp literals in PERL and Ruby is simply
>> expedience and clarity. Regexps can be hard enough to parse without
>> extra noise surrounding them as required by the language. Scala's
>> triple-quotes go a REALLY long way, but it would be cool to be able to
>>
>> match someString {
>> case /$1\-800-(\d\d\d-\d\d\d\d)$/ => "Your call to " + $1 + " will work"
>> case /$1\-888-(\d\d\d-\d\d\d\d)$/ => "Your call to " + $1 + " won't
>> work on this old phone"
>> case _ => throw new RuntimeExcpetion("String isn't a 1-800 style number")
>> }
>>
>> On Wed, Jun 17, 2009 at 11:37 AM, Vlad Patryshev wrote:
>>> I must confess that I used Perl's ability to insert snippets of code in
>>> "regular expression" to process the data. However bad Perl is, it was
>>> extremely convenient: one could write a page-long regexp that, with
>>> comments, was pretty readable and maintainable.
>>>
>>> 2009/6/16 Gordon Tyler
>>>> Ricky Clarkson wrote:
>>>>> I'd rather see parser combinators used more where people might
>>>>> otherwise use regex. That said, I haven't used them much.
>>>>>
>>>>> What would it take for parser combinators to replace regex?
>>>> Far less boiler plate. Requiring the developer to create tokenizer and
>>>> parser classes, etc. just to match a string is overkill.
>>>>
>>>> Ciao,
>>>> Gordon
>>>
>>>
>>> --
>>> Thanks,
>>> -Vlad
>>>
>

davetron5000
Joined: 2009-06-07,
User offline. Last seen 2 years 31 weeks ago.
Re: Regular Expression Literals?

Granted, I'm new to Scala, but it seems to me that the magic that allows

(1,2,3,4)

to be identical to

Tuple4[Int](1,2,3,4)

indicates that there's precedent for actual or apparent literals.

I really like how List, Map, and Tuple "literals" are possible.

I guess my original point is that promoting regexps to this status
(either as baked-in literals or the above-exemplified magic) makes a
lot of types of code more concise and easier to deal with. I doubt
you'll find anyone who finds Java's regexp handling superior to Ruby's
or that a regexp defined in XML easy to deal with as opposed to a
PERL-style literal.

And, further, if something like this is just Not The Scala Way, that's
cool, too, and my question becomes, what is the Scala Way to perform
simple parsing and pattern recognition?

Dave

On Wed, Jun 17, 2009 at 12:15 PM, Randall R Schulz wrote:
> On Wednesday June 17 2009, David Copeland wrote:
>> I think the advantage of regexp literals ...
>
> Shouldn't we be talking about how to generalize and make extensible the
> island grammar exemplified by Scala's XML literals.
>
> I doubt it's easy and I know neither how to do it nor how general it can
> be made, but just as we don't want every imaginable control structure
> to be defined by the language, I think we should look for ways to open
> the language to new concise forms of value notation.
>
>
> Randall Schulz
>

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Regular Expression Literals?
val phone800 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r  // Was $1\- correct? That didn't make sense. val phone888 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r   someString match {   case phone800(number) => "Your call to " + number + " will work"   case phone888(number) => "Your call to " + number + " won't work on this old phone"   case _ => throw new RuntimeExcpetion("String isn't a 1-800 style number") }
What kills me is that you can't define the regex on the fly and pattern match against it. If I could do the following, I'd be much happier:   someString match {   case """^1-800-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to " + number + " will work"   case """^1-888-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to " + number + " won't work on this old phone"   case _ => throw new RuntimeExcpetion("String isn't a 1-800 style number") }
Or any variant of it. Anything, as long as I didn't need to predefine as vals every single regexp pattern I wanted to use in a case match.
On Wed, Jun 17, 2009 at 1:34 PM, David Copeland <davetron5000@gmail.com> wrote:
Granted, I'm new to Scala, but it seems to me that the magic that allows

(1,2,3,4)

to be identical to

Tuple4[Int](1,2,3,4)

indicates that there's precedent for actual or apparent literals.

I really like how List, Map, and Tuple "literals" are possible.

I guess my original point is that promoting regexps to this status
(either as baked-in literals or the above-exemplified magic) makes a
lot of types of code more concise and easier to deal with.  I doubt
you'll find anyone who finds Java's regexp handling superior to Ruby's
or that a regexp defined in XML easy to deal with as opposed to a
PERL-style literal.

And, further, if something like this is just Not The Scala Way, that's
cool, too, and my question becomes, what is the Scala Way to perform
simple parsing and pattern recognition?

Dave

On Wed, Jun 17, 2009 at 12:15 PM, Randall R Schulz<rschulz@sonic.net> wrote:
> On Wednesday June 17 2009, David Copeland wrote:
>> I think the advantage of regexp literals ...
>
> Shouldn't we be talking about how to generalize and make extensible the
> island grammar exemplified by Scala's XML literals.
>
> I doubt it's easy and I know neither how to do it nor how general it can
> be made, but just as we don't want every imaginable control structure
> to be defined by the language, I think we should look for ways to open
> the language to new concise forms of value notation.
>
>
> Randall Schulz
>



--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
Ricky Clarkson
Joined: 2008-12-19,
User offline. Last seen 3 years 2 weeks ago.
Re: Regular Expression Literals?

I guess you didn't appreciate that I was joking.

2009/6/17 Robert Fischer :
> You have a funny definition of "more readable": I'll take a+rgh!{10,50} over that XML argh! any day.
>
> ~~ Robert.
>
> Ricky Clarkson wrote:
>> Let's [ab]use XML literals to make regexps more readable, then.
>>
>>
>>     a
>>     rgh
>>     !
>> .r
>>
>> 2009/6/17 David Copeland :
>>> I think the advantage of regexp literals in PERL and Ruby is simply
>>> expedience and clarity.  Regexps can be hard enough to parse without
>>> extra noise surrounding them as required by the language.  Scala's
>>> triple-quotes go a REALLY long way, but it would be cool to be able to
>>>
>>> match someString {
>>>   case /$1\-800-(\d\d\d-\d\d\d\d)$/ => "Your call to " + $1 + " will work"
>>>   case /$1\-888-(\d\d\d-\d\d\d\d)$/ => "Your call to " + $1 + " won't
>>> work on this old phone"
>>>   case _ => throw new RuntimeExcpetion("String isn't a 1-800 style number")
>>> }
>>>
>>> On Wed, Jun 17, 2009 at 11:37 AM, Vlad Patryshev wrote:
>>>> I must confess that I used Perl's ability to insert snippets of code in
>>>> "regular expression" to process the data. However bad Perl is, it was
>>>> extremely convenient: one could write a page-long regexp that, with
>>>> comments, was pretty readable and maintainable.
>>>>
>>>> 2009/6/16 Gordon Tyler
>>>>> Ricky Clarkson wrote:
>>>>>> I'd rather see parser combinators used more where people might
>>>>>> otherwise use regex.  That said, I haven't used them much.
>>>>>>
>>>>>> What would it take for parser combinators to replace regex?
>>>>> Far less boiler plate. Requiring the developer to create tokenizer and
>>>>> parser classes, etc. just to match a string is overkill.
>>>>>
>>>>> Ciao,
>>>>> Gordon
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> -Vlad
>>>>
>>
>
> --
> ~~ Robert Fischer, Smokejumper IT Consulting.
> Enfranchised Mind Blog http://EnfranchisedMind.com/blog
>
> Check out my book, "Grails Persistence with GORM and GSQL"!
> http://www.smokejumperit.com/redirect.html
>
>

michael.kebe
Joined: 2009-04-26,
User offline. Last seen 3 years 21 weeks ago.
Re: Re gular Expression Literals?

Daniel Sobral wrote:
>
> What kills me is that you can't define the regex on the fly and pattern
> match against it. If I could do the following, I'd be much happier:
>
> someString match {
> case """^1-800-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to " +
> number + " will work"
> case """^1-888-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to "
> + number + " won't work on this old phone"
> case _ => throw new RuntimeExcpetion("String isn't a 1-800 style
> number")
> }
> Or any variant of it. Anything, as long as I didn't need to predefine as
> vals every single regexp pattern I wanted to use in a case match.
>
I totally aggree with you. That would be really cool. I wonder if it is
possible with Scala today. Any hints from the enthusiasts?

Michael

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re gular Expression Literals?

On Wed, Jun 17, 2009 at 6:36 PM, Michael Kebe <michael.kebe@gmail.com> wrote:
 

 
Daniel Sobral wrote:
>
> What kills me is that you can't define the regex on the fly and pattern
> match against it. If I could do the following, I'd be much happier:
>
>  someString match {
>   case """^1-800-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to " +
> number + " will work"
>   case """^1-888-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to "
> + number + " won't work on this old phone"
>   case _ => throw new RuntimeExcpetion("String isn't a 1-800 style
> number")
> }
> Or any variant of it. Anything, as long as I didn't need to predefine as
> vals every single regexp pattern I wanted to use in a case match.
>
I totally aggree with you. That would be really cool. I wonder if it is
possible with Scala today. Any hints from the enthusiasts?
 
 

 
I doubt it. Inside a case statement, Scala expects wildercard, literals or identifiers, optionally followed by type definition and guard clause. The identifiers are treated as identifiers to be bound to (beginning with lowercase), constants to be compared against (beginning with uppercase) or the application of unapply (identifier followed by parenthesis).

 
So I don't see any way to do it.
 

--
Daniel C. Sobral
 
Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
 

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re gular Expression Literals?
On Wed, Jun 17, 2009 at 6:36 PM, Michael Kebe <michael.kebe@gmail.com> wrote:


Daniel Sobral wrote:
>
> What kills me is that you can't define the regex on the fly and pattern
> match against it. If I could do the following, I'd be much happier:
>
>  someString match {
>   case """^1-800-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to " +
> number + " will work"
>   case """^1-888-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to "
> + number + " won't work on this old phone"
>   case _ => throw new RuntimeExcpetion("String isn't a 1-800 style
> number")
> }
> Or any variant of it. Anything, as long as I didn't need to predefine as
> vals every single regexp pattern I wanted to use in a case match.
>
I totally aggree with you. That would be really cool. I wonder if it is
possible with Scala today. Any hints from the enthusiasts?


In fact, let me bitch a bit more. While there is no way to do a regexp match, with or without variable binding, inside a case statement, you _can_ do a match against XML.   Now, I don't want to belittle XML or anything, but getting that support in was way more complicated than regexp would be. And, as it happens, depending on the domain of your problems, your code might be all about XML, but it might also be all about regexp. Perl didn't get where it is because of the beauty of its syntax...
--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?
You have to go through the extra step of assigning the regular expression to a stable identifier:
val B = "([0-9]+)([a-z]+)".r
def m(in: String) = in match {   case B(x,y) => println(x+": "+y)  case _ => println("no match")}
m("123dogs")m("hello world")

But once you do this, you get the ability to extract values that match the regular expression with the added benefit of only building the RegEx once rather than each time the pattern is matched.

On Wed, Jun 17, 2009 at 3:31 PM, Daniel Sobral <dcsobral@gmail.com> wrote:
On Wed, Jun 17, 2009 at 6:36 PM, Michael Kebe <michael.kebe@gmail.com> wrote:


Daniel Sobral wrote:
>
> What kills me is that you can't define the regex on the fly and pattern
> match against it. If I could do the following, I'd be much happier:
>
>  someString match {
>   case """^1-800-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to " +
> number + " will work"
>   case """^1-888-(\d\d\d-\d\d\d\d)$/""".r(number) => "Your call to "
> + number + " won't work on this old phone"
>   case _ => throw new RuntimeExcpetion("String isn't a 1-800 style
> number")
> }
> Or any variant of it. Anything, as long as I didn't need to predefine as
> vals every single regexp pattern I wanted to use in a case match.
>
I totally aggree with you. That would be really cool. I wonder if it is
possible with Scala today. Any hints from the enthusiasts?


In fact, let me bitch a bit more. While there is no way to do a regexp match, with or without variable binding, inside a case statement, you _can_ do a match against XML.   Now, I don't want to belittle XML or anything, but getting that support in was way more complicated than regexp would be. And, as it happens, depending on the domain of your problems, your code might be all about XML, but it might also be all about regexp. Perl didn't get where it is because of the beauty of its syntax...
--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.



--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp
Alex Boisvert
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?
For those wishing to match against regular expressions...

Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM, Java 1.6.0_10).
Type in expressions to have them evaluated.                                     
Type :help for more information.                                                

scala> import scala.util.matching.Regex
import scala.util.matching.Regex      

scala> class PimpMyRegex(s: String) {
     |   def regexMatch[R](cases: (String, R)*): Option[R] = {
     |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) -> c._2).
     |       dropWhile(_._1.isEmpty).firstOption.map(_._2)
     |   }
     |   def regexMatchGroups[R](cases: (String, PartialFunction[List[String], R])*): Option[R] = {
     |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) -> c._2).
     |       flatMap(t => if (t._1.isDefined && t._2.isDefinedAt(t._1.get)) List(t._2(t._1.get)) else Nil).firstOption
     |   }
     | }
defined class PimpMyRegex

scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
pimpString: (String)PimpMyRegex

scala> "aaa" regexMatch (
     |   "a*" -> 1,
     |   "b*" -> 2,
     |   "c*" -> 3
     | )
res0: Option[Int] = Some(1)

scala> "zzz" regexMatch (
     |   "a*" -> 1,
     |   "b*" -> 2,
     |   "c*" -> 3
     | )
res1: Option[Int] = None

scala> case class YMD(year: Int, month: Int, day: Int)
defined class YMD

scala> { // parsing dates is fun
     |
     |   implicit def toInt(s: String) = s.toInt
     |
     |   "2008-06-17".regexMatchGroups[YMD] (
     |     """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
     |       { case List(year, month, day) => YMD(year, month, day) },
     |     """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
     |       { case List(day, month, year) => YMD(day, month, year) }
     |   ) getOrElse error("Invalid date")
     | }
res2: YMD = YMD(2008,6,17)

alex
Henry Ware
Joined: 2009-03-07,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?

Wow. I nominate this for Hack Of The Month.

On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert wrote:
> For those wishing to match against regular expressions...
>
> Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM, Java
> 1.6.0_10).
> Type in expressions to have them
> evaluated.
> Type :help for more
> information.
>
> scala> import scala.util.matching.Regex
> import scala.util.matching.Regex
>
> scala> class PimpMyRegex(s: String) {
>      |   def regexMatch[R](cases: (String, R)*): Option[R] = {
>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) -> c._2).
>      |       dropWhile(_._1.isEmpty).firstOption.map(_._2)
>      |   }
>      |   def regexMatchGroups[R](cases: (String,
> PartialFunction[List[String], R])*): Option[R] = {
>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) -> c._2).
>      |       flatMap(t => if (t._1.isDefined && t._2.isDefinedAt(t._1.get))
> List(t._2(t._1.get)) else Nil).firstOption
>      |   }
>      | }
> defined class PimpMyRegex
>
> scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
> pimpString: (String)PimpMyRegex
>
> scala> "aaa" regexMatch (
>      |   "a*" -> 1,
>      |   "b*" -> 2,
>      |   "c*" -> 3
>      | )
> res0: Option[Int] = Some(1)
>
> scala> "zzz" regexMatch (
>      |   "a*" -> 1,
>      |   "b*" -> 2,
>      |   "c*" -> 3
>      | )
> res1: Option[Int] = None
>
> scala> case class YMD(year: Int, month: Int, day: Int)
> defined class YMD
>
> scala> { // parsing dates is fun
>      |
>      |   implicit def toInt(s: String) = s.toInt
>      |
>      |   "2008-06-17".regexMatchGroups[YMD] (
>      |     """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
>      |       { case List(year, month, day) => YMD(year, month, day) },
>      |     """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
>      |       { case List(day, month, year) => YMD(day, month, year) }
>      |   ) getOrElse error("Invalid date")
>      | }
> res2: YMD = YMD(2008,6,17)
>
> alex
>

sadie
Joined: 2008-12-21,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?

Cool as that hack is, I'm putting in my vote for making regex literals a part
of the language. Every other version of regex matching I've seen is just too
messy.

"Perl didn't get where it is because of the beauty of its syntax."
- Daniel Sobral

QFT. For all its massive flaws, I've remained fond of Perl, especially when
quadruple-escaping Java regexes. I want to be able to write this, or
something like it:

number match {
case /^(\d{4,5}-\d{6,8})$/ (area, local) => "Ring ring"
case _ => "Not a number!"
}

Why the brackets after the regex, rather than squeeze the names inside it?
Because one of the strengths of regex is that the syntax is approximately
the same everywhere you go. And as others have commented, the inside of a
regex is complicated enough already.

What would make this *really* neat is if the compiler noticed the fixed
regex and shifted it out to class scope for me, saving on the repeated
compilation.

Henry Ware-2 wrote:
>
> Wow. I nominate this for Hack Of The Month.
>
> On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert
> wrote:
>> For those wishing to match against regular expressions...
>>
>> Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM, Java
>> 1.6.0_10).
>> Type in expressions to have them
>> evaluated.
>> Type :help for more
>> information.
>>
>> scala> import scala.util.matching.Regex
>> import scala.util.matching.Regex
>>
>> scala> class PimpMyRegex(s: String) {
>>      |   def regexMatch[R](cases: (String, R)*): Option[R] = {
>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> c._2).
>>      |       dropWhile(_._1.isEmpty).firstOption.map(_._2)
>>      |   }
>>      |   def regexMatchGroups[R](cases: (String,
>> PartialFunction[List[String], R])*): Option[R] = {
>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> c._2).
>>      |       flatMap(t => if (t._1.isDefined &&
>> t._2.isDefinedAt(t._1.get))
>> List(t._2(t._1.get)) else Nil).firstOption
>>      |   }
>>      | }
>> defined class PimpMyRegex
>>
>> scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
>> pimpString: (String)PimpMyRegex
>>
>> scala> "aaa" regexMatch (
>>      |   "a*" -> 1,
>>      |   "b*" -> 2,
>>      |   "c*" -> 3
>>      | )
>> res0: Option[Int] = Some(1)
>>
>> scala> "zzz" regexMatch (
>>      |   "a*" -> 1,
>>      |   "b*" -> 2,
>>      |   "c*" -> 3
>>      | )
>> res1: Option[Int] = None
>>
>> scala> case class YMD(year: Int, month: Int, day: Int)
>> defined class YMD
>>
>> scala> { // parsing dates is fun
>>      |
>>      |   implicit def toInt(s: String) = s.toInt
>>      |
>>      |   "2008-06-17".regexMatchGroups[YMD] (
>>      |     """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
>>      |       { case List(year, month, day) => YMD(year, month, day) },
>>      |     """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
>>      |       { case List(day, month, year) => YMD(day, month, year) }
>>      |   ) getOrElse error("Invalid date")
>>      | }
>> res2: YMD = YMD(2008,6,17)
>>
>> alex
>>
>
>

Robert Fischer
Joined: 2009-01-31,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?

How do you envision handling this regex?

case /(.)/

One of the tricky parts of regex is that (unless you assume ^ and $), matches can have an arbitrary
(and unknowable) number of results. Maybe it's better for the match to return a list?

~~ Robert.

Marcus Downing wrote:
> Cool as that hack is, I'm putting in my vote for making regex literals a part
> of the language. Every other version of regex matching I've seen is just too
> messy.
>
> "Perl didn't get where it is because of the beauty of its syntax."
> - Daniel Sobral
>
> QFT. For all its massive flaws, I've remained fond of Perl, especially when
> quadruple-escaping Java regexes. I want to be able to write this, or
> something like it:
>
> number match {
> case /^(\d{4,5}-\d{6,8})$/ (area, local) => "Ring ring"
> case _ => "Not a number!"
> }
>
> Why the brackets after the regex, rather than squeeze the names inside it?
> Because one of the strengths of regex is that the syntax is approximately
> the same everywhere you go. And as others have commented, the inside of a
> regex is complicated enough already.
>
> What would make this *really* neat is if the compiler noticed the fixed
> regex and shifted it out to class scope for me, saving on the repeated
> compilation.
>
>
>
>
> Henry Ware-2 wrote:
>> Wow. I nominate this for Hack Of The Month.
>>
>> On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert
>> wrote:
>>> For those wishing to match against regular expressions...
>>>
>>> Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM, Java
>>> 1.6.0_10).
>>> Type in expressions to have them
>>> evaluated.
>>> Type :help for more
>>> information.
>>>
>>> scala> import scala.util.matching.Regex
>>> import scala.util.matching.Regex
>>>
>>> scala> class PimpMyRegex(s: String) {
>>> | def regexMatch[R](cases: (String, R)*): Option[R] = {
>>> | cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>>> c._2).
>>> | dropWhile(_._1.isEmpty).firstOption.map(_._2)
>>> | }
>>> | def regexMatchGroups[R](cases: (String,
>>> PartialFunction[List[String], R])*): Option[R] = {
>>> | cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>>> c._2).
>>> | flatMap(t => if (t._1.isDefined &&
>>> t._2.isDefinedAt(t._1.get))
>>> List(t._2(t._1.get)) else Nil).firstOption
>>> | }
>>> | }
>>> defined class PimpMyRegex
>>>
>>> scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
>>> pimpString: (String)PimpMyRegex
>>>
>>> scala> "aaa" regexMatch (
>>> | "a*" -> 1,
>>> | "b*" -> 2,
>>> | "c*" -> 3
>>> | )
>>> res0: Option[Int] = Some(1)
>>>
>>> scala> "zzz" regexMatch (
>>> | "a*" -> 1,
>>> | "b*" -> 2,
>>> | "c*" -> 3
>>> | )
>>> res1: Option[Int] = None
>>>
>>> scala> case class YMD(year: Int, month: Int, day: Int)
>>> defined class YMD
>>>
>>> scala> { // parsing dates is fun
>>> |
>>> | implicit def toInt(s: String) = s.toInt
>>> |
>>> | "2008-06-17".regexMatchGroups[YMD] (
>>> | """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
>>> | { case List(year, month, day) => YMD(year, month, day) },
>>> | """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
>>> | { case List(day, month, year) => YMD(day, month, year) }
>>> | ) getOrElse error("Invalid date")
>>> | }
>>> res2: YMD = YMD(2008,6,17)
>>>
>>> alex
>>>
>>
>

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re gular Expression Literals?
I'll handle it like regex has been handling it in forever. If there is a match (empty string won't match!), get the first one. I'd also want all such regex to be turned into anonymous singletons, once for each unique pattern, lazily evaluated, and, furthermore, I'd have the following two case statements work:   case s @ /(.)/  => case /(.)/ (s) =>   In the first case, s would be bound to the MatchData, and in the second to the group.   I don't much care for "/" as regex separator, but it is tradition. I prefer myself the m?...? -- ? being any character -- notation Perl has adopted. But I'd take even """...""".r.   Finally, an improved replace notation -- perhaps a class with a regex and a substitution string as members, supporting both String and StringBuilder as parameters. Regex could have a factory method for it, too. Also, a replace method on MatchData -- after all, it has "before", "after" and all groups, so it could do it. No substitution literals required, though.   If Scala were to give me that, I'd take it over any perl script ever written. Anyone up to writting a spec for improvement request? :-)

On Thu, Jun 18, 2009 at 3:08 PM, Robert Fischer <robert.fischer@smokejumperit.com> wrote:
How do you envision handling this regex?

case /(.)/

One of the tricky parts of regex is that (unless you assume ^ and $), matches can have an arbitrary
(and unknowable) number of results.  Maybe it's better for the match to return a list?

~~ Robert.

Marcus Downing wrote:
> Cool as that hack is, I'm putting in my vote for making regex literals a part
> of the language. Every other version of regex matching I've seen is just too
> messy.
>
> "Perl didn't get where it is because of the beauty of its syntax."
>      - Daniel Sobral
>
> QFT. For all its massive flaws, I've remained fond of Perl, especially when
> quadruple-escaping Java regexes. I want to be able to write this, or
> something like it:
>
> number match {
>     case /^(\d{4,5}-\d{6,8})$/ (area, local) => "Ring ring"
>     case _ => "Not a number!"
> }
>
> Why the brackets after the regex, rather than squeeze the names inside it?
> Because one of the strengths of regex is that the syntax is approximately
> the same everywhere you go. And as others have commented, the inside of a
> regex is complicated enough already.
>
> What would make this *really* neat is if the compiler noticed the fixed
> regex and shifted it out to class scope for me, saving on the repeated
> compilation.
>
>
>
>
> Henry Ware-2 wrote:
>> Wow.  I nominate this for Hack Of The Month.
>>
>> On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert<boisvert@intalio.com>
>> wrote:
>>> For those wishing to match against regular expressions...
>>>
>>> Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM, Java
>>> 1.6.0_10).
>>> Type in expressions to have them
>>> evaluated.
>>> Type :help for more
>>> information.
>>>
>>> scala> import scala.util.matching.Regex
>>> import scala.util.matching.Regex
>>>
>>> scala> class PimpMyRegex(s: String) {
>>>      |   def regexMatch[R](cases: (String, R)*): Option[R] = {
>>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>>> c._2).
>>>      |       dropWhile(_._1.isEmpty).firstOption.map(_._2)
>>>      |   }
>>>      |   def regexMatchGroups[R](cases: (String,
>>> PartialFunction[List[String], R])*): Option[R] = {
>>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>>> c._2).
>>>      |       flatMap(t => if (t._1.isDefined &&
>>> t._2.isDefinedAt(t._1.get))
>>> List(t._2(t._1.get)) else Nil).firstOption
>>>      |   }
>>>      | }
>>> defined class PimpMyRegex
>>>
>>> scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
>>> pimpString: (String)PimpMyRegex
>>>
>>> scala> "aaa" regexMatch (
>>>      |   "a*" -> 1,
>>>      |   "b*" -> 2,
>>>      |   "c*" -> 3
>>>      | )
>>> res0: Option[Int] = Some(1)
>>>
>>> scala> "zzz" regexMatch (
>>>      |   "a*" -> 1,
>>>      |   "b*" -> 2,
>>>      |   "c*" -> 3
>>>      | )
>>> res1: Option[Int] = None
>>>
>>> scala> case class YMD(year: Int, month: Int, day: Int)
>>> defined class YMD
>>>
>>> scala> { // parsing dates is fun
>>>      |
>>>      |   implicit def toInt(s: String) = s.toInt
>>>      |
>>>      |   "2008-06-17".regexMatchGroups[YMD] (
>>>      |     """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
>>>      |       { case List(year, month, day) => YMD(year, month, day) },
>>>      |     """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
>>>      |       { case List(day, month, year) => YMD(day, month, year) }
>>>      |   ) getOrElse error("Invalid date")
>>>      | }
>>> res2: YMD = YMD(2008,6,17)
>>>
>>> alex
>>>
>>
>

--
~~ Robert Fischer, Smokejumper IT Consulting.
Enfranchised Mind Blog http://EnfranchisedMind.com/blog

Check out my book, "Grails Persistence with GORM and GSQL"!
http://www.smokejumperit.com/redirect.html




--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
sadie
Joined: 2008-12-21,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?

A matched regex should produce a lazy Seq of results. If you use a regex for
pattern matching, then you're after 1 or 0 results, just like an Option. If
you want to retrieve a number of results, then you use it differently,
probably something like:

for (s <- string ~ /(.)/) ...

Or even

string ~ /(.)/ map { s => ... }

Is the ~ operator available? The strokes /.../ for separators are I think
worth keeping. To my eyes, they're more readable than many of the
alternatives, especially """...""".r Of course, there is the question of //
but I'm sure the compiler geniuses can sort that out... :)

What syntax were you thinking of for the replace notation? Again, I was
always fond of Perl's /xxx/yyy/ .

Daniel Sobral wrote:
>
> I'll handle it like regex has been handling it in forever. If there is a
> match (empty string won't match!), get the first one. I'd also want all
> such
> regex to be turned into anonymous singletons, once for each unique
> pattern,
> lazily evaluated, and, furthermore, I'd have the following two case
> statements work:
>
> case s @ /(.)/ =>
> case /(.)/ (s) =>
>
> In the first case, s would be bound to the MatchData, and in the second to
> the group.
>
> I don't much care for "/" as regex separator, but it is tradition. I
> prefer
> myself the m?...? -- ? being any character -- notation Perl has adopted.
> But
> I'd take even """...""".r.
>
> Finally, an improved replace notation -- perhaps a class with a regex and
> a
> substitution string as members, supporting both String and StringBuilder
> as
> parameters. Regex could have a factory method for it, too. Also, a replace
> method on MatchData -- after all, it has "before", "after" and all groups,
> so it could do it. No substitution literals required, though.
>
> If Scala were to give me that, I'd take it over any perl script ever
> written. Anyone up to writting a spec for improvement request? :-)
>
> On Thu, Jun 18, 2009 at 3:08 PM, Robert Fischer <
> robert.fischer@smokejumperit.com> wrote:
>
>> How do you envision handling this regex?
>>
>> case /(.)/
>>
>> One of the tricky parts of regex is that (unless you assume ^ and $),
>> matches can have an arbitrary
>> (and unknowable) number of results. Maybe it's better for the match to
>> return a list?
>>
>> ~~ Robert.
>>
>> Marcus Downing wrote:
>> > Cool as that hack is, I'm putting in my vote for making regex literals
>> a
>> part
>> > of the language. Every other version of regex matching I've seen is
>> just
>> too
>> > messy.
>> >
>> > "Perl didn't get where it is because of the beauty of its syntax."
>> > - Daniel Sobral
>> >
>> > QFT. For all its massive flaws, I've remained fond of Perl, especially
>> when
>> > quadruple-escaping Java regexes. I want to be able to write this, or
>> > something like it:
>> >
>> > number match {
>> > case /^(\d{4,5}-\d{6,8})$/ (area, local) => "Ring ring"
>> > case _ => "Not a number!"
>> > }
>> >
>> > Why the brackets after the regex, rather than squeeze the names inside
>> it?
>> > Because one of the strengths of regex is that the syntax is
>> approximately
>> > the same everywhere you go. And as others have commented, the inside of
>> a
>> > regex is complicated enough already.
>> >
>> > What would make this *really* neat is if the compiler noticed the fixed
>> > regex and shifted it out to class scope for me, saving on the repeated
>> > compilation.
>> >
>> >
>> >
>> >
>> > Henry Ware-2 wrote:
>> >> Wow. I nominate this for Hack Of The Month.
>> >>
>> >> On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert
>> >> wrote:
>> >>> For those wishing to match against regular expressions...
>> >>>
>> >>> Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM,
>> Java
>> >>> 1.6.0_10).
>> >>> Type in expressions to have them
>> >>> evaluated.
>> >>> Type :help for more
>> >>> information.
>> >>>
>> >>> scala> import scala.util.matching.Regex
>> >>> import scala.util.matching.Regex
>> >>>
>> >>> scala> class PimpMyRegex(s: String) {
>> >>> | def regexMatch[R](cases: (String, R)*): Option[R] = {
>> >>> | cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> >>> c._2).
>> >>> | dropWhile(_._1.isEmpty).firstOption.map(_._2)
>> >>> | }
>> >>> | def regexMatchGroups[R](cases: (String,
>> >>> PartialFunction[List[String], R])*): Option[R] = {
>> >>> | cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> >>> c._2).
>> >>> | flatMap(t => if (t._1.isDefined &&
>> >>> t._2.isDefinedAt(t._1.get))
>> >>> List(t._2(t._1.get)) else Nil).firstOption
>> >>> | }
>> >>> | }
>> >>> defined class PimpMyRegex
>> >>>
>> >>> scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
>> >>> pimpString: (String)PimpMyRegex
>> >>>
>> >>> scala> "aaa" regexMatch (
>> >>> | "a*" -> 1,
>> >>> | "b*" -> 2,
>> >>> | "c*" -> 3
>> >>> | )
>> >>> res0: Option[Int] = Some(1)
>> >>>
>> >>> scala> "zzz" regexMatch (
>> >>> | "a*" -> 1,
>> >>> | "b*" -> 2,
>> >>> | "c*" -> 3
>> >>> | )
>> >>> res1: Option[Int] = None
>> >>>
>> >>> scala> case class YMD(year: Int, month: Int, day: Int)
>> >>> defined class YMD
>> >>>
>> >>> scala> { // parsing dates is fun
>> >>> |
>> >>> | implicit def toInt(s: String) = s.toInt
>> >>> |
>> >>> | "2008-06-17".regexMatchGroups[YMD] (
>> >>> | """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
>> >>> | { case List(year, month, day) => YMD(year, month, day)
>> },
>> >>> | """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
>> >>> | { case List(day, month, year) => YMD(day, month, year) }
>> >>> | ) getOrElse error("Invalid date")
>> >>> | }
>> >>> res2: YMD = YMD(2008,6,17)
>> >>>
>> >>> alex
>> >>>
>> >>
>> >
>>
>> --
>> ~~ Robert Fischer, Smokejumper IT Consulting.
>> Enfranchised Mind Blog
>> http://EnfranchisedMind.com/blog
>>
>> Check out my book, "Grails Persistence with GORM and GSQL"!
>> http://www.smokejumperit.com/redirect.html
>>
>>
>
>

John Wright
Joined: 2009-06-19,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?
I must admit to liking the visual look of the /.../ notation too, but it'd create a whole mess of ambiguity in Scala's parser. How would the lexer/parser distinguish a regex from a custom operator?


Marcus Downing wrote:
24110774 [dot] post [at] talk [dot] nabble [dot] com" type="cite">
A matched regex should produce a lazy Seq of results. If you use a regex for
pattern matching, then you're after 1 or 0 results, just like an Option. If
you want to retrieve a number of results, then you use it differently,
probably something like:

  for (s <- string ~ /(.)/) ...

Or even

  string ~ /(.)/ map { s => ... }

Is the ~ operator available? The strokes /.../ for separators are I think
worth keeping. To my eyes, they're more readable than many of the
alternatives, especially """...""".r  Of course, there is the question of //
but I'm sure the compiler geniuses can sort that out... :)

What syntax were you thinking of for the replace notation? Again, I was
always fond of Perl's /xxx/yyy/ .



Daniel Sobral wrote:
  
I'll handle it like regex has been handling it in forever. If there is a
match (empty string won't match!), get the first one. I'd also want all
such
regex to be turned into anonymous singletons, once for each unique
pattern,
lazily evaluated, and, furthermore, I'd have the following two case
statements work:

case s @ /(.)/  =>
case /(.)/ (s) =>

In the first case, s would be bound to the MatchData, and in the second to
the group.

I don't much care for "/" as regex separator, but it is tradition. I
prefer
myself the m?...? -- ? being any character -- notation Perl has adopted.
But
I'd take even """...""".r.

Finally, an improved replace notation -- perhaps a class with a regex and
a
substitution string as members, supporting both String and StringBuilder
as
parameters. Regex could have a factory method for it, too. Also, a replace
method on MatchData -- after all, it has "before", "after" and all groups,
so it could do it. No substitution literals required, though.

If Scala were to give me that, I'd take it over any perl script ever
written. Anyone up to writting a spec for improvement request? :-)

On Thu, Jun 18, 2009 at 3:08 PM, Robert Fischer <
robert.fischer@smokejumperit.com> wrote:

    
How do you envision handling this regex?

case /(.)/

One of the tricky parts of regex is that (unless you assume ^ and $),
matches can have an arbitrary
(and unknowable) number of results.  Maybe it's better for the match to
return a list?

~~ Robert.

Marcus Downing wrote:
      
Cool as that hack is, I'm putting in my vote for making regex literals
        
a
part
      
of the language. Every other version of regex matching I've seen is
        
just
too
      
messy.

"Perl didn't get where it is because of the beauty of its syntax."
     - Daniel Sobral

QFT. For all its massive flaws, I've remained fond of Perl, especially
        
when
      
quadruple-escaping Java regexes. I want to be able to write this, or
something like it:

number match {
    case /^(\d{4,5}-\d{6,8})$/ (area, local) => "Ring ring"
    case _ => "Not a number!"
}

Why the brackets after the regex, rather than squeeze the names inside
        
it?
      
Because one of the strengths of regex is that the syntax is
        
approximately
      
the same everywhere you go. And as others have commented, the inside of
        
a
      
regex is complicated enough already.

What would make this *really* neat is if the compiler noticed the fixed
regex and shifted it out to class scope for me, saving on the repeated
compilation.




Henry Ware-2 wrote:
        
Wow.  I nominate this for Hack Of The Month.

On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert
wrote:
          
For those wishing to match against regular expressions...

Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM,
            
Java
      
1.6.0_10).
Type in expressions to have them
evaluated.
Type :help for more
information.

scala> import scala.util.matching.Regex
import scala.util.matching.Regex

scala> class PimpMyRegex(s: String) {
     |   def regexMatch[R](cases: (String, R)*): Option[R] = {
     |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
c._2).
     |       dropWhile(_._1.isEmpty).firstOption.map(_._2)
     |   }
     |   def regexMatchGroups[R](cases: (String,
PartialFunction[List[String], R])*): Option[R] = {
     |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
c._2).
     |       flatMap(t => if (t._1.isDefined &&
t._2.isDefinedAt(t._1.get))
List(t._2(t._1.get)) else Nil).firstOption
     |   }
     | }
defined class PimpMyRegex

scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
pimpString: (String)PimpMyRegex

scala> "aaa" regexMatch (
     |   "a*" -> 1,
     |   "b*" -> 2,
     |   "c*" -> 3
     | )
res0: Option[Int] = Some(1)

scala> "zzz" regexMatch (
     |   "a*" -> 1,
     |   "b*" -> 2,
     |   "c*" -> 3
     | )
res1: Option[Int] = None

scala> case class YMD(year: Int, month: Int, day: Int)
defined class YMD

scala> { // parsing dates is fun
     |
     |   implicit def toInt(s: String) = s.toInt
     |
     |   "2008-06-17".regexMatchGroups[YMD] (
     |     """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
     |       { case List(year, month, day) => YMD(year, month, day)
            
},
      
     |     """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
     |       { case List(day, month, year) => YMD(day, month, year) }
     |   ) getOrElse error("Invalid date")
     | }
res2: YMD = YMD(2008,6,17)

alex

            
--
~~ Robert Fischer, Smokejumper IT Consulting.
Enfranchised Mind Blog
http://EnfranchisedMind.com/blog<http://enfranchisedmind.com/blog>

Check out my book, "Grails Persistence with GORM and GSQL"!
http://www.smokejumperit.com/redirect.html


      
Arthur Peters
Joined: 2009-01-09,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?
Couldn't these usages implemented as a simple implicit wrapper on String (def ~(regex:String))?

You would need to change them to:

 for (s <- string ~ """(.)""") ...

Or even

 string ~ """(.)""" map { s => ... }

But the ".r" would not be needed.

You could even implement this:


 string ~ """(.)""" / """\1"""

for s/(.)/\1/

It's not quite as short. But it would get closer.

-Arthur


On Fri, Jun 19, 2009 at 9:12 AM, Marcus Downing <marcus@minotaur.it> wrote:

A matched regex should produce a lazy Seq of results. If you use a regex for
pattern matching, then you're after 1 or 0 results, just like an Option. If
you want to retrieve a number of results, then you use it differently,
probably something like:

 for (s <- string ~ /(.)/) ...

Or even

 string ~ /(.)/ map { s => ... }

Is the ~ operator available? The strokes /.../ for separators are I think
worth keeping. To my eyes, they're more readable than many of the
alternatives, especially """...""".r  Of course, there is the question of //
but I'm sure the compiler geniuses can sort that out... :)

What syntax were you thinking of for the replace notation? Again, I was
always fond of Perl's /xxx/yyy/ .



Daniel Sobral wrote:
>
> I'll handle it like regex has been handling it in forever. If there is a
> match (empty string won't match!), get the first one. I'd also want all
> such
> regex to be turned into anonymous singletons, once for each unique
> pattern,
> lazily evaluated, and, furthermore, I'd have the following two case
> statements work:
>
> case s @ /(.)/  =>
> case /(.)/ (s) =>
>
> In the first case, s would be bound to the MatchData, and in the second to
> the group.
>
> I don't much care for "/" as regex separator, but it is tradition. I
> prefer
> myself the m?...? -- ? being any character -- notation Perl has adopted.
> But
> I'd take even """...""".r.
>
> Finally, an improved replace notation -- perhaps a class with a regex and
> a
> substitution string as members, supporting both String and StringBuilder
> as
> parameters. Regex could have a factory method for it, too. Also, a replace
> method on MatchData -- after all, it has "before", "after" and all groups,
> so it could do it. No substitution literals required, though.
>
> If Scala were to give me that, I'd take it over any perl script ever
> written. Anyone up to writting a spec for improvement request? :-)
>
> On Thu, Jun 18, 2009 at 3:08 PM, Robert Fischer <
> robert.fischer@smokejumperit.com> wrote:
>
>> How do you envision handling this regex?
>>
>> case /(.)/
>>
>> One of the tricky parts of regex is that (unless you assume ^ and $),
>> matches can have an arbitrary
>> (and unknowable) number of results.  Maybe it's better for the match to
>> return a list?
>>
>> ~~ Robert.
>>
>> Marcus Downing wrote:
>> > Cool as that hack is, I'm putting in my vote for making regex literals
>> a
>> part
>> > of the language. Every other version of regex matching I've seen is
>> just
>> too
>> > messy.
>> >
>> > "Perl didn't get where it is because of the beauty of its syntax."
>> >      - Daniel Sobral
>> >
>> > QFT. For all its massive flaws, I've remained fond of Perl, especially
>> when
>> > quadruple-escaping Java regexes. I want to be able to write this, or
>> > something like it:
>> >
>> > number match {
>> >     case /^(\d{4,5}-\d{6,8})$/ (area, local) => "Ring ring"
>> >     case _ => "Not a number!"
>> > }
>> >
>> > Why the brackets after the regex, rather than squeeze the names inside
>> it?
>> > Because one of the strengths of regex is that the syntax is
>> approximately
>> > the same everywhere you go. And as others have commented, the inside of
>> a
>> > regex is complicated enough already.
>> >
>> > What would make this *really* neat is if the compiler noticed the fixed
>> > regex and shifted it out to class scope for me, saving on the repeated
>> > compilation.
>> >
>> >
>> >
>> >
>> > Henry Ware-2 wrote:
>> >> Wow.  I nominate this for Hack Of The Month.
>> >>
>> >> On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert<boisvert@intalio.com>
>> >> wrote:
>> >>> For those wishing to match against regular expressions...
>> >>>
>> >>> Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM,
>> Java
>> >>> 1.6.0_10).
>> >>> Type in expressions to have them
>> >>> evaluated.
>> >>> Type :help for more
>> >>> information.
>> >>>
>> >>> scala> import scala.util.matching.Regex
>> >>> import scala.util.matching.Regex
>> >>>
>> >>> scala> class PimpMyRegex(s: String) {
>> >>>      |   def regexMatch[R](cases: (String, R)*): Option[R] = {
>> >>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> >>> c._2).
>> >>>      |       dropWhile(_._1.isEmpty).firstOption.map(_._2)
>> >>>      |   }
>> >>>      |   def regexMatchGroups[R](cases: (String,
>> >>> PartialFunction[List[String], R])*): Option[R] = {
>> >>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> >>> c._2).
>> >>>      |       flatMap(t => if (t._1.isDefined &&
>> >>> t._2.isDefinedAt(t._1.get))
>> >>> List(t._2(t._1.get)) else Nil).firstOption
>> >>>      |   }
>> >>>      | }
>> >>> defined class PimpMyRegex
>> >>>
>> >>> scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
>> >>> pimpString: (String)PimpMyRegex
>> >>>
>> >>> scala> "aaa" regexMatch (
>> >>>      |   "a*" -> 1,
>> >>>      |   "b*" -> 2,
>> >>>      |   "c*" -> 3
>> >>>      | )
>> >>> res0: Option[Int] = Some(1)
>> >>>
>> >>> scala> "zzz" regexMatch (
>> >>>      |   "a*" -> 1,
>> >>>      |   "b*" -> 2,
>> >>>      |   "c*" -> 3
>> >>>      | )
>> >>> res1: Option[Int] = None
>> >>>
>> >>> scala> case class YMD(year: Int, month: Int, day: Int)
>> >>> defined class YMD
>> >>>
>> >>> scala> { // parsing dates is fun
>> >>>      |
>> >>>      |   implicit def toInt(s: String) = s.toInt
>> >>>      |
>> >>>      |   "2008-06-17".regexMatchGroups[YMD] (
>> >>>      |     """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
>> >>>      |       { case List(year, month, day) => YMD(year, month, day)
>> },
>> >>>      |     """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
>> >>>      |       { case List(day, month, year) => YMD(day, month, year) }
>> >>>      |   ) getOrElse error("Invalid date")
>> >>>      | }
>> >>> res2: YMD = YMD(2008,6,17)
>> >>>
>> >>> alex
>> >>>
>> >>
>> >
>>
>> --
>> ~~ Robert Fischer, Smokejumper IT Consulting.
>> Enfranchised Mind Blog
>> http://EnfranchisedMind.com/blog<http://enfranchisedmind.com/blog>
>>
>> Check out my book, "Grails Persistence with GORM and GSQL"!
>> http://www.smokejumperit.com/redirect.html
>>
>>
>
>
> --
> Daniel C. Sobral
>
> Something I learned in academia: there are three kinds of academic
> reviews:
> review by name, review by reference and review by value.
>
>

--
View this message in context: http://www.nabble.com/Regular-Expression-Literals--tp24060820p24110774.html
Sent from the Scala - User mailing list archive at Nabble.com.


dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re gular Expression Literals?
I'm trying to think of least impact, greater benefice changes here. That you can't put a regex literal in a case match is particularly inconvenient, but many things about the present regex library are bad.   Let's consider them. First, there is no builtin support in the language. All support is library support. That means no regex literals. We got a bone thrown our way with the triple quote strings, which makes it easier to write regex expressions, aside from \u, which is reserved for unicode characters even inside triple quotes. Actually, that concerns me less in regex strings than in being able to write Windows paths, such as c:\users.   Next, library support. First, there is support builtin String itself, even in Java. We  have split, which splits according to a regex, match, which matches a regex, replaceAll and replaceFirst, which do what you would expect. For example, 'line split """\W+"""' would split a line in its components words, for regex definition of "word". Also, 'line matches """^\s*//"""' would match match lines with just comments. These are methods provided by java.lang.String itself, and they follow closely how one thinks of matching patterns: this string matches this pattern.   They have three downsides. First, verbosity. Its pretty good for the occasional use, but too verbose for heavy regex usage, but that's a limit of Java itself, which has no operator syntax. Second, the regex patterns are passed as strings, and when you use regex in critical paths you want it to be pre-compiled. And third, "matches" anchors start and end, as if you had written "^pattern$" instead of "pattern", something which I find singularly brain-dead. Java does that, sometimes.   Now, Scala adds a single method in its RichString class, "r". This method compiles a string into a Regex object, which is why we can do "pattern".r (or """pattern""".r).   So, the first thing Scala could do for us, easily, is to provide split/matches/replaceFirst/replaceAll methods that took Regex parameters, so you could pass pre-compiled patterns, and provide operator-style syntax. The "~" operator would do well, and it could be overloaded in RichString as ~(String), ~(Regex), ~(String, String) and ~(Regex, String) to mean matches, in the first two cases, or replaceFirst in the second case.   Actually, I'd make ~ based on findFirstIn, to get rid of the anchors, but that might surprise people replacing "matches" with "~", so I'd add "find" too, with the same meaning. A brief explanation: input.matches(regex) translates to Pattern.matches(regex, input), which behave as Pattern.compile(regex).matcher(input).matches(). To get a non-anchoring behavior, the last expression would need "matches()" replaced with "find()". Therefore, a method "find" would preserve the logic of the method "matches". Now, matches() returns a Boolean, so I'd add a "~~" method returning Some(Match) instead, as a Match can be extraordinarily useful.   Next, since we just spoke of Match, I'll speak of it a bit. A Match is a very convenient data type. Its toString method will return the matched string, but it is much richer than that. I can call "source" to get the string in which Match was found, "before" and "after" to get what came before and after the match, start and end to get the index into source of the match, start(i) and end(i) to get the index into source of the group i, as well as get each group by name or number. What it has no method for is a replace method to replace the match in the original string, once a match was made. This was much short-sighted, and I'd correct it.   And here we get to the end of what we can do with libraries (at least that I can think of). Everything else requires support in the language itself or a plugin, I believe.   There are three things to further support to regex: literal support, compiled-once literals and case matching. A natural literal format for regex would be /.../, but that poses a few problems, because "/" is a valid identifier. In fact, since a regex expression is pretty much a free form string, one could form regex expressions that would be valid Scala statements. While the same can be said of XML literals, their formats is much more restricted than regex, and they have the advantage of being in the language for quite a while.   I honestly can't think of any reasonable regex literal syntax that is not liable to break stuff, so we might just as well keep the present format. While not perfect, it is workable.   Regex usage would be much improved, though, if they could be made to be compile-once. This, in fact, is possible. A regex "literal", that is, "....".r or """...""".r, could be translated into a a reference to a lazy val in an anonymous object. This shouldn't be any problem for a plugin.   The final problem to be overcome would be case matching. Since case matching is a fundamental part of the language, support for doing so through regex literals would be important. If such literals were turned into lazy vals, then unapplySeq would work very naturally, as """...""".r(a,b,c) became $AnonymousRegex.LitNNN(a,b,c). Getting that to work as a plugin might not be trivial, though, and, IIRC, Regex's unapplySeq does anchored matches, which might confuse people used to standard regex practices.   That would deal with almost everything. I'd still like to have some way to get back a Match from a case match, but I don't see much of a way around it. The only thing I can think of to solve both this problem and the the one about anchored matches would be alternative Regex classes with alternative unapplySeq implementations. One would use unanchored matches for unapplySeq, and the other would do that and, additionally, implement a unapply instead of unapplySeq, which would return the Match. I propose .re and .reM as factory methods inside RichString. :-)   I'll see if I write the library part of this stuff tonight and submit it. I might even make a pass at the plugin, but that is unlikely to produce any results at this stage in my Scala knowledge.

On Fri, Jun 19, 2009 at 10:12 AM, Marcus Downing <marcus@minotaur.it> wrote:

A matched regex should produce a lazy Seq of results. If you use a regex for
pattern matching, then you're after 1 or 0 results, just like an Option. If
you want to retrieve a number of results, then you use it differently,
probably something like:

 for (s <- string ~ /(.)/) ...

Or even

 string ~ /(.)/ map { s => ... }

Is the ~ operator available? The strokes /.../ for separators are I think
worth keeping. To my eyes, they're more readable than many of the
alternatives, especially """...""".r  Of course, there is the question of //
but I'm sure the compiler geniuses can sort that out... :)

What syntax were you thinking of for the replace notation? Again, I was
always fond of Perl's /xxx/yyy/ .



Daniel Sobral wrote:
>
> I'll handle it like regex has been handling it in forever. If there is a
> match (empty string won't match!), get the first one. I'd also want all
> such
> regex to be turned into anonymous singletons, once for each unique
> pattern,
> lazily evaluated, and, furthermore, I'd have the following two case
> statements work:
>
> case s @ /(.)/  =>
> case /(.)/ (s) =>
>
> In the first case, s would be bound to the MatchData, and in the second to
> the group.
>
> I don't much care for "/" as regex separator, but it is tradition. I
> prefer
> myself the m?...? -- ? being any character -- notation Perl has adopted.
> But
> I'd take even """...""".r.
>
> Finally, an improved replace notation -- perhaps a class with a regex and
> a
> substitution string as members, supporting both String and StringBuilder
> as
> parameters. Regex could have a factory method for it, too. Also, a replace
> method on MatchData -- after all, it has "before", "after" and all groups,
> so it could do it. No substitution literals required, though.
>
> If Scala were to give me that, I'd take it over any perl script ever
> written. Anyone up to writting a spec for improvement request? :-)
>
> On Thu, Jun 18, 2009 at 3:08 PM, Robert Fischer <
> robert.fischer@smokejumperit.com> wrote:
>
>> How do you envision handling this regex?
>>
>> case /(.)/
>>
>> One of the tricky parts of regex is that (unless you assume ^ and $),
>> matches can have an arbitrary
>> (and unknowable) number of results.  Maybe it's better for the match to
>> return a list?
>>
>> ~~ Robert.
>>
>> Marcus Downing wrote:
>> > Cool as that hack is, I'm putting in my vote for making regex literals
>> a
>> part
>> > of the language. Every other version of regex matching I've seen is
>> just
>> too
>> > messy.
>> >
>> > "Perl didn't get where it is because of the beauty of its syntax."
>> >      - Daniel Sobral
>> >
>> > QFT. For all its massive flaws, I've remained fond of Perl, especially
>> when
>> > quadruple-escaping Java regexes. I want to be able to write this, or
>> > something like it:
>> >
>> > number match {
>> >     case /^(\d{4,5}-\d{6,8})$/ (area, local) => "Ring ring"
>> >     case _ => "Not a number!"
>> > }
>> >
>> > Why the brackets after the regex, rather than squeeze the names inside
>> it?
>> > Because one of the strengths of regex is that the syntax is
>> approximately
>> > the same everywhere you go. And as others have commented, the inside of
>> a
>> > regex is complicated enough already.
>> >
>> > What would make this *really* neat is if the compiler noticed the fixed
>> > regex and shifted it out to class scope for me, saving on the repeated
>> > compilation.
>> >
>> >
>> >
>> >
>> > Henry Ware-2 wrote:
>> >> Wow.  I nominate this for Hack Of The Month.
>> >>
>> >> On Wed, Jun 17, 2009 at 9:49 PM, Alex Boisvert<boisvert@intalio.com>
>> >> wrote:
>> >>> For those wishing to match against regular expressions...
>> >>>
>> >>> Welcome to Scala version 2.7.4.final (Java HotSpot(TM) Server VM,
>> Java
>> >>> 1.6.0_10).
>> >>> Type in expressions to have them
>> >>> evaluated.
>> >>> Type :help for more
>> >>> information.
>> >>>
>> >>> scala> import scala.util.matching.Regex
>> >>> import scala.util.matching.Regex
>> >>>
>> >>> scala> class PimpMyRegex(s: String) {
>> >>>      |   def regexMatch[R](cases: (String, R)*): Option[R] = {
>> >>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> >>> c._2).
>> >>>      |       dropWhile(_._1.isEmpty).firstOption.map(_._2)
>> >>>      |   }
>> >>>      |   def regexMatchGroups[R](cases: (String,
>> >>> PartialFunction[List[String], R])*): Option[R] = {
>> >>>      |     cases.projection.map(c => new Regex(c._1).unapplySeq(s) ->
>> >>> c._2).
>> >>>      |       flatMap(t => if (t._1.isDefined &&
>> >>> t._2.isDefinedAt(t._1.get))
>> >>> List(t._2(t._1.get)) else Nil).firstOption
>> >>>      |   }
>> >>>      | }
>> >>> defined class PimpMyRegex
>> >>>
>> >>> scala> implicit def pimpString(s: String) = new PimpMyRegex(s)
>> >>> pimpString: (String)PimpMyRegex
>> >>>
>> >>> scala> "aaa" regexMatch (
>> >>>      |   "a*" -> 1,
>> >>>      |   "b*" -> 2,
>> >>>      |   "c*" -> 3
>> >>>      | )
>> >>> res0: Option[Int] = Some(1)
>> >>>
>> >>> scala> "zzz" regexMatch (
>> >>>      |   "a*" -> 1,
>> >>>      |   "b*" -> 2,
>> >>>      |   "c*" -> 3
>> >>>      | )
>> >>> res1: Option[Int] = None
>> >>>
>> >>> scala> case class YMD(year: Int, month: Int, day: Int)
>> >>> defined class YMD
>> >>>
>> >>> scala> { // parsing dates is fun
>> >>>      |
>> >>>      |   implicit def toInt(s: String) = s.toInt
>> >>>      |
>> >>>      |   "2008-06-17".regexMatchGroups[YMD] (
>> >>>      |     """(\d\d\d\d)-(\d\d)-(\d\d)""" ->
>> >>>      |       { case List(year, month, day) => YMD(year, month, day)
>> },
>> >>>      |     """(\d\d)-(\d\d)-(\d\d\d\d)""" ->
>> >>>      |       { case List(day, month, year) => YMD(day, month, year) }
>> >>>      |   ) getOrElse error("Invalid date")
>> >>>      | }
>> >>> res2: YMD = YMD(2008,6,17)
>> >>>
>> >>> alex
>> >>>
>> >>
>> >
>>
>> --
>> ~~ Robert Fischer, Smokejumper IT Consulting.
>> Enfranchised Mind Blog
>> http://EnfranchisedMind.com/blog<http://enfranchisedmind.com/blog>
>>
>> Check out my book, "Grails Persistence with GORM and GSQL"!
>> http://www.smokejumperit.com/redirect.html
>>
>>
>
>
> --
> Daniel C. Sobral
>
> Something I learned in academia: there are three kinds of academic
> reviews:
> review by name, review by reference and review by value.
>
>

--
View this message in context: http://www.nabble.com/Regular-Expression-Literals--tp24060820p24110774.html
Sent from the Scala - User mailing list archive at Nabble.com.




--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
sadie
Joined: 2008-12-21,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?

Daniel Sobral wrote:
>
> A natural literal format for regex
> would be /.../, but that poses a few problems, because "/" is a valid
> identifier. In fact, since a regex expression is pretty much a free form
> string, one could form regex expressions that would be valid Scala
> statements.
>

While I don't doubt this is true, I'd like to see a few of them. In
particular, is there anything in the standard or reasonably well-known
libraries that would trigger this ambiguity?

Since a regex is a value, rather than an operator, it's generally used from
in a different context than the normal uses of / - it's not going to get
confused with a division or other operator. Most other uses of / are
similarly not in the same place you'd fine a regex, not to mention the
rarity of finding two or three /s close together on a line.

Colin Bullock
Joined: 2009-01-23,
User offline. Last seen 42 years 45 weeks ago.
Re: Re gular Expression Literals?



While I don't doubt this is true, I'd like to see a few of them.

Admittedly, more than a little contrived, but here goes:

scala> case class /(a: String) { def /(b: String) = println(a + b) }
defined class $div

scala> val i = "foo"
i: java.lang.String = foo

scala> val f = /("bar")/i
foobar
f: Unit = ()

scala>

Since starting to write this, a few more equally contrived examples have occurred to me which are at the same time both legal scala syntax and legal regexes, but for the sake of my good name, I'll limit myself to the one above. ;)

- Colin
dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Re gular Expression Literals?
The problem with your logic is to assume that an "operator" is in any way different than a value. Not really. A "value", as I assume you mean, may be a literal or a reference to an object. A reference to an object is an identifier. "/" is a valid identifier. In practice, the infix operator notation is "identifier identifier identifier", where the second identifier happens to be a method defined on the object referenced by the first identifier. But a method, when used inside a class, will often appear as the first identifier.   So, how exactly do you tell where "/" is an identifier and where "/" is the start of a regexp pattern?   But, at any rate, I'll go a bit further here, and speak something I do not know as fact, but that I inferred from my personal experience in the Scala community. The idea of introducing new keywords or reserved symbols goes against the "plan" for the language, which is keeping reserved keywords or symbols to a minimum, so as to maximize the ability of libraries to extend the language.     On Fri, Jun 19, 2009 at 4:02 PM, Marcus Downing <marcus@minotaur.it> wrote:


Daniel Sobral wrote:
>
> A natural literal format for regex
> would be /.../, but that poses a few problems, because "/" is a valid
> identifier. In fact, since a regex expression is pretty much a free form
> string, one could form regex expressions that would be valid Scala
> statements.
>

While I don't doubt this is true, I'd like to see a few of them. In
particular, is there anything in the standard or reasonably well-known
libraries that would trigger this ambiguity?

Since a regex is a value, rather than an operator, it's generally used from
in a different context than the normal uses of / - it's not going to get
confused with a division or other operator. Most other uses of / are
similarly not in the same place you'd fine a regex, not to mention the
rarity of finding two or three /s close together on a line.
--
View this message in context: http://www.nabble.com/Regular-Expression-Literals--tp24060820p24117150.html
Sent from the Scala - User mailing list archive at Nabble.com.




--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
Detering Dirk
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
RE: Regular Expression Literals?

Perhaps I have misunderstood something ...
Is the code below supposed to work under 2.7.5 ?

________________________________

From: Daniel Sobral [mailto:dcsobral@gmail.com]
Sent: Wednesday, June 17, 2009 7:07 PM
To: David Copeland
Cc: scala-user@listes.epfl.ch
Subject: Re: [scala-user] Regular Expression Literals?

val phone800 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r // Was $1\-
correct? That didn't make sense.
val phone888 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r

someString match {
case phone800(number) => "Your call to " + number + " will
work"
case phone888(number) => "Your call to " + number + " won't
work on this old phone"
case _ => throw new RuntimeExcpetion("String isn't a 1-800
style number")
}

...

Detering Dirk
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
RE: Regular Expression Literals?

Ok, a bit more context now ...
In the meantime I found the mistake, it is the trailing slash
in the expressions (beside phone888 having 800 in the regex).

Now the code from Daniel works for me. What's surprising is,
that my own test case does not work:

----------------
import scala.io._
import scala.util.matching._

object MatchPattern extends Application {
val source = Source.fromFile("TextEingabe.txt")
val pone = new Regex(".*PATONE.*")
val ptwo = """.*PATTWO.*""".r
for ( line <- source.getLines ) {
line match {
case pone(m) => println ("pattern one found")
case ptwo(m) => println ("pattern two found")
case lin => println ("NO match on: " + lin )
}
}
}
// This line only for interpreter start:
MatchPattern.main(Array(""))
----------------

The TextEingabe.txt contains lines like:
Dies ist eine Zeile mit PATONE als pattern.

So the regex SHOULD match.

I always get "NO match on: ", no matter what I try
to declare pone or ptwo. Is it because of the '(m)' parm
in the case clause? Even tried to change patterns to
".*(PATONE).*" to introduce a group, like in the example below.
No way, I can't get pone or ptwo to match.

What is more than a bit frustrating, as a parallel test with Groovy
worked
out of the box:

new File("D:/Temporary/scalatest/TextEingabe.txt").eachLine { line ->
switch (line) {
case ~/.*PATONE.*/ : println "pattern one found on: $line" ;break
case ~/.*PATTWO.*/ : println "pattern two found on: $line" ; break
case ~/.*PAT .*/ : println "pattern with space found on: $line" ;
break
default : println "no match on: $line " ; break
}
}

What did I miss then?

TIA
Det

> -----Original Message-----
> From: Detering Dirk [mailto:Dirk.Detering@bitmarck.de]
> Sent: Monday, June 22, 2009 3:49 PM
> To: Daniel Sobral; David Copeland
> Cc: scala-user@listes.epfl.ch
> Subject: RE: [scala-user] Regular Expression Literals?
>
>
> Perhaps I have misunderstood something ...
> Is the code below supposed to work under 2.7.5 ?
>
> ________________________________
>
> From: Daniel Sobral [mailto:dcsobral@gmail.com]
> Sent: Wednesday, June 17, 2009 7:07 PM
> To: David Copeland
> Cc: scala-user@listes.epfl.ch
> Subject: Re: [scala-user] Regular Expression Literals?
>
>
> val phone800 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r //
> Was $1\- correct? That didn't make sense.
> val phone888 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r
>
> someString match {
> case phone800(number) => "Your call to " + number + "
> will work"
> case phone888(number) => "Your call to " + number + "
> won't work on this old phone"
> case _ => throw new RuntimeExcpetion("String isn't a
> 1-800 style number")
> }
>
>
> ...
>

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Regular Expression Literals?
Yes, I left a "/" by mistake, and forgot to change 800 to 888. But, yes, that is supposed to work, as I guess you found out.   Now, as for the problem. The "extractor" pattern only works if you use () to mark your groups. If you are not interested in knowing what you matched, just that you did, you can use this syntax:   case pone() => println ("pattern one found")   So you extracted nothing from the match, but you tested to see if the match happened. This is a minor nit, and you would have got around that by yourself. I had your problem too, and I was a bit upset by it too. You don't see problems like it in Perl because, there, a match is not anchored. What happened is that the lines you have read contain a newline, and "." does not match a newline by default. You can either change "line match" to "line.stripLineEnd match", or you can write your patterns as """(?s).*PATONE.*""".r. The later flag, (?s), tells the regex library that, in this pattern, "." will match newlines as well.   Then again, chomp was second nature in Perl, so maybe I had trouble there too and I don't recall them. :-)
Personally, I think the matching behavior of Scala's Regex is seriously bogus in trying to match the whole string instead of a a part of it. It seems the person who wrote it was familiar with using "matches" in Java's String library, and just the occasional more advance regex in Java, never having used egrep, sed, perl, or other common regex-heavy programs.
On Mon, Jun 22, 2009 at 11:12 AM, Detering Dirk <Dirk.Detering@bitmarck.de> wrote:
Ok, a bit more context now ...
In the meantime I found the mistake, it is the trailing slash
in the expressions (beside phone888 having 800 in the regex).

Now the code from Daniel works for me.  What's surprising is,
that my own test case does not work:

----------------
import scala.io._
import scala.util.matching._

object MatchPattern extends Application {
  val source = Source.fromFile("TextEingabe.txt")
  val pone =   new Regex(".*PATONE.*")
  val ptwo =   """.*PATTWO.*""".r
  for ( line <- source.getLines ) {
     line match {
       case pone(m) => println ("pattern one found")
       case ptwo(m) => println ("pattern two found")
       case lin     => println ("NO match on: " + lin )
     }
  }
}
// This line only for interpreter start:
MatchPattern.main(Array(""))
----------------

The TextEingabe.txt contains lines like:
Dies ist eine Zeile mit PATONE als pattern.

So the regex SHOULD match.

I always get "NO match on: ", no matter what I try
to declare pone or ptwo. Is it because of the '(m)' parm
in the case clause?  Even tried to change patterns to
".*(PATONE).*" to introduce a group, like in the example below.
No way, I can't get pone or ptwo to match.

What is more than a bit frustrating, as a parallel test with Groovy
worked
out of the box:


new File("D:/Temporary/scalatest/TextEingabe.txt").eachLine { line ->
  switch (line) {
    case ~/.*PATONE.*/ :   println "pattern one found on: $line" ;break
    case ~/.*PATTWO.*/ : println "pattern two found on: $line" ; break
    case ~/.*PAT .*/   : println "pattern with space found on: $line" ;
break
    default       : println "no match on: $line " ; break
  }
}

What did I miss then?

TIA
Det





> -----Original Message-----
> From: Detering Dirk [mailto:Dirk.Detering@bitmarck.de]
> Sent: Monday, June 22, 2009 3:49 PM
> To: Daniel Sobral; David Copeland
> Cc: scala-user@listes.epfl.ch
> Subject: RE: [scala-user] Regular Expression Literals?
>
>
> Perhaps I have misunderstood something ...
> Is the code below supposed to work under 2.7.5 ?
>
> ________________________________
>
>       From: Daniel Sobral [mailto:dcsobral@gmail.com]
>       Sent: Wednesday, June 17, 2009 7:07 PM
>       To: David Copeland
>       Cc: scala-user@listes.epfl.ch
>       Subject: Re: [scala-user] Regular Expression Literals?
>
>
>       val phone800 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r  //
> Was $1\- correct? That didn't make sense.
>       val phone888 = """^1-800-(\d\d\d-\d\d\d\d)$/""".r
>
>       someString match {
>         case phone800(number) => "Your call to " + number + "
> will work"
>         case phone888(number) => "Your call to " + number + "
> won't work on this old phone"
>         case _ => throw new RuntimeExcpetion("String isn't a
> 1-800 style number")
>       }
>
>
> ...
>



--
Daniel C. Sobral

Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
Detering Dirk
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
RE: Regular Expression Literals?

> Now, as for the problem. The "extractor" pattern only works
> if you use () to mark your groups.

Tried that (see my original posting).

> If you are not interested in knowing what you matched, just
>that you did, you can use this syntax:
>
> case pone() => println ("pattern one found")

Tried that too, didn't work, due to ...

> What happened is that the lines you have read contain a newline,
> and "." does not match a newline by default. You can either
> change "line match" to "line.stripLineEnd match", or you can
> write your patterns as """(?s).*PATONE.*""".r. The later
> flag, (?s), tells the regex library that, in this pattern,
> "." will match newlines as well.

That's it!! Thank you very much for that hint.

> Then again, chomp was second nature in Perl,

Yeah, right, and now I have 'stipLineEnd' mentally connected to
'chomp' for the future.
(Embarrased to admit that normally 'trim' was my second nature but
totally forgot that here..)

KR
Det

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland