- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Problem with JavaTokenParsers
Fri, 2008-12-26, 03:58
I'm going crazy trying to parse a really simple text, I want to parse a raw wiki script. After having many problems trying to make it work I reduced the code to the least possible to show what I appears to be a bug, unless there's something I'm missing.
import scala.util.parsing.combinator._
object Parser extends JavaTokenParsers {
def list: Parser[Any] = rep("+"~text)
def text: Parser[Any] = ".*".r
val source = "+elem1\n+elem2\n+elem3"
def main(args: Array[String]) {
println(source)
println(parseAll(list, source))
}
}
Outputs:
[3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
Ok everything works as expected but if "rep("+"~text) " is replaced with "rep(text)" it goes into an infinite loop forcing me to kill the proccess which by the way is reaching a whole GB of ram.
I run into this problem when trying to parse this lists scattered through regular text and now I can't continue because of this. I really need help.
Thanks!
import scala.util.parsing.combinator._
object Parser extends JavaTokenParsers {
def list: Parser[Any] = rep("+"~text)
def text: Parser[Any] = ".*".r
val source = "+elem1\n+elem2\n+elem3"
def main(args: Array[String]) {
println(source)
println(parseAll(list, source))
}
}
Outputs:
[3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
Ok everything works as expected but if "rep("+"~text) " is replaced with "rep(text)" it goes into an infinite loop forcing me to kill the proccess which by the way is reaching a whole GB of ram.
I run into this problem when trying to parse this lists scattered through regular text and now I can't continue because of this. I really need help.
Thanks!
Fri, 2008-12-26, 12:17
#2
Re: Problem with JavaTokenParsers
I have replied directly instead of to the mailing list...
here is what I said...
Well if you just put rep(text) and text is virtually anything, how is it
supposed to be parsed?
This was indeed the problem. I got some debug output using:
def text: Parser[Any] = ".*".r ^^ {x => println(x);x}
And it immediately showed me, that it was trying to match empty strings with
this parser. Which this parser happily accepted.
Change it to:
def text: Parser[Any] = ".+".r
and it works fine...
Roland Kuhn-2 wrote:
>
> Hi Tomás,
>
> On Fri, December 26, 2008 03:58, Tomás Lázaro wrote:
>> I'm going crazy trying to parse a really simple text, I want to parse a
>> raw
>> wiki script. After having many problems trying to make it work I reduced
>> the
>> code to the least possible to show what I appears to be a bug, unless
>> there's something I'm missing.
>>
>> import scala.util.parsing.combinator._
>>
>> object Parser extends JavaTokenParsers {
>> def list: Parser[Any] = rep("+"~text)
>> def text: Parser[Any] = ".*".r
>>
>> val source = "+elem1\n+elem2\n+elem3"
>>
>> def main(args: Array[String]) {
>> println(source)
>> println(parseAll(list, source))
>> }
>> }
>>
>> Outputs:
>>
>> [3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
>>
>> Ok everything works as expected but if "rep("+"~text) " is replaced with
>> "rep(text)" it goes into an infinite loop forcing me to kill the proccess
>> which by the way is reaching a whole GB of ram.
>>
>> I run into this problem when trying to parse this lists scattered through
>> regular text and now I can't continue because of this. I really need
>> help.
>>
>> Thanks!
>>
> The problem is your regex, which happily accepts the empty string. Putting
> that into a "rep" is
> asking for disaster ;-) Without having access to my normal Scala gear, I
> suspect that the regex
> parser does not discard whitespace (the newline) like the literal "+"
> parser does, so you get
> stuck at the end of the first line. I don't know what exactly you are
> trying to parse, but you
> should be more specific with your regex. At least use '+' instead of '*',
> but you can also send me
> a more specific example so I can help you better.
>
> Ciao,
>
> Roland
>
>
>
Fri, 2008-12-26, 18:17
#3
Re: Problem with JavaTokenParsers
Wow that was unexpected... it's like an extremely non-greedy regex. Thanks to all, that got me working again.
(That example was a simplification, it does not make sense to parse any text anywhere.)
On Fri, Dec 26, 2008 at 9:04 AM, Stefan Ackermann-4 <stivo.scala@gmail.com> wrote:
(That example was a simplification, it does not make sense to parse any text anywhere.)
On Fri, Dec 26, 2008 at 9:04 AM, Stefan Ackermann-4 <stivo.scala@gmail.com> wrote:
I have replied directly instead of to the mailing list...
here is what I said...
Well if you just put rep(text) and text is virtually anything, how is it
supposed to be parsed?
This was indeed the problem. I got some debug output using:
def text: Parser[Any] = ".*".r ^^ {x => println(x);x}
And it immediately showed me, that it was trying to match empty strings with
this parser. Which this parser happily accepted.
Change it to:
def text: Parser[Any] = ".+".r
and it works fine...
Roland Kuhn-2 wrote:
>
> Hi Tomás,
>
> On Fri, December 26, 2008 03:58, Tomás Lázaro wrote:
>> I'm going crazy trying to parse a really simple text, I want to parse a
>> raw
>> wiki script. After having many problems trying to make it work I reduced
>> the
>> code to the least possible to show what I appears to be a bug, unless
>> there's something I'm missing.
>>
>> import scala.util.parsing.combinator._
>>
>> object Parser extends JavaTokenParsers {
>> def list: Parser[Any] = rep("+"~text)
>> def text: Parser[Any] = ".*".r
>>
>> val source = "+elem1\n+elem2\n+elem3"
>>
>> def main(args: Array[String]) {
>> println(source)
>> println(parseAll(list, source))
>> }
>> }
>>
>> Outputs:
>>
>> [3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
>>
>> Ok everything works as expected but if "rep("+"~text) " is replaced with
>> "rep(text)" it goes into an infinite loop forcing me to kill the proccess
>> which by the way is reaching a whole GB of ram.
>>
>> I run into this problem when trying to parse this lists scattered through
>> regular text and now I can't continue because of this. I really need
>> help.
>>
>> Thanks!
>>
> The problem is your regex, which happily accepts the empty string. Putting
> that into a "rep" is
> asking for disaster ;-) Without having access to my normal Scala gear, I
> suspect that the regex
> parser does not discard whitespace (the newline) like the literal "+"
> parser does, so you get
> stuck at the end of the first line. I don't know what exactly you are
> trying to parse, but you
> should be more specific with your regex. At least use '+' instead of '*',
> but you can also send me
> a more specific example so I can help you better.
>
> Ciao,
>
> Roland
>
>
>
--
View this message in context: http://www.nabble.com/Problem-with-JavaTokenParsers-tp21171432p21173385.html
Sent from the Scala - User mailing list archive at Nabble.com.
Hi Tomás,
On Fri, December 26, 2008 03:58, Tomás Lázaro wrote:
> I'm going crazy trying to parse a really simple text, I want to parse a raw
> wiki script. After having many problems trying to make it work I reduced the
> code to the least possible to show what I appears to be a bug, unless
> there's something I'm missing.
>
> import scala.util.parsing.combinator._
>
> object Parser extends JavaTokenParsers {
> def list: Parser[Any] = rep("+"~text)
> def text: Parser[Any] = ".*".r
>
> val source = "+elem1\n+elem2\n+elem3"
>
> def main(args: Array[String]) {
> println(source)
> println(parseAll(list, source))
> }
> }
>
> Outputs:
>
> [3.7] parsed: List((+~elem1), (+~elem2), (+~elem3))
>
> Ok everything works as expected but if "rep("+"~text) " is replaced with
> "rep(text)" it goes into an infinite loop forcing me to kill the proccess
> which by the way is reaching a whole GB of ram.
>
> I run into this problem when trying to parse this lists scattered through
> regular text and now I can't continue because of this. I really need help.
>
> Thanks!
>
The problem is your regex, which happily accepts the empty string. Putting that into a "rep" is
asking for disaster ;-) Without having access to my normal Scala gear, I suspect that the regex
parser does not discard whitespace (the newline) like the literal "+" parser does, so you get
stuck at the end of the first line. I don't know what exactly you are trying to parse, but you
should be more specific with your regex. At least use '+' instead of '*', but you can also send me
a more specific example so I can help you better.
Ciao,
Roland