- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Backtracking with combinator parsers
Thu, 2011-06-23, 23:57
Hi,
(how) is it possible to handle a grammar like this with combinator parsers?
prod ::= ("a" "b" "a"+)+
object TestParser extends JavaTokenParsers with PackratParsers {
lazy val prod: PackratParser[Any] = rep1("a" ~ "b" ~ rep1("a"))
}
I've tried all kinds of variants of the grammar as well as the parser
construction, but it looks like backtracking doesn't extend to this
scenario. (Naive left recursion works fine.)
The actual use case is parsing a (rather informal and sloppily
formatted) BNF grammar, like
a ::= b |
c
b ::=
IDENT
c ::= IDENT
Best regards,
Patrick
Fri, 2011-06-24, 11:57
#2
Re: Backtracking with combinator parsers
Responding to Randall R Schulz:
> On Thursday 23 June 2011, Patrick Roemer wrote:
>> prod ::= ("a" "b" "a"+)+
>>
>> object TestParser extends JavaTokenParsers with PackratParsers {
>> lazy val prod: PackratParser[Any] = rep1("a" ~ "b" ~ rep1("a"))
>> }
>
> What inputs produce what undesired parse?
"a b a a b a" results in "string matching regex `\z' expected but `b'
found [1.9]".
I've also tried to rephrase it closer to the BNF grammar[1]:
lazy val syntax: PackratParser[Any] = rule | (rule ~ syntax)
lazy val rule: PackratParser[Any] = "a" ~ "b" ~ list
lazy val list: PackratParser[Any] = "a" | ("a" ~ list)
...but this doesn't even match "a b a a"?!
(The BNF grammar includes the line-end as a rule terminator, but the
format of the grammar I'm trying to parse is goofed up so it has line
breaks within the single productions.)
> I ask 'cause your grammar is strictly ambiguous owing to the "a" at the
> beginning and the end of the closed group.
What does "strictly ambiguous" mean in this context? Naively, I'd assume
that handling this kind of construct is locally ambiguous, i.e. requires
backtracking, but in the end there's only one way to match the whole
string...?
> And your BNF is ambiguous for a different reason: Each branch of the
> alternation in the production for "a" is identical.
This was just a silly example of what I'm actually trying to achieve.
With my BNF parser attempt, for this input
a ::= b
c ::= d
I get "`EOF' expected but del '::=' found [2.3]", obviously the same
problem as above.
Best regards,
Patrick
Fri, 2011-06-24, 12:47
#3
Re: Re: Backtracking with combinator parsers
On 24/06/2011, at 8:48 PM, Patrick Roemer wrote:
> lazy val syntax: PackratParser[Any] = rule | (rule ~ syntax)
> lazy val rule: PackratParser[Any] = "a" ~ "b" ~ list
> lazy val list: PackratParser[Any] = "a" | ("a" ~ list)
These rules are unlikely to be what you really want. The | operator in the packet parser library is an ordered choice operator (as for all PEG-style libraries). Thus, something of the form:
foo | foo bar ble
will never reach the "foo bar ble" alternative, since a valid input will start with a foo and the first choice will be committed.
As a concrete illustration, your grammar doesn't match "a b a a" because the first alternatives of syntax and list are chosen, so the input "a b a" is recognised as a syntax (comprised rule, which is comprised of "a b" followed by a list, which is comprised of an "a"). Therefore the syntax parse succeeds, but your overall parse fails because you have not reached the end of the input (\z). (I assume you are using parseAll or phrase.)
If you have two alternatives where one is a prefix of the other, you need to put then that will match longer input first.
Tony
Fri, 2011-06-24, 13:17
#4
Re: Backtracking with combinator parsers
Responding to Tony Sloane:
> On 24/06/2011, at 8:48 PM, Patrick Roemer wrote:
>
>> lazy val syntax: PackratParser[Any] = rule | (rule ~ syntax)
>> lazy val rule: PackratParser[Any] = "a" ~ "b" ~ list
>> lazy val list: PackratParser[Any] = "a" | ("a" ~ list)
>
> These rules are unlikely to be what you really want. The | operator in the packet
> parser library is an ordered choice operator (as for all PEG-style libraries).
> Thus, something of the form:
>
> foo | foo bar ble
>
> will never reach the "foo bar ble" alternative, since a valid input will start with
> a foo and the first choice will be committed.
I see, thanks. Fixing this indeed makes the "abaa" succeed, but this
leaves me with the same problem as the original version: No retry to
start a new rule. So I still wonder whether this is possible using
combinator parsers.
I've tried to port this simple example to ANTLR, and there it seems to
work (with explicit backtracking option):
bnf_grammar returns [String str]:
s = bnf_syntax { $str = s; } EOF ;
bnf_syntax returns [String str] options { backtrack = true; }:
(r2 = bnf_rule s = bnf_syntax { $str = r2 + "\n" + s; })
| r1 = bnf_rule { $str = r1;};
bnf_rule returns [String str]:
i = ID ASSIGN l = bnf_list { $str = i.getText() + " ::= " + l; };
bnf_list returns [String str]:
(i2 = ID l = bnf_list { $str = i2.getText() + " " + l; })
| i1 = ID { $str = i1.getText(); };
Best regards,
Patrick
Fri, 2011-06-24, 14:17
#5
Re: Re: Backtracking with combinator parsers
Have you tried with a negative look-ahead in the parser for list? So,
that the parser "knows" in advance that the list is now finished.
That seems to work:
lazy val prod: PackratParser[Any] = rep1("a" ~ "b" ~ rep1(not("ab") ~ "a"))
On Fri, Jun 24, 2011 at 2:13 PM, Patrick Roemer wrote:
> Responding to Tony Sloane:
>>
>> On 24/06/2011, at 8:48 PM, Patrick Roemer wrote:
>>
>>> lazy val syntax: PackratParser[Any] = rule | (rule ~ syntax)
>>> lazy val rule: PackratParser[Any] = "a" ~ "b" ~ list
>>> lazy val list: PackratParser[Any] = "a" | ("a" ~ list)
>>
>> These rules are unlikely to be what you really want. The | operator in
>> the packet
>> parser library is an ordered choice operator (as for all PEG-style
>> libraries).
>> Thus, something of the form:
>>
>> foo | foo bar ble
>>
>> will never reach the "foo bar ble" alternative, since a valid input will
>> start with
>> a foo and the first choice will be committed.
>
> I see, thanks. Fixing this indeed makes the "abaa" succeed, but this leaves
> me with the same problem as the original version: No retry to start a new
> rule. So I still wonder whether this is possible using combinator parsers.
>
> I've tried to port this simple example to ANTLR, and there it seems to work
> (with explicit backtracking option):
>
> bnf_grammar returns [String str]:
> s = bnf_syntax { $str = s; } EOF ;
> bnf_syntax returns [String str] options { backtrack = true; }:
> (r2 = bnf_rule s = bnf_syntax { $str = r2 + "\n" + s; })
> | r1 = bnf_rule { $str = r1;};
> bnf_rule returns [String str]:
> i = ID ASSIGN l = bnf_list { $str = i.getText() + " ::= " + l; };
> bnf_list returns [String str]:
> (i2 = ID l = bnf_list { $str = i2.getText() + " " + l; })
> | i1 = ID { $str = i1.getText(); };
>
> Best regards,
> Patrick
>
>
Fri, 2011-06-24, 15:37
#6
Re: Backtracking with combinator parsers
Responding to Johannes Rudolph:
> Have you tried with a negative look-ahead in the parser for list? So,
> that the parser "knows" in advance that the list is now finished.
>
> That seems to work:
>
> lazy val prod: PackratParser[Any] = rep1("a" ~ "b" ~ rep1(not("ab") ~ "a"))
Thanks a lot, that was the missing link. This also seems to work for my
actual BNF parser:
lazy val bnf_prod: PackratParser[Any] =
ident ~ ASSIGN ~ bnf_rule
lazy val bnf_rule: PackratParser[Any] =
rep1(not(bnf_alt ~ ASSIGN) ~ bnf_alt)
Best regards,
Patrick
On Thursday 23 June 2011, Patrick Roemer wrote:
> Hi,
>
> (how) is it possible to handle a grammar like this with combinator
> parsers?
>
> prod ::= ("a" "b" "a"+)+
>
> object TestParser extends JavaTokenParsers with PackratParsers {
> lazy val prod: PackratParser[Any] = rep1("a" ~ "b" ~ rep1("a"))
> }
>
> I've tried all kinds of variants of the grammar as well as the parser
> construction, but it looks like backtracking doesn't extend to this
> scenario. (Naive left recursion works fine.)
>
> The actual use case is parsing a (rather informal and sloppily
> formatted) BNF grammar, like
>
> a ::= b |
> c
> b ::=
> IDENT
> c ::= IDENT
>
> Best regards,
> Patrick
What inputs produce what undesired parse?
I ask 'cause your grammar is strictly ambiguous owing to the "a" at the
beginning and the end of the closed group.
And your BNF is ambiguous for a different reason: Each branch of the
alternation in the production for "a" is identical.
Randall Schulz