Using lookahead in the new parser in trunk

5 replies

Wed, 2009-05-27, 10:47

Anders Bach Nielsen

Joined: 2008-12-17,

Hey All

I am currently updating an old patch to the Parser
(ast/parser/Parsers.scala) but I have a problem using the next token!

I can use in.token to test what token I can looking at now, but the
in.next.token is always EMPTY (-3) ...

So what I want to do and what worked before is

if (in.token == SUPER && in.next.token != LBRACKET && in.next.token != DOT ) {

This will find all cases where the super keyword is used and it not part
of a bath or anything. How should something like this be written now?

/Anders

Wed, 2009-05-27, 14:47

Anders Bach Nielsen

Joined: 2008-12-17,

Re: Using lookahead in the new parser in trunk

After a little talk with paulp on IRC, we wondered about the origin of
Scanner1 (before it was merged in as Scanner in the current trunk).

Before there was Scanner and NewScanner and it was NewScanner that was
used in the compiler. Then Martin created Scanner1 from file ? (yes,
there is no svn log of what the origin of Scanner1 is, if you know
better please say so).
This new Scanner1 shares directly some code with NewScanner, but there
are some things that are missing.
In Scanner1 (the current Scanner) there is a next and prev token that
can be used for lookahead (if you trust the comment), but most of the
time next is EMPTY.

My understanding of (how) the design of the scanner (should have been)
is that in.token holds the current token, in.next.token holds the next token and
prev holds the previeus one. When we do a nextToken() we move the
content of next to this and load the next token in in.next... This way
you have the one token lookahead. Currently this is not what is
happening, as far as I can see in the source.

/Anders

Anders Bach Nielsen wrote:
> Hey All
>
> I am currently updating an old patch to the Parser
> (ast/parser/Parsers.scala) but I have a problem using the next token!
>
> I can use in.token to test what token I can looking at now, but the
> in.next.token is always EMPTY (-3) ...
>
> So what I want to do and what worked before is
>
> if (in.token == SUPER && in.next.token != LBRACKET && in.next.token != DOT ) {
>
> This will find all cases where the super keyword is used and it not part
> of a bath or anything. How should something like this be written now?
>
> /Anders
>

Wed, 2009-05-27, 14:57

odersky

Joined: 2008-07-29,

Re: Using lookahead in the new parser in trunk

Scanners1 was the old Scanners before Sean replaced that new
NewScanners. So Scanners1/Scanners is a reversal to pre Sean state.
NewScanners had issues with the new collections and IO did not
understand it sufficiently well to fix it with confidence. Besides a
lot of it was obsolete, dating from some failed experiments with the
Eclipse plugin.

next and prev are for internal Scanner consumption only. The scanner
stores there the next token if it was forced to do a lookahead.
Otherwise it is empty.

(We are currently re-doing the Eclipse plugin. Having a real fast
Scanners & Parsers combo is essential for this, because it will be
invoked on every keystroke. So, I am against nice to have's that would
slow down the Scanner.)

Hope this helps

Wed, 2009-05-27, 15:07

extempore

Joined: 2008-12-17,

Re: Using lookahead in the new parser in trunk

On Wed, May 27, 2009 at 03:53:30PM +0200, martin odersky wrote:
> (We are currently re-doing the Eclipse plugin. Having a real fast
> Scanners & Parsers combo is essential for this, because it will be
> invoked on every keystroke. So, I am against nice to have's that would
> slow down the Scanner.)

I can totally understand not keeping the next token "hot", but unless
you object I figured I would encapsulate the logic already used where a
lookahead is necessary so at least it doesn't get duplicated:

if (token == CASE) {
prev copyFrom this
val nextLastOffset = charOffset - 1
fetchToken()
...
} else {
lastOffset = nextLastOffset
next copyFrom this
this copyFrom prev

Wed, 2009-05-27, 15:17

Anders Bach Nielsen

Joined: 2008-12-17,

Re: Using lookahead in the new parser in trunk

Hey Martin

Yes for the Eclipse perspective a fast scanner/parser combo is
important, that I can see and I wont argue ;-)

But, the early definitions syntax as you surgested yourself should use
the super keyword as a delimiter between the early definition statements
and the normal body statements. This means that I have to identify the
super statement and not the super[X] and super.something, because they
are also valid, but not as the delimiter between the two parts.

One solution could be to create a new Token SUPERKW (Super Key Word)
that is identified in the same way as CASECLASS and the like in the
Scanner, this way a simple match in the Parser can be implemented to
find this special super delimiter. What is your oppinion about this or
do you have a better idea?

/Anders

Martin Odersky wrote:
> Scanners1 was the old Scanners before Sean replaced that new
> NewScanners. So Scanners1/Scanners is a reversal to pre Sean state.
> NewScanners had issues with the new collections and IO did not
> understand it sufficiently well to fix it with confidence. Besides a
> lot of it was obsolete, dating from some failed experiments with the
> Eclipse plugin.
>
> next and prev are for internal Scanner consumption only. The scanner
> stores there the next token if it was forced to do a lookahead.
> Otherwise it is empty.
>
> (We are currently re-doing the Eclipse plugin. Having a real fast
> Scanners & Parsers combo is essential for this, because it will be
> invoked on every keystroke. So, I am against nice to have's that would
> slow down the Scanner.)
>
> Hope this helps
>

Sat, 2009-05-30, 22:17

odersky

Joined: 2008-07-29,

Re: Using lookahead in the new parser in trunk

Hi Paul, Anders:

Yes, adding a method for getting a lookahead token in Scanners makes sense.

Cheers

Scala Main Menu

Using lookahead in the new parser in trunk

Scala Quick Links

Featured News

User login