This page is no longer maintained — Please continue to the home page at www.scala-lang.org

push parsers with chunks?

3 replies
Henry Story
Joined: 2011-03-26,
User offline. Last seen 42 years 45 weeks ago.
I am looking for a parser that can accept chunks of data at a time so that I can do the parsing efficiently in a netty client thread.
The currently RDF parsing library I am using does not allow this. But ifit did it would work something like this gist:
   def onBodyPartReceived(bodyPart: HttpResponseBodyPart) = {
    logger.info("body part n."+counter)    counter += 1    reader.read(model,new ByteArrayInputStream(bodyPart.getBodyPartBytes),base)    STATE.CONTINUE  }

The idea is to use the parser inside the async-http-client , but doing so insidean akka based client would require the same. A chunked push parser is importantfor efficiency, to avoid having to pull all information into memory. 
Pull parsers don't work as they end up consuming up a whole thread.(this is the problem with the current library I am using. I blocks on reader.read(in))
I was looking at the Scala Parser, but I am not sure if it does allow this. 
Some (many? most?) SAX Parsers support such a push model. But XML is not the only document format we need to parse.  Given Scala's emphasison scalability, it would seem that such a push model is really needed forgood agent programming. So I thought there must be somewhere I shouldlook for this. Any ideas?
Henry
PS. Here is a javascript push parser, just in case I am not using the righttechnical word
   https://github.com/polotek/libxmljs/wiki/SaxPushParser
Social Web Architect
http://bblfish.net/
Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: push parsers with chunks?
On Mon, Jan 30, 2012 at 11:48 AM, Henry Story <henry.story@gmail.com> wrote:
I am looking for a parser that can accept chunks of data at a time so that I can do the parsing efficiently in a netty client thread.

The only open source "pushme-pullyou" parser I'm aware of is https://github.com/FasterXML/aalto-xml
-0xe1a
Chris Twiner
Joined: 2008-12-17,
User offline. Last seen 42 years 45 weeks ago.
Re: push parsers with chunks?

On Mon, Jan 30, 2012 at 9:26 PM, Alex Cruise wrote:
> On Mon, Jan 30, 2012 at 11:48 AM, Henry Story wrote:
>>
>> I am looking for a parser that can accept chunks of data at a time so
>> that I can do the parsing efficiently in a netty client thread.
>
>
> The only open source "pushme-pullyou" parser I'm aware of
> is https://github.com/FasterXML/aalto-xml
>
> -0xe1a

yep, I've also found thats the only game in town currently.

I'm going to be attempting an extended pull reader for aalto for
Scales in an upcoming release, as this fits *very* well with Iteratee
based processing. The Iteratee can simply get called when there is
more work to be done - the pull parser yielding EVENT_INCOMPLETE
implying an Empty.

By the way that Dr Doolittle reference is perfect for describing that
relationship :-)

Henry Story
Joined: 2011-03-26,
User offline. Last seen 42 years 45 weeks ago.
streaming parsers Was: push parsers with chunks?

thanks, a lot! That should help me cover XML :-)

So that leaves me still with the question of parsers for non-xml
syntaxes such as JSON, Turtle [1], XSPARQL, and whatever other
domain specific language notation exists out there...

This Nomo library seems to be doing what I am looking for
https://bitbucket.org/pchiusano/nomo

[[
A parser combinator library for Scala supporting streaming parsing, user state, and fine-grained control over backtracking and error reporting. Parsers can be written via the combinators or in natural recursive style - results are trampolined to avoid stack overflows when combining results in arbitrary recursive or mutually recursive grammars.
]]

[1] http://www.w3.org/TR/turtle/

On 30 Jan 2012, at 21:59, Chris Twiner wrote:

> On Mon, Jan 30, 2012 at 9:26 PM, Alex Cruise wrote:
>> On Mon, Jan 30, 2012 at 11:48 AM, Henry Story wrote:
>>>
>>> I am looking for a parser that can accept chunks of data at a time so
>>> that I can do the parsing efficiently in a netty client thread.
>>
>>
>> The only open source "pushme-pullyou" parser I'm aware of
>> is https://github.com/FasterXML/aalto-xml
>>
>> -0xe1a
>
> yep, I've also found thats the only game in town currently.
>
> I'm going to be attempting an extended pull reader for aalto for
> Scales in an upcoming release, as this fits *very* well with Iteratee
> based processing. The Iteratee can simply get called when there is
> more work to be done - the pull parser yielding EVENT_INCOMPLETE
> implying an Empty.
>
> By the way that Dr Doolittle reference is perfect for describing that
> relationship :-)

Social Web Architect
http://bblfish.net/

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland