This page is no longer maintained — Please continue to the home page at www.scala-lang.org

XhtmlParser (or ConstructingParser) Usage

3 replies
Bastian, Mark
Joined: 2009-01-16,
User offline. Last seen 42 years 45 weeks ago.
XhtmlParser (or ConstructingParser) Usage Hi Folks,

I am exploring the scala.xml.parsing package and am trying to load some HTML up in the REPL. Here’s the command I am executing (It’s long, but complete):

scala.xml.parsing.XhtmlParser(scala.io.Source.fromURL(java.net.URI.create("http://www.java.net/").toURL))

When I do it, I get a whole bunch of stuff kind of like this (what seems like hundreds of lines of it):

:17:24: '/' expected instead of ''                       ^
:17:24: name expected, but char '' cannot start a name                       ^
:17:24: '>' expected instead of ''                       ^

The final result is this:
res1: scala.xml.NodeSeq = Document(<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< The=""><><><><><><><><><><><></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></>)

Clearly, something isn’t being parsed correctly. Is there something that I am doing incorrectly? Any tips on what I should be doing?

I get similar results when I try to use the ConstructingParser.

Thanks,
Mark
David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: XhtmlParser (or ConstructingParser) Usage
Java.net is not an XHTML site, it's HTML... which is not well formed XML.  It's not going to parse as XML.  Sorry.

On Tue, May 25, 2010 at 2:32 PM, Bastian, Mark <mbastia@sandia.gov> wrote:
Hi Folks,

I am exploring the scala.xml.parsing package and am trying to load some HTML up in the REPL. Here’s the command I am executing (It’s long, but complete):

scala.xml.parsing.XhtmlParser(scala.io.Source.fromURL(java.net.URI.create("http://www.java.net/").toURL))

When I do it, I get a whole bunch of stuff kind of like this (what seems like hundreds of lines of it):

:17:24: '/' expected instead of ''                       ^
:17:24: name expected, but char '' cannot start a name                       ^
:17:24: '>' expected instead of ''                       ^

The final result is this:
res1: scala.xml.NodeSeq = Document(<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< The=""><><><><><><><><><><><></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></>)

Clearly, something isn’t being parsed correctly. Is there something that I am doing incorrectly? Any tips on what I should be doing?

I get similar results when I try to use the ConstructingParser.

Thanks,
Mark



--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics
Bastian, Mark
Joined: 2009-01-16,
User offline. Last seen 42 years 45 weeks ago.
Re: XhtmlParser (or ConstructingParser) Usage
Re: [scala-user] XhtmlParser (or ConstructingParser) Usage Is there any canned functionality in the API for handling HTML? Just wondering.

On 5/25/10 5:20 PM, "David Pollak" <feeder [dot] of [dot] the [dot] bears [at] gmail [dot] com" rel="nofollow">feeder.of.the.bears@gmail.com> wrote:

Java.net is not an XHTML site, it's HTML... which is not well formed XML.  It's not going to parse as XML.  Sorry.

On Tue, May 25, 2010 at 2:32 PM, Bastian, Mark <mbastia [at] sandia [dot] gov" rel="nofollow">mbastia@sandia.gov> wrote:
Hi Folks,

I am exploring the scala.xml.parsing package and am trying to load some HTML up in the REPL. Here’s the command I am executing (It’s long, but complete):

scala.xml.parsing.XhtmlParser(scala.io.Source.fromURL(java.net.URI.create("http://www.java.net/ <http://www.java.net/> ").toURL))

When I do it, I get a whole bunch of stuff kind of like this (what seems like hundreds of lines of it):

:17:24: '/' expected instead of ''                       ^
:17:24: name expected, but char '' cannot start a name                       ^
:17:24: '>' expected instead of ''                       ^

The final result is this:
res1: scala.xml.NodeSeq = Document(<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< The=""><><><><><><><><><><><></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></>)

Clearly, something isn’t being parsed correctly. Is there something that I am doing incorrectly? Any tips on what I should be doing?

I get similar results when I try to use the ConstructingParser.

Thanks,
Mark


David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: XhtmlParser (or ConstructingParser) Usage


On Wed, May 26, 2010 at 6:43 AM, Bastian, Mark <mbastia@sandia.gov> wrote:
Is there any canned functionality in the API for handling HTML? Just wondering.

No.  There are some Java libraries (you can Google for them) that parse HTML and that you can use from Scala.  


On 5/25/10 5:20 PM, "David Pollak" <feeder [dot] of [dot] the [dot] bears [at] gmail [dot] com" target="_blank" rel="nofollow">feeder.of.the.bears@gmail.com> wrote:

Java.net is not an XHTML site, it's HTML... which is not well formed XML.  It's not going to parse as XML.  Sorry.

On Tue, May 25, 2010 at 2:32 PM, Bastian, Mark <mbastia [at] sandia [dot] gov" target="_blank" rel="nofollow">mbastia@sandia.gov> wrote:
Hi Folks,

I am exploring the scala.xml.parsing package and am trying to load some HTML up in the REPL. Here’s the command I am executing (It’s long, but complete):

scala.xml.parsing.XhtmlParser(scala.io.Source.fromURL(java.net.URI.create("http://www.java.net/ <http://www.java.net/> ").toURL))

When I do it, I get a whole bunch of stuff kind of like this (what seems like hundreds of lines of it):

:17:24: '/' expected instead of ''                       ^
:17:24: name expected, but char '' cannot start a name                       ^
:17:24: '>' expected instead of ''                       ^

The final result is this:
res1: scala.xml.NodeSeq = Document(<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< The=""><><><><><><><><><><><></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></></>)

Clearly, something isn’t being parsed correctly. Is there something that I am doing incorrectly? Any tips on what I should be doing?

I get similar results when I try to use the ConstructingParser.

Thanks,
Mark





--
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Surf the harmonics

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland