This page is no longer maintained — Please continue to the home page at www.scala-lang.org

"Content is not allowed in prolog" Parsing XML

4 replies
GA
Joined: 2009-10-23,
User offline. Last seen 42 years 45 weeks ago.

Hello guys,

I have the following scala code line:

XML.load(new java.io.InputStreamReader(conn.getInputStream, "UTF-8"))

I am downloading and parsing RSS feeds. This line parses most of the sites without problems but some sites give the following error:

Content is not allowed in prolog

Any ideas on how to solve this problem?

Thanks in advance,

GA

Pavol Vaskovic
Joined: 2009-11-10,
User offline. Last seen 42 years 45 weeks ago.
Re: "Content is not allowed in prolog" Parsing XML
On Thu, Nov 12, 2009 at 12:32 PM, GA <my_lists@me.com> wrote:
Hello guys,

I have the following scala code line:

XML.load(new java.io.InputStreamReader(conn.getInputStream, "UTF-8"))

I am downloading and parsing RSS feeds. This line parses most of the sites without problems but some sites give the following error:

Content is not allowed in prolog

Any ideas on how to solve this problem?

Thanks in advance,

GA

Looks like the problematic feeds are not well formed XML. Try validating them, to confirm if this is the problem: http://beta.feedvalidator.org/

If that is the case, you can not simply rely on XML parsing and would have to resort to more liberal parsers.

Regards
Pavol Vaskovic
Stuart MacKay
Joined: 2009-11-12,
User offline. Last seen 42 years 45 weeks ago.
Re: "Content is not allowed in prolog" Parsing XML

Make sure the XML document you are downloading is not simply loaded from
a file that starts with a Byte Order Mark (BOM).

Regards,
Stuart MacKay

> On Thu, Nov 12, 2009 at 12:32 PM, GA > wrote:
>
> Hello guys,
>
> I have the following scala code line:
>
> XML.load(new java.io.InputStreamReader(conn.getInputStream, "UTF-8"))
>
> I am downloading and parsing RSS feeds. This line parses most of
> the sites without problems but some sites give the following error:
>
> Content is not allowed in prolog
>
> Any ideas on how to solve this problem?
>
> Thanks in advance,
>
> GA
>
>
> Looks like the problematic feeds are not well formed XML. Try
> validating them, to confirm if this is the problem:
> http://beta.feedvalidator.org/
>
> If that is the case, you can not simply rely on XML parsing and would
> have to resort to more liberal parsers.
>
> Regards
> Pavol Vaskovic

Florian Hars 2
Joined: 2009-11-01,
User offline. Last seen 42 years 45 weeks ago.
Re: "Content is not allowed in prolog" Parsing XML

GA schrieb:
> Any ideas on how to solve this problem?

Make the sites produce valid feeds. Otherwise, follow Postel's law and don't use an
XML parser to parse feeds, that hasn't changed in the last seven or so years:

http://diveintomark.org/archives/2002/08/13/ultraliberal_rss_parser
http://www.xml.com/pub/a/2003/01/22/dive-into-xml.html
http://www.ibm.com/developerworks/xml/library/x-tipufp.html

- Florian.

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: "Content is not allowed in prolog" Parsing XML
Try using the XHtml object to parse it instead.

On Thu, Nov 12, 2009 at 9:32 AM, GA <my_lists@me.com> wrote:
Hello guys,

I have the following scala code line:

XML.load(new java.io.InputStreamReader(conn.getInputStream, "UTF-8"))

I am downloading and parsing RSS feeds. This line parses most of the sites without problems but some sites give the following error:

Content is not allowed in prolog

Any ideas on how to solve this problem?

Thanks in advance,

GA



--
Daniel C. Sobral

Veni, vidi, veterni.

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland