- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
"Content is not allowed in prolog" Parsing XML
Thu, 2009-11-12, 12:33
Hello guys,
I have the following scala code line:
XML.load(new java.io.InputStreamReader(conn.getInputStream, "UTF-8"))
I am downloading and parsing RSS feeds. This line parses most of the sites without problems but some sites give the following error:
Content is not allowed in prolog
Any ideas on how to solve this problem?
Thanks in advance,
GA
Thu, 2009-11-12, 13:27
#2
Re: "Content is not allowed in prolog" Parsing XML
Make sure the XML document you are downloading is not simply loaded from
a file that starts with a Byte Order Mark (BOM).
Regards,
Stuart MacKay
> On Thu, Nov 12, 2009 at 12:32 PM, GA > wrote:
>
> Hello guys,
>
> I have the following scala code line:
>
> XML.load(new java.io.InputStreamReader(conn.getInputStream, "UTF-8"))
>
> I am downloading and parsing RSS feeds. This line parses most of
> the sites without problems but some sites give the following error:
>
> Content is not allowed in prolog
>
> Any ideas on how to solve this problem?
>
> Thanks in advance,
>
> GA
>
>
> Looks like the problematic feeds are not well formed XML. Try
> validating them, to confirm if this is the problem:
> http://beta.feedvalidator.org/
>
> If that is the case, you can not simply rely on XML parsing and would
> have to resort to more liberal parsers.
>
> Regards
> Pavol Vaskovic
Fri, 2009-11-13, 08:27
#3
Re: "Content is not allowed in prolog" Parsing XML
GA schrieb:
> Any ideas on how to solve this problem?
Make the sites produce valid feeds. Otherwise, follow Postel's law and don't use an
XML parser to parse feeds, that hasn't changed in the last seven or so years:
http://diveintomark.org/archives/2002/08/13/ultraliberal_rss_parser
http://www.xml.com/pub/a/2003/01/22/dive-into-xml.html
http://www.ibm.com/developerworks/xml/library/x-tipufp.html
- Florian.
Fri, 2009-11-13, 15:17
#4
Re: "Content is not allowed in prolog" Parsing XML
Try using the XHtml object to parse it instead.
On Thu, Nov 12, 2009 at 9:32 AM, GA <my_lists@me.com> wrote:
--
Daniel C. Sobral
Veni, vidi, veterni.
On Thu, Nov 12, 2009 at 9:32 AM, GA <my_lists@me.com> wrote:
Hello guys,
I have the following scala code line:
XML.load(new java.io.InputStreamReader(conn.getInputStream, "UTF-8"))
I am downloading and parsing RSS feeds. This line parses most of the sites without problems but some sites give the following error:
Content is not allowed in prolog
Any ideas on how to solve this problem?
Thanks in advance,
GA
--
Daniel C. Sobral
Veni, vidi, veterni.
Looks like the problematic feeds are not well formed XML. Try validating them, to confirm if this is the problem: http://beta.feedvalidator.org/
If that is the case, you can not simply rely on XML parsing and would have to resort to more liberal parsers.
Regards
Pavol Vaskovic