- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Any good Scala/Java solutions for sanitizing HTML?
Wed, 2009-01-21, 04:41
I'd like to use the Scala XPath features, and it's quite possible some
of the HTML I'll be dealing with won't be properly formatted. Can
someone recommend a good sanitizer?
Thanks,
Ken
Wed, 2009-01-21, 08:27
#2
Re: Any good Scala/Java solutions for sanitizing HTML?
I was looking into this recently, and I found an article that was helpful. The comments are worth reading too.
http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/
Cheers
Rich
On Wed, Jan 21, 2009 at 7:22 PM, Florian Hars <hars@bik-gmbh.de> wrote:
--
http://www.richdougherty.com/
http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/
Cheers
Rich
On Wed, Jan 21, 2009 at 7:22 PM, Florian Hars <hars@bik-gmbh.de> wrote:
Kenneth McDonald schrieb:
> I'd like to use the Scala XPath features, and it's quite possible some
> of the HTML I'll be dealing with won't be properly formatted. Can
> someone recommend a good sanitizer?
http://www.nabble.com/How-to-use-TagSoup-with-Scala-XML--td17575225.html
- Florian
--
http://www.richdougherty.com/
Sat, 2009-01-24, 21:57
#3
Re: Any good Scala/Java solutions for sanitizing HTML?
Rich Dougherty schrieb:
> I was looking into this recently, and I found an article that was
> helpful. The comments are worth reading too.
>
> http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/
Most are DOM parsers, while scala wants SAX. I put up code for the two
that are usable without a DOM2SAX converter there:
http://www.hars.de/2009/01/html-as-xml-in-scala.html
- Florian
Kenneth McDonald schrieb:
> I'd like to use the Scala XPath features, and it's quite possible some
> of the HTML I'll be dealing with won't be properly formatted. Can
> someone recommend a good sanitizer?
http://www.nabble.com/How-to-use-TagSoup-with-Scala-XML--td17575225.html
- Florian