- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
More elegant way of reading HTML from a URL than this?
Wed, 2009-01-21, 04:51
Here's a bit of code I wrote to read the HTML from a URL, and return
it as a string. I was wondering if a Scala guru could show me the
"right" way to do this. I'm sure there's a more elegant solution.
-----------------------------
class URLLineReader(url:String) extends Iterator[String] {
val reader = new java.io.BufferedReader(new
java.io.InputStreamReader(new java.net.URL(url).openStream()))
var line:String = null;
def hasNext = {
line = reader.readLine()
line != null
}
def next = line
}
object Main {
def main(args: Array[String]) {
val reader = new URLLineReader("http://www.yahoo.com/")
val html = (for (line <- reader) yield line).mkString("")
println(html)
}
}
------------------------------
Wed, 2009-01-21, 09:17
#2
Re: More elegant way of reading HTML from a URL than this?
Scala apart, it's quite bad style for hasNext() not to be idempotent.
O/H Kenneth McDonald έγραψε:
> Here's a bit of code I wrote to read the HTML from a URL, and return
> it as a string. I was wondering if a Scala guru could show me the
> "right" way to do this. I'm sure there's a more elegant solution.
>
> -----------------------------
> class URLLineReader(url:String) extends Iterator[String] {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new java.net.URL(url).openStream()))
> var line:String = null;
>
> def hasNext = {
> line = reader.readLine()
> line != null
> }
>
> def next = line
> }
>
> object Main {
> def main(args: Array[String]) {
> val reader = new URLLineReader("http://www.yahoo.com/")
> val html = (for (line <- reader) yield line).mkString("")
> println(html)
> }
> }
> ------------------------------
>
>
Wed, 2009-01-21, 10:57
#3
Re: More elegant way of reading HTML from a URL than this?
Well besides the fact that the code doesn't honor its contract because the next() operator doesn't return the next line, like it's supposed to, if you haven't called hasNext() first.
So not only it's bad style, it's just plain wrong :)
On Wed, Jan 21, 2009 at 09:07, Dimitris Andreou <jim.andreou@gmail.com> wrote:
So not only it's bad style, it's just plain wrong :)
On Wed, Jan 21, 2009 at 09:07, Dimitris Andreou <jim.andreou@gmail.com> wrote:
..8ex; padding-left: 1ex;"> Scala apart, it's quite bad style for hasNext() not to be idempotent.
O/H Kenneth McDonald έγραψε:
Here's a bit of code I wrote to read the HTML from a URL, and return it as a string. I was wondering if a Scala guru could show me the "right" way to do this. I'm sure there's a more elegant solution.
-----------------------------
class URLLineReader(url:String) extends Iterator[String] {
val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new java.net.URL(url).openStream()))
var line:String = null;
def hasNext = {
line = reader.readLine()
line != null
}
def next = line
}
object Main {
def main(args: Array[String]) {
val reader = new URLLineReader("http://www.yahoo.com/")
val html = (for (line <- reader) yield line).mkString("")
println(html)
}
}
------------------------------
Wed, 2009-01-21, 12:27
#4
Re: More elegant way of reading HTML from a URL than this?
class URLLineReader(url: String) {
val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new java.net.URL(url).openStream(), "US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T = reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
}
val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new java.net.URL(url).openStream(), "US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T = reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
}
Wed, 2009-01-21, 17:27
#5
Re: More elegant way of reading HTML from a URL than this?
At least the original not-so-precise version was almost linear. This is
quadratic. And pages tend to be quite lengthy these days, so beware.
O/H Ricky Clarkson έγραψε:
> class URLLineReader(url: String) {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new java.net.URL(url).openStream(),
> "US-ASCII"));
> def foldLeft[T](init: T)(f: (T, String) => T): T = reader.readLine
> match {
> case null => init
> case line => foldLeft(f(init, line))(f)
> }
> }
>
> object Main {
> def main(args: Array[String]) = println(new
> URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
> }
Wed, 2009-01-21, 17:37
#6
Re: More elegant way of reading HTML from a URL than this?
TRUE, TRUE, TRUE (!)
--
__~O
-\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com
Scala apart, it's quite bad style for hasNext() not to be idempotent.
--
__~O
-\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com
Wed, 2009-01-21, 17:47
#7
Re: More elegant way of reading HTML from a URL than this?
InputStreamResource.url("http://...").readString
InputStreamResource.url("http://...").readLines
InputStreamResource.url("http://...").lines.foreach(println(_))
InputStreamResource is part of scalax.
BTW, I think InputStreamResource-like classes must be included into
the scala standard library.
S.
On Wed, Jan 21, 2009 at 06:39, Kenneth McDonald
wrote:
> Here's a bit of code I wrote to read the HTML from a URL, and return it as a
> string. I was wondering if a Scala guru could show me the "right" way to do
> this. I'm sure there's a more elegant solution.
>
> -----------------------------
> class URLLineReader(url:String) extends Iterator[String] {
> val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new
> java.net.URL(url).openStream()))
> var line:String = null;
>
> def hasNext = {
> line = reader.readLine()
> line != null
> }
>
> def next = line
> }
>
> object Main {
> def main(args: Array[String]) {
> val reader = new URLLineReader("http://www.yahoo.com/")
> val html = (for (line <- reader) yield line).mkString("")
> println(html)
> }
> }
> ------------------------------
>
>
Wed, 2009-01-21, 17:47
#8
Re: More elegant way of reading HTML from a URL than this?
How is what I showed quadratic?
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com>
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com>
jim.andreou@gmail.com>At least the original not-so-precise version was almost linear. This is quadratic. And pages tend to be quite lengthy these days, so beware.
O/H Ricky Clarkson έγραψε:
class URLLineReader(url: String) {
val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new java.net.URL(url).openStream(), "US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T = reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
}
Wed, 2009-01-21, 18:07
#9
Re: More elegant way of reading HTML from a URL than this?
Stepan Koltsov wrote:
> InputStreamResource.url("http://...").readString
>
> InputStreamResource.url("http://...").readLines
>
> InputStreamResource.url("http://...").lines.foreach(println(_))
>
> InputStreamResource is part of scalax.
>
> BTW, I think InputStreamResource-like classes must be included into
> the scala standard library.
I second that. Scala's included IO package is pretty anemic. At the very
least, it would be nice to have some wrappers similar to JCL to add some
nice scala-ish functionality to existing Java IO classes. Of course, I
don't personally have time to work on it so I can't complain too loudly ;)
Derek
Wed, 2009-01-21, 18:17
#10
Re: More elegant way of reading HTML from a URL than this?
Maybe my scala-code-parsing brain neurons are still too weak, but I
think you wrote the equivalent of:
val lines: Seq[String] = ...
var output = ""
for (line <- lines) output += line
No?
O/H Ricky Clarkson έγραψε:
> How is what I showed quadratic?
>
> 2009/1/21 Dimitris Andreou >
>
> At least the original not-so-precise version was almost linear.
> This is quadratic. And pages tend to be quite lengthy these days,
> so beware.
>
> O/H Ricky Clarkson έγραψε:
>
> class URLLineReader(url: String) {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new java.net.URL(url).openStream(),
> "US-ASCII"));
> def foldLeft[T](init: T)(f: (T, String) => T): T =
> reader.readLine match {
> case null => init
> case line => foldLeft(f(init, line))(f)
> }
> }
>
> object Main {
> def main(args: Array[String]) = println(new
> URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
> }
>
>
>
Wed, 2009-01-21, 19:17
#11
Re: More elegant way of reading HTML from a URL than this?
If performance is such an issue, couldn't you first get the content-length from the HTTP headers and then allocate the initial capacity of a StringBuilder with that content-length. StringBuilder's append should be faster than String concatenation.
On Wed, Jan 21, 2009 at 12:00 PM, Dimitris Andreou <jim.andreou@gmail.com> wrote:
On Wed, Jan 21, 2009 at 12:00 PM, Dimitris Andreou <jim.andreou@gmail.com> wrote:
Maybe my scala-code-parsing brain neurons are still too weak, but I think you wrote the equivalent of:
val lines: Seq[String] = ...
var output = ""
for (line <- lines) output += line
No?
O/H Ricky Clarkson έγραψε:
How is what I showed quadratic?
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com <mailto:jim.andreou@gmail.com>>
At least the original not-so-precise version was almost linear.
This is quadratic. And pages tend to be quite lengthy these days,
so beware.
O/H Ricky Clarkson έγραψε:
class URLLineReader(url: String) {
val reader = new java.io.BufferedReader(new
java.io.InputStreamReader(new java.net.URL(url).openStream(),
..openStream(),
"US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T =
reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new
URLLineReader("http://www.yahoo.com/").foldLeft("&q..com/" target="_blank">http://www.yahoo.com/").foldLeft("")(_ + _))
}
Wed, 2009-01-21, 19:27
#12
Re: More elegant way of reading HTML from a URL than this?
But then you'd have to have two branches in the code, one for responses _with_ Content-Length, and one for terminated-at-end-of-transmission logic.
2009/1/21 Bryan <<..
2009/1/21 Bryan <germish@gmail.com>
--
Viktor Klang
Senior Systems Analyst
2009/1/21 Bryan <<..
2009/1/21 Bryan <germish@gmail.com>
If performance is such an issue, couldn't you first get the content-length from the HTTP headers and then allocate the initial capacity of a StringBuilder with that content-length. StringBuilder's append should be faster than String concatenation.
On Wed, Jan 21, 2009 at 12:00 PM, Dimitris Andreou <jim.andreou@gmail.com> wrote:
Maybe my scala-code-parsing brain neurons are still too weak, but I think you wrote the equivalent of:
val lines: Seq[String] = ...
var output = ""
for (line <- lines) output += line
No?
O/H Ricky Clarkson έγραψε:
How is what I showed quadratic?
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com <mailto:jim.andreou@gmail.com>>
At least the original not-so-precise version was almost linear.
This is quadratic. And pages tend to be quite lengthy these days,
so beware.
O/H Ricky Clarkson έγραψε:
class URLLineReader(url: String) {
val reader = new java.io.BufferedReader(new
java.io.InputStreamReader(new java.net.URL(url).openStream(),
..openStream(),
..openStream(),
...openStream(),
"US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T =
reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new
URLLineReader("http://www.yahoo.com/").foldLeft("&a..com/" target="_blank">http://www.yahoo.com/").foldLeft("&q..com/" target="_blank">http://www.yahoo.com/").foldLeft("&q..com/" target="_blank">http://www.yahoo.com/").foldLeft("")(_ + _))
}
--
Viktor Klang
Senior Systems Analyst
Wed, 2009-01-21, 20:17
#13
Re: More elegant way of reading HTML from a URL than this?
Indeed. I was looking in the URLLineReader class for, um, quadraticity. Here's a fixed up main:
object Main {
def main(args: Array[String]) = println(new URLLineReader("http://www.yahoo.com/").foldLeft(new StringBuilder)(_ append _))
}
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com>
object Main {
def main(args: Array[String]) = println(new URLLineReader("http://www.yahoo.com/").foldLeft(new StringBuilder)(_ append _))
}
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com>
Maybe my scala-code-parsing brain neurons are still too weak, but I think you wrote the equivalent of:
val lines: Seq[String] = ...
var output = ""
for (line <- lines) output += line
No?
O/H Ricky Clarkson έγραψε:
How is what I showed quadratic?
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com <mailto:jim.andreou@gmail.com>>
At least the original not-so-precise version was almost linear.
This is quadratic. And pages tend to be quite lengthy these days,
so beware.
O/H Ricky Clarkson έγραψε:
class URLLineReader(url: String) {
val reader = new java.io.BufferedReader(new
java.io.InputStreamReader(new java.net.URL(url).openStream(),
..openStream(),
"US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T =
reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new
URLLineReader("http://www.yahoo.com/").foldLeft("&q..com/" target="_blank">http://www.yahoo.com/").foldLeft("")(_ + _))
}
Wed, 2009-01-21, 20:47
#14
Re: More elegant way of reading HTML from a URL than this?
Surely. It would be much faster even with the typically modest default
initial size.
I wanted to make the (obvious, in my opinion) point that making an
algorithm so much slower is inexcusable, for whatever kind of elegance's
sake. (I had thought that Ricky consciously chosen this kind of
'elegance' over that performance, but probably by mistake, so it's ok)
O/H Bryan έγραψε:
> If performance is such an issue, couldn't you first get the
> content-length from the HTTP headers and then allocate the initial
> capacity of a StringBuilder with that content-length. StringBuilder's
> append should be faster than String concatenation.
>
> On Wed, Jan 21, 2009 at 12:00 PM, Dimitris Andreou
> > wrote:
>
> Maybe my scala-code-parsing brain neurons are still too weak, but
> I think you wrote the equivalent of:
>
> val lines: Seq[String] = ...
> var output = ""
> for (line <- lines) output += line
>
> No?
>
>
> O/H Ricky Clarkson ������:
>
> How is what I showed quadratic?
>
> 2009/1/21 Dimitris Andreou >>
>
>
> At least the original not-so-precise version was almost linear.
> This is quadratic. And pages tend to be quite lengthy these
> days,
> so beware.
>
> O/H Ricky Clarkson ������:
>
> class URLLineReader(url: String) {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new
> java.net.URL(url).openStream(),
> "US-ASCII"));
> def foldLeft[T](init: T)(f: (T, String) => T): T =
> reader.readLine match {
> case null => init
> case line => foldLeft(f(init, line))(f)
> }
> }
>
> object Main {
> def main(args: Array[String]) = println(new
> URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
> }
>
>
>
>
>
Wed, 2009-01-21, 20:57
#15
Re: More elegant way of reading HTML from a URL than this?
Actually I was choosing readability and referential transparency over performance. Computer programs are primarily for humans to read, and only incidentally for machines to execute. (probably a paraphrase, rather than a quote, from SICP).
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com>
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com>
Surely. It would be much faster even with the typically modest default initial size.
I wanted to make the (obvious, in my opinion) point that making an algorithm so much slower is inexcusable, for whatever kind of elegance's sake. (I had thought that Ricky consciously chosen this kind of 'elegance' over that performance, but probably by mistake, so it's ok)
O/H Bryan έγραψε:
If performance is such an issue, couldn't you first get the content-length from the HTTP headers and then allocate the initial capacity of a StringBuilder with that content-length. StringBuilder's append should be faster than String concatenation.
On Wed, Jan 21, 2009 at 12:00 PM, Dimitris Andreou <jim.andreou@gmail.com <mailto:jim.andreou@gmail.com>> wrote:
Maybe my scala-code-parsing brain neurons are still too weak, but
I think you wrote the equivalent of:
val lines: Seq[String] = ...
var output = ""
for (line <- lines) output += line
No?
O/H Ricky Clarkson ������:
How is what I showed quadratic?
2009/1/21 Dimitris Andreou <jim.andreou@gmail.com
<mailto:jim.andreou@gmail.com> <mailto:jim.andreou@gmail.com
<mailto:jim.andreou@gmail.com>>>
At least the original not-so-precise version was almost linear.
This is quadratic. And pages tend to be quite lengthy these
days,
so beware.
O/H Ricky Clarkson ������:
class URLLineReader(url: String) {
val reader = new java.io.BufferedReader(new
java.io.InputStreamReader(new
java.net.URL(url).openStream(),
"US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T =
reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new
URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
}
Thu, 2009-01-22, 13:37
#16
Re: More elegant way of reading HTML from a URL than this?
here the URLLineReader using the java.util.Scanner
--------------------
class URLLineReader(urlstring:String) extends Iterator[String] {
val url = new java.net.URL(urlstring)
val scan = new java.util.Scanner(url.openStream)
def hasNext = scan.hasNextLine
def next = scan.nextLine
}
--------------------
and if you like to read the text in one piece
--------------------
def text(urlstring:String):String = {
val url = new java.net.URL(urlstring)
val scan = new java.util.Scanner(url.openStream)
scan.useDelimiter("\\Z") /* End Of File */
scan.next
}
--------------------
-----------------------------
class URLLineReader(url:String) extends Iterator[String] {
val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new java.net.URL(url).openStream()))
var line:String = null;
def hasNext = {
line = reader.readLine()
line != null
}
def next = line
}
object Main {
def main(args: Array[String]) {
val reader = new URLLineReader("http://www.yahoo.com/")
val html = (for (line <- reader) yield line).mkString("")
println(html)
}
}
------------------------------