- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Advice to improve some code?
Mon, 2009-02-02, 03:43
In trying to get better at Scala, I've been trying to work on some small
exercises. Would someone more experienced mind looking at what I've written
and advising if there is a better "Scala way"?
The problem is this: Given some plain text files describing people in a few
different formats, write a program to import and print out all the people.
Here are a few sample input formats:
Input format: Comma separated
# Last name, First Name, Gender, Favorite Color, Birthdate
Anson, Nigel, Male, Tan, 2/16/1941
Bellevue, Tara, Female, Yellow, 4/24/1976
Input format; Pipe separated
# First name, Last Name, Middle Initial, Gender, Birthdate
Stan | Smythe | D | M | Blue | 2-3-1992
Barbara | Sanderson | A | F | Purple | 7-21-1962
I am using the following code to convert the list of files to a List of
Person objects.
case class Person
val inputFormats = List( new CommaParser(Source.fromFile("...")), new
PipeParser(Source.fromFile("...")) )
val allPeople = inputFormats.flatMap { _.parse }
// Sorting and printing the people omitted
Here's the start of the parsing code. I am not using the parser combinator
library since that seemed to be tuned to parsing languages - not necessarily
data. (Is this correct?)
// A base class, using the template method pattern
abstract class FormatParser {
val token: String // The token to split
val input: Source // The lines to parse
// The parse method is abstract, defined per each subclass
def createPerson(tokens: List[String]): Person
def parse(): List[Person] = {
input.getLines.map(parseLine).toList
}
private def parseLine(input: String) = {
val components = input.split(token).map{ _.trim }
createPerson(components.toList)
}
}
I made the Parsers as classes using the template method pattern; the base
class had the functionality to take a list of Strings read from a file,
tokenize based upon the abstract separator, and map each line to a Person
using the abstract parseLine function. Subclasses must implement the token
abstract value as well as the createPerson function.
This use of the template method pattern felt very Java-ish; is there a more
Scala-ish ways to represent this functionality?
// Parser implementations provide the separator token,
// a date format, and a method to convert a list of String tokens
// into a Person object
class CommaParser(val input: Source) extends FormatParser {
val token = ","
val dateFormat = new SimpleDateFormat("m/d/yyyy")
def createPerson(tokens: List[String]) = tokens match {
case lastName :: firstName :: gender :: favoriteColor :: date :: Nil =>
new Person(lastName, firstName, "", gender, favoriteColor,
dateFormat.parse(date))
case _ => throw new IllegalArgumentException("Does not parse!")
}
}
class PipeParser(val input: Source) extends FormatParser {
val token = "\\|"
val dateFormat = new SimpleDateFormat("m-d-yyyy")
def createPerson(tokens: List[String]) = tokens match {
case lastName :: firstName :: middle :: gender :: favoriteColor :: date
:: Nil =>
new Person(lastName, firstName, middle, gender, favoriteColor,
dateFormat.parse(date))
case _ => throw new IllegalArgumentException("Does not parse!")
}
}
Here I use pattern matching to name the fields in the order they appear in
the input text file format.
Can anyone provide me any advice or suggestions as to how to improve this
code? Thank you for your feedback!
Cheers,
Dan
Tue, 2009-02-03, 05:17
#2
Re: Advice to improve some code?
Thank you Jon and Thomas for your feedback - it's very helpful! I like the
idea of using extractors; definitely makes things more declarative.
Two follow-ups:
1. I had originally modeled Gender as an Enumeration. I handled parsing the
String by writing an implicit conversion from string to Gender, but it seems
like a extractor/unapply method could be useful on Enumerations as well?
object Gender extends Enumeration {
val Female = Value("Female")
val Male = Value("Male")
implicit def string2gender(str: String) = str(0) match {
case 'M' => Male
case 'F' => Female
case x => throw new IllegalArgumentException("Unknown Gender: " + x)
}
}
2. These parser objects feel something like the "Method Object" pattern,
where the steps of the parse algorithm is encapsulated in an object. I used
a template method pattern to allow easy creation of variants, but in a
functional language like Scala it seems like higher-order functions would be
a natural alternative. I haven't tried doing it yet, but I think it should
be possible. I liked creating objects because it was easy to name the
implementations and keep them in a List, but again I think this would work
for higher-order functions. My initial reaction was that putting these
algorithms in classes would make it easier to find and name them, but I
think that's my object-oriented background speaking.
What do you think?
Cheers,
Dan
Jon Pretty wrote:
>
> I might suggest the following, though (more as a tour of Scala's
> features than because it's wrong):
>
> Most of the data you are parsing is string data, though gender is more
> like a boolean, and - of course - the date is a date. We could improve
> the pattern matching by defining the following inside FormatParser:
>
> object DateMatcher {
> def unapply(d : String) = try { Some(dateFormat.parse(d)) } catch {
> case e : ParseException => None
> }
> }
>
> object GenderParser {
> // NB. No gender bias intended by arbitrary choice here!
> def unapply(g : String) = g match {
> case "m" | "M" => Some(true)
> case "f" | "F" => Some(false)
> case _ => None
> }
> }
>
Tue, 2009-02-03, 16:57
#3
Re: Advice to improve some code?
Hi Dan,
Daniel Wellman wrote:
> 1. I had originally modeled Gender as an Enumeration. I handled parsing the
> String by writing an implicit conversion from string to Gender, but it seems
> like a extractor/unapply method could be useful on Enumerations as well?
>
> object Gender extends Enumeration {
> val Female = Value("Female")
> val Male = Value("Male")
>
> implicit def string2gender(str: String) = str(0) match {
> case 'M' => Male
> case 'F' => Female
> case x => throw new IllegalArgumentException("Unknown Gender: " + x)
> }
> }
I'd recommend not using an implicit method for this. Every new one you
add loses you a little bit more type safety...
But your suggestion to include an extractor in Enumerations is good:
object Gender extends Enumeration {
val Female = Value("Female")
val Male = Value("Male")
def unapply(s : String) : Option[Value] = // you can write this!
}
You can then match using
string match {
case Gender(g) => ... // g is a Gender
...
}
> 2. These parser objects feel something like the "Method Object" pattern,
> where the steps of the parse algorithm is encapsulated in an object. I used
> a template method pattern to allow easy creation of variants, but in a
> functional language like Scala it seems like higher-order functions would be
> a natural alternative. I haven't tried doing it yet, but I think it should
> be possible. I liked creating objects because it was easy to name the
> implementations and keep them in a List, but again I think this would work
> for higher-order functions. My initial reaction was that putting these
> algorithms in classes would make it easier to find and name them, but I
> think that's my object-oriented background speaking.
Yes, I think that analysis is probably correct. A function is often
preferable over a class. I find that, in a well-designed interface,
Scala's classes can bear a greater correspondence to real-life "classes
of objects" than is practical in Java (which usually involves lots of
classes which just don't exist in real life).
This can only be a good thing.
In general, if you have a class which only really has a single useful
method, I'd consider whether it could be replaced by a method (possibly
curried, but there's no reason why it needs to be higher-order).
In the case of your example, using classes seemed absolutely fine to me,
so I don't want to suggest that it was wrong, but I would probably have
written it something like:
def commaParse(ln : String) : Person = ln.split(", ").toList match {
case ... => ...
...
}
def pipeParse(ln : String) : Person = ln.split(", ").toList match {
case ... => ...
...
}
val persons = Source.fromFile("...").getLines map commaParse
Note that I haven't bothered trying to abstract the two parse methods
into anything more general, because there are only two of them, and I'm
supposing there won't be more (though you may know better)! Noting that
the parsers differ in more than just the delimiter, if I knew there were
a few more, I might use singleton objects inheriting from a common
superclass with a 'parse' method, or if there were lots more, I might
consider a hierarchy of classes, defined such the definition of each new
parser was as minimal as I could make it. (If there were lots and lots
more, I'd write a DSL.)
Cheers,
Jon
Wed, 2009-02-04, 04:17
#4
Re: Advice to improve some code?
Jon,
Thanks for the feedback.
Jon Pretty wrote:
>
> I'd recommend not using an implicit method for this. Every new one you
> add loses you a little bit more type safety...
>
Interesting, but I see your point. An unneeded implicit conversion gives
the compiler more room to accidentally pick the wrong type. It seems like
implicits are then good when you know you have a target type that would not
likely be mistaken for another? (In my example, it would convert to a
Gender.Value, but that's better expressed by an extractor)
Jon Pretty wrote:
>
> Yes, I think that analysis is probably correct. A function is often
> preferable over a class. I find that, in a well-designed interface,
> Scala's classes can bear a greater correspondence to real-life "classes
> of objects" than is practical in Java (which usually involves lots of
> classes which just don't exist in real life).
>
> This can only be a good thing.
>
> In general, if you have a class which only really has a single useful
> method, I'd consider whether it could be replaced by a method (possibly
> curried, but there's no reason why it needs to be higher-order).
>
Well said!
Jon Pretty wrote:
>
> Note that I haven't bothered trying to abstract the two parse methods
> into anything more general, because there are only two of them, and I'm
> supposing there won't be more (though you may know better)! Noting that
> the parsers differ in more than just the delimiter, if I knew there were
> a few more, I might use singleton objects inheriting from a common
> superclass with a 'parse' method, or if there were lots more, I might
> consider a hierarchy of classes, defined such the definition of each new
> parser was as minimal as I could make it. (If there were lots and lots
> more, I'd write a DSL.)
>
So in my example, I didn't show a third type which has a different date
format, different order of fields, and different delimiter. But the point
is taken - it looks like objects provide a nice namespace in which to hide
and reuse "stuff" for an algorithm. Though I suppose I could see using an
object as a namespace to contain three different public parser methods, but
then using private functions inside the object to hide the common bits.
Thanks for all your feedback, it was very helpful!
Cheers,
Dan
Thu, 2009-02-05, 00:07
#5
Re: Advice to improve some code?
Hi Dan,
Daniel Wellman wrote:
> Interesting, but I see your point. An unneeded implicit conversion gives
> the compiler more room to accidentally pick the wrong type. It seems like
> implicits are then good when you know you have a target type that would not
> likely be mistaken for another? (In my example, it would convert to a
> Gender.Value, but that's better expressed by an extractor)
Yeah, implicits can be incredibly useful, and they always seem like a
cool feature if you've never seen them before, but they're just a bit
too magic, and magic has a tendancy to run out.
> So in my example, I didn't show a third type which has a different date
> format, different order of fields, and different delimiter. But the point
> is taken - it looks like objects provide a nice namespace in which to hide
> and reuse "stuff" for an algorithm. Though I suppose I could see using an
> object as a namespace to contain three different public parser methods, but
> then using private functions inside the object to hide the common bits.
You could also do it like this (so many options!):
abstract class Parser(val dateFormat : DateFormat,
val delimeter : String) {
def parse(f : File) = Source.fromFile(f).getLines map { ln =>
createPerson(ln.split(delimeter).toList)
}
def createPerson(line : List[String]) : Person
}
val commaParser = new Parser(new SimpleDateFormat("m/d/yyyy"), ",") {
def createPerson(line : List[String]) = line match { .... }
}
val pipeParser = new Parser(new SimpleDateFormat("m-d-yyyy"),
"\\|") {
def createPerson(line : List[String]) = line match { ... }
}
Where I'm using anonymous classes in place of singletons only to take
advantage of the convenience of their constructor parameters.
> Thanks for all your feedback, it was very helpful!
No trouble! I'm glad it's been useful.
Cheers,
Jon
Hi Dan,
> Can anyone provide me any advice or suggestions as to how to improve this
> code? Thank you for your feedback!
I've had a quick look over it, and there's certainly not a lot wrong
with what you've written.
I might suggest the following, though (more as a tour of Scala's
features than because it's wrong):
Most of the data you are parsing is string data, though gender is more
like a boolean, and - of course - the date is a date. We could improve
the pattern matching by defining the following inside FormatParser:
object DateMatcher {
def unapply(d : String) = try { Some(dateFormat.parse(d)) } catch {
case e : ParseException => None
}
}
object GenderParser {
// NB. No gender bias intended by arbitrary choice here!
def unapply(g : String) = g match {
case "m" | "M" => Some(true)
case "f" | "F" => Some(false)
case _ => None
}
}
This will give us a DateMatcher and GenderMatcher object for each
instance of FormatParser, the former being specific to the date format
defined in the same class. We can then use this in the pattern matchers
of the subclasses, e.g.:
def createPerson(tokens: List[String]) = tokens match {
case lastName :: firstName :: GenderMatcher(gender) ::
favoriteColor :: DateMatcher(date) :: Nil =>
// 'date' is a Date here
// 'gender' is a Boolean
...
case _ => throw new IllegalArgumentException("Does not parse!")
}
We will now get a parse failure if an invalid date or gender is supplied.
Another improvement you could make is to make your parser classes
iterators, so we could use them as:
for(p <- new PipeParser(Source.fromFile("..."))) {
// do something with each Person, p
}
This turns out to be easier than you might expect, because
Source.fromFile("...").getLines already returns an iterator. So we can
just map across the iterator to get an Iterator[Person]:
def persons = input.getLines map { ln =>
// parse each line and return a Person
}
I'll let you put it all together.
Good luck!
Jon