This page is no longer maintained — Please continue to the home page at www.scala-lang.org

not easy to challenge 'while' loops

32 replies
Tux Racer
Joined: 2009-12-21,
User offline. Last seen 42 years 45 weeks ago.

Hello Scala Users,

In the scala book section 7.2 p 119 about the while loops one can find:
"In general, we recommend you challenge while loops in your code in the
same way you challenge vars".

I find it rather difficult to follow this rule when using Java APIs.
Example: I would like to port the java code presented
http://java.sun.com/docs/books/tutorial/i18n/text/word.html

I find it easy to translate the Java code into a scala java-like code:

-----------------------------
scala> import java.text.BreakIterator
import java.text.BreakIterator

scala> val text="""She stopped. She said, "Hello there," and then went
on."""
text: java.lang.String = She stopped. She said, "Hello there," and then
went on.

scala> import java.util.Locale
import java.util.Locale

scala> val currentLocale = new Locale ("en","US")
currentLocale: java.util.Locale = en_US

scala> val wordIterator = BreakIterator.getWordInstance(currentLocale)
wordIterator: java.text.BreakIterator = [checksum=0x5e8990c]

scala>
object extract{
def Words(target:String, wordIterator:BreakIterator) {

wordIterator.setText(target);
var start = wordIterator.first();
var end = wordIterator.next();

while (end != BreakIterator.DONE) {
var word = target.substring(start,end);
if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
}
start = end;
end = wordIterator.next();
}
}
}

scala> extract.Words(text,wordIterator)
She
stopped
She
said
Hello
there
and
then
went
on
-------------------------------------------------------------

This works OK, but how could I write a code more scalaic? (i.e remove
the while and var)

Thanks
Alex

Seth Tisue
Joined: 2008-12-16,
User offline. Last seen 34 weeks 3 days ago.
Re: not easy to challenge 'while' loops

>>>>> "TuX" == TuX RaceR writes:

TuX> var start = wordIterator.first();
TuX> var end = wordIterator.next();
TuX> while (end != BreakIterator.DONE) {

In Scala 2.8 I'd write this as:

Stream.continually(wordIterator.next())
.takeWhile(_ != BreakIterator.DONE)

or, if Stream gives you the willies, substitute Iterator.continually
for Stream.continually.

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: not easy to challenge 'while' loops
def words(target: String, wordIterator: BreakIterator) {
  wordIterator.setText(target)
  (
    Iterator.iterate(wordIterator.first)(_ => wordIterator.next)
    takeWhile (_ != BreakIterator.DONE)
    sliding 2
    map { case List(start, end) => text.substring(start, end) }
    filter (_(0).isLetterOrDigit)
  ) foreach println
}   Mind you, there may be a convertion from Java Iterator to Scala Iterator that precludes the need for the first line.

On Mon, Feb 8, 2010 at 4:56 PM, TuX RaceR <tuxracer69@gmail.com> wrote:
Hello Scala Users,

In the scala book section 7.2 p 119 about the while loops one can find:
"In general, we recommend you challenge while loops in your code in the same way you challenge vars".

I find it rather difficult to follow this rule when using Java APIs. Example: I would like to port the java code presented
http://java.sun.com/docs/books/tutorial/i18n/text/word.html

I find it easy to translate the Java code into a scala java-like code:


-----------------------------
scala> import java.text.BreakIterator
import java.text.BreakIterator

scala> val text="""She stopped. She said, "Hello there," and then went on."""
text: java.lang.String = She stopped. She said, "Hello there," and then went on.

scala> import java.util.Locale
import java.util.Locale

scala> val currentLocale = new Locale ("en","US")
currentLocale: java.util.Locale = en_US

scala> val wordIterator = BreakIterator.getWordInstance(currentLocale)
wordIterator: java.text.BreakIterator = [checksum=0x5e8990c]


scala>
object extract{
 def Words(target:String, wordIterator:BreakIterator) {
    wordIterator.setText(target);
  var start = wordIterator.first();
  var end = wordIterator.next();
    while (end != BreakIterator.DONE) {
    var word = target.substring(start,end);
    if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
    }
    start = end;
    end = wordIterator.next();
  }
 }
}


scala> extract.Words(text,wordIterator)
She
stopped
She
said
Hello
there
and
then
went
on
-------------------------------------------------------------


This works OK, but how could I write a code more scalaic? (i.e remove the while and var)

Thanks
Alex



--
Daniel C. Sobral

I travel to the future all the time.
extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: not easy to challenge 'while' loops

On Mon, Feb 08, 2010 at 01:12:00PM -0600, Seth Tisue wrote:
> TuX> var start = wordIterator.first();
> TuX> var end = wordIterator.next();
> TuX> while (end != BreakIterator.DONE) {
>
> In Scala 2.8 I'd write this as:
>
> Stream.continually(wordIterator.next())
> .takeWhile(_ != BreakIterator.DONE)

And since declaring two vars up front isn't exactly swinging for the
"scala-y fences" (you heathen seth) here's another approach to consider,
though this is not exactly how I'd write this.

def Words(target:String, wordIterator:BreakIterator) {
wordIterator setText target
Iterator.iterate((wordIterator.first, wordIterator.next)) {
case (_, BreakIterator.DONE) => return
case (start, end) =>
val word = target.substring(start, end)
if (Character isLetterOrDigit word(0))
println(word)

(end, wordIterator.next)
} toList
}

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: not easy to challenge 'while' loops
Just checked, BreakIterator is not an Iterator, so I think this is it.

On Mon, Feb 8, 2010 at 5:21 PM, Daniel Sobral <dcsobral@gmail.com> wrote:
def words(target: String, wordIterator: BreakIterator) {
  wordIterator.setText(target)
  (
    Iterator.iterate(wordIterator.first)(_ => wordIterator.next)
    takeWhile (_ != BreakIterator.DONE)
    sliding 2
    map { case List(start, end) => text.substring(start, end) }
    filter (_(0).isLetterOrDigit)
  ) foreach println
}   Mind you, there may be a convertion from Java Iterator to Scala Iterator that precludes the need for the first line.

On Mon, Feb 8, 2010 at 4:56 PM, TuX RaceR <tuxracer69@gmail.com> wrote:
Hello Scala Users,

In the scala book section 7.2 p 119 about the while loops one can find:
"In general, we recommend you challenge while loops in your code in the same way you challenge vars".

I find it rather difficult to follow this rule when using Java APIs. Example: I would like to port the java code presented
http://java.sun.com/docs/books/tutorial/i18n/text/word.html

I find it easy to translate the Java code into a scala java-like code:


-----------------------------
scala> import java.text.BreakIterator
import java.text.BreakIterator

scala> val text="""She stopped. She said, "Hello there," and then went on."""
text: java.lang.String = She stopped. She said, "Hello there," and then went on.

scala> import java.util.Locale
import java.util.Locale

scala> val currentLocale = new Locale ("en","US")
currentLocale: java.util.Locale = en_US

scala> val wordIterator = BreakIterator.getWordInstance(currentLocale)
wordIterator: java.text.BreakIterator = [checksum=0x5e8990c]


scala>
object extract{
 def Words(target:String, wordIterator:BreakIterator) {
    wordIterator.setText(target);
  var start = wordIterator.first();
  var end = wordIterator.next();
    while (end != BreakIterator.DONE) {
    var word = target.substring(start,end);
    if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
    }
    start = end;
    end = wordIterator.next();
  }
 }
}


scala> extract.Words(text,wordIterator)
She
stopped
She
said
Hello
there
and
then
went
on
-------------------------------------------------------------


This works OK, but how could I write a code more scalaic? (i.e remove the while and var)

Thanks
Alex



--
Daniel C. Sobral

I travel to the future all the time.



Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

Non-mutable generative approaches, i.e. applying transformations on data, are certainly more elegant and probably don't have a noticable impact for 90% of applications. However, there are some application niches, such as scientific computing, where this is not at all practical. I spend most of my time dealing with enormous timeseries, matrices, and high-dimensional data sets.
Code like:

new DenseMatrix (
for (row <- 0 until numRows ; col <- 0 until numCols) yield A(row,col) * B(col,row))

Might be a nice generative approach to multiplying two matrices, but is not practical because of:

- overhead of for comprehension
- overhead of projection

It would be nice to express as a stream of computations but the overhead dominates ....

On Feb 8, 2010, at 1:56 PM, TuX RaceR wrote:

> Hello Scala Users,
>
> In the scala book section 7.2 p 119 about the while loops one can find:
> "In general, we recommend you challenge while loops in your code in the same way you challenge vars".
>
> I find it rather difficult to follow this rule when using Java APIs. Example: I would like to port the java code presented
> http://java.sun.com/docs/books/tutorial/i18n/text/word.html
>
> I find it easy to translate the Java code into a scala java-like code:
>

Tony Morris 2
Joined: 2009-03-20,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

Side-effects are pervasive. You must use a while loop or use something
that does, implied by the use of your iterator (which is side-effecting).

TuX RaceR wrote:
> Hello Scala Users,
>
> In the scala book section 7.2 p 119 about the while loops one can find:
> "In general, we recommend you challenge while loops in your code in
> the same way you challenge vars".
>
> I find it rather difficult to follow this rule when using Java APIs.
> Example: I would like to port the java code presented
> http://java.sun.com/docs/books/tutorial/i18n/text/word.html
>
> I find it easy to translate the Java code into a scala java-like code:
>
>
> -----------------------------
> scala> import java.text.BreakIterator
> import java.text.BreakIterator
>
> scala> val text="""She stopped. She said, "Hello there," and then went
> on."""
> text: java.lang.String = She stopped. She said, "Hello there," and
> then went on.
>
> scala> import java.util.Locale
> import java.util.Locale
>
> scala> val currentLocale = new Locale ("en","US")
> currentLocale: java.util.Locale = en_US
>
> scala> val wordIterator = BreakIterator.getWordInstance(currentLocale)
> wordIterator: java.text.BreakIterator = [checksum=0x5e8990c]
>
>
> scala>
> object extract{
> def Words(target:String, wordIterator:BreakIterator) {
> wordIterator.setText(target);
> var start = wordIterator.first();
> var end = wordIterator.next();
> while (end != BreakIterator.DONE) {
> var word = target.substring(start,end);
> if (Character.isLetterOrDigit(word.charAt(0))) {
> System.out.println(word);
> }
> start = end;
> end = wordIterator.next();
> }
> }
> }
>
>
> scala> extract.Words(text,wordIterator)
> She
> stopped
> She
> said
> Hello
> there
> and
> then
> went
> on
> -------------------------------------------------------------
>
>
> This works OK, but how could I write a code more scalaic? (i.e remove
> the while and var)
>
> Thanks
> Alex
>

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: not easy to challenge 'while' loops
The difficulty with this code is that Java has a lot of non-Iterator iterators.  BreakIterator is a prime example of that.  My preferred method in these cases is to first wrap the offending Java class into a Scala Iterator.  Then, I solve the problem I'm faced with using the Scala iterator.

Now, this doesn't actually make your solution any shorter or prettier, but it *does* remove the offending ugliness into a well-encapsulated module.

object Example {
  import java.text.BreakIterator
  val text="""She stopped. She said, "Hello there," and then went on."""
  import java.util.Locale
  val currentLocale = new Locale ("en","US")
  val wordIterator = BreakIterator.getWordInstance(currentLocale)
 
  class BreakIt(target: String, bi: BreakIterator) extends Iterator[String] {
    bi.setText(target)
    private var start = bi.first
    private var end = bi.next
    def hasNext = end != BreakIterator.DONE
    def next = {
      val result = target.substring(start,end)
      start = end
      end = bi.next
      result
    }
  }
 
  object Extract {
    def words(target: String, wordIterator: BreakIterator) {
      (new BreakIt(target,wordIterator)).foreach(word => {
        if (word(0) isLetterOrDigit) println(word)
      })
    }
  }

  def test = Extract.words(text,wordIterator)
}

scala> Example.test
She
stopped
She
said
Hello
there
and
then
went
on

Now the business of extracting the words is actually quite simple: a foreach that tests if the first character is a letter or digit.

The advantage is that if we want to do something else, we then can re-use the functionality:

scala> val ba = (new Example.BreakIt(Example.text,Example.wordIterator))
ba: Example.BreakIt = non-empty iterator

scala> val bb = ba filter (_(0) isLetterOrDigit)
bb: Iterator[String] = non-empty iterator

scala> val bc = bb map (word => if (word.length>4) "blah" else word)
bc: Iterator[java.lang.String] = non-empty iterator

scala> bc foreach println
She
blah
She
said
blah
blah
and
then
went
on

If you are pretty sure you never want to come near java.text.BreakIterators again, then Daniel's solution looks good to me (though it is 2.8-specific; you could use his method instead for the core of my BreakIt class (stopping after the map), but I wanted something that worked under 2.7 also).

  --Rex

On Mon, Feb 8, 2010 at 1:56 PM, TuX RaceR <tuxracer69@gmail.com> wrote:
Hello Scala Users,

In the scala book section 7.2 p 119 about the while loops one can find:
"In general, we recommend you challenge while loops in your code in the same way you challenge vars".

I find it rather difficult to follow this rule when using Java APIs. Example: I would like to port the java code presented
http://java.sun.com/docs/books/tutorial/i18n/text/word.html

I find it easy to translate the Java code into a scala java-like code:


-----------------------------
scala> import java.text.BreakIterator
import java.text.BreakIterator

scala> val text="""She stopped. She said, "Hello there," and then went on."""
text: java.lang.String = She stopped. She said, "Hello there," and then went on.

scala> import java.util.Locale
import java.util.Locale

scala> val currentLocale = new Locale ("en","US")
currentLocale: java.util.Locale = en_US

scala> val wordIterator = BreakIterator.getWordInstance(currentLocale)
wordIterator: java.text.BreakIterator = [checksum=0x5e8990c]


scala>
object extract{
 def Words(target:String, wordIterator:BreakIterator) {
    wordIterator.setText(target);
  var start = wordIterator.first();
  var end = wordIterator.next();
    while (end != BreakIterator.DONE) {
    var word = target.substring(start,end);
    if (Character.isLetterOrDigit(word.charAt(0))) {
System.out.println(word);
    }
    start = end;
    end = wordIterator.next();
  }
 }
}


scala> extract.Words(text,wordIterator)
She
stopped
She
said
Hello
there
and
then
went
on
-------------------------------------------------------------


This works OK, but how could I write a code more scalaic? (i.e remove the while and var)

Thanks
Alex

Ben Hutchison 3
Joined: 2009-11-02,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

On Tue, Feb 9, 2010 at 7:01 AM, Jonathan Shore wrote:
> Non-mutable generative approaches, i.e. applying transformations on data, are certainly more elegant and >probably don't have a noticable impact for 90% of applications.   However, there are some application niches, such >as scientific computing, where this is not at all practical. I spend most of my time dealing with enormous >timeseries, matrices, and high-dimensional data sets.

Maybe Scala is not the right language for your needs then. It
emphasizes abstraction, composition, type-safety and a functional
style, not performance. Its well recognized that there is a tradeoff
involved. Advances in JVM and Scala compiler technology are
consistently narrowing the penalty of using well abstracted code, but
they do not eliminate it entirely.

If you really need highest performance, why are you even using the
JVM, which doesn't have user values types, or unchecked array access?

If, like me and much of the Scala community, you find working with
well-abstracted, composable code a pleasant & productive experience,
realise that it also suggests a choice by you, as a user, to use
relinquish aspirations to achieving the same performance as more
"bare-metal" languages.

-Ben

> Code like:
>
> new DenseMatrix (
>        for (row <- 0 until numRows ; col <- 0 until numCols)  yield A(row,col) * B(col,row))
>
> Might be a nice generative approach to multiplying two matrices, but is not practical because of:
>
>        - overhead of for comprehension
>        - overhead of projection
>
> It would be nice to express as a stream of computations but the overhead dominates ....
>
> On Feb 8, 2010, at 1:56 PM, TuX RaceR wrote:
>
>> Hello Scala Users,
>>
>> In the scala book section 7.2 p 119 about the while loops one can find:
>> "In general, we recommend you challenge while loops in your code in the same way you challenge vars".
>>
>> I find it rather difficult to follow this rule when using Java APIs. Example: I would like to port the java code presented
>> http://java.sun.com/docs/books/tutorial/i18n/text/word.html
>>
>> I find it easy to translate the Java code into a scala java-like code:
>>
>
>

Raoul Duke
Joined: 2009-01-05,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

On Mon, Feb 8, 2010 at 4:15 PM, Ben Hutchison wrote:
> If, like me and much of the Scala community, you find working with
> well-abstracted, composable code a pleasant & productive experience,
> realise that it also suggests a choice by you, as a user, to use
> relinquish aspirations to achieving the same performance as more
> "bare-metal" languages.

i wonder how well Fortress will (some day) do?

Ben Hutchison 3
Joined: 2009-11-02,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

On Tue, Feb 9, 2010 at 11:19 AM, Raoul Duke wrote:
> On Mon, Feb 8, 2010 at 4:15 PM, Ben Hutchison wrote:
>> If, like me and much of the Scala community, you find working with
>> well-abstracted, composable code a pleasant & productive experience,
>> realise that it also suggests a choice by you, as a user, to use
>> relinquish aspirations to achieving the same performance as more
>> "bare-metal" languages.
>
> i wonder how well Fortress will (some day) do?
>

It sounds like they're having a hard time making it run well on the JVM:

http://www.infoq.com/presentations/chase-fortress

BTW some very interesting ideas in Fortress, especially implicit
parallel task-dispatch onto work-stealing queues. See also Guy
Steele's excellent ICFP talk (I like his "Conc Lists"):

http://www.vimeo.com/6624203

-Ben

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: not easy to challenge 'while' loops
On Mon, Feb 8, 2010 at 7:15 PM, Ben Hutchison <brhutchison@gmail.com> wrote:
On Tue, Feb 9, 2010 at 7:01 AM, Jonathan Shore <jonathan.shore@gmail.com> wrote:
> Non-mutable generative approaches, i.e. applying transformations on data, are certainly more elegant and >probably don't have a noticable impact for 90% of applications.   However, there are some application niches, such >as scientific computing, where this is not at all practical.  I spend most of my time dealing with enormous >timeseries, matrices, and high-dimensional data sets.

Maybe Scala is not the right language for your needs then. It
emphasizes abstraction, composition, type-safety and a functional
style, not performance. Its well recognized that there is a tradeoff
involved. Advances in JVM and Scala compiler technology are
consistently narrowing the penalty of using well abstracted code, but
they do not eliminate it entirely.

If you really need highest performance, why are you even using the
JVM, which doesn't have user values types, or unchecked array access?

I can't speak for Jonathan, but I use it because it gives me the answer faster.  That's what I care about when doing scientific computations: time from _now_, with a pile of data and an idea of what to do, to having the answer in hand.

Scala, if used carefully, with while loops and primitive arrays and automatic code generation, is within a factor of 2 of C++ at runtime for almost anything, and often is within 10%.  And once I have a few time-critical operations nicely packaged, I can use the high-level well-abstracted elegantly composable features to dramatically reduce my coding time.

If I used C++, I'd spend additional weeks writing and debugging the code.  If I used Matlab, I'd spend additional weeks waiting for it to run.  When the balance shifts too much, I _do_ use C++ (e.g. real time image processing) or Matlab (e.g. prototyping signal analysis methods).

  --Rex

Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

On Feb 8, 2010, at 7:15 PM, Ben Hutchison wrote:

> On Tue, Feb 9, 2010 at 7:01 AM, Jonathan Shore wrote:
>> Non-mutable generative approaches, i.e. applying transformations on data, are certainly more elegant and >probably don't have a noticable impact for 90% of applications. However, there are some application niches, such >as scientific computing, where this is not at all practical. I spend most of my time dealing with enormous >timeseries, matrices, and high-dimensional data sets.
>
> Maybe Scala is not the right language for your needs then. It
> emphasizes abstraction, composition, type-safety and a functional
> style, not performance. Its well recognized that there is a tradeoff
> involved. Advances in JVM and Scala compiler technology are
> consistently narrowing the penalty of using well abstracted code, but
> they do not eliminate it entirely.

I would be happy with Java-level performance. Scala can only match java performance for tight numerical work if one avoids functional abstractions. My point, posted in previous conversations, has been that simple cases can be optimised away by the compiler. At present this does not seem to be a concern on the Scala team.

I would prefer to use the JVM *and* functional abstractions. F# manages to allow me to write nearly pure functional code and gives excellent performance in many cases. Unfortunately it is locked onto the MS platform.

>
> If you really need highest performance, why are you even using the
> JVM, which doesn't have user values types, or unchecked array access?
>

This is no longer true. The JVM has optimisations to avoid array access checking (in cases where it can be reasonably determined). I find java code to be as fast as C++ and sometimes faster (if you know what you are doing).

> If, like me and much of the Scala community, you find working with
> well-abstracted, composable code a pleasant & productive experience,
> realise that it also suggests a choice by you, as a user, to use
> relinquish aspirations to achieving the same performance as more
> "bare-metal" languages.
>

There is no need to make a dramatic compromise and use something like C++. Functional and performance do not have to be distinct. There are definitely enhancements that can be done to improve Scala performance substantially. F# is out there as well and looking to compete in the scientific community.

Tux Racer
Joined: 2009-12-21,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

Thank you very much all for your interesting answers.
In fact I was looking for a wrapper solution as the one described by Rex.
The example was a bit annoying because the text was English, but it
becomes more exciting with Japanese (where word boundaries are not
defined by spaces). Example below with a parametrized class:

class Example(text:String,lang:String,country:String) {
import java.text.BreakIterator
import java.util.Locale
val currentLocale = new Locale (lang,country)
val wordIterator = BreakIterator.getWordInstance(currentLocale)

class BreakIt(target: String, bi: BreakIterator) extends
Iterator[String] {
bi.setText(target)
private var start = bi.first
private var end = bi.next
def hasNext = end != BreakIterator.DONE
def next = {
val result = target.substring(start,end)
start = end
end = bi.next
result
}
}

object Extract {
def words(target: String, wordIterator: BreakIterator) {
(new BreakIt(target,wordIterator)).foreach(word => {
if (word(0) isLetterOrDigit) println(word)
})
}
}

def test = Extract.words(text,wordIterator)
}

*** ENGLISH ***

scala> val text="""She stopped. She said, "Hello there," and then went
on."""
text: java.lang.String = She stopped. She said, "Hello there," and then
went on.

scala> val exEnglish=new Example(text,"en","US")
exEnglish: Example = Example@10469e8

scala> exEnglish.test
She
stopped
She
said
Hello
there
and
then
went
on

*** JAPANESE ***
scala> val jpText="イチゴを食べます。"
jpText: java.lang.String = イチゴを食べます。

scala> val exJapanese=new Example(jpText,"jp","JP")
exJapanese: Example = Example@45484a

scala> exJapanese.test
イチゴ


べます

Cheers
TuX

Rex Kerr wrote:
> The difficulty with this code is that Java has a lot of non-Iterator
> iterators. BreakIterator is a prime example of that. My preferred
> method in these cases is to first wrap the offending Java class into a
> Scala Iterator. Then, I solve the problem I'm faced with using the
> Scala iterator.
>
> Now, this doesn't actually make your solution any shorter or prettier,
> but it *does* remove the offending ugliness into a well-encapsulated
> module.
>
> object Example {
> import java.text.BreakIterator
> val text="""She stopped. She said, "Hello there," and then went on."""
> import java.util.Locale
> val currentLocale = new Locale ("en","US")
> val wordIterator = BreakIterator.getWordInstance(currentLocale)
>
> class BreakIt(target: String, bi: BreakIterator) extends
> Iterator[String] {
> bi.setText(target)
> private var start = bi.first
> private var end = bi.next
> def hasNext = end != BreakIterator.DONE
> def next = {
> val result = target.substring(start,end)
> start = end
> end = bi.next
> result
> }
> }
>
> object Extract {
> def words(target: String, wordIterator: BreakIterator) {
> (new BreakIt(target,wordIterator)).foreach(word => {
> if (word(0) isLetterOrDigit) println(word)
> })
> }
> }
>
> def test = Extract.words(text,wordIterator)
> }
>
> scala> Example.test
> She
> stopped
> She
> said
> Hello
> there
> and
> then
> went
> on
>
> Now the business of extracting the words is actually quite simple: a
> foreach that tests if the first character is a letter or digit.
>
> The advantage is that if we want to do something else, we then can
> re-use the functionality:
>
> scala> val ba = (new Example.BreakIt(Example.text,Example.wordIterator))
> ba: Example.BreakIt = non-empty iterator
>
> scala> val bb = ba filter (_(0) isLetterOrDigit)
> bb: Iterator[String] = non-empty iterator
>
> scala> val bc = bb map (word => if (word.length>4) "blah" else word)
> bc: Iterator[java.lang.String] = non-empty iterator
>
> scala> bc foreach println
> She
> blah
> She
> said
> blah
> blah
> and
> then
> went
> on
>
> If you are pretty sure you never want to come near
> java.text.BreakIterators again, then Daniel's solution looks good to
> me (though it is 2.8-specific; you could use his method instead for
> the core of my BreakIt class (stopping after the map), but I wanted
> something that worked under 2.7 also).
>
> --Rex
>
> On Mon, Feb 8, 2010 at 1:56 PM, TuX RaceR > wrote:
>
> Hello Scala Users,
>
> In the scala book section 7.2 p 119 about the while loops one can
> find:
> "In general, we recommend you challenge while loops in your code
> in the same way you challenge vars".
>
> I find it rather difficult to follow this rule when using Java
> APIs. Example: I would like to port the java code presented
> http://java.sun.com/docs/books/tutorial/i18n/text/word.html
>
> I find it easy to translate the Java code into a scala java-like code:
>
>
> -----------------------------
> scala> import java.text.BreakIterator
> import java.text.BreakIterator
>
> scala> val text="""She stopped. She said, "Hello there," and then
> went on."""
> text: java.lang.String = She stopped. She said, "Hello there," and
> then went on.
>
> scala> import java.util.Locale
> import java.util.Locale
>
> scala> val currentLocale = new Locale ("en","US")
> currentLocale: java.util.Locale = en_US
>
> scala> val wordIterator = BreakIterator.getWordInstance(currentLocale)
> wordIterator: java.text.BreakIterator = [checksum=0x5e8990c]
>
>
> scala>
> object extract{
> def Words(target:String, wordIterator:BreakIterator) {
> wordIterator.setText(target);
> var start = wordIterator.first();
> var end = wordIterator.next();
> while (end != BreakIterator.DONE) {
> var word = target.substring(start,end);
> if (Character.isLetterOrDigit(word.charAt(0))) {
> System.out.println(word);
> }
> start = end;
> end = wordIterator.next();
> }
> }
> }
>
>
> scala> extract.Words(text,wordIterator)
> She
> stopped
> She
> said
> Hello
> there
> and
> then
> went
> on
> -------------------------------------------------------------
>
>
> This works OK, but how could I write a code more scalaic? (i.e
> remove the while and var)
>
> Thanks
> Alex
>
>

Russel Winder
Joined: 2009-02-13,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

On Mon, 2010-02-08 at 21:28 -0500, Jonathan Shore wrote:
[ . . . ]
>
> I would prefer to use the JVM *and* functional abstractions. F#
> manages to allow me to write nearly pure functional code and gives
> excellent performance in many cases. Unfortunately it is locked onto
> the MS platform.
>
F# is I believe based on OCaml so you could always try OCaml itself
which works on Linux, Solaris, Mac OS X, etc. as well as Windows.

[ . . . ]
>
> There is no need to make a dramatic compromise and use something like
> C++. Functional and performance do not have to be distinct. There
> are definitely enhancements that can be done to improve Scala
> performance substantially. F# is out there as well and looking to
> compete in the scientific community.

Assuming you are prepared to be locked into Windows as the only platform
of use.

Personally I'd prefer Scala to be more computationally efficient and use
that. An alternative a lot of people are using though is Python/C/C++.

Stefan Langer
Joined: 2009-10-23,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops
You could always write the performance critical parts as Java and then make them easily accessible from Scala behind an easy to use nice interface this way you get the best of both worlds.

-Stefan

2010/2/9 Russel Winder <russel.winder@concertant.com>
On Mon, 2010-02-08 at 21:28 -0500, Jonathan Shore wrote:
[ . . . ]
>
> I would prefer to use the JVM *and* functional abstractions.    F#
> manages to allow me to write nearly pure functional code and gives
> excellent performance in many cases.   Unfortunately it is locked onto
> the MS platform.
>
F# is I believe based on OCaml so you could always try OCaml itself
which works on Linux, Solaris, Mac OS X, etc. as well as Windows.

[ . . . ]
>
> There is no need to make a dramatic compromise and use something like
> C++.   Functional and performance do not have to be distinct.   There
> are definitely enhancements that can be done to improve Scala
> performance substantially.    F# is out there as well and looking to
> compete in the scientific community.

Assuming you are prepared to be locked into Windows as the only platform
of use.

Personally I'd prefer Scala to be more computationally efficient and use
that.  An alternative a lot of people are using though is Python/C/C++.


--
Russel.
=============================================================================
Dr Russel Winder      Partner
                                           xmpp: russel@russel.org.uk
Concertant LLP        t: +44 20 7585 2200, +44 20 7193 9203
41 Buckmaster Road,   f: +44 8700 516 084   voip: 3Arussel [dot] winder [at] ekiga [dot] net" rel="nofollow">sip:russel.winder@ekiga.net
London SW11 1EN, UK   m: +44 7770 465 077   skype: russel_winder

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: not easy to challenge 'while' loops
That wrapper is pretty much what this line of my solution does:
 Iterator.iterate(wordIterator.first)(_ => wordIterator.next) takeWhile (_ != BreakIterator.DONE)
It returns an Iterator from a wordIterator.

On Tue, Feb 9, 2010 at 6:57 AM, TuX RaceR <tuxracer69@gmail.com> wrote:
Thank you very much all for your interesting answers.
In fact I was looking for a wrapper solution as the one described by Rex.
The example was a bit annoying because the text was English, but it becomes more exciting with Japanese (where word boundaries are not  defined by spaces). Example below with a parametrized class:


class Example(text:String,lang:String,country:String) {
 import java.text.BreakIterator
 import java.util.Locale
 val currentLocale = new Locale (lang,country)
 val wordIterator = BreakIterator.getWordInstance(currentLocale)

 class BreakIt(target: String, bi: BreakIterator) extends Iterator[String] {
  bi.setText(target)
  private var start = bi.first
  private var end = bi.next
  def hasNext = end != BreakIterator.DONE
  def next = {
    val result = target.substring(start,end)
    start = end
    end = bi.next
    result
  }
 }

 object Extract {
  def words(target: String, wordIterator: BreakIterator) {
    (new BreakIt(target,wordIterator)).foreach(word => {
      if (word(0) isLetterOrDigit) println(word)
    })
  }
 }

 def test = Extract.words(text,wordIterator)
}


*** ENGLISH ***

scala> val text="""She stopped. She said, "Hello there," and then went on."""
text: java.lang.String = She stopped. She said, "Hello there," and then went on.

scala> val exEnglish=new Example(text,"en","US")
exEnglish: Example = Example@10469e8

scala> exEnglish.test
She
stopped
She
said
Hello
there
and
then
went
on

*** JAPANESE ***
scala> val jpText="イチゴを食べます。"
jpText: java.lang.String = イチゴを食べます。

scala> val exJapanese=new Example(jpText,"jp","JP")
exJapanese: Example = Example@45484a

scala> exJapanese.test
イチゴ


べます

Cheers
TuX

Rex Kerr wrote:
The difficulty with this code is that Java has a lot of non-Iterator iterators.  BreakIterator is a prime example of that.  My preferred method in these cases is to first wrap the offending Java class into a Scala Iterator.  Then, I solve the problem I'm faced with using the Scala iterator.

Now, this doesn't actually make your solution any shorter or prettier, but it *does* remove the offending ugliness into a well-encapsulated module.

object Example {
 import java.text.BreakIterator
 val text="""She stopped. She said, "Hello there," and then went on."""
 import java.util.Locale
 val currentLocale = new Locale ("en","US")
 val wordIterator = BreakIterator.getWordInstance(currentLocale)
   class BreakIt(target: String, bi: BreakIterator) extends Iterator[String] {
   bi.setText(target)
   private var start = bi.first
   private var end = bi.next
   def hasNext = end != BreakIterator.DONE
   def next = {
     val result = target.substring(start,end)
     start = end
     end = bi.next
     result
   }
 }
   object Extract {
   def words(target: String, wordIterator: BreakIterator) {
     (new BreakIt(target,wordIterator)).foreach(word => {
       if (word(0) isLetterOrDigit) println(word)
     })
   }
 }

 def test = Extract.words(text,wordIterator)
}

scala> Example.test
She
stopped
She
said
Hello
there
and
then
went
on

Now the business of extracting the words is actually quite simple: a foreach that tests if the first character is a letter or digit.

The advantage is that if we want to do something else, we then can re-use the functionality:

scala> val ba = (new Example.BreakIt(Example.text,Example.wordIterator))
ba: Example.BreakIt = non-empty iterator

scala> val bb = ba filter (_(0) isLetterOrDigit)
bb: Iterator[String] = non-empty iterator

scala> val bc = bb map (word => if (word.length>4) "blah" else word)
bc: Iterator[java.lang.String] = non-empty iterator

scala> bc foreach println
She
blah
She
said
blah
blah
and
then
went
on

If you are pretty sure you never want to come near java.text.BreakIterators again, then Daniel's solution looks good to me (though it is 2.8-specific; you could use his method instead for the core of my BreakIt class (stopping after the map), but I wanted something that worked under 2.7 also).

 --Rex

On Mon, Feb 8, 2010 at 1:56 PM, TuX RaceR <tuxracer69@gmail.com <mailto:tuxracer69@gmail.com>> wrote:

   Hello Scala Users,

   In the scala book section 7.2 p 119 about the while loops one can
   find:
   "In general, we recommend you challenge while loops in your code
   in the same way you challenge vars".

   I find it rather difficult to follow this rule when using Java
   APIs. Example: I would like to port the java code presented
   http://java.sun.com/docs/books/tutorial/i18n/text/word.html

   I find it easy to translate the Java code into a scala java-like code:


   -----------------------------
   scala> import java.text.BreakIterator
   import java.text.BreakIterator

   scala> val text="""She stopped. She said, "Hello there," and then
   went on."""
   text: java.lang.String = She stopped. She said, "Hello there," and
   then went on.

   scala> import java.util.Locale
   import java.util.Locale

   scala> val currentLocale = new Locale ("en","US")
   currentLocale: java.util.Locale = en_US

   scala> val wordIterator = BreakIterator.getWordInstance(currentLocale)
   wordIterator: java.text.BreakIterator = [checksum=0x5e8990c]


   scala>
   object extract{
    def Words(target:String, wordIterator:BreakIterator) {
       wordIterator.setText(target);
     var start = wordIterator.first();
     var end = wordIterator.next();
       while (end != BreakIterator.DONE) {
       var word = target.substring(start,end);
       if (Character.isLetterOrDigit(word.charAt(0))) {
   System.out.println(word);
       }
       start = end;
       end = wordIterator.next();
     }
    }
   }


   scala> extract.Words(text,wordIterator)
   She
   stopped
   She
   said
   Hello
   there
   and
   then
   went
   on
   -------------------------------------------------------------


   This works OK, but how could I write a code more scalaic? (i.e
   remove the while and var)

   Thanks
   Alex






--
Daniel C. Sobral

I travel to the future all the time.
ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: not easy to challenge 'while' loops

On Tue, Feb 9, 2010 at 9:31 AM, Stefan Langer
wrote:
> You could always write the performance critical parts as Java and then make
> them easily accessible from Scala behind an easy to use nice interface this
> way you get the best of both worlds.

The only advantage of this is that you can use break/continue and for
loops (Java for loops). Personally, I prefer to write Scala using
while loops for the performance critical parts instead (for now).

Best,
Ismael

Stefan Langer
Joined: 2009-10-23,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops
Just showing options.

2010/2/9 Ismael Juma <mlists@juma.me.uk>
On Tue, Feb 9, 2010 at 9:31 AM, Stefan Langer
<mailtolanger@googlemail.com> wrote:
> You could always write the performance critical parts as Java and then make
> them easily accessible from Scala behind an easy to use nice interface this
> way you get the best of both worlds.

The only advantage of this is that you can use break/continue and for
loops (Java for loops). Personally, I prefer to write Scala using
while loops for the performance critical parts instead (for now).

Best,
Ismael

ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: not easy to challenge 'while' loops
The first line of your solution is an iterator over boundaries, not an iterator over words like my wrapper is.  Going from word boundaries to words takes a fair number of steps even in functional style--though I agree that it's still shorter.

(Also, Iterator.iterate and sliding are 2.8-specific.)

  ---Rex

On Tue, Feb 9, 2010 at 5:39 AM, Daniel Sobral <dcsobral@gmail.com> wrote:
That wrapper is pretty much what this line of my solution does:
 Iterator.iterate(wordIterator.first)(_ => wordIterator.next) takeWhile (_ != BreakIterator.DONE)
It returns an Iterator from a wordIterator.

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: not easy to challenge 'while' loops
Iterator.iterate(wordIterator.first)(_ => wordIterator.next)  takeWhile (_ != BreakIterator.DONE) sliding 2 map { case List(start, end) => text.substring(start, end) }
Fair number is exagerated. :-) It was actually a one-liner as I wrote it. I converted it to multiple lines just to make what I was doing clearer. The only two lines beyond that were the filter and the print.   But that wasn't quite my point. My point is that Iterator.iterate offers an easy way to convert the BreakIterator into a Scala Iterator.   And, yes, both iterate and sliding are present only on 2.8. All the more reason to use it. :-)   On Tue, Feb 9, 2010 at 2:06 PM, Rex Kerr <ichoran@gmail.com> wrote:
The first line of your solution is an iterator over boundaries, not an iterator over words like my wrapper is.  Going from word boundaries to words takes a fair number of steps even in functional style--though I agree that it's still shorter.

(Also, Iterator.iterate and sliding are 2.8-specific.)

  ---Rex

On Tue, Feb 9, 2010 at 5:39 AM, Daniel Sobral <dcsobral@gmail.com> wrote:
That wrapper is pretty much what this line of my solution does:
 Iterator.iterate(wordIterator.first)(_ => wordIterator.next) takeWhile (_ != BreakIterator.DONE)
It returns an Iterator from a wordIterator.




--
Daniel C. Sobral

I travel to the future all the time.
ichoran
Joined: 2009-08-14,
User offline. Last seen 2 years 3 weeks ago.
Re: not easy to challenge 'while' loops
On Tue, Feb 9, 2010 at 11:57 AM, Daniel Sobral <dcsobral@gmail.com> wrote:
Iterator.iterate(wordIterator.first)(_ => wordIterator.next)  takeWhile (_ != BreakIterator.DONE) sliding 2 map { case List(start, end) => text.substring(start, end) }
Fair number is exagerated. :-) It was actually a one-liner as I wrote it. I converted it to multiple lines just to make what I was doing clearer. The only two lines beyond that were the filter and the print.

Well, "fair number" is "twice as much".  If you squash everything onto one line and count characters, the direct definition of iterator is only about 30% longer.  (167 vs. 214.)
 
 But that wasn't quite my point. My point is that Iterator.iterate offers an easy way to convert the BreakIterator into a Scala Iterator.

Indeed.
 
 And, yes, both iterate and sliding are present only on 2.8. All the more reason to use it. :-)

Absolutely.  Since only the beta is out, though, I tend to assume people are using 2.7 unless they explicitly mention 2.8 (or use a 2.8-specific feature).

  --Rex
 
Mohamed Bana 2
Joined: 2009-10-21,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops
From my understanding, most of the problems are because of the JVM.  If someone were to go and, say, implement value types, tuples, tail call optimization and non-refied generics some of the problems will disappear.  To my knowledge, .NET already has native support for these.

—Mohamed
Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

On Feb 9, 2010, at 4:31 PM, Mohamed Bana wrote:

> From my understanding, most of the problems are because of the JVM. If someone were to go and, say, implement value types, tuples, tail call optimization and non-refied generics some of the problems will disappear. To my knowledge, .NET already has native support for these.
>
> —Mohamed

It is a shame isn't it. Sun did very little in terms of innovating Java or the JVM in the years before .NET (well aside from major JVM performance enhancements).

The .NET VM and the level of innovation going on in that community is compelling. Big problem is cross-platform. Mono is not there yet performance-wise ...

Jonathan

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: not easy to challenge 'while' loops

On Tue, 2010-02-09 at 16:58 -0500, Jonathan Shore wrote:
> On Feb 9, 2010, at 4:31 PM, Mohamed Bana wrote:
>
> > From my understanding, most of the problems are because of the JVM. If someone were to go and, say, implement value types, tuples, tail call optimization and non-refied generics some of the problems will disappear. To my knowledge, .NET already has native support for these.
> >
> > —Mohamed
>
>
> It is a shame isn't it. Sun did very little in terms of innovating Java or the JVM in the years before
> .NET (well aside from major JVM performance enhancements).

Well, this is not a small aside. :)

> The .NET VM and the level of innovation going on in that community is compelling. Big problem is cross-platform. Mono is not there yet performance-wise ...

>From what I've read the .NET JIT is a lot less sophisticated, so you'd
probably pay a price for code with a higher level of abstraction there
too. Just in a different way.

Best,
Ismael

Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

On Feb 9, 2010, at 6:19 PM, Ismael Juma wrote:
>> It is a shame isn't it. Sun did very little in terms of innovating Java or the JVM in the years before
>> .NET (well aside from major JVM performance enhancements).
>
> Well, this is not a small aside. :)
>

Sure, but I meant in terms of evolution of features. Java is tired and the JVM, as I see it is in jeopardy. It is still evolving too slowly in comparison to .NET.

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: not easy to challenge 'while' loops

On Tue, 2010-02-09 at 19:23 -0500, Jonathan Shore wrote:
> Sure, but I meant in terms of evolution of features. Java is tired
> and the JVM, as I see it is in jeopardy. It is still evolving too
> slowly in comparison to .NET.

I don't see it in that way. Java is certainly evolving very slowly, but
the JVM is constantly being improved. Things like compressed references
have a huge effect requiring no changes from the programmer. Escape
analysis, scalar replacement and the Garbage First GC are very
promising. Not to mention invokedynamic, method handles, etc. And there
are a bunch of JRockit features that are also very nice that will
possibly make their way to HotSpot now (particularly thread-local GC).

Sure, I'd like it to be faster so that we get the full MLVM list[1]
quickly, but these things take time and I don't see other runtimes
evolving faster.

Best,
Ismael

[1] http://openjdk.java.net/projects/mlvm/subprojects.html

Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

Perhaps this discussion belongs elsewhere, nevertheless is very interesting. I have to give the JVM credit for being at the top of the pack in terms of performance. The level of optimisations the JVM is capable of is quite astounding. The CLR does not appear to have reached the level of sophistication present in the JVM as of yet.

My problem is outside of that with the pace of JVM innovation and Java innovation (the later perhaps a moot point given other JVM targeting languages). Surely Java and the JVM had compatibility issues to deal with. However that has not stopped C# or the CLR from evolving at a much faster pace.

Just in terms of fundamentals:

- C# generics vs Java generics (C# wins here)
- value types (C# wins here, not present in Java, a big mistake IMO)
- operator overloading, operators, or something like the scala approach (not present in Java)
- properties syntax (C# wins over getters / setters approach)
- reflection facilities (CLR's are more powerful)
- native interface (big win for the CLR, JNI poor)
- functional constructs (CLR ahead, JVM proposals still to be implemented)

My impression is that the JVM has just started really to evolve in a way that is less java-centric. The .NET CLR was designed as a multi-language environment and the designers have not hesitated in fitting the CLR to be hospitable to radically different language environments.

Of course I am speaking as a 3rd party looking on.

Jonathan

On Feb 9, 2010, at 7:33 PM, Ismael Juma wrote:

> On Tue, 2010-02-09 at 19:23 -0500, Jonathan Shore wrote:
>> Sure, but I meant in terms of evolution of features. Java is tired
>> and the JVM, as I see it is in jeopardy. It is still evolving too
>> slowly in comparison to .NET.
>
> I don't see it in that way. Java is certainly evolving very slowly, but
> the JVM is constantly being improved. Things like compressed references
> have a huge effect requiring no changes from the programmer. Escape
> analysis, scalar replacement and the Garbage First GC are very
> promising. Not to mention invokedynamic, method handles, etc. And there
> are a bunch of JRockit features that are also very nice that will
> possibly make their way to HotSpot now (particularly thread-local GC).
>
> Sure, I'd like it to be faster so that we get the full MLVM list[1]
> quickly, but these things take time and I don't see other runtimes
> evolving faster.
>
> Best,
> Ismael
>
> [1] http://openjdk.java.net/projects/mlvm/subprojects.html
>

ijuma
Joined: 2008-08-20,
User offline. Last seen 22 weeks 2 days ago.
Re: not easy to challenge 'while' loops

On Tue, 2010-02-09 at 20:02 -0500, Jonathan Shore wrote:
> Perhaps this discussion belongs elsewhere, nevertheless is very interesting. I have to give the JVM credit for being at the top of the pack in terms of performance. The level of optimisations the JVM is capable of is quite astounding. The CLR does not appear to have reached the level of sophistication present in the JVM as of yet.
>
> My problem is outside of that with the pace of JVM innovation and Java innovation (the later perhaps a moot point given other JVM targeting languages). Surely Java and the JVM had compatibility issues to deal with. However that has not stopped C# or the CLR from evolving at a much faster pace.

Again, are we talking about Java or the JVM? I agree that Java has been
a slow mover, but it seems to me that you're conflating both. To make it
clear, I care less about Java, the language, since I use Scala.

> - C# generics vs Java generics (C# wins here)

Even though .NET has reified generics, Scala still uses erasure there as
Sala generics support certain features that made it difficult to map
them to .NET generics.

> - value types (C# wins here, not present in Java, a big mistake IMO)

This is a clear advantage for .NET for certain types of applications,
and is one of the features I'd like to see on the JVM (it's on the MLVM
list for what is worth). Still, the JVM has things like escape analysis
and scalar replacement that are nice to have for those users who don't
want to bother thinking about these issues (they work in more limited
scenarios though).

> - operator overloading, operators, or something like the scala approach (not present in Java)

This is more Java and less the JVM, the JVM has very few limitations
here:

http://blogs.sun.com/jrose/entry/symbolic_freedom_in_the_vm

By the way, the bit described there that makes it easier to call such
methods from Java has already made it to OpenJDK (and is available on
JDK7 betas/snapshots).

> - properties syntax (C# wins over getters / setters approach)

More of a language issue again.

> - reflection facilities (CLR's are more powerful)

Would you please clarify what you mean here?

> - native interface (big win for the CLR, JNI poor)

Yes, JNI is awkward as Sun wanted to motivate people not to write native
code, however JNA helps there quite a bit.

> - functional constructs (CLR ahead, JVM proposals still to be implemented)

CLR has tail calls, but what else do you mean here?

> My impression is that the JVM has just started really to evolve in a way that is
> less java-centric. The .NET CLR was designed as a multi-language environment
> and the designers have not hesitated in fitting the CLR to be hospitable to
> radically different language environments.

Yes and No. For example, reified generics are apparently a pain to deal
with for dynamic languages. Ironically, the JVM choice has worked out
better in this case.

Best,
Ismael

Landei
Joined: 2008-12-18,
User offline. Last seen 45 weeks 4 days ago.
Re: not easy to challenge 'while' loops

Tux Racer wrote:
>
> Thank you very much all for your interesting answers.
> In fact I was looking for a wrapper solution as the one described by Rex.
> The example was a bit annoying because the text was English, but it
> becomes more exciting with Japanese (where word boundaries are not
> ...
>
> *** JAPANESE ***
> scala> val jpText="イチゴを食べます。"
> jpText: java.lang.String = イチゴを食べます。
>
> scala> val exJapanese=new Example(jpText,"jp","JP")
> exJapanese: Example = Example@45484a
>
> scala> exJapanese.test
> イチゴ
> を
> 食
> べます
>
> Cheers
> TuX
>
>

For the example sentence イチゴを食べます ([I, you, ...] eat strawberries) shouldn't
the result be

イチゴ

食べます

I mean, neither 食 nor べます could be considered "words", they form the verb
"to eat". Of course something like this is hard to accomplish, but the
current implementation looks pretty useless to me.

Cheers,
Landei

Tux Racer
Joined: 2009-12-21,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

Hello Landei,

Yes, digging a bit more, it seems that the java breakIterator does not
work well for CJK:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4095325

"treat CJK characters in a Japanese-specific way: an arbitrary run of
Kanji characters, followed by an optional arbitrary run of Hiragana
characters, followed by an optional arbitrary run of Katakana
characters, all gets treated as a single "word" by the word-break iterator."

so it is useless for Chinese and not very useful for Japanese... if you
have some nice and fast CJK tokenizer please let us know ;)

Thanks
TuX

Landei wrote:
> Tux Racer wrote:
>
>> Thank you very much all for your interesting answers.
>> In fact I was looking for a wrapper solution as the one described by Rex.
>> The example was a bit annoying because the text was English, but it
>> becomes more exciting with Japanese (where word boundaries are not
>> ...
>>
>> *** JAPANESE ***
>> scala> val jpText="イチゴを食べます。"
>> jpText: java.lang.String = イチゴを食べます。
>>
>> scala> val exJapanese=new Example(jpText,"jp","JP")
>> exJapanese: Example = Example@45484a
>>
>> scala> exJapanese.test
>> イチゴ
>> を
>> 食
>> べます
>>
>> Cheers
>> TuX
>>
>>
>>
>
> For the example sentence イチゴを食べます ([I, you, ...] eat strawberries) shouldn't
> the result be
>
> イチゴ
> を
> 食べます
>
> I mean, neither 食 nor べます could be considered "words", they form the verb
> "to eat". Of course something like this is hard to accomplish, but the
> current implementation looks pretty useless to me.
>
> Cheers,
> Landei
>

Jonathan Shore
Joined: 2009-04-10,
User offline. Last seen 42 years 45 weeks ago.
Re: not easy to challenge 'while' loops

Landei,

Hi. Thought I would chime in as a Japanese speaker ;) It would be difficult to distinguish word breaks in japanese without a full NLP approach IMO. Disambiguation would require both context and perhaps a probabalistic model. A well known twister is: うらにわにはにわのにわとりがいる。 Of course few except kids would write in pure hiragana. Simplifying, becomes: うら庭には二羽の鶏がいる。 Even deciding between the particles に and には can be difficult. In some contexts は could begin a noun in the object portion of the sentence, though again, most writers would avoid hiragana, especially right after a particle.

On Feb 10, 2010, at 4:46 AM, TuX RaceR wrote:

> Hello Landei,
>
> Yes, digging a bit more, it seems that the java breakIterator does not work well for CJK:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4095325
>
> "treat CJK characters in a Japanese-specific way: an arbitrary run of Kanji characters, followed by an optional arbitrary run of Hiragana characters, followed by an optional arbitrary run of Katakana characters, all gets treated as a single "word" by the word-break iterator."
>
> so it is useless for Chinese and not very useful for Japanese... if you have some nice and fast CJK tokenizer please let us know ;)
>
> Thanks
> TuX
>
> Landei wrote:
>> Tux Racer wrote:
>>
>>> Thank you very much all for your interesting answers.
>>> In fact I was looking for a wrapper solution as the one described by Rex.
>>> The example was a bit annoying because the text was English, but it becomes more exciting with Japanese (where word boundaries are not ...
>>>
>>> *** JAPANESE ***
>>> scala> val jpText="イチゴを食べます。"
>>> jpText: java.lang.String = イチゴを食べます。
>>>
>>> scala> val exJapanese=new Example(jpText,"jp","JP")
>>> exJapanese: Example = Example@45484a
>>>
>>> scala> exJapanese.test
>>> イチゴ
>>> を
>>> 食
>>> べます
>>>
>>> Cheers
>>> TuX
>>>
>>>
>>>
>>
>> For the example sentence イチゴを食べます ([I, you, ...] eat strawberries) shouldn't
>> the result be
>> イチゴ
>> を
>> 食べます
>>
>> I mean, neither 食 nor べます could be considered "words", they form the verb
>> "to eat". Of course something like this is hard to accomplish, but the
>> current implementation looks pretty useless to me.
>>
>> Cheers, Landei
>>
>

Landei
Joined: 2008-12-18,
User offline. Last seen 45 weeks 4 days ago.
Re: not easy to challenge 'while' loops

Tux Racer wrote:
>
> Hello Landei,
>
> Yes, digging a bit more, it seems that the java breakIterator does not
> work well for CJK:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4095325
>
> "treat CJK characters in a Japanese-specific way: an arbitrary run of
> Kanji characters, followed by an optional arbitrary run of Hiragana
> characters, followed by an optional arbitrary run of Katakana
> characters, all gets treated as a single "word" by the word-break
> iterator."
>
> so it is useless for Chinese and not very useful for Japanese... if you
> have some nice and fast CJK tokenizer please let us know ;)
>
> Thanks
> TuX
>

I'm sorry, I have just the one on top of my shoulders, and this one is
neither fast nor accurate...

As Jonathan Shore pointed out, it's really hard to get this done correctly.
Maybe there is already a free solution, but I don't know about one. On the
other hand, Japanese and Chinese people are used to the situation that they
have no word boundaries.

Cheers,
Daniel

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland