This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Re: Character encoding issue...

9 replies
Seth Tisue
Joined: 2008-12-16,
User offline. Last seen 34 weeks 3 days ago.

>>>>> "Ken" == Ken Faulkner writes:

Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.

Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

Ken Faulkner
Joined: 2009-03-23,
User offline. Last seen 42 years 45 weeks ago.
Re: Character encoding issue...
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/

James Iry
Joined: 2008-08-19,
User offline. Last seen 1 year 23 weeks ago.
Re: Character encoding issue...
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/


Ken Faulkner
Joined: 2009-03-23,
User offline. Last seen 42 years 45 weeks ago.
Re: Character encoding issue...
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken

On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/



Ken Faulkner
Joined: 2009-03-23,
User offline. Last seen 42 years 45 weeks ago.
Re: Character encoding issue...
I've got around my issue by just making sure in the pdf conversion I force UTF8. Still, am wondering if I'll end up hitting this issue again when I cant force the conversion?
Ken

On Tue, Mar 24, 2009 at 10:13 AM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken

On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/




James Iry
Joined: 2008-08-19,
User offline. Last seen 1 year 23 weeks ago.
Re: Character encoding issue...
Ah.  The file reader expects UTF-8.  ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").

On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken

On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/




James Iry
Joined: 2008-08-19,
User offline. Last seen 1 year 23 weeks ago.
Re: Character encoding issue...
Y'eah, my bad. It has to be "ISO-8859-1"

http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html

On Mon, Mar 23, 2009 at 4:30 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Tried that, but got an error saying it didn't know about the ISO-8859 encoding.

I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:
Ah.  The file reader expects UTF-8.  ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").

On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken

On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/






Ken Faulkner
Joined: 2009-03-23,
User offline. Last seen 42 years 45 weeks ago.
Re: Character encoding issue...
Tried that, but got an error saying it didn't know about the ISO-8859 encoding.

I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:
Ah.  The file reader expects UTF-8.  ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").

On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken

On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/





Ken Faulkner
Joined: 2009-03-23,
User offline. Last seen 42 years 45 weeks ago.
Re: Character encoding issue...
aaaahhhhhhhhhhhhhhhhhh
thanks.
:)


On Tue, Mar 24, 2009 at 10:32 AM, James Iry <jamesiry@gmail.com> wrote:
Y'eah, my bad. It has to be "ISO-8859-1"

http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html

On Mon, Mar 23, 2009 at 4:30 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Tried that, but got an error saying it didn't know about the ISO-8859 encoding.

I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:
Ah.  The file reader expects UTF-8.  ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").

On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken

On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/







vpatryshev
Joined: 2009-02-16,
User offline. Last seen 1 year 24 weeks ago.
Re: Character encoding issue...
I think this may be a Java library bug. It is kind of obvious that if a utf-8 reader does not get the full byte sequence there's something wrong with file encoding, not with the "buffer underflow"; so I'd intercept this "underflow" and rethrow a more reasonable exception that tells exactly what's wrong (namely, a utf-8 character sequence does not end properly).

2009/3/23 Ken Faulkner <ken.faulkner@gmail.com>
aaaahhhhhhhhhhhhhhhhhh
thanks.
:)


On Tue, Mar 24, 2009 at 10:32 AM, James Iry <jamesiry@gmail.com> wrote:
Y'eah, my bad. It has to be "ISO-8859-1"

http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html

On Mon, Mar 23, 2009 at 4:30 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Tried that, but got an error saying it didn't know about the ISO-8859 encoding.

I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:
Ah.  The file reader expects UTF-8.  ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").

On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken

On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,

Do you know what encoding the file is?

On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken

On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:

 Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
 Ken> but mainly now Python background), and I'm having an issue
 Ken> regarding something I thought should be trivial.

 Ken> I'm trying to read a file (that isn't strictly 100% ASCII)

 Ken> java.nio.BufferUnderflowException at
 Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at

Are you using Scala 2.7.3, or some earlier version?  io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.

--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/










--
Thanks,
-Vlad

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland