- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
Re: Character encoding issue...
Mon, 2009-03-23, 23:37
>>>>> "Ken" == Ken Faulkner writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
Tue, 2009-03-24, 00:17
#2
Re: Character encoding issue...
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Tue, 2009-03-24, 00:27
#3
Re: Character encoding issue...
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Tue, 2009-03-24, 00:27
#4
Re: Character encoding issue...
I've got around my issue by just making sure in the pdf conversion I force UTF8. Still, am wondering if I'll end up hitting this issue again when I cant force the conversion?
Ken
On Tue, Mar 24, 2009 at 10:13 AM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Ken
On Tue, Mar 24, 2009 at 10:13 AM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Tue, 2009-03-24, 00:37
#5
Re: Character encoding issue...
Ah. The file reader expects UTF-8. ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").
On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Tue, 2009-03-24, 00:37
#6
Re: Character encoding issue...
Y'eah, my bad. It has to be "ISO-8859-1"
http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
On Mon, Mar 23, 2009 at 4:30 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
On Mon, Mar 23, 2009 at 4:30 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Tried that, but got an error saying it didn't know about the ISO-8859 encoding.
I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:Ah. The file reader expects UTF-8. ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").
On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Tue, 2009-03-24, 00:47
#7
Re: Character encoding issue...
Tried that, but got an error saying it didn't know about the ISO-8859 encoding.
I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:
I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:
Ah. The file reader expects UTF-8. ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").
On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Tue, 2009-03-24, 00:57
#8
Re: Character encoding issue...
aaaahhhhhhhhhhhhhhhhhh
thanks.
:)
On Tue, Mar 24, 2009 at 10:32 AM, James Iry <jamesiry@gmail.com> wrote:
thanks.
:)
On Tue, Mar 24, 2009 at 10:32 AM, James Iry <jamesiry@gmail.com> wrote:
Y'eah, my bad. It has to be "ISO-8859-1"
http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
On Mon, Mar 23, 2009 at 4:30 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:Tried that, but got an error saying it didn't know about the ISO-8859 encoding.
I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:Ah. The file reader expects UTF-8. ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").
On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
Wed, 2009-03-25, 01:47
#9
Re: Character encoding issue...
I think this may be a Java library bug. It is kind of obvious that if a utf-8 reader does not get the full byte sequence there's something wrong with file encoding, not with the "buffer underflow"; so I'd intercept this "underflow" and rethrow a more reasonable exception that tells exactly what's wrong (namely, a utf-8 character sequence does not end properly).
2009/3/23 Ken Faulkner <ken.faulkner@gmail.com>
--
Thanks,
-Vlad
2009/3/23 Ken Faulkner <ken.faulkner@gmail.com>
aaaahhhhhhhhhhhhhhhhhh
thanks.
:)
On Tue, Mar 24, 2009 at 10:32 AM, James Iry <jamesiry@gmail.com> wrote:
Y'eah, my bad. It has to be "ISO-8859-1"
http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html
On Mon, Mar 23, 2009 at 4:30 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:Tried that, but got an error saying it didn't know about the ISO-8859 encoding.
I'll stick with the forcing UTF8 atm...
Thanks anyways :)
Ken
On Tue, Mar 24, 2009 at 10:20 AM, James Iry <jamesiry@gmail.com> wrote:Ah. The file reader expects UTF-8. ASCII is a subset, but ISO-8859 is not. So use Source.fromFile(fileName, "ISO-8859").
On Mon, Mar 23, 2009 at 4:13 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
file testing.txttesting.txt: ISO-8859 English text, with very long lines
In this particular case, its just a small segment of a pdftotext converstion of the stairway Scala book.
Ken
On Tue, Mar 24, 2009 at 10:05 AM, James Iry <jamesiry@gmail.com> wrote:
Ken,
Do you know what encoding the file is?
On Mon, Mar 23, 2009 at 3:58 PM, Ken Faulkner <ken.faulkner@gmail.com> wrote:
Hi
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote:
>>>>> "Ken" == Ken Faulkner <ken.faulkner@gmail.com> writes:
Ken> Hi I'm dipping my toe into the Scala waters (being from a C/C++
Ken> but mainly now Python background), and I'm having an issue
Ken> regarding something I thought should be trivial.
Ken> I'm trying to read a file (that isn't strictly 100% ASCII)
Ken> java.nio.BufferUnderflowException at
Ken> java.nio.Buffer.nextGetIndex(Buffer.java:398) at
Are you using Scala 2.7.3, or some earlier version? io.BufferedSource
(which io.Source uses) was pretty buggy in earlier versions.
--
Seth Tisue / http://tisue.net
lead developer, NetLogo: http://ccl.northwestern.edu/netlogo/
--
Thanks,
-Vlad
yeah, am using 2.7.3 final, running on JRE 1.5.0_16
Ken
On Tue, Mar 24, 2009 at 9:37 AM, Seth Tisue <seth@tisue.net> wrote: