This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Arrays again

11 replies
odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.

Hi all,

I have given some thought what to do with the array situation. As we
are all painfully aware, arrays in Scala are still broken in several
ways.

First, there's the question what to do when the user wants to create
an Array[T], where T is a type variable. Currently, we create a
deviously clever boxed array which ``snaps'' into the right
representation as soon as someone casts it to some more concrete type.
Sort of like in physics where a photon can be either a wave or a
particle, depending how you look at it. There are at least three
problems with this:

-1. There are some surprising corner effects, for instance when using
isInstanceOf followed by a cast.
-2. Because there are surprising corner effects, we need to spec them,
but because the implementation is so clever, that's really hard to do.
-3. There can be unexpected performance drops.

Second, there's the problem that boxing of arrays works only on the
toplevel. Here's an example that fails:

scala> val x = Array(1, 2, 3)
x: Array[Int] = Array(1, 2, 3)

scala> val xs = List(x)
xs: List[Array[Int]] = List([I@1353d27)

scala> val ys: List[collection.mutable.Vector[Int]] = xs
ys: List[scala.collection.mutable.Vector[Int]] = List([I@1353d27)

scala> ys.head.head
java.lang.ClassCastException: [I cannot be cast to
scala.collection.generic.VectorTemplate

The problem is the widening cast from List[Array[Int]] to
List[Vector[Int]]. That requires a representation change, but because
the array is already wrapped in the list, the boxing operation does
not get done.

Some time ago, David and Stepan have made a proposal that involved
having two kinds of arrays instead of one: A generic array, which is
really a Scala wrapper on an array of objects, and a Java array, which
would support only selection, update and length, just as Java does.
The proposal was to have implicit conversions between the two kinds of
arrays and to duplicate all methods that take either array as a
parameter into two overloaded variants, one for each type of array. We
had a long discussion about this. In the end it was mainly Lex and me
who had doubts because we feared that it would be difficult for a user
to choose between the two forms of array: Only generic arrays have all
the nice operations defined on them, but Java arrays are more compact,
significantly faster, and better for interoperability.

In the meantime, I have made some progress on the manifests side, and
am now convinced that it is not unreasonable to demand manifests for
generic array creations. To a large degree this was made possible by
the new 2.8 collection's builder design, which can opportunistically
make use of manifests where available. With that new fact, I think a
different array design is now possible:

As in the McIver/Koltsov proposal there are two array classes with
implicit conversions between them:

Array[T] (which corresponds to a Java array at run time)
ArrayVector[T] (which wraps a Java array as a subtype of
collection.mutable.Vector)

Both types support all operations defined in collection.mutable.Vector
at the correct types. For instance

Array[T] has method reverse: Array[T]
ArrayVector[T] has method reverse: ArrayVector[T]

Most ArrayVector operations are simply inherited from VectorTemplate,
same as for any other Vector. But the same operations on Array have to
be implemented by compiler magic. Essentially, for every Vector
operation and every primitive array type there must be a static method
implementing this operation on this type. Then the compiler would do
an expansion like

(xs: Array[Int]).reverse becomes IntArrayOps.reverse(xs)

Another tricky aspect is erasure. The erasure of Array[Int] is
Array[Int] and the erasure of Array[List[Int]] is Array[List]. So the
erasure of every array type that has a non-variable element type is
Array of erasure-of-element-type.
But what about Array[T] where T is a type parameter or abstract type?
Previously the erasure was BoxedArray, but that leads into trouble
because it means change in representation. So the erasure of Array[T]
must be Object.
This means that generic array operations must dynamically dispatch on
the array element type. Given ys: Array[T],

(ys: Array[T]).apply(i) would need to be translated to
GenericArrayOps.apply(i)

This would lead to code like:

ys match {
case ys: Array[Object]: ys(i)
case ys: Array[Int]: ys(i)
case ys: Array[Float]: ys(i)
... // and so on for all other 6 primitive types
}

... and its result would be an AnyRef, so there would also be some
boxing to do. Similarly for update and length. All other operations
can in principle be defined in terms of these three, but for often
used operations such as foreach we might prefer to specialize
dynamically similarly to the above. So there's certainly some overhead
involved, even though it should be less that what's currently done in
BoxedAnyArray. Luckily the overhead can be eliminated by @specialized.

The other long-standing question is the famous

"abc" != "abc".reverse.reverse

problem on strings. I believe this problem can be solved in the same
way as for arrays. We already have the two representations of strings:
String and RichString, with implicit conversions between them. All
that's needed is to re-implement all vector operations on strings as
static methods, and to have compiler magic to invoke them.

The main benefits of this proposal is that it's relatively simple to
understand and that it involves few special tricks. It's all implicit
conversions and special treatment of some operations (akin to
extension methods). I also believe
that the large majority of existing array code will continue to work.
The main downside is that there will be a lot of operations to
implement (close to 100 methods on vectors, each needs to be repeated
on 9 array types plus string, makes about a thousand methods).

What do you think? Is there something I have overlooked? Something
that can be improved?

Matt Fowles
Joined: 2009-07-09,
User offline. Last seen 42 years 45 weeks ago.
Re: Arrays again

Martin~

I can't speak to the technical trade offs; however, this actually
provides a fairly good opportunity for people new to Scala development
to get their feet wet. Implementing some of these methods could
provide a good introduction to Scala development for people like me.
It would require who ever starts the processes of implementing this to
write a wiki page or something instructing newbies where to look, but
I know that I would be happy to implement some of the required
methods.

Matt

On Fri, Aug 28, 2009 at 10:44 AM, martin odersky wrote:
> Hi all,
>
> I have given some thought what to do with the array situation. As we
> are all painfully aware, arrays in Scala are still broken in several
> ways.
>
> First, there's the question what to do when the user wants to create
> an  Array[T], where T is a type variable. Currently, we create a
> deviously clever boxed array which ``snaps'' into the right
> representation as soon as someone casts it to some more concrete type.
> Sort of like in physics where a photon can be either a wave or a
> particle, depending how you look at it. There are at least three
> problems with this:
>
> -1. There are some surprising corner effects, for instance when using
> isInstanceOf followed by a cast.
> -2. Because there are surprising corner effects, we need to spec them,
> but because the implementation is so clever, that's really hard to do.
> -3. There can be unexpected performance drops.
>
> Second, there's the problem that boxing of arrays works only on the
> toplevel. Here's an example that fails:
>
> scala> val x = Array(1, 2, 3)
> x: Array[Int] = Array(1, 2, 3)
>
> scala> val xs = List(x)
> xs: List[Array[Int]] = List([I@1353d27)
>
> scala> val ys: List[collection.mutable.Vector[Int]] = xs
> ys: List[scala.collection.mutable.Vector[Int]] = List([I@1353d27)
>
> scala> ys.head.head
> java.lang.ClassCastException: [I cannot be cast to
> scala.collection.generic.VectorTemplate
>
> The problem is the widening cast from List[Array[Int]] to
> List[Vector[Int]]. That requires a representation change, but because
> the array is already wrapped in the list, the boxing operation does
> not get done.
>
> Some time ago, David and Stepan have made a proposal that involved
> having two kinds of arrays instead of one: A generic array, which is
> really a Scala wrapper on an array of objects, and a Java array, which
> would support only selection, update and length, just as Java does.
> The proposal was to have implicit conversions between the two kinds of
> arrays and to duplicate all methods that take either array as a
> parameter into two overloaded variants, one for each type of array. We
> had a long discussion about this. In the end it was mainly Lex and me
> who had doubts because we feared that it would be difficult for a user
> to choose between the two forms of array: Only generic arrays have all
> the nice operations defined on them, but Java arrays are more compact,
> significantly faster, and better for interoperability.
>
> In the meantime, I have made some progress on the manifests side, and
> am now convinced that it is not unreasonable to demand manifests for
> generic array creations. To a large degree this was made possible by
> the new 2.8 collection's builder design, which can opportunistically
> make use of manifests where available. With that new fact, I think a
> different array design is now possible:
>
> As in the McIver/Koltsov proposal there are two array classes with
> implicit conversions between them:
>
> Array[T]   (which corresponds to a Java array at run time)
> ArrayVector[T]    (which wraps a Java array as a subtype of
> collection.mutable.Vector)
>
> Both types support all operations defined in collection.mutable.Vector
> at the correct types. For instance
>
> Array[T]     has method   reverse: Array[T]
> ArrayVector[T]    has method     reverse: ArrayVector[T]
>
> Most ArrayVector operations are simply inherited from VectorTemplate,
> same as for any other Vector. But the same operations on Array have to
> be implemented by compiler magic. Essentially, for every Vector
> operation and every primitive array type there must be a static method
> implementing this operation on this type. Then the compiler would do
> an expansion like
>
> (xs: Array[Int]).reverse     becomes    IntArrayOps.reverse(xs)
>
> Another tricky aspect is erasure. The erasure of Array[Int] is
> Array[Int] and the erasure of Array[List[Int]] is Array[List]. So the
> erasure of every array type that has a non-variable element type is
> Array of erasure-of-element-type.
> But what about Array[T] where T is a type parameter or abstract type?
> Previously the erasure was BoxedArray, but that leads into trouble
> because it means change in representation. So the erasure of Array[T]
> must be Object.
> This means that generic array operations must dynamically dispatch on
> the array element type. Given ys: Array[T],
>
> (ys: Array[T]).apply(i)   would need to be translated to
> GenericArrayOps.apply(i)
>
> This would lead to code like:
>
> ys match {
>  case ys: Array[Object]: ys(i)
>  case ys: Array[Int]: ys(i)
>  case ys: Array[Float]: ys(i)
>  ... // and so on for all other 6 primitive types
> }
>
> ... and its result would be an AnyRef, so there would also be some
> boxing to do. Similarly for update and length. All other operations
> can in principle be defined in terms of these three, but for often
> used operations such as foreach we might prefer to specialize
> dynamically similarly to the above. So there's certainly some overhead
> involved, even though it should be less that what's currently done in
> BoxedAnyArray. Luckily the overhead can be eliminated by @specialized.
>
> The other long-standing question is the famous
>
>  "abc" != "abc".reverse.reverse
>
> problem on strings. I believe this problem can be solved in the same
> way as for arrays. We already have the two representations of strings:
> String and RichString, with implicit conversions between them. All
> that's needed is to re-implement all vector operations on strings as
> static methods, and to have compiler magic to invoke them.
>
> The main benefits of this proposal is that it's relatively simple to
> understand and that it involves few special tricks. It's all implicit
> conversions and special treatment of some operations (akin to
> extension methods). I also believe
> that the large majority of existing array code will continue to work.
> The main downside is that there will be a lot of operations to
> implement (close to 100 methods on vectors, each needs to be repeated
> on 9 array types plus string, makes about a thousand methods).
>
> What do you think? Is there something I have overlooked? Something
> that can be improved?
>
>  -- Martin
>

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Arrays again

On Fri, Aug 28, 2009 at 5:07 PM, Matt Fowles wrote:
> Martin~
>
> I can't speak to the technical trade offs; however, this actually
> provides a fairly good opportunity for people new to Scala development
> to get their feet wet.  Implementing some of these methods could
> provide a good introduction to Scala development for people like me.
> It would require who ever starts the processes of implementing this to
> write a wiki page or something instructing newbies where to look, but
> I know that I would be happy to implement some of the required
> methods.
>
Actually, I did not mean that these methods would be implemented
manually. There's too much risks for errors. I think it's easy to
generate them either by some script or with regexp replace in the
editor. The concern was more codesize, but it's a minor one.

Cheers

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Arrays again

On Fri, Aug 28, 2009 at 04:44:05PM +0200, martin odersky wrote:
> problem on strings. I believe this problem can be solved in the same
> way as for arrays. We already have the two representations of strings:
> String and RichString, with implicit conversions between them. All
> that's needed is to re-implement all vector operations on strings as
> static methods, and to have compiler magic to invoke them.

It's almost creepy, the parallel development. Yesterday the manifest
patch fixed #2250 while I was working on it, and also yesterday I wrote
a pile of static methods with the intention of shopping exactly this
solution for Strings. I'm pretty close to having all the Vector
methods, but written for String specifically so hopefully with decent
performance except for the issue that function1 isn't specialized.

final def takeRight(s: String, len: Int): String =
if (len <= 0) ""
else if (len >= s.length) s
else s.substring(s.length - len, s.length)

final def dropRight(s: String, len: Int): String =
if (len <= 0) s
else if (len >= s.length) ""
else s.substring(0, s.length - len)

final def splitAt(s: String, index: Int): (String, String) =
if (index <= 0) ("", s)
else if (index >= s.length) (s, "")
else (take(s, index), drop(s, index))

... etc

Do I understand correctly we'd get the right return types for String, so
"abcdefg" take 3 would yield "abc" the String, not the RichString?
That'd be such a big jump in goodness.

David Pollak
Joined: 2008-12-16,
User offline. Last seen 42 years 45 weeks ago.
Re: Arrays again


On Fri, Aug 28, 2009 at 7:44 AM, martin odersky <martin.odersky@epfl.ch> wrote:
Hi all,

I have given some thought what to do with the array situation. As we
are all painfully aware, arrays in Scala are still broken in several
ways.

First, there's the question what to do when the user wants to create
an  Array[T], where T is a type variable. Currently, we create a
deviously clever boxed array which ``snaps'' into the right
representation as soon as someone casts it to some more concrete type.
Sort of like in physics where a photon can be either a wave or a
particle, depending how you look at it. There are at least three
problems with this:

-1. There are some surprising corner effects, for instance when using
isInstanceOf followed by a cast.
-2. Because there are surprising corner effects, we need to spec them,
but because the implementation is so clever, that's really hard to do.
-3. There can be unexpected performance drops.

Second, there's the problem that boxing of arrays works only on the
toplevel. Here's an example that fails:

scala> val x = Array(1, 2, 3)
x: Array[Int] = Array(1, 2, 3)

scala> val xs = List(x)
xs: List[Array[Int]] = List([I@1353d27)

scala> val ys: List[collection.mutable.Vector[Int]] = xs
ys: List[scala.collection.mutable.Vector[Int]] = List([I@1353d27)

scala> ys.head.head
java.lang.ClassCastException: [I cannot be cast to
scala.collection.generic.VectorTemplate

The problem is the widening cast from List[Array[Int]] to
List[Vector[Int]]. That requires a representation change, but because
the array is already wrapped in the list, the boxing operation does
not get done.

Some time ago, David and Stepan have made a proposal that involved
having two kinds of arrays instead of one: A generic array, which is
really a Scala wrapper on an array of objects, and a Java array, which
would support only selection, update and length, just as Java does.
The proposal was to have implicit conversions between the two kinds of
arrays and to duplicate all methods that take either array as a
parameter into two overloaded variants, one for each type of array. We
had a long discussion about this. In the end it was mainly Lex and me
who had doubts because we feared that it would be difficult for a user
to choose between the two forms of array: Only generic arrays have all
the nice operations defined on them, but Java arrays are more compact,
significantly faster, and better for interoperability.

In the meantime, I have made some progress on the manifests side, and
am now convinced that it is not unreasonable to demand manifests for
generic array creations. To a large degree this was made possible by
the new 2.8 collection's builder design, which can opportunistically
make use of manifests where available. With that new fact, I think a
different array design is now possible:

As in the McIver/Koltsov proposal there are two array classes with
implicit conversions between them:

Array[T]   (which corresponds to a Java array at run time)
ArrayVector[T]    (which wraps a Java array as a subtype of
collection.mutable.Vector)

Both types support all operations defined in collection.mutable.Vector
at the correct types. For instance

Array[T]     has method   reverse: Array[T]
ArrayVector[T]    has method     reverse: ArrayVector[T]

Most ArrayVector operations are simply inherited from VectorTemplate,
same as for any other Vector. But the same operations on Array have to
be implemented by compiler magic. Essentially, for every Vector
operation and every primitive array type there must be a static method
implementing this operation on this type. Then the compiler would do
an expansion like

(xs: Array[Int]).reverse     becomes    IntArrayOps.reverse(xs)

Another tricky aspect is erasure. The erasure of Array[Int] is
Array[Int] and the erasure of Array[List[Int]] is Array[List]. So the
erasure of every array type that has a non-variable element type is
Array of erasure-of-element-type.
But what about Array[T] where T is a type parameter or abstract type?
Previously the erasure was BoxedArray, but that leads into trouble
because it means change in representation. So the erasure of Array[T]
must be Object.
This means that generic array operations must dynamically dispatch on
the array element type. Given ys: Array[T],

(ys: Array[T]).apply(i)   would need to be translated to
GenericArrayOps.apply(i)

This would lead to code like:

ys match {
 case ys: Array[Object]: ys(i)
 case ys: Array[Int]: ys(i)
 case ys: Array[Float]: ys(i)
 ... // and so on for all other 6 primitive types
}

... and its result would be an AnyRef, so there would also be some
boxing to do. Similarly for update and length. All other operations
can in principle be defined in terms of these three, but for often
used operations such as foreach we might prefer to specialize
dynamically similarly to the above. So there's certainly some overhead
involved, even though it should be less that what's currently done in
BoxedAnyArray. Luckily the overhead can be eliminated by @specialized.

The other long-standing question is the famous

 "abc" != "abc".reverse.reverse

problem on strings. I believe this problem can be solved in the same
way as for arrays. We already have the two representations of strings:
String and RichString, with implicit conversions between them. All
that's needed is to re-implement all vector operations on strings as
static methods, and to have compiler magic to invoke them.

The main benefits of this proposal is that it's relatively simple to
understand and that it involves few special tricks. It's all implicit
conversions and special treatment of some operations (akin to
extension methods). I also believe
that the large majority of existing array code will continue to work.
The main downside is that there will be a lot of operations to
implement (close to 100 methods on vectors, each needs to be repeated
on 9 array types plus string, makes about a thousand methods).

What do you think? Is there something I have overlooked? Something
that can be improved?

Martin,

The only thing I would suggest as an improvement is to include a Manifest[T] with every List cons cell.  I'm not sure what the runtime performance hit would be (the memory footprint issue could be addressed by custom cons subclasses for String, Char, Int, etc.)  I suggest this as an improvement because I believe the farther we can push Manifests into collections, the more rare the corner cases will be (e.g., List[Array[T]]).

My 2 cents.

Thanks,

David
 


Tiark Rompf
Joined: 2009-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Arrays again

Hi,

given my experience benchmarking immutable Vectors and other library
classes that need to make efficient use of arrays internally, I'm very
much in favor of a solution that provides some mechanism to use plain
Java arrays without any boxing overhead. For making library classes
efficient, I think this is crucial.

I was wondering whether something analogous to Numeric would be
feasible, i.e. using an unboxed representation but adding methods via
implicits. Taking the dreaded String.reverse as an example, this could
look as follows:

object Sequential {
trait StringIsSequential extends Sequential[String, Char] {
def doReverse(x: String): String = x.reverse
...
}
implicit object StringIsSequential extends StringIsSequential
}

trait Sequential[T, @specialized U] {
def doReverse(x: T): T
...

class Ops(lhs: T) {
def doReverse = Sequential.this.doReverse(lhs)
...
}
implicit def mkSequentialOps(lhs: T): Ops = new Ops(lhs)
}

And it yields the desired behavior, since there are no RichStrings
involved:

x: "lorem ipsum"
x.reverse: "muspi merol"
x == x.reverse.reverse: true
x.reverse.reverse == x: true
x.reverse == x.reverse: true

For Arrays, it could be done similarly (it doesn't compile currently,
because BoxedArrays are expected):

trait ArrayIsSequential[@specialized T] extends
Sequential[Array[T], T] {
def newArray(x: Int): Array[T]
def doReverse(x: Array[T]): Array[T] = {
var i = 0;
var res = newArray(x.length)
while (i < res.length) {
res(i) = x(x.length - i - 1)
i += 1
}
res
}
....
}
implicit def arrayIsSequential[T](implicit m: Manifest[T]) = new
ArrayIsSequential[T]() {
def newArray(x: Int) =
java.lang.reflect.Array.newInstance(m.erasure, x).asInstanceOf[Array[T]]
}

Code which works both with Strings and Array[Char] could look similar
to this (using Sequential in what we decided to call a context bound):

def foo[T: Sequential[Char]](x: T) = {
// would need to have implicits in scope here (import from
evidence object)
println("x: " + x)
println("x.reverse: " + x.doReverse)
}

Of course this is somehow contrary to the new collection library
design, so I'm not sure if it's really practicable.

- Tiark

On 28.08.2009, at 16:44, martin odersky wrote:

> Hi all,
>
> I have given some thought what to do with the array situation. As we
> are all painfully aware, arrays in Scala are still broken in several
> ways.
>
> First, there's the question what to do when the user wants to create
> an Array[T], where T is a type variable. Currently, we create a
> deviously clever boxed array which ``snaps'' into the right
> representation as soon as someone casts it to some more concrete type.
> Sort of like in physics where a photon can be either a wave or a
> particle, depending how you look at it. There are at least three
> problems with this:
>
> -1. There are some surprising corner effects, for instance when using
> isInstanceOf followed by a cast.
> -2. Because there are surprising corner effects, we need to spec them,
> but because the implementation is so clever, that's really hard to do.
> -3. There can be unexpected performance drops.
>
> Second, there's the problem that boxing of arrays works only on the
> toplevel. Here's an example that fails:
>
> scala> val x = Array(1, 2, 3)
> x: Array[Int] = Array(1, 2, 3)
>
> scala> val xs = List(x)
> xs: List[Array[Int]] = List([I@1353d27)
>
> scala> val ys: List[collection.mutable.Vector[Int]] = xs
> ys: List[scala.collection.mutable.Vector[Int]] = List([I@1353d27)
>
> scala> ys.head.head
> java.lang.ClassCastException: [I cannot be cast to
> scala.collection.generic.VectorTemplate
>
> The problem is the widening cast from List[Array[Int]] to
> List[Vector[Int]]. That requires a representation change, but because
> the array is already wrapped in the list, the boxing operation does
> not get done.
>
> Some time ago, David and Stepan have made a proposal that involved
> having two kinds of arrays instead of one: A generic array, which is
> really a Scala wrapper on an array of objects, and a Java array, which
> would support only selection, update and length, just as Java does.
> The proposal was to have implicit conversions between the two kinds of
> arrays and to duplicate all methods that take either array as a
> parameter into two overloaded variants, one for each type of array. We
> had a long discussion about this. In the end it was mainly Lex and me
> who had doubts because we feared that it would be difficult for a user
> to choose between the two forms of array: Only generic arrays have all
> the nice operations defined on them, but Java arrays are more compact,
> significantly faster, and better for interoperability.
>
> In the meantime, I have made some progress on the manifests side, and
> am now convinced that it is not unreasonable to demand manifests for
> generic array creations. To a large degree this was made possible by
> the new 2.8 collection's builder design, which can opportunistically
> make use of manifests where available. With that new fact, I think a
> different array design is now possible:
>
> As in the McIver/Koltsov proposal there are two array classes with
> implicit conversions between them:
>
> Array[T] (which corresponds to a Java array at run time)
> ArrayVector[T] (which wraps a Java array as a subtype of
> collection.mutable.Vector)
>
> Both types support all operations defined in collection.mutable.Vector
> at the correct types. For instance
>
> Array[T] has method reverse: Array[T]
> ArrayVector[T] has method reverse: ArrayVector[T]
>
> Most ArrayVector operations are simply inherited from VectorTemplate,
> same as for any other Vector. But the same operations on Array have to
> be implemented by compiler magic. Essentially, for every Vector
> operation and every primitive array type there must be a static method
> implementing this operation on this type. Then the compiler would do
> an expansion like
>
> (xs: Array[Int]).reverse becomes IntArrayOps.reverse(xs)
>
> Another tricky aspect is erasure. The erasure of Array[Int] is
> Array[Int] and the erasure of Array[List[Int]] is Array[List]. So the
> erasure of every array type that has a non-variable element type is
> Array of erasure-of-element-type.
> But what about Array[T] where T is a type parameter or abstract type?
> Previously the erasure was BoxedArray, but that leads into trouble
> because it means change in representation. So the erasure of Array[T]
> must be Object.
> This means that generic array operations must dynamically dispatch on
> the array element type. Given ys: Array[T],
>
> (ys: Array[T]).apply(i) would need to be translated to
> GenericArrayOps.apply(i)
>
> This would lead to code like:
>
> ys match {
> case ys: Array[Object]: ys(i)
> case ys: Array[Int]: ys(i)
> case ys: Array[Float]: ys(i)
> ... // and so on for all other 6 primitive types
> }
>
> ... and its result would be an AnyRef, so there would also be some
> boxing to do. Similarly for update and length. All other operations
> can in principle be defined in terms of these three, but for often
> used operations such as foreach we might prefer to specialize
> dynamically similarly to the above. So there's certainly some overhead
> involved, even though it should be less that what's currently done in
> BoxedAnyArray. Luckily the overhead can be eliminated by @specialized.
>
> The other long-standing question is the famous
>
> "abc" != "abc".reverse.reverse
>
> problem on strings. I believe this problem can be solved in the same
> way as for arrays. We already have the two representations of strings:
> String and RichString, with implicit conversions between them. All
> that's needed is to re-implement all vector operations on strings as
> static methods, and to have compiler magic to invoke them.
>
> The main benefits of this proposal is that it's relatively simple to
> understand and that it involves few special tricks. It's all implicit
> conversions and special treatment of some operations (akin to
> extension methods). I also believe
> that the large majority of existing array code will continue to work.
> The main downside is that there will be a lot of operations to
> implement (close to 100 methods on vectors, each needs to be repeated
> on 9 array types plus string, makes about a thousand methods).
>
> What do you think? Is there something I have overlooked? Something
> that can be improved?
>

milessabin
Joined: 2008-08-11,
User offline. Last seen 33 weeks 3 days ago.
Re: Arrays again

On Fri, Aug 28, 2009 at 6:07 PM, Tiark Rompf wrote:
> Code which works both with Strings and Array[Char] could look similar to
> this (using Sequential in what we decided to call a context bound):
>
>  def foo[T: Sequential[Char]](x: T) = {
>    // would need to have implicits in scope here (import from evidence
> object)
>    println("x: " + x)
>    println("x.reverse: " + x.doReverse)
>  }

What kind of overheads would you anticipate for this?

The ability to generalize across String's and Array[Char]'s with
minimal performance impact would be a really big win for a lot of
applications ...

Cheers,

Miles

milessabin
Joined: 2008-08-11,
User offline. Last seen 33 weeks 3 days ago.
Re: Arrays again

On Fri, Aug 28, 2009 at 6:23 PM, Miles Sabin wrote:
> On Fri, Aug 28, 2009 at 6:07 PM, Tiark Rompf wrote:
>> Code which works both with Strings and Array[Char] could look similar to
>> this (using Sequential in what we decided to call a context bound):
>>
>>  def foo[T: Sequential[Char]](x: T) = {
>>    // would need to have implicits in scope here (import from evidence
>> object)
>>    println("x: " + x)
>>    println("x.reverse: " + x.doReverse)
>>  }
>
> What kind of overheads would you anticipate for this?
>
> The ability to generalize across String's and Array[Char]'s with
> minimal performance impact would be a really big win for a lot of
> applications ...

And while we're at it ...

Would the same technique allow abstracting over java.nio.ByteBuffer's
and Array[Byte]'s and java.nio.CharBuffer's and Array[Char]'s (and
Strings) with minimal overhead?

Cheers,

Miles

Tiark Rompf
Joined: 2009-02-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Arrays again

On 28.08.2009, at 20:49, Miles Sabin wrote:

> On Fri, Aug 28, 2009 at 6:23 PM, Miles Sabin
> wrote:
>> On Fri, Aug 28, 2009 at 6:07 PM, Tiark Rompf
>> wrote:
>>> Code which works both with Strings and Array[Char] could look
>>> similar to
>>> this (using Sequential in what we decided to call a context bound):
>>>
>>> def foo[T: Sequential[Char]](x: T) = {
>>> // would need to have implicits in scope here (import from
>>> evidence
>>> object)
>>> println("x: " + x)
>>> println("x.reverse: " + x.doReverse)
>>> }
>>
>> What kind of overheads would you anticipate for this?
>>
>> The ability to generalize across String's and Array[Char]'s with
>> minimal performance impact would be a really big win for a lot of
>> applications ...
>
> And while we're at it ...
>
> Would the same technique allow abstracting over java.nio.ByteBuffer's
> and Array[Byte]'s and java.nio.CharBuffer's and Array[Char]'s (and
> Strings) with minimal overhead?

I think the overhead should be pretty small, but without extensive
testing I might well be wrong. In principle, a combination of
@specialized, inlining and escape analysis could make it as efficient
as a regular method call. Since we're relying heavily on implicits
here, we'd need to get rid of all those one-shot SequentialOps
allocations. In principle, scalar replacement based on escape analysis
should do that, but it's hard to tell beforehand how well it works in
practice.

Does anybody have reliable performance figures for Numeric with
@specialized and -XX:+DoEscapeAnalysis?

For java.nio.CharBuffer and Array[Char] I guess one could use the same
mechanism.

Cheers,
- Tiark

Roland Kuhn
Joined: 2008-12-26,
User offline. Last seen 3 years 14 weeks ago.
Re: Arrays again

Hi Martin,

having experienced (and I believe understood) the current array
issues, I very much appreciate the approach you describe. In
particular, it seems to me that no "compiler magic" is actually
necessary for correctness of the implementation, e.g. using the
technique outline by Tiark. That way, the library would be more
readable for people like me, who have not yet ventured much into the
compiler sources. The transformation of

object ArrayOps {
def reverse(a : T) : T
}

class ArrayOps(a : T) {
def reverse = ArrayOps.reverse(a)
}
implicit def mkArrayOps(a : T) : ArrayOps = new ArrayOps(a)
(a : Array[Int]).reverse

into

ArrayOps.reverse(a)

is more of a performance optimization, which the compiler could do (I
admit that I wouldn't know how; maybe add some information to the
scala method signature for these simple dispatch-to-static methods?).
Wouldn't this gain in transparency be a good thing? Also, this kind of
optimization would certainly benefit other uses as well.

Regards,

Roland

On 28 Aug 2009, at 16:44, martin odersky wrote:

> Hi all,
>
> I have given some thought what to do with the array situation. As we
> are all painfully aware, arrays in Scala are still broken in several
> ways.
>
> First, there's the question what to do when the user wants to create
> an Array[T], where T is a type variable. Currently, we create a
> deviously clever boxed array which ``snaps'' into the right
> representation as soon as someone casts it to some more concrete type.
> Sort of like in physics where a photon can be either a wave or a
> particle, depending how you look at it. There are at least three
> problems with this:
>
> -1. There are some surprising corner effects, for instance when using
> isInstanceOf followed by a cast.
> -2. Because there are surprising corner effects, we need to spec them,
> but because the implementation is so clever, that's really hard to do.
> -3. There can be unexpected performance drops.
>
> Second, there's the problem that boxing of arrays works only on the
> toplevel. Here's an example that fails:
>
> scala> val x = Array(1, 2, 3)
> x: Array[Int] = Array(1, 2, 3)
>
> scala> val xs = List(x)
> xs: List[Array[Int]] = List([I@1353d27)
>
> scala> val ys: List[collection.mutable.Vector[Int]] = xs
> ys: List[scala.collection.mutable.Vector[Int]] = List([I@1353d27)
>
> scala> ys.head.head
> java.lang.ClassCastException: [I cannot be cast to
> scala.collection.generic.VectorTemplate
>
> The problem is the widening cast from List[Array[Int]] to
> List[Vector[Int]]. That requires a representation change, but because
> the array is already wrapped in the list, the boxing operation does
> not get done.
>
> Some time ago, David and Stepan have made a proposal that involved
> having two kinds of arrays instead of one: A generic array, which is
> really a Scala wrapper on an array of objects, and a Java array, which
> would support only selection, update and length, just as Java does.
> The proposal was to have implicit conversions between the two kinds of
> arrays and to duplicate all methods that take either array as a
> parameter into two overloaded variants, one for each type of array. We
> had a long discussion about this. In the end it was mainly Lex and me
> who had doubts because we feared that it would be difficult for a user
> to choose between the two forms of array: Only generic arrays have all
> the nice operations defined on them, but Java arrays are more compact,
> significantly faster, and better for interoperability.
>
> In the meantime, I have made some progress on the manifests side, and
> am now convinced that it is not unreasonable to demand manifests for
> generic array creations. To a large degree this was made possible by
> the new 2.8 collection's builder design, which can opportunistically
> make use of manifests where available. With that new fact, I think a
> different array design is now possible:
>
> As in the McIver/Koltsov proposal there are two array classes with
> implicit conversions between them:
>
> Array[T] (which corresponds to a Java array at run time)
> ArrayVector[T] (which wraps a Java array as a subtype of
> collection.mutable.Vector)
>
> Both types support all operations defined in collection.mutable.Vector
> at the correct types. For instance
>
> Array[T] has method reverse: Array[T]
> ArrayVector[T] has method reverse: ArrayVector[T]
>
> Most ArrayVector operations are simply inherited from VectorTemplate,
> same as for any other Vector. But the same operations on Array have to
> be implemented by compiler magic. Essentially, for every Vector
> operation and every primitive array type there must be a static method
> implementing this operation on this type. Then the compiler would do
> an expansion like
>
> (xs: Array[Int]).reverse becomes IntArrayOps.reverse(xs)
>
> Another tricky aspect is erasure. The erasure of Array[Int] is
> Array[Int] and the erasure of Array[List[Int]] is Array[List]. So the
> erasure of every array type that has a non-variable element type is
> Array of erasure-of-element-type.
> But what about Array[T] where T is a type parameter or abstract type?
> Previously the erasure was BoxedArray, but that leads into trouble
> because it means change in representation. So the erasure of Array[T]
> must be Object.
> This means that generic array operations must dynamically dispatch on
> the array element type. Given ys: Array[T],
>
> (ys: Array[T]).apply(i) would need to be translated to
> GenericArrayOps.apply(i)
>
> This would lead to code like:
>
> ys match {
> case ys: Array[Object]: ys(i)
> case ys: Array[Int]: ys(i)
> case ys: Array[Float]: ys(i)
> ... // and so on for all other 6 primitive types
> }
>
> ... and its result would be an AnyRef, so there would also be some
> boxing to do. Similarly for update and length. All other operations
> can in principle be defined in terms of these three, but for often
> used operations such as foreach we might prefer to specialize
> dynamically similarly to the above. So there's certainly some overhead
> involved, even though it should be less that what's currently done in
> BoxedAnyArray. Luckily the overhead can be eliminated by @specialized.
>
> The other long-standing question is the famous
>
> "abc" != "abc".reverse.reverse
>
> problem on strings. I believe this problem can be solved in the same
> way as for arrays. We already have the two representations of strings:
> String and RichString, with implicit conversions between them. All
> that's needed is to re-implement all vector operations on strings as
> static methods, and to have compiler magic to invoke them.
>
> The main benefits of this proposal is that it's relatively simple to
> understand and that it involves few special tricks. It's all implicit
> conversions and special treatment of some operations (akin to
> extension methods). I also believe
> that the large majority of existing array code will continue to work.
> The main downside is that there will be a lot of operations to
> implement (close to 100 methods on vectors, each needs to be repeated
> on 9 array types plus string, makes about a thousand methods).
>
> What do you think? Is there something I have overlooked? Something
> that can be improved?
>

Joshua.Suereth
Joined: 2008-09-02,
User offline. Last seen 32 weeks 5 days ago.
Re: Arrays again
Would this extension-method-like approach be something that could be available to user-space libraries?  I'm curious how this relates to some "pimp-my-library" proposals I've seen,


Would the end result look something like this?

implicit class RichArray[@specialized T](a : Array[T]) extends Vector[T] {
   //
}

Where implict classes (or objects?) compile down to static methods against their constructor arguments.  When combined with specialized, they compile to several classes (one for each primitive and the "generic" variant), and when in scope allow you to perform magic against arrays while keeping their type?



I very much like the idea regardless of how the nitty-gritty gets defined.  +1 from me.


- Josh







On Fri, Aug 28, 2009 at 10:44 AM, martin odersky <martin.odersky@epfl.ch> wrote:
Hi all,

I have given some thought what to do with the array situation. As we
are all painfully aware, arrays in Scala are still broken in several
ways.

First, there's the question what to do when the user wants to create
an  Array[T], where T is a type variable. Currently, we create a
deviously clever boxed array which ``snaps'' into the right
representation as soon as someone casts it to some more concrete type.
Sort of like in physics where a photon can be either a wave or a
particle, depending how you look at it. There are at least three
problems with this:

-1. There are some surprising corner effects, for instance when using
isInstanceOf followed by a cast.
-2. Because there are surprising corner effects, we need to spec them,
but because the implementation is so clever, that's really hard to do.
-3. There can be unexpected performance drops.

Second, there's the problem that boxing of arrays works only on the
toplevel. Here's an example that fails:

scala> val x = Array(1, 2, 3)
x: Array[Int] = Array(1, 2, 3)

scala> val xs = List(x)
xs: List[Array[Int]] = List([I@1353d27)

scala> val ys: List[collection.mutable.Vector[Int]] = xs
ys: List[scala.collection.mutable.Vector[Int]] = List([I@1353d27)

scala> ys.head.head
java.lang.ClassCastException: [I cannot be cast to
scala.collection.generic.VectorTemplate

The problem is the widening cast from List[Array[Int]] to
List[Vector[Int]]. That requires a representation change, but because
the array is already wrapped in the list, the boxing operation does
not get done.

Some time ago, David and Stepan have made a proposal that involved
having two kinds of arrays instead of one: A generic array, which is
really a Scala wrapper on an array of objects, and a Java array, which
would support only selection, update and length, just as Java does.
The proposal was to have implicit conversions between the two kinds of
arrays and to duplicate all methods that take either array as a
parameter into two overloaded variants, one for each type of array. We
had a long discussion about this. In the end it was mainly Lex and me
who had doubts because we feared that it would be difficult for a user
to choose between the two forms of array: Only generic arrays have all
the nice operations defined on them, but Java arrays are more compact,
significantly faster, and better for interoperability.

In the meantime, I have made some progress on the manifests side, and
am now convinced that it is not unreasonable to demand manifests for
generic array creations. To a large degree this was made possible by
the new 2.8 collection's builder design, which can opportunistically
make use of manifests where available. With that new fact, I think a
different array design is now possible:

As in the McIver/Koltsov proposal there are two array classes with
implicit conversions between them:

Array[T]   (which corresponds to a Java array at run time)
ArrayVector[T]    (which wraps a Java array as a subtype of
collection.mutable.Vector)

Both types support all operations defined in collection.mutable.Vector
at the correct types. For instance

Array[T]     has method   reverse: Array[T]
ArrayVector[T]    has method     reverse: ArrayVector[T]

Most ArrayVector operations are simply inherited from VectorTemplate,
same as for any other Vector. But the same operations on Array have to
be implemented by compiler magic. Essentially, for every Vector
operation and every primitive array type there must be a static method
implementing this operation on this type. Then the compiler would do
an expansion like

(xs: Array[Int]).reverse     becomes    IntArrayOps.reverse(xs)

Another tricky aspect is erasure. The erasure of Array[Int] is
Array[Int] and the erasure of Array[List[Int]] is Array[List]. So the
erasure of every array type that has a non-variable element type is
Array of erasure-of-element-type.
But what about Array[T] where T is a type parameter or abstract type?
Previously the erasure was BoxedArray, but that leads into trouble
because it means change in representation. So the erasure of Array[T]
must be Object.
This means that generic array operations must dynamically dispatch on
the array element type. Given ys: Array[T],

(ys: Array[T]).apply(i)   would need to be translated to
GenericArrayOps.apply(i)

This would lead to code like:

ys match {
 case ys: Array[Object]: ys(i)
 case ys: Array[Int]: ys(i)
 case ys: Array[Float]: ys(i)
 ... // and so on for all other 6 primitive types
}

... and its result would be an AnyRef, so there would also be some
boxing to do. Similarly for update and length. All other operations
can in principle be defined in terms of these three, but for often
used operations such as foreach we might prefer to specialize
dynamically similarly to the above. So there's certainly some overhead
involved, even though it should be less that what's currently done in
BoxedAnyArray. Luckily the overhead can be eliminated by @specialized.

The other long-standing question is the famous

 "abc" != "abc".reverse.reverse

problem on strings. I believe this problem can be solved in the same
way as for arrays. We already have the two representations of strings:
String and RichString, with implicit conversions between them. All
that's needed is to re-implement all vector operations on strings as
static methods, and to have compiler magic to invoke them.

The main benefits of this proposal is that it's relatively simple to
understand and that it involves few special tricks. It's all implicit
conversions and special treatment of some operations (akin to
extension methods). I also believe
that the large majority of existing array code will continue to work.
The main downside is that there will be a lot of operations to
implement (close to 100 methods on vectors, each needs to be repeated
on 9 array types plus string, makes about a thousand methods).

What do you think? Is there something I have overlooked? Something
that can be improved?

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Arrays again

On Sat, Aug 29, 2009 at 7:36 PM, Josh Suereth wrote:
> Would this extension-method-like approach be something that could be
> available to user-space libraries?  I'm curious how this relates to some
> "pimp-my-library" proposals I've seen,
>
>
> Would the end result look something like this?
>
> implicit class RichArray[@specialized T](a : Array[T]) extends Vector[T] {
>    //
> }
>
> Where implict classes (or objects?) compile down to static methods against
> their constructor arguments.  When combined with specialized, they compile
> to several classes (one for each primitive and the "generic" variant), and
> when in scope allow you to perform magic against arrays while keeping their
> type?
>
The problem with generalizing this is that Array or String will
already have an implicit conversion to support and operation like
reverse, but it will be the wrong one:

reverse: ArrayVector[T], or
reverse: RichString

If we simply added another implicit conversion to the right reverse,
we'd get an ambiguity.
Now, one could think of some priority scheme between implicit
conversions or extension
methods, but that seems too complex for my taste.

Cheers

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland