This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Concerning the change to pickle format

13 replies
dubochet
Joined: 2008-06-30,
User offline. Last seen 1 year 36 weeks ago.

Hello.

Some of you may have noticed a change in the way pickled Scala signatures are stored in class files.

For those of you who are interested in the change, a document describing it can be downloaded here:

http://lamp.epfl.ch/~dubochet/new_pickle.pdf

It is only necessary to read this document if you maintain or plan to write a tool that access pickled Scala signatures by parsing class files.

Of course, even if this is not the case, the document may still be an interesting read as it gives a high-level description of what pickled Scala signatures are and how they are stored in Java class files.

Cheers,
Gilles.

Antonio Cunei
Joined: 2008-12-16,
User offline. Last seen 3 years 22 weeks ago.
Re: Concerning the change to pickle format

This is excellent internal documentation, I think you should add it as a SID.
Toni

On Fri, March 26, 2010 1:06 pm, Gilles Dubochet wrote:
> Hello.
>
> Some of you may have noticed a change in the way pickled Scala signatures
> are stored in class files.
>
> For those of you who are interested in the change, a document describing
> it can be downloaded here:
>
> http://lamp.epfl.ch/~dubochet/new_pickle.pdf
>
> It is only necessary to read this document if you maintain or plan to
> write a tool that access pickled Scala signatures by parsing class files.
>
> Of course, even if this is not the case, the document may still be an
> interesting read as it gives a high-level description of what pickled
> Scala signatures are and how they are stored in Java class files.
>
> Cheers,
> Gilles.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Concerning the change to pickle format

Fun. I have a feeling this will have some interesting consequences.
First thing I've noticed is that javap output now fires a bunch of
control characters at the terminal. Here's the output of "class A".
Notice my prompt buried in the middle of it -- that's not a bad paste on
my part, but it's actual final resting place. Pretty sad that terminals
are so easily hosed even in 2010, but I use javap all day long: any
suggestions?

const #7 = Asciz java/lang/Object;
[paulp@leaf tmp]$ #7; // java/lang/Object
const #9 = NameAndType #3:#4;// "":()V
const #10 = Method #8.#9; // java/lang/Object."":()V
const #11 = Asciz ScalaSig;
const #12 = Asciz Lscala/reflect/ScalaSignature;;
const #13 = Asciz bytes;
const #14 = Asciz e1\"\t!*\t1!A=K6H/ MaQ\"
%Q7b]TaCU4-\t1qJ6fGR\"a\nAQ!Eg
G.Y'A1bU2bY|%M[3di\")QC-1A(

dubochet
Joined: 2008-06-30,
User offline. Last seen 1 year 36 weeks ago.
Re: Concerning the change to pickle format

Hey.

> Fun. I have a feeling this will have some interesting consequences.
> First thing I've noticed is that javap output now fires a bunch of
> control characters at the terminal. Here's the output of "class A".

The string with the strange character is the encoded form of an array of bytes (the Scala signature); there is no guarantee that the characters in it are printable. If we were to restrict the range of characters to be all printable, the encoding would become less efficient and class files would become bigger.

I am sure you'll agree this is not good.

Furthermore, the output you mention happens only when javap prints the constant pool (at least on my machine), which only happens when using "java -verbose".

> Notice my prompt buried in the middle of it -- that's not a bad paste on
> my part, but it's actual final resting place. Pretty sad that terminals
> are so easily hosed even in 2010,

Yes. Although I don't notice this problem on my machine (Mac OS X, standard terminal, character encoding is UTF-8). There are just a few "bips" emmited when decompiling a class.

> but I use javap all day long: any suggestions?

Is there really information from verbose mode that isn't available with "-c -l -s" and that you use? If you are actually reading the constant pool (sounds strange to me), you could use jclasslib, which is a GUI and doesn't get confused by the strange characters in the encoded Scala signature.

Cheers,
Gilles.

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Concerning the change to pickle format

On Fri, Mar 26, 2010 at 04:32:03PM +0100, Gilles Dubochet wrote:
> I am sure you'll agree this is not good.

I wasn't advocating for anything, just bringing it up. I use javap a
lot and on a lot of different machines where I might be logged in over
ssh and I most likely have not have installed custom tools (so that's
two strikes against jclasslib.) So it's nice to be able to use javap.
And I have little doubt variations on this issue are going to start
coming up in other contexts, because as you observed in the document,
runtime annotations are pretty clearly not designed for holding binary
data, so tools written to process them are at high risk of freaking out.

> Furthermore, the output you mention happens only when javap prints the
> constant pool (at least on my machine), which only happens when using
> "java -verbose".

That is right, my javap alias says "lay it on me." I find it's faster to
print everything up front and do what I want with the output than it is
to incrementally dial up the verbosity.

> Yes. Although I don't notice this problem on my machine (Mac OS X,
> standard terminal, character encoding is UTF-8). There are just a few
> "bips" emmited when decompiling a class.

I also am on OSX in the standard terminal, character encoding UTF-8.
Maybe you didn't luck into the same control characters.

I can work around it. But I'd ask that you not be too dismissive of
this kind of issue. Such impediments are real, and it takes feedback
from diverse corners to find out what the real impact of a change like
this is. You can't generalize from how it works on your machine.

Miguel Garcia
Joined: 2009-06-10,
User offline. Last seen 42 years 45 weeks ago.
Re: Concerning the change to pickle format
Randall R Schulz
Joined: 2008-12-16,
User offline. Last seen 1 year 29 weeks ago.
Re: Concerning the change to pickle format

On Friday March 26 2010, Paul Phillips wrote:
> On Fri, Mar 26, 2010 at 04:32:03PM +0100, Gilles Dubochet wrote:
> > I am sure you'll agree this is not good.
>
> I wasn't advocating for anything, just bringing it up. I use javap a
> lot and on a lot of different machines where I might be logged in
> over ssh and I most likely have not have installed custom tools (so
> that's two strikes against jclasslib.) So it's nice to be able to
> use javap.

In this situation, piping through "cat -v" will at least prevent raw
non-ASCII characters from reaching your (pseudo-) tty.

> ...

Randall Schulz

odersky
Joined: 2008-07-29,
User offline. Last seen 45 weeks 6 days ago.
Re: Concerning the change to pickle format


On Fri, Mar 26, 2010 at 5:18 PM, Miguel Garcia <miguel.garcia@tuhh.de> wrote:

dubochet
Joined: 2008-06-30,
User offline. Last seen 1 year 36 weeks ago.
Re: Concerning the change to pickle format

Hello Paul.

>> Furthermore, the output you mention happens only when javap prints the
>> constant pool (at least on my machine), which only happens when using
>> "java -verbose".
>
> That is right, my javap alias says "lay it on me." I find it's faster to
> print everything up front and do what I want with the output than it is
> to incrementally dial up the verbosity.

Sure, I just wanted to point out that printing the whole constant pool probably isn't the most common use-case. But of course, there is nothing wrong with it ;)

> I also am on OSX in the standard terminal, character encoding UTF-8.
> Maybe you didn't luck into the same control characters.

Yes, that must be it.

> I can work around it. But I'd ask that you not be too dismissive of
> this kind of issue. Such impediments are real, and it takes feedback
> from diverse corners to find out what the real impact of a change like
> this is. You can't generalize from how it works on your machine.

You are absolutely right. There is a risk in the change. If I may have sounded as if I dismiss this risk, it wasn't intentional. I take it very seriously. This is why I documented it thoroughly and tried to be careful in the way that it was rolled out.

Happily, Randall's solution solves your issue with javap nicely (doesn't it?). Furthermore, javap itself behaves as it should: it is the terminal that has troubles dealing with some characters.

I understand that the issue you describe can be a nuisance. But I think you can agree with me by saying it is a fringe issue that doesn't demonstrate a systemic problem. I am not saying that I am 100% sure there isn't such a problem, but for the time being, there isn't an example that demonstrates one, I think.

Have a nice weekend,
Gilles.

dcsobral
Joined: 2009-04-23,
User offline. Last seen 38 weeks 5 days ago.
Re: Concerning the change to pickle format
Actually, BASE64 has 6 bits of information per byte, which gives a 1/3 increase over 8 bits encoding. However, because the new pickled signatures are encoded as strings, they have to deal with UTF-8, and the way they are dealing with it is by using 7-bits encoding.
Therefore, using BASE64 instead of pure 7-bits would increase in 1/6 the size of the encoding, which, in my opinion, is a way smaller increase that tips the balance rather more heavily toward its use.
So, here it is the choice: fully-printable encoding at the cost of 1/6 size increase, or effectively binary-only, but not even 8 bits at that, encoding?
On Fri, Mar 26, 2010 at 1:38 PM, martin odersky <martin.odersky@epfl.ch> wrote:


On Fri, Mar 26, 2010 at 5:18 PM, Miguel Garcia <miguel.garcia@tuhh.de> wrote:

Archontophoenix
Joined: 2010-02-04,
User offline. Last seen 2 years 35 weeks ago.
RE: Concerning the change to pickle format
BASE95 (using all the ASCII printing characters) is, of course, even more compact than BASE64.

See, for example, http://icerealm.quickfox.org/data/FTR-old/?p=base95.

A

From: dcsobral@gmail.com
Date: Fri, 26 Mar 2010 18:22:18 -0300
Subject: Re: [scala-internals] Concerning the change to pickle format
To: martin.odersky@epfl.ch
CC: miguel.garcia@tuhh.de; scala-internals@listes.epfl.ch

Actually, BASE64 has 6 bits of information per byte, which gives a 1/3 increase over 8 bits encoding. However, because the new pickled signatures are encoded as strings, they have to deal with UTF-8, and the way they are dealing with it is by using 7-bits encoding.
Therefore, using BASE64 instead of pure 7-bits would increase in 1/6 the size of the encoding, which, in my opinion, is a way smaller increase that tips the balance rather more heavily toward its use.
So, here it is the choice: fully-printable encoding at the cost of 1/6 size increase, or effectively binary-only, but not even 8 bits at that, encoding?
On Fri, Mar 26, 2010 at 1:38 PM, martin odersky <martin.odersky@epfl.ch> wrote:


On Fri, Mar 26, 2010 at 5:18 PM, Miguel Garcia <miguel.garcia@tuhh.de> wrote:

extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: Concerning the change to pickle format

On Fri, Mar 26, 2010 at 06:10:40PM +0100, Gilles Dubochet wrote:
> I understand that the issue you describe can be a nuisance. But I
> think you can agree with me by saying it is a fringe issue that
> doesn't demonstrate a systemic problem. I am not saying that I am 100%
> sure there isn't such a problem, but for the time being, there isn't
> an example that demonstrates one, I think.

I do not disagree that this problem by itself is overlookable. What I'm
afraid of is that with this going in so late, if there is a systemic
problem, we won't find out until after 2.8 has shipped. Certainly there
will be a lot of resistance to fixing whatever issues do arise, since
this is being brought into 2.8 specifically trying to maintain
compatibility. A hopeless errand, I should add. I want binary
compatibility across releases as much as anyone but I find it
spectacularly unrealistic to think it will happen in such a way as to
allow 2.8 bytecode to work without recompilation.

Also, I wish I'd been alerted a big breaking change was coming to the
pickler, because this is a missed opportunity to clean up some of the
inconsistencies which (as I understand it from lex spoon) are a result
of trying to preserve compatibility across previous changes. Such cruft
could be dropped at no cost with this change since it definitionally
breaks everything.

Mark Harrah
Joined: 2008-12-18,
User offline. Last seen 35 weeks 3 days ago.
Re: Concerning the change to pickle format

On Monday 29 March 2010, Paul Phillips wrote:
> On Fri, Mar 26, 2010 at 06:10:40PM +0100, Gilles Dubochet wrote:
> > I understand that the issue you describe can be a nuisance. But I
> > think you can agree with me by saying it is a fringe issue that
> > doesn't demonstrate a systemic problem. I am not saying that I am 100%
> > sure there isn't such a problem, but for the time being, there isn't
> > an example that demonstrates one, I think.
>
> I do not disagree that this problem by itself is overlookable. What I'm
> afraid of is that with this going in so late, if there is a systemic
> problem, we won't find out until after 2.8 has shipped. Certainly there
> will be a lot of resistance to fixing whatever issues do arise, since
> this is being brought into 2.8 specifically trying to maintain
> compatibility. A hopeless errand, I should add. I want binary
> compatibility across releases as much as anyone but I find it
> spectacularly unrealistic to think it will happen in such a way as to
> allow 2.8 bytecode to work without recompilation.

I agree on binary compatibility being unrealistic at this point. It is why I put so much effort into sbt to translate the need for binary compatibility into source compatibility where possible and will generally continue to do so.

> Also, I wish I'd been alerted a big breaking change was coming to the
> pickler, because this is a missed opportunity to clean up some of the
> inconsistencies which (as I understand it from lex spoon) are a result
> of trying to preserve compatibility across previous changes. Such cruft
> could be dropped at no cost with this change since it definitionally
> breaks everything.

Binary compatibility breaks fairly often on trunk; it is just not extensive like when the pickler format changes. My opinion is that if there is some issue you want to resolve that breaks binary compatibility, you should just go ahead and do it before the upcoming beta2/RC.

-Mark

Joshua.Suereth
Joined: 2008-09-02,
User offline. Last seen 32 weeks 5 days ago.
Re: Concerning the change to pickle format

Especially after the email on version conventions where 2.8 will break
even source compatibility.

Perhaps 3.0 should be the version where binary compatibility is
"solved".

Sent from my iPhone

On Mar 29, 2010, at 12:23 PM, Mark Harrah wrote:

> On Monday 29 March 2010, Paul Phillips wrote:
>> On Fri, Mar 26, 2010 at 06:10:40PM +0100, Gilles Dubochet wrote:
>>> I understand that the issue you describe can be a nuisance. But I
>>> think you can agree with me by saying it is a fringe issue that
>>> doesn't demonstrate a systemic problem. I am not saying that I am
>>> 100%
>>> sure there isn't such a problem, but for the time being, there isn't
>>> an example that demonstrates one, I think.
>>
>> I do not disagree that this problem by itself is overlookable.
>> What I'm
>> afraid of is that with this going in so late, if there is a systemic
>> problem, we won't find out until after 2.8 has shipped. Certainly
>> there
>> will be a lot of resistance to fixing whatever issues do arise, since
>> this is being brought into 2.8 specifically trying to maintain
>> compatibility. A hopeless errand, I should add. I want binary
>> compatibility across releases as much as anyone but I find it
>> spectacularly unrealistic to think it will happen in such a way as to
>> allow 2.8 bytecode to work without recompilation.
>
> I agree on binary compatibility being unrealistic at this point. It
> is why I put so much effort into sbt to translate the need for
> binary compatibility into source compatibility where possible and
> will generally continue to do so.
>
>> Also, I wish I'd been alerted a big breaking change was coming to the
>> pickler, because this is a missed opportunity to clean up some of the
>> inconsistencies which (as I understand it from lex spoon) are a
>> result
>> of trying to preserve compatibility across previous changes. Such
>> cruft
>> could be dropped at no cost with this change since it definitionally
>> breaks everything.
>
> Binary compatibility breaks fairly often on trunk; it is just not
> extensive like when the pickler format changes. My opinion is that
> if there is some issue you want to resolve that breaks binary
> compatibility, you should just go ahead and do it before the
> upcoming beta2/RC.
>
> -Mark

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland