This page is no longer maintained — Please continue to the home page at www.scala-lang.org

performance: mapConserve

3 replies
extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.

In trunk, the greatest producer of garbage by a landslide is List#mapConserve. I have a feeling we can make some changes which could have a dramatic impact. Here are some statistics I gathered with yourkit probes. (I'm working on generalizing the probes so one can do this sort of thing on-the-fly.)

These are the calls to mapConserve when compiling src/library. The second column is the number of calls with lists of the given length; the third column is how many of those ended up with the list mapping onto itself, and so the original list is returned and the "provisional" result is thrown in the garbage bin. I have a number of ideas on what to do differently, but I'd like to if these stats give anyone else any ideas.

length frequency identities
------ --------- ----------
415 1 1
394 4 4
388 1 1
379 1 1
375 1 1
374 1 1
372 1 1
371 6 6
369 2 2
368 1 1
367 1 1
365 1 1
361 3 3
360 1 1
359 1 1
358 2 2
357 6 6
355 2 2
353 1 1
351 1 1
344 1 1
343 1 1
339 1 1
338 1 1
336 1 1
335 1 1
333 2 2
328 1 1
323 1 1
322 3 2
320 1 1
316 1 1
314 1 1
313 1 0
311 2 2
310 2 2
308 2 2
306 2 2
305 1 1
304 1 1
303 1 1
302 1 1
300 2 2
299 1 1
298 3 3
297 1 1
295 1 0
294 1 1
293 1 1
291 1 1
289 1 1
286 1 1
284 1 1
281 1 1
279 1 1
277 2 2
276 2 2
275 2 1
270 1 1
269 1 1
267 2 2
265 2 2
263 3 3
262 1 1
261 3 3
259 1 1
258 1 1
257 1 1
256 2 2
255 1 1
252 1 1
249 16 13
247 2 2
242 1 1
240 1 1
239 1 1
238 2 2
237 2 2
236 1 1
235 2 2
234 1 1
227 1 1
223 2 0
222 5 1
221 12 9
219 1 1
214 1 1
212 1 1
206 2 0
204 6 1
203 2 0
192 2 2
190 12 9
188 1 1
187 1 1
185 7 7
184 1 1
182 2 2
179 1 1
178 1 1
177 1 1
176 1 1
174 1 1
173 2 2
172 3 3
171 2 2
170 1 1
169 2 1
166 1 1
164 1 1
163 1 1
160 1 1
157 5 1
156 1 1
155 2 2
154 1 0
153 1 1
152 9 2
151 3 1
150 3 2
149 2 2
148 2 1
147 3 3
145 8 8
140 1 1
139 8 2
138 3 3
134 1 0
132 1 1
131 1 0
130 1 1
129 4 1
125 1 0
123 1 1
122 1 1
121 1 1
119 1 0
118 3 2
116 12 8
115 9 4
114 2 2
113 2 2
110 13 9
109 86 76
108 6 6
107 5 5
106 6 6
105 10 9
104 13 13
103 9 9
102 15 11
101 2 2
100 10 2
98 9 7
97 5 2
96 5 1
95 20 4
94 8 4
93 15 7
92 1 0
91 5 1
90 3 1
89 5 1
88 8 3
87 41 32
86 21 9
85 46 10
84 34 17
83 22 8
82 4 3
81 15 3
80 6 2
79 17 8
78 20 7
77 12 2
76 17 14
75 21 10
74 9 3
73 17 7
72 15 3
71 49 11
70 43 21
69 34 20
68 9 3
67 30 9
66 22 6
65 33 12
64 57 18
63 63 18
62 31 10
61 48 18
60 12 3
59 22 5
58 28 11
57 42 16
56 24 8
55 51 15
54 41 21
53 46 12
52 53 24
51 53 22
50 46 19
49 48 24
48 26 13
47 117 54
46 77 43
45 88 60
44 64 28
43 173 69
42 111 41
41 101 49
40 101 54
39 120 49
38 201 103
37 153 75
36 117 58
35 89 59
34 137 75
33 179 110
32 69 46
31 247 139
30 202 120
29 218 128
28 203 112
27 227 133
26 177 82
25 271 139
24 353 185
23 621 344
22 881 593
21 877 598
20 1029 659
19 1038 651
18 1106 702
17 1097 678
16 1176 719
15 1498 936
14 1469 944
13 1510 948
12 3487 2035
11 2318 1475
10 5241 4184
9 5440 3351
8 5317 3537
7 9497 6652
6 15611 10398
5 31154 23846
4 54891 39904
3 164982 87574
2 440863 297020
1 2247097 1480272
0 4048597 4048597

Callers passing lists with 100+ elements:
scala.tools.nsc.ast.Trees$Transformer.transformTrees(Trees.scala:876) 329
scala.tools.nsc.ast.Trees$Transformer.transformStats(Trees.scala:892) 121
scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:2178) 23
scala.tools.nsc.typechecker.Typers$Typer.typedArrayValue$1(Typers.scala:3099) 1

adriaanm
Joined: 2010-02-08,
User offline. Last seen 31 weeks 4 days ago.
Re: performance: mapConserve
One other statistic that comes to (my not-tuned-for-profiling) mind: how long does it generally take before we find out we're not mapping the identity function?  
That could steer the obvious lazy allocation optimisation, where we don't allocate a new list until we hit proof that it wasn't the identity after all (at which point you'd take a hit, having to copy the "identity-prefix" of the old list to the newly allocated list)
cheersadriaan

On Mon, May 16, 2011 at 5:51 PM, Paul Phillips <paulp@improving.org> wrote:
In trunk, the greatest producer of garbage by a landslide is List#mapConserve.  I have a feeling we can make some changes which could have a dramatic impact.  Here are some statistics I gathered with yourkit probes.  (I'm working on generalizing the probes so one can do this sort of thing on-the-fly.)

These are the calls to mapConserve when compiling src/library.  The second column is the number of calls with lists of the given length; the third column is how many of those ended up with the list mapping onto itself, and so the original list is returned and the "provisional" result is thrown in the garbage bin.  I have a number of ideas on what to do differently, but I'd like to if these stats give anyone else any ideas.


length  frequency  identities
------  ---------  ----------
  415          1  1
  394          4  4
  388          1  1
  379          1  1
  375          1  1
  374          1  1
  372          1  1
  371          6  6
  369          2  2
  368          1  1
  367          1  1
  365          1  1
  361          3  3
  360          1  1
  359          1  1
  358          2  2
  357          6  6
  355          2  2
  353          1  1
  351          1  1
  344          1  1
  343          1  1
  339          1  1
  338          1  1
  336          1  1
  335          1  1
  333          2  2
  328          1  1
  323          1  1
  322          3  2
  320          1  1
  316          1  1
  314          1  1
  313          1  0
  311          2  2
  310          2  2
  308          2  2
  306          2  2
  305          1  1
  304          1  1
  303          1  1
  302          1  1
  300          2  2
  299          1  1
  298          3  3
  297          1  1
  295          1  0
  294          1  1
  293          1  1
  291          1  1
  289          1  1
  286          1  1
  284          1  1
  281          1  1
  279          1  1
  277          2  2
  276          2  2
  275          2  1
  270          1  1
  269          1  1
  267          2  2
  265          2  2
  263          3  3
  262          1  1
  261          3  3
  259          1  1
  258          1  1
  257          1  1
  256          2  2
  255          1  1
  252          1  1
  249         16  13
  247          2  2
  242          1  1
  240          1  1
  239          1  1
  238          2  2
  237          2  2
  236          1  1
  235          2  2
  234          1  1
  227          1  1
  223          2  0
  222          5  1
  221         12  9
  219          1  1
  214          1  1
  212          1  1
  206          2  0
  204          6  1
  203          2  0
  192          2  2
  190         12  9
  188          1  1
  187          1  1
  185          7  7
  184          1  1
  182          2  2
  179          1  1
  178          1  1
  177          1  1
  176          1  1
  174          1  1
  173          2  2
  172          3  3
  171          2  2
  170          1  1
  169          2  1
  166          1  1
  164          1  1
  163          1  1
  160          1  1
  157          5  1
  156          1  1
  155          2  2
  154          1  0
  153          1  1
  152          9  2
  151          3  1
  150          3  2
  149          2  2
  148          2  1
  147          3  3
  145          8  8
  140          1  1
  139          8  2
  138          3  3
  134          1  0
  132          1  1
  131          1  0
  130          1  1
  129          4  1
  125          1  0
  123          1  1
  122          1  1
  121          1  1
  119          1  0
  118          3  2
  116         12  8
  115          9  4
  114          2  2
  113          2  2
  110         13  9
  109         86  76
  108          6  6
  107          5  5
  106          6  6
  105         10  9
  104         13  13
  103          9  9
  102         15  11
  101          2  2
  100         10  2
   98          9  7
   97          5  2
   96          5  1
   95         20  4
   94          8  4
   93         15  7
   92          1  0
   91          5  1
   90          3  1
   89          5  1
   88          8  3
   87         41  32
   86         21  9
   85         46  10
   84         34  17
   83         22  8
   82          4  3
   81         15  3
   80          6  2
   79         17  8
   78         20  7
   77         12  2
   76         17  14
   75         21  10
   74          9  3
   73         17  7
   72         15  3
   71         49  11
   70         43  21
   69         34  20
   68          9  3
   67         30  9
   66         22  6
   65         33  12
   64         57  18
   63         63  18
   62         31  10
   61         48  18
   60         12  3
   59         22  5
   58         28  11
   57         42  16
   56         24  8
   55         51  15
   54         41  21
   53         46  12
   52         53  24
   51         53  22
   50         46  19
   49         48  24
   48         26  13
   47        117  54
   46         77  43
   45         88  60
   44         64  28
   43        173  69
   42        111  41
   41        101  49
   40        101  54
   39        120  49
   38        201  103
   37        153  75
   36        117  58
   35         89  59
   34        137  75
   33        179  110
   32         69  46
   31        247  139
   30        202  120
   29        218  128
   28        203  112
   27        227  133
   26        177  82
   25        271  139
   24        353  185
   23        621  344
   22        881  593
   21        877  598
   20       1029  659
   19       1038  651
   18       1106  702
   17       1097  678
   16       1176  719
   15       1498  936
   14       1469  944
   13       1510  948
   12       3487  2035
   11       2318  1475
   10       5241  4184
    9       5440  3351
    8       5317  3537
    7       9497  6652
    6      15611  10398
    5      31154  23846
    4      54891  39904
    3     164982  87574
    2     440863  297020
    1    2247097  1480272
    0    4048597  4048597

Callers passing lists with 100+ elements:
scala.tools.nsc.ast.Trees$Transformer.transformTrees(Trees.scala:876) 329
scala.tools.nsc.ast.Trees$Transformer.transformStats(Trees.scala:892) 121
scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:2178) 23
scala.tools.nsc.typechecker.Typers$Typer.typedArrayValue$1(Typers.scala:3099) 1


extempore
Joined: 2008-12-17,
User offline. Last seen 35 weeks 3 days ago.
Re: performance: mapConserve

On 5/16/11 9:02 AM, Adriaan Moors wrote:
> One other statistic that comes to (my not-tuned-for-profiling) mind: how
> long does it generally take before we find out we're not mapping the
> identity function?
>
> That could steer the obvious lazy allocation optimisation, where we
> don't allocate a new list until we hit proof that it wasn't the identity
> after all (at which point you'd take a hit, having to copy the
> "identity-prefix" of the old list to the newly allocated list)

Looking more closely I see it already does this. Which makes more
sense, but then how is it producing so freaking much garbage? It passes
null on the first call and keeps passing null until some element doesn't
map to itself, and only then allocates the ListBuffer.

Chris Marshall
Joined: 2009-06-17,
User offline. Last seen 44 weeks 3 days ago.
RE: performance: mapConserve
What is the metric of "garbage production"? Presumably we are talking about the # of instances created under the method which are subsequently garbage (as opposed to # of bytes)? 
Obviously much/most/all data that passes through a program is ultimately garbage, so presumably you want some measure whether the garbage is unnecessary or not. Might mapConserve just be the poor method which ends up taking the brunt of the fact that it gets called more than map in the code's hot loops? 
For example, with lists of length 1, there are ~700k invocations (in your figures) that result in a new List being created. This results in at least 7 (I think) object allocations, one of which is f(a) which is not, of course, identical to "a" but which (presumably) is unavoidable in the general scheme of things. Still, this does not seem like very much
> Date: Mon, 16 May 2011 08:51:02 -0700
> From: paulp@improving.org
> To: scala-internals@googlegroups.com
> Subject: [scala-internals] performance: mapConserve
>
> In trunk, the greatest producer of garbage by a landslide is List#mapConserve. 

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland