- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
performance: mapConserve
Mon, 2011-05-16, 16:54
In trunk, the greatest producer of garbage by a landslide is List#mapConserve. I have a feeling we can make some changes which could have a dramatic impact. Here are some statistics I gathered with yourkit probes. (I'm working on generalizing the probes so one can do this sort of thing on-the-fly.)
These are the calls to mapConserve when compiling src/library. The second column is the number of calls with lists of the given length; the third column is how many of those ended up with the list mapping onto itself, and so the original list is returned and the "provisional" result is thrown in the garbage bin. I have a number of ideas on what to do differently, but I'd like to if these stats give anyone else any ideas.
length frequency identities
------ --------- ----------
415 1 1
394 4 4
388 1 1
379 1 1
375 1 1
374 1 1
372 1 1
371 6 6
369 2 2
368 1 1
367 1 1
365 1 1
361 3 3
360 1 1
359 1 1
358 2 2
357 6 6
355 2 2
353 1 1
351 1 1
344 1 1
343 1 1
339 1 1
338 1 1
336 1 1
335 1 1
333 2 2
328 1 1
323 1 1
322 3 2
320 1 1
316 1 1
314 1 1
313 1 0
311 2 2
310 2 2
308 2 2
306 2 2
305 1 1
304 1 1
303 1 1
302 1 1
300 2 2
299 1 1
298 3 3
297 1 1
295 1 0
294 1 1
293 1 1
291 1 1
289 1 1
286 1 1
284 1 1
281 1 1
279 1 1
277 2 2
276 2 2
275 2 1
270 1 1
269 1 1
267 2 2
265 2 2
263 3 3
262 1 1
261 3 3
259 1 1
258 1 1
257 1 1
256 2 2
255 1 1
252 1 1
249 16 13
247 2 2
242 1 1
240 1 1
239 1 1
238 2 2
237 2 2
236 1 1
235 2 2
234 1 1
227 1 1
223 2 0
222 5 1
221 12 9
219 1 1
214 1 1
212 1 1
206 2 0
204 6 1
203 2 0
192 2 2
190 12 9
188 1 1
187 1 1
185 7 7
184 1 1
182 2 2
179 1 1
178 1 1
177 1 1
176 1 1
174 1 1
173 2 2
172 3 3
171 2 2
170 1 1
169 2 1
166 1 1
164 1 1
163 1 1
160 1 1
157 5 1
156 1 1
155 2 2
154 1 0
153 1 1
152 9 2
151 3 1
150 3 2
149 2 2
148 2 1
147 3 3
145 8 8
140 1 1
139 8 2
138 3 3
134 1 0
132 1 1
131 1 0
130 1 1
129 4 1
125 1 0
123 1 1
122 1 1
121 1 1
119 1 0
118 3 2
116 12 8
115 9 4
114 2 2
113 2 2
110 13 9
109 86 76
108 6 6
107 5 5
106 6 6
105 10 9
104 13 13
103 9 9
102 15 11
101 2 2
100 10 2
98 9 7
97 5 2
96 5 1
95 20 4
94 8 4
93 15 7
92 1 0
91 5 1
90 3 1
89 5 1
88 8 3
87 41 32
86 21 9
85 46 10
84 34 17
83 22 8
82 4 3
81 15 3
80 6 2
79 17 8
78 20 7
77 12 2
76 17 14
75 21 10
74 9 3
73 17 7
72 15 3
71 49 11
70 43 21
69 34 20
68 9 3
67 30 9
66 22 6
65 33 12
64 57 18
63 63 18
62 31 10
61 48 18
60 12 3
59 22 5
58 28 11
57 42 16
56 24 8
55 51 15
54 41 21
53 46 12
52 53 24
51 53 22
50 46 19
49 48 24
48 26 13
47 117 54
46 77 43
45 88 60
44 64 28
43 173 69
42 111 41
41 101 49
40 101 54
39 120 49
38 201 103
37 153 75
36 117 58
35 89 59
34 137 75
33 179 110
32 69 46
31 247 139
30 202 120
29 218 128
28 203 112
27 227 133
26 177 82
25 271 139
24 353 185
23 621 344
22 881 593
21 877 598
20 1029 659
19 1038 651
18 1106 702
17 1097 678
16 1176 719
15 1498 936
14 1469 944
13 1510 948
12 3487 2035
11 2318 1475
10 5241 4184
9 5440 3351
8 5317 3537
7 9497 6652
6 15611 10398
5 31154 23846
4 54891 39904
3 164982 87574
2 440863 297020
1 2247097 1480272
0 4048597 4048597
Callers passing lists with 100+ elements:
scala.tools.nsc.ast.Trees$Transformer.transformTrees(Trees.scala:876) 329
scala.tools.nsc.ast.Trees$Transformer.transformStats(Trees.scala:892) 121
scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:2178) 23
scala.tools.nsc.typechecker.Typers$Typer.typedArrayValue$1(Typers.scala:3099) 1
Mon, 2011-05-16, 17:17
#2
Re: performance: mapConserve
On 5/16/11 9:02 AM, Adriaan Moors wrote:
> One other statistic that comes to (my not-tuned-for-profiling) mind: how
> long does it generally take before we find out we're not mapping the
> identity function?
>
> That could steer the obvious lazy allocation optimisation, where we
> don't allocate a new list until we hit proof that it wasn't the identity
> after all (at which point you'd take a hit, having to copy the
> "identity-prefix" of the old list to the newly allocated list)
Looking more closely I see it already does this. Which makes more
sense, but then how is it producing so freaking much garbage? It passes
null on the first call and keeps passing null until some element doesn't
map to itself, and only then allocates the ListBuffer.
Mon, 2011-05-16, 19:27
#3
RE: performance: mapConserve
What is the metric of "garbage production"? Presumably we are talking about the # of instances created under the method which are subsequently garbage (as opposed to # of bytes)?
Obviously much/most/all data that passes through a program is ultimately garbage, so presumably you want some measure whether the garbage is unnecessary or not. Might mapConserve just be the poor method which ends up taking the brunt of the fact that it gets called more than map in the code's hot loops?
For example, with lists of length 1, there are ~700k invocations (in your figures) that result in a new List being created. This results in at least 7 (I think) object allocations, one of which is f(a) which is not, of course, identical to "a" but which (presumably) is unavoidable in the general scheme of things. Still, this does not seem like very much
> Date: Mon, 16 May 2011 08:51:02 -0700
> From: paulp@improving.org
> To: scala-internals@googlegroups.com
> Subject: [scala-internals] performance: mapConserve
>
> In trunk, the greatest producer of garbage by a landslide is List#mapConserve.
Obviously much/most/all data that passes through a program is ultimately garbage, so presumably you want some measure whether the garbage is unnecessary or not. Might mapConserve just be the poor method which ends up taking the brunt of the fact that it gets called more than map in the code's hot loops?
For example, with lists of length 1, there are ~700k invocations (in your figures) that result in a new List being created. This results in at least 7 (I think) object allocations, one of which is f(a) which is not, of course, identical to "a" but which (presumably) is unavoidable in the general scheme of things. Still, this does not seem like very much
> Date: Mon, 16 May 2011 08:51:02 -0700
> From: paulp@improving.org
> To: scala-internals@googlegroups.com
> Subject: [scala-internals] performance: mapConserve
>
> In trunk, the greatest producer of garbage by a landslide is List#mapConserve.
That could steer the obvious lazy allocation optimisation, where we don't allocate a new list until we hit proof that it wasn't the identity after all (at which point you'd take a hit, having to copy the "identity-prefix" of the old list to the newly allocated list)
cheersadriaan
On Mon, May 16, 2011 at 5:51 PM, Paul Phillips <paulp@improving.org> wrote: