forked from jmcmellen/splat
-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathperformance.txt
194 lines (152 loc) · 8.88 KB
/
performance.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
1.0 Changes
After making the following changes to itwom3.0.c:
* removing static variables
* replacing the C++ complex-number library with a C-only version
* converting from C++ pass-by-reference to pass-by-pointer
Splat 1.4.2 was compiled with g++.
The new version was compiled with clang/clang++, except where noted.
2.0 Calculation times for PlotLRMap()
point_to_point() (ITWOM algorithm)
-------
Compute times (in seconds), including generating the ppm image:
Splat Splat MT
Cores Threads v1.4.2 Single thread Multithread
Raspberry Pi, Model A 1 1 893 929 946
Atom D510 2 4 248 255 87
Avoton C2550 4 4 94 108 30
Core i7-2700K 4 8 32 32 7
2011 iMac Core i5-2400 4 4 28 30 11
Core i3-8100 w/SSD 4 4 24 24 6
Splat-HD Splat-HD MT
Cores Threads v1.4.2 Single thread Multithread
Core i3-8100 4 4 n/a 375 102
dB loss:
The dB loss calculations for changes to the new ITM algorithm all are within 0.1 dB
of the 1.4.2 version's implementation.
point_to_point_ITM() (L-R algorithm)
-------
Compute times (in seconds), including generating the ppm image:
Splat Splat MT
Cores Threads v1.4.2 Single thread Multithread
Raspberry Pi, Model A 1 1 676 634 645
Atom D510 2 4 180 209 66
Avoton C2550 4 4 70 84 23
Core i7-2700K 4 8 23 23 6
2011 iMac Core i5-2400 4 4 22 21 9
Core i3-8100 w/SSD 4 4 18 17 4
Splat-HD Splat-HD MT
Cores Threads v1.4.2 Single thread Multithread
Core i3-8100 w/SSD 4 4 n/a 189 56
dB loss:
With the modifications, this mostly stays under 0.1 dB loss difference in its calculations,
with a few exceptions. Those mostly stay under 0.5 dB, and all stay under 2 dB.
Compiler flags
-------
The compiler and fast-math options make a big difference. For instance, PlotLRMap() generates
over 1.5M points. Using the new code with the ITM (L-R) algorithm in single threaded mode on
the 2011 iMac, we have:
gcc/g++, regular: 21 sec, 123 different results from 1.4.2
gcc/g++, fastmath: 18 sec, 12 different results from 1.4.2
clang/clang++ regular: 20 sec, 134 different results from 1.4.2
clang/clang++ fastmath: 17 sec, 354 different results from 1.4.2
Choice of output image
--------
In all cases, the pixmap (ppm) images are the fastest to generate and write out, but consume
an order of magnitude more disk space.
jpg images are the second fastest to generate and write out, and the smallest file size - about
1/20th of the ppm. On spinning disks in particular, the much smaller file size more than offsets
the extra compression time, resulting in a total time that more or less matches the ppm version.
On SSD's they're very slightly slower to generate than the ppms. Unfortunately, even on the highest
quality setting, the jpg images have a reduction in image quality that results in artifacts around
any text printed in the image. This is probably a good choice for including into kml files
though, where the text can be handled outside the image.
png images are a good compromise. The generation time is somewhat longer than the other two
but still relatively small, and since it's lossless the text comes out crisp. The file
sizes are about twice the jpg size, or 1/10th the ppm size.
3.0 PlotLRMap() function in call times from gprof
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
38.65 6.11 6.11 3127012 0.00 0.00 qtile
18.15 8.98 2.87 1563537 0.00 0.01 d1thx2
16.13 11.53 2.55 1563537 0.00 0.00 hzns2
6.45 12.55 1.02 3126841 0.00 0.00 z1sq2
4.11 13.20 0.65 1563537 0.00 0.01 point_to_point
2.72 13.63 0.43 6250166 0.00 0.00 saalos
2.56 14.04 0.41 20648111 0.00 0.00 GetElevation(Site)
2.34 14.41 0.37 12000 0.03 1.29 PlotLRPath(Site, Site, unsigned char, _IO_FILE*)
2.15 14.75 0.34 12016 0.03 0.07 ReadPath(Site&, Site&, Path*)
1.58 15.00 0.25 WritePPMSS(char*, unsigned char, unsigned char, unsigned char, Site*, unsigned char)
0.76 15.12 0.12 10096669 0.00 0.00 OrMask(double, double, int)
0.70 15.23 0.11 1563537 0.00 0.00 lrprop2
0.57 15.32 0.09 1563537 0.00 0.00 avar
0.51 15.40 0.08 20635918 0.00 0.00 arccos(double, double)
0.44 15.47 0.07 3108686 0.00 0.00 adiff2_init
0.38 15.53 0.06 4672223 0.00 0.00 adiff2
0.32 15.58 0.05 3127074 0.00 0.00 qerfi
0.32 15.63 0.05 1563537 0.00 0.01 qlrpfl2
0.28 15.67 0.05 1563537 0.00 0.00 GetSignal(double, double)
0.25 15.71 0.04 1581925 0.00 0.00 alos2
0.25 15.75 0.04 1563537 0.00 0.00 PutSignal(double, double, unsigned char)
0.13 15.77 0.02 1563537 0.00 0.00 PutMask(double, double, int)
0.13 15.79 0.02 7 2.86 2.86 LoadSDF_SDF(char*)
0.06 15.80 0.01 9159139 0.00 0.00 aknfe
0.06 15.81 0.01 1575553 0.00 0.00 Azimuth(Site, Site)
0.00 15.81 0.00 5633434 0.00 0.00 fht
0.00 15.81 0.00 1563537 0.00 0.00 qlrps
0.00 15.81 0.00 1563537 0.00 0.00 tcsqrt
0.00 15.81 0.00 12016 0.00 0.00 Distance(Site, Site)
0.00 15.81 0.00 16 0.00 0.07 AverageTerrain(Site, double, double, double)
0.00 15.81 0.00 7 0.00 2.86 LoadSDF(char*)
0.00 15.81 0.00 6 0.00 0.00 ReduceAngle(double)
0.00 15.81 0.00 2 0.00 0.00 ReadBearing(char*)
0.00 15.81 0.00 2 0.00 0.00 dec2dms(double)
0.00 15.81 0.00 1 0.00 0.00 LoadSDF_BZ(char*)
0.00 15.81 0.00 1 0.00 0.00 LoadSignalColors(Site)
0.00 15.81 0.00 1 0.00 0.55 haat(Site)
0.00 15.81 0.00 1 0.00 0.00 LoadPAT(char*)
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
Copyright (C) 2012-2018 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Analysis:
Clearly some work needs to be done on qtile, which is called from
both d1thx() and d1thx2().
In addition, hzns2() should be analyzed.
qlrpfl()
|-> hzns() generate horizon
|-> d1thx()
| |-> z1sq1() least-squares fit
| |-> qtile() 90% quantile
| \-> qtile() 10% quantile (on same array)
|
\-> z1sq() least-squares fit (2x longer distances)
qlrpfl2()
|-> hzns2() generate horizon
|-> d1thx2()
| |-> z1sq2() least-squares fit
| |-> qtile() 90% quantile
| \-> qtile() 10% quantile (on same array)
|
\-> z1sq2() least-squares fit (2x for transhorizon)
hoche, 1/8/19