DataStructuresTextbook/analysis.tex at main · abrosen/DataStructuresTextbook · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
\chapter{Analyzing Algorithms}
\label{chap:analysis}
Or we would look at the list, but we need to talk about Math.  Sorry for the bait-and-switch, but it will make sense shortly.

You don't need much math to be a good programmer, but if you want to be an amazing programmer, you probably need math or very math adjacent skills.


\section{Cost}
Every function, operation, algorithm, or what have you that a computer performs has a \emph{cost}.
In fact, there are always multiples costs;  we often just focus on the most important one or two costs.

What is most important depends on context.
However, in the vast majority of cases, the most important cost to focus on is \textbf{time}.
When our program is eating away at our storage resources like a hungry child slurping up spaghetti, we can always go out and buy more memory/storage/RAM.
If our program requires a large amount of energy consumption, energy is readily available from a variety of sources: batteries, power plugs, internal combustion engines, the giant fusion reactor in the sky to name a few.
When we measure cost, we need to do abstractly.  Since all computers have different configurations of hardware and software, we look at the number of operations that will be executed, not the overall elapsed time.

\subsection{Time}
A time cost is a measure of not just how long it takes a program to finish executing, bit also how the length of execution is affected by adding additional item.

Time is almost always \emph{the most important cost}. We cannot got out and buy another weeks worth of time.  You can't hand a bunch of money to the reaper and ask for a deferral. You can't buy another minute to spend with your mother\footnote{Call your mother.  She would love to hear from you.}.
Yes, processors get faster as technology marches on, but they get faster slowly and Moore's law ostensibly has its limits.
The only way to make our programs realistically run faster is to make them more efficient.  And \textbf{Big O notation} is the way we will be measuring that efficiency.


\subsection{Space}

For data structures, we will be measuring their space efficiency in terms of \textit{auxiliary} cost, in other words, how much extra space we need to use over the space used for the items itself.   To clarify, any data structure that contains $ n $ items of roughly the same size will use $ n \times \mathtt{sizeOf(item)}$ space at minimum, no matter what data structure we use.

\subsection{Energy}
%\subsection{Other costs - Bandwidth}
While not something this book concerns itself with, some programmers need to be wary of the amount of energy  an algorithm consumes.  If energy is expensive and/or battery life needs to be conserved, then choosing an energy efficient algorithm might be a better choice, even if the time or space complexity is higher.  Some examples where energy use is a large concern  is Mobile Ad-Hoc Networks (MANETs) and battery powered cameras.


\section{What is Algorithm Analysis}
% Taken from
% For Java https://runestone.academy/ns/books/published/javads/algorithm-analysis_what-is-algorithm-analysis.html?mode=browsing
% For Python https://runestone.academy/ns/books/published/pythonds3/AlgorithmAnalysis/WhatIsAlgorithmAnalysis.html?mode=browsing

It is very common for beginning computer science students to compare their programs with one another.
You may also have noticed that it is common for computer programs to look very similar, especially the simple ones. An interesting question often arises.
When two programs solve the same problem but look different, is one program better than the other?

In order to answer this question, we need to remember that there is an important difference between a program and the underlying algorithm that the program is representing.
As you have learned, \textbf{an algorithm is a generic, step-by-step list of instructions for solving a problem}.
It is a method for solving any instance of the problem so that given a particular input, the algorithm produces the desired result. A program, on the other hand, is an algorithm that has been encoded into some programming language.
There may be many programs for the same algorithm, depending on the programmer and the programming language being used.

To explore this difference further, consider the functions shown in Listing \ref{code:sumOfNJava1} and \ref{code:sumOfNPython1}. This function solves a familiar problem, computing the sum of the first $n$ integers. The algorithm uses the idea of an accumulator variable that is initialized to 0. The solution then iterates through the $n$ integers, adding each to the accumulator.  You may have also heard this referred to as a ``running sum.''

\begin{javacode}[label={code:sumOfNJava1}]{Sum Of N}
public class FindSum {
	public static long sumOfN(int n) {
		long theSum = 0;
		for (int i = 1; i <= n; i++) {
			theSum = theSum + i;
		}
		return theSum;
	}
}
\end{javacode}


\begin{pycode}[label={code:sumOfNPython1}]{Sum of N}
def sumOfN(n):
	theSum = 0
	for i in range(1, n + 1):
		theSum = theSum + i
	return theSum
\end{pycode}

Now look at the functions in Listing \ref{code:sumOfNJava2} and \ref{code:sumOfNPython2}. At first glance it may look strange, but upon further inspection you can see that this function is essentially doing the same thing as the previous one. The reason this is not obvious is poor coding. We did not use good identifier names to assist with readability, and we used an extra assignment statement that was not really necessary during the accumulation step.

\begin{javacode}[label={code:sumOfNJava2}]{Sum of N 2 - Obsfucated}
public class FindSum2 {
	public static long foo(int tom) {
		long fred = 0;
		for (int nancy = 1; nancy <= tom; nancy++) {
			long joanne = nancy;
			fred = fred + joanne;
		}
		return fred;
	}

	public static void main(String[] args) {
		System.out.println(foo(10));
	}
}
\end{javacode}

\begin{pycode}[label={code:sumOfNPython2}]{Sum of N 2 - Obsfucated}
def foo(tom):
	fred = 0
	for bill in range(1, tom + 1):
		barney = bill
		fred = fred + barney

	return fred

print(foo(10))
\end{pycode}

%This block from the Java book
The question we raised earlier asked whether one method is better than another. The answer depends on your criteria. The function \texttt{sumOfN} is certainly better than the function \texttt{foo} if you are concerned with readability.
In fact, you have probably seen many examples of this in your introductory programming course since one of the goals there is to help you write programs that are easy to read and easy to understand.
In this course, however, we are also interested in characterizing the algorithm itself. (We certainly hope that you will continue to strive to write readable, understandable code.)

Algorithm analysis is concerned with comparing algorithms based upon the amount of computing resources that each algorithm uses.
We want to be able to consider two algorithms and say that one is better than the other because it is more efficient in its use of those resources or perhaps because it simply uses fewer. From this perspective, the two methods above seem very similar. They both use essentially the same algorithm to solve the summation problem.

At this point, it is important to think more about what we really mean by computing resources. There are two different ways to look at this. One way is to consider the amount of space or memory an algorithm requires to solve the problem. The amount of space required by a problem solution is typically dictated by the problem instance itself. Every so often, however, there are algorithms that have very specific space requirements, and in those cases we will be very careful to explain the variations.

As an alternative to space requirements, we can analyze and compare algorithms based on the amount of time they require to execute. This measure is sometimes referred to as the execution time or running time of the algorithm. One way we can measure the execution time for the function sumOfN is to do a benchmark analysis. This means that we will track the actual time required for the program to compute its result.
In most languages, we can benchmark a function by noting the starting time and ending time of the program or function we are executing.
For Java, the \texttt{System} module contains a method called \texttt{nanoTime} that will return the amount of time that the Java virtual machine has been running, in nanoseconds.
For Python, the \texttt{time} module has a \texttt{time} function\footnote{I love Python, but sometimes the repetitive naming is vexing.} which returns the number of seconds since January 1, 00:00:00 UTC.\footnote{This is commonly  referred to as Unix time and is extremely common.}

By using either of these tools to track the start and finish times and then computing the difference, we can get the number of seconds (fractions in most cases) for execution.

Listing \ref{code:timingSumJava} and \ref{code:timingSumPython} lets you enter the number you want to sum up to $n$. It then invokes the sumOfN method 25 times and calculates the time required for each trial:

\begin{javacode}[label={code:timingSumJava}]{Timing Sum of N in Java}
import java.util.Scanner;

public class Timing {
	public static long sumOfN(long n) {
		long theSum = 0;
		for (int i = 1; i <= n; i++) {
			theSum = theSum + i;
		}
		return theSum;
	}

	public static void main(String[] args) {

		Scanner input = new Scanner(System.in);
		System.out.print("Find sum from 1 to n: ");
		long n = input.nextInt();

		for (int trial = 0; trial < 25; trial++) {
			long startTime = System.nanoTime();
			long result = sumOfN(n);

			double elapsed = (System.nanoTime() - startTime) / 1.0E9;
			System.out.printf("Trial %d: Sum %d: time %.6f sec.%n",
			trial, result, elapsed);
		}
	}
}
\end{javacode}


\begin{pycode}[label={code:timingSumPython}]{Timing Sum of N in Python}
import time

def sumOfN(n):
	start = time.time()

	the_sum = 0
	for i in range(1, n + 1):
		the_sum = the_sum + i

	end = time.time()

return the_sum, end - start

\end{pycode}

Listing \ref{code:timingSumJava} shows the original sumOfN function with the timing calls embedded before and after the summation. The function returns a tuple consisting of the result and the amount of time (in seconds) required for the calculation. Here is start and end of the output when we enter 10000 for the limit of the sum using the Java code:

\begin{verbatim}
Trial 0: Sum 50005000: time 0.003886 sec.
Trial 1: Sum 50005000: time 0.004009 sec.
Trial 2: Sum 50005000: time 0.000186 sec.
Trial 3: Sum 50005000: time 0.000185 sec.
...
Trial 20: Sum 50005000: time 0.000125 sec.
Trial 21: Sum 50005000: time 0.000124 sec.
Trial 22: Sum 50005000: time 0.000125 sec.
Trial 23: Sum 50005000: time 0.000124 sec.
Trial 24: Sum 50005000: time 0.000124 sec.
\end{verbatim}


Why does the time go down from .003886 seconds to .000124? Because the Java Virtual machine and the computer hardware itself cache results, keeping them in memory for future access. We run the trial loop 25 times in order to give the cache time to stabilize.\footnote{The Python results are similar, minus the caching.}

We discover that the time is fairly consistent and it takes on average about 0.000125 seconds to execute that code. What if we run the function adding the first 100,000 integers? (Showing only the final five trials)

\begin{verbatim}
Trial 20: Sum 5000050000: time 0.001225 sec.
Trial 21: Sum 5000050000: time 0.001226 sec.
Trial 22: Sum 5000050000: time 0.001225 sec.
Trial 23: Sum 5000050000: time 0.001224 sec.
Trial 24: Sum 5000050000: time 0.001224 sec.
\end{verbatim}


Again, the time required for each run, although longer, is very consistent, averaging about 10 times more seconds. For $n = 1,000,000$  we get:

\begin{verbatim}
Trial 20: Sum 500000500000: time 0.012350 sec.
Trial 21: Sum 500000500000: time 0.012411 sec.
Trial 22: Sum 500000500000: time 0.012353 sec.
Trial 23: Sum 500000500000: time 0.012443 sec.
Trial 24: Sum 500000500000: time 0.012447 sec.
\end{verbatim}

In this case, the average again turns out to be about 10 times the previous experiment.

Now consider Listings \ref{code:sumOfNImprovedJava} and \ref{code:sumOfNImprovedPython}, which shows a different means of solving the summation problem. This method, sumOfNImproved, takes advantage of a closed equation $\sum_{i=1}^{n} i = 1+2+3+\dots+(n-1)+n =  \frac {(n)(n+1)}{2}$ to compute the sum of the first $n$ integers without iterating\footnote{This sequence of numbers is the \textbf{Triangular Number Series} and shows up a lot in analysis.}.

\begin{javacode}[label={code:sumOfNImprovedJava}]{An Improvement To Constant Time}
public static long sumOfNImproved(long n) {
	long theSum = n * (n + 1) / 2;
	return theSum;
}
\end{javacode}


\begin{pycode}[label={code:sumOfNImprovedPython}]{An Improvement To Constant Time}
def sumOfNImproved(n):
	return (n * (n + 1)) / 2
print(sumOfNImproved(10))
\end{pycode}

If we do the same benchmark measurement with this revised code, using five different values for n (10,000, 100,000, 1,000,000, 10,000,000, and 100,000,000), we get the following results from averaging the last five trials:
\begin{verbatim}
Sum 50005000:       time 0.0000088 sec.
Sum 5000050000:     time 0.0000092 sec.
Sum 500000500000:   time 0.0000082 sec.
Sum 50000005000000: time 0.0000078 sec.
\end{verbatim}

There are two important things to notice about this output. First, the times recorded above are shorter than any of the previous examples. Second, they are very consistent no matter what the value of n. It appears that \texttt{sumOfNImproved} is hardly impacted by the number of integers being added.


But what does this benchmark really tell us? Intuitively, we can see that the iterative solutions seem to be doing more work since some program steps are being repeated. This is likely the reason it is taking longer. Also, the time required for the iterative solution seems to increase as we increase the value of n. However, if we ran the same function on a different computer or used a different method language, we would likely get different results. It could take even longer to perform \texttt{sumOfNImproved} if the computer were older.

We need a better way to characterize these algorithms with respect to execution time. The benchmark technique computes the actual time to execute. It does not really provide us with a useful measurement because it is dependent on a particular machine, program, time of day, compiler, and programming language. Instead, we would like to have a characterization that is independent of the program or computer being used. This measure would then be useful for judging the algorithm alone and could be used to compare algorithms across implementations.


\section{Big O Notation}
% From https://runestone.academy/ns/books/published/javads/algorithm-analysis_big-o-notation.html?mode=browsing
When trying to characterize an algorithm’s efficiency in terms of execution time, independent of any particular program or computer, it is important to quantify the number of operations or steps that the algorithm will require. If each of these steps is considered to be a basic unit of computation, then the execution time for an algorithm can be expressed as the number of steps required to solve the problem. Deciding on an appropriate basic unit of computation can be a complicated problem and will depend on how the algorithm is implemented.

A good basic unit of computation for comparing the summation algorithms shown earlier might be the number of assignment statements performed to compute the sum. In the function sumOfN, the number of assignment statements is 1 (
) plus the value of n (the number of times we perform ). We can denote this by a function, call it , where . The parameter n is often referred to as the ``size of the problem,'' and we can read this as “ is the time it takes to solve a problem of size , namely
steps.”

In the summation functions given above, it makes sense to use the number of terms in the summation to denote the size of the problem. We can then say that the sum of the first 100,000 integers is a bigger instance of the summation problem than the sum of the first 1,000. Because of this, it might seem reasonable that the time required to solve the larger case would be greater than for the smaller case. Our goal then is to show how the algorithm’s execution time changes with respect to the size of the problem.

Computer scientists prefer to take this analysis technique one step further. It turns out that the exact number of operations is not as important as determining the most dominant part of the
function. In other words, as the problem gets larger, some portion of the function tends to overpower the rest. This dominant term is what, in the end, is used for comparison. The order of magnitude function describes the part of that increases the fastest as the value of n increases. Order of magnitude is often called Big O notation (O stands for order) and written as . It provides a useful approximation of the actual number of steps in the computation. The function provides a simple representation of the dominant part of the original .

In the above example, . As n gets larger, the constant 1 will become less and less significant to the final result. If we are looking for an approximation for , then we can drop the 1 and simply say that the running time is . It is important to note that the 1 is certainly significant for .
However, as n gets large, our approximation will be just as accurate without it.

As another example, suppose that for some algorithm, the exact number of steps is .
When n is small, say 1 or 2, the constant 1005 seems to be the dominant part of the function. However, as n gets larger, the term becomes the most important. In fact, when n is really large, the other two terms become insignificant in the role that they play in determining the final result. Again, to approximate as n gets large, we can ignore the other terms and focus on . In addition, the coefficient becomes insignificant as n gets large. We would say then that the function has an order of magnitude , or simply that it is .

Although we do not see this in the summation example, sometimes the performance of an algorithm depends on the exact values of the data rather than simply the size of the problem. For these kinds of algorithms we need to characterize their performance in terms of best-case, worst-case, or average-case performance. The worst-case performance refers to a particular data set where the algorithm performs especially poorly, whereas a different data set for the exact same algorithm might have extraordinarily good (best-case) performance. However, in most cases the algorithm performs somewhere in between these two extremes (average-case performance). It is important for a computer scientist to understand these distinctions so they are not misled by one particular case.


A number of very common order of magnitude functions will come up over and over as you study algorithms. These are shown in Table 2.3.1. In order to decide which of these functions is the dominant part of any
function, we must see how they compare with one another as n gets large.

TABLE GOES Here


Figure 2.3.2 shows graphs of the common functions from Table 2.3.1. Notice that when n is small, the functions are not very well defined with respect to one another. It is hard to tell which is dominant. However, as n grows, there is a definite relationship and it is easy to see how they compare with one another.

PLOT OF COMMON FUNCTIONS GOES HERE

As a final example, suppose that we have the fragment of Java code shown in Listing 2.3.3. Although this program does not really do anything, it is instructive to see how we can take actual code and analyze performance.
\begin{minted}[escapeinside=!!]{Java}
int a = 5;
int b = 6;
int c = 10;
int n = 1000;
for (int i = 0; i < n; i++) {
	for (int j = 0; j < n; j++) {
		int x = i * i;
		int y = j * j;
		int z = i * j;
	}
}

for (int k = 0; k < n; k++) {
	int w = a * k + 45;
	int v = b * b;
}
int d = 33;
\end{minted}
Listing 2.3.3.


The number of assignment operations is the sum of four terms. The first term is the constant 4, representing the four assignment statements at the start of the fragment. The second term is ,
since there are three statements that are performed times due to the nested iteration. The third term is , two statements iterated n times. Finally, the fourth term is the constant 1, representing the final assignment statement. This gives us . By looking at the exponents, we can see that the term will be dominant and therefore this fragment of code is . Note that all of the other terms as well as the coefficient on the dominant term can be ignored as n grows larger.


GRAPH COMPARING T(N)

Figure 2.3.4 shows a few of the common Big O functions as they compare with the function discussed above. Note that is initially larger than the cubic function. However, as n grows, the cubic function quickly overtakes . We can also see that then follows the quadratic function as continues to grow.

\begin{itemize}
	\item What is big O

	\item  how to read it
	\item Aside about big omega and theta
	\item How wrong usage annoys mathematician
	\item refers to cost in general, but used for time usually
	\item  space complexity
	\item Common runtimes
	\item runtimes we''ll focus on now
	\item runtimes we focus on later
\end{itemize}

\subsubsection{Annoying Your Friends in Math}

Programmers are often loose with their language with Big O notation and often refer to runtime using Big O notation, imposing an upper bound on runtime.  They do this even when they ar

Another common mistake about Big O is to assume Big O means worst case scenario and Big Omega is best case.  This is not so.  Big O and the rest are \emph{tools} to describe best, worst, or average case.

\subsection{Common Runtimes in this book}
While we've introduced many different runtimes, not all occur at the same level of frequency.


\subsection{Space Complexity}


\section{Examples with Arrays}


\begin{itemize}

	\item Retrieval  - refer back to earlier chapter for address lookup - SHow how that is constant time.
	\item Replacement
	\item Linear Search
	\item Binary Search
\end{itemize}


\subsection{Selection Sort}


\subsection{Bubble Sort}
\subsection{Insertion Sort}
\subsection{Other Sorting Algorithms}


\section{The Formal Mathematics of Big O Notation}
\section{Other Notations}


\section{When To Ignore Costs}