-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy patharraylists.tex
More file actions
856 lines (589 loc) · 33.7 KB
/
arraylists.tex
File metadata and controls
856 lines (589 loc) · 33.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
\chapter{Array Lists}
\label{chap:arraylist}
%TODO Find the Ur List and Ur dynamic array.
%TODO Consider moving to a different chapter.
% I actually hate everything up until List Operations
%TODO Replace everything below until List Operations the script I used from my video
The first data structure we will be studying is the list.
The list is by far the most relatable data structure, as humans deal with lists on a regular basis.
\section{What is a List?}
When you get right down to it, lists are defined by order.
We don't have to take advantage of this order, but its there.
Populated lists have a first item and they have a last item.
Let's start by looking at two non-computer examples of lists.
Take a look at this quest below from a hypothetical fantasy game:
\subsubsection*{Quest: Slay the Dragon of Doom}
\begin{itemize}
\item Get Sword of Dragonslaying
\item Locate the map to Dragon Lair of Doom
\item Travel to the Dragon Lair of Doom
\item Slay the Dragon of Doom
\item Return to the Castle
\end{itemize}
Here, the order is implied by the contents of the list - you can't beat the dragon without the macguffin and you certainly can't fight it without being able to find it.
Generally speaking, going up against a dragon without any preparation is foolhardy in the extreme, but I digress.
Thus, you must get the special sword\footnote{What if its possible to get the map before the sword? We'll see much later this kind of quest and it's requirements are much better handled by a directed acyclic graph in Chapter \ref{chap-graphs}, but this example is fine for teaching lists.} first, and you must get the map to find the lair before you can physically travel there.
\subsection*{shopping list example}
While lists are defined by order, we don't necessarily ascribe any meaning to the order.
Take a look at the shopping list below:
<Shopping List>
While bread is the first item on this list, being the first item in the shopping list in this case has no special meaning. It's not the most important item on the list\footnote{obviously, that's the cookies}, nor is it necessarily the item I'm going to pick up first. Thus, as previously stated, while lists have an order, the whether or not order is important depends on the context.
\section{Why Should I care}
Where arrays and lists differ is that lists can grow to an arbitrary size, whereas arrays are static. Arrays can't get bigger, lists can.In this chapter, we will be discussing the \textbf{array list}, the first of two types of lists this book will teach you.
\subsection*{A note on terminology}
An \textbf{array list} is a type of list. These are sometimes called dynamic arrays.
As mentioned in Section \ref{sec-python-arrays-lists}, Python doesn't have arrays. If you've been programming in Python, you've been using an array list the entire time you've declared \texttt{[]}. They are usually just called lists rather than array lists for simplicity's sake.
I will be using the Java nomenclature for the majority of the book as this allows me to be clear about the types and implementations of data structures.
\subsection{Lists in Java}
An array list in Java is represented by the class \texttt{ArrayList}. Here is an example:
\begin{minted}{Java}
\end{minted}
\subsubsection{An Aside about interfaces}
This textbook assumes that you have already taken your requisite object oriented programming course, but in case you haven't or it's been a while, I'll review briefly here.
An interface is about as abstract as a class can get and ties deeply to how Java deals with polymorphism. In fact, an interface contains only abstract methods, which must be implemented by the inheriting class.
%TODO Check for accuracy duck typing
What about python? Python deals with polymorphism using duck typing, originating from the idiom \href{https://en.wikipedia.org/wiki/Duck\_test}{``If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck''}
\subsubsection*{Here is the source code}
\section{Generics}
You may have noticed that when we create the arraylist
\subsection{What are they?}
Before we get to deep into lists, we need to have a discussion about generics. Generics are a way of restricting and specifying what types can go into a collection.
\subsection{But Why?}
Using generics has two big purposes: strong typing and the lack of need for casting. This textbook deals with handling collections of data. In fact, the Java superclass for a lot of the topics we cover is called a \texttt{Collection}. Generics allow us to predefine what precisely the collection will hold. If we do not do so, the only thing we can assume a collection holds is objects of type \texttt{Object}.
This creates two big headaches
\begin{itemize}
\item In strongly typed languages like Java, we will need to cast objects to the appropriate type when extracting them. In duck-typed languages like Python, we rely on thoughts and prayers.
\item There are no safeguards to prevent us from inserting items of the wrong type into the collection.
\end{itemize}
\section{List Operations}
\subsection{Size} We need some easy way of knowing how big our lists are, if for no other reason than to make sure our add and remove methods can figure our their valid indices.
\subsection{Add} By default, we add items to the end of the list, but we can also add items to any index we want.
When we add an item at some specific index $ i $, the item at $ i $ and all indices to the right shift over one. In other words, what was at $ i $ is now at $ i+1 $, what was at $ i+1 $ is at $ i+2 $, and so on.
This also an understandable restriction to adding items to a list - we cannot an item to any index greater than \texttt{myList.size() +1}. Anything greater wouldn't be at the end of the list; it would be beyond it. The same goes for negative indices\footnote{Python does allow negative indexes, but we will ignore that for now}.
<possible picture showing a legal an illegal add>
We will cover this operation in more detail when we implement the add method for the arraylist
\subsection{Remove} We can remove items from a list much in the same way we can add them. When removing an item at index $i$, the
For example, in the image below, we are removing the item at index 3, the word ``cookie,'' from the list.
<image before>
C is for cookie that's good enough for me
<image after>
C is for that's good enough for me
\subsection{Get}
Get is how we retrieve our items from the list. Given an index, get will give us the value that has been stored at that index.
\subsection{Set}
\section{ArrayLists}
An array list, as you might have guessed, are lists built using \textit{arrays}.\footnote{Shockingly, many of the names we give things at this point actually make sense.}
They work by growing or shrinking the array\footnote{A lie. As you'll see we don't actually change the size of an array; we create a new array of the appropriate size and copy everything over} automatically as items are added or removed from the list, giving the illusion that the data structure can hold an arbitrary amount of data.
We'll go into the specifics of how this works in Section \ref{buildingArraylist}.
\subsubsection{Java's ArrayLists}
Java's arrayList
\subsubsection{Python's Lists}
Python's lists, such as below:
\begin{minted}{python3}
l = [1,2,3] # this is a list, not an array!
\end{minted}
are actually array lists! %TODO cite this https://docs.python.org/3/faq/design.html#how-are-lists-implemented-in-cpython
Python uses a different vocabulary for some of the methods we'll be implementing below.
For example, take the action of adding an item to a list.
Python uses the \texttt{append} method to add an item to end of the list and \texttt{insert} to put an item into the middle of the list.
Java (who's vocabulary we'll be following), uses \texttt{add} for both these contexts.
\section{Example Algorithms}
\begin{minted}{Java}
public static <E> boolean isPermutation(List<E> listA, List<E> listB) {
if(listA.size() != listB.size()) {
return false;
}
for(int i = 0; i < listA.size() ; i++){
E item = listA.get(i);
int countA = 0;
int countB = 0;
for (E element : listA) {
if(item.equals(element)){
countA++;
}
}
for (E element : listB) {
if(item.equals(element)){
countB++;
}
}
if(countA != countB) {
return false;
}
}
return true;
}
\end{minted}
\section{Building an ArrayList}
\label{buildingArraylist}
To truly understand how a data structure works we need to implement it ourselves. We will be making simpler versions of what's actually implemented in the language of your choice, but the logic and obstacles we need to overcome are the same.
\subsection{Caveats}
\subsubsection{MyArrayList.java}
We will not be implementing the \texttt{List} interface. We don't need to implement all the functions to get an understanding of how the fundamentals of an arraylist work.
Implementing the list interface would take up a hideous amount of paper and get in the way of actually understanding the code.
If you want to see how \texttt{ArrayList} actually looks, you can look at the javadoc.
\subsubsection{myArrayList.py}
For python, this will require some suspension of disbelief, as our array list will require using an array, and as previously discussed, arrays are shirked in favor of arraylists in python. We'll be using a list and pretending it's an array. Silly? Yes. But it will keep our code compact and easier to understand.
%Remember, the standard python list is built in the C programming language.
\subsection{Instance Variables}
Believe it or not, we only need to keep track of three instance variables to get our arraylist working.
\begin{enumerate}
\item[theData] We need an array to actually store the items. This is it.
\item[size] Size here refers to the total number of items we have stored in the array.
\item[capacity] This is the number of items the underlying array in our list can hold. It is the maximum size of the list before we have increase the capacity and move everything \texttt{theData} to a new array of length \texttt{capacity}. This is not strictly necessary as we can get it by querying \texttt{theData}'s length. However, making it it's own variable will help with the readability.
\end{enumerate}
It is very easy to confuse size and capacity since they both deal with counting how many elements. When I talk about \texttt{size}, I am talking about the number of items we have stored in the list we are making. Capacity, on the other hand, depends on the length of the built-in array.
\subsubsection{Java}
\begin{javacode}[label={code:MyArrayListInstanceJava}]{The Beginning}
public class MyArrayList<E> {
private E[] theData
private int size; // how many items in the list
private int capacity; // how many items the underlying array can hold
\end{javacode}
First, note the \texttt{<E>} after \texttt{MyArrayList}. This means that we're saying:
\begin{itemize}
\item MyArrayList is designed to hold a specific type of object.
\item Every \texttt{E} we see is a placeholder for some type, which will be that same across the entire lifespan of the object.
\end{itemize}
\subsubsection{Python}
In python, we will be creating our instance variables in the constructor below. We will end up with this at the end of Section \ref{arraylist-constructor}.
\begin{pycode}[label={code:MyArrayListInstancePython}]{The Beginning}
class MyArrayList(object):
def __init__(self):
self.size = 0
self.capacity = 10
self.theData = [None]*self.capacity
\end{pycode}
\subsection{Constructor}
\label{arraylist-constructor}
We need to set the variables to their initial values upon creating the arraylist.
The \texttt{size} will be 0, since we won't have any objects stored in it yet.
We will set the \texttt{initial} capacity to 10, as this is the default behavior of Java's ArrayList class.
It's a small number and thus won't create much wasted space if we don't fill up \texttt{theData}.
\texttt{theData} will be an empty array of \texttt{capacity} length.
If \texttt{theData} becomes full, we will create a bigger array to hold our items using the \texttt{reallocate()} method (Section \ref{arraylist-reallocate}) %TODO Subection Depth
\subsubsection{Java}
With our constructor, we have one line of weird black magic in order to create an Array of \texttt{E[]}'s.
\begin{minted}{Java}
public MyArrayList(){
size = 0;
capacity = 10;
theData = (E[]) new Object[10]; // this generates a warning
}
\end{minted}
So what's going on with the last line?
Typically, when creating an array, we would just say:
\begin{minted}{Java}
//doing this in the constructor gives us an error.
TYPE[] myArray = new Type[desired_size];
\end{minted}
However, Java won't let you create new \texttt{E} objects since there's no telling what the constructors will be. This rule extends to arrays of \texttt{E}, like so:
\begin{minted}{Java}
theData = new E[10];
\end{minted}
However, when creating a new empty array of objects of any type, we're just making an array of nulls which will eventually be replaced by references to objects. Thus, even though the Java compiler will yell at us about Type safety, we can instead create an array of \texttt{Object} and then tell , since all references to any types are the same size.
\begin{minted}{Java}
// creating one array of nulls and telling Java
// its another type of array of nulls.
theData = (E[]) new Object[10];
\end{minted}
Remember how Java and most modern programming languages deal with objects; if you're assigning an object to a variable, like in \texttt{Object o = new Object()}, we are storing a reference to that object.
Thus, when we add an item to a list, what really happens is we'll be adding a reference to it - the instructions on how to find it in memory.
\subsubsection{Python}
Python is fairly straightforward, with the caveat that we are pretending \texttt{theData} is an array, and not a list.
\begin{minted}{Python3}
class MyArrayList(object):
def __init__(self):
self.size = 0
self.capacity = 10
self.theData = [None]*self.capacity
\end{minted}
Since built-in lists in Python grow and shrink like we would expect a list to, we initialize \texttt{theData} with 10 \texttt{None} objects\footnote{This is the Python equivalent to the Java \texttt{null}.} to mimic the way an array would be initialized.
%\subsection{Placeholder toString}
%
%
%\subsubsection{Java}
%
%\subsubsection{Python}
\subsection{Size}
Now, we will add a size method to our list; fairly straightforward in \texttt{Java}.
\begin{minted}{Java}
public int size() {
return size;
}
\end{minted}
In Python, we can go ahead and use the built in \texttt{\_\_len\_\_} method, which can then be invoked with \texttt{len(myList)}.
\begin{minted}{Python3}
def __len__(self):
return self.size
\end{minted}
Retrieving the size of our list is always $O(1)$, as we are just accessing a variable and returning its value.
\subsection{The Add Method}
Now it's time to dig into the bulk of our code: adding items to our list.
To do this, I'll be creating two methods: one for adding to the end of the list (an extremely common operation) and one for adding at any index in the list.
In Java, we will overload these two methods and call them both add. We will have an \texttt{add( E item )} for adding to the end and an \texttt{add(int index, E item)} for every other case.
In Python, these two \texttt{add} methods are called \texttt{append} and \texttt{insert} respectively, as Python does not support method overloading.
We will be looking at the case of adding to a specific index first.
\subsubsection{Cases}
The \texttt{add} method has 5 basic parts, only three of which involve actual thinking about how to code:
\begin{enumerate}
\item Check index to see if our index in bounds
\begin{itemize}
\item If it is, crash the program.
\end{itemize}
\item Check to see if our array list has room to add a new item.
\begin{itemize}
\item If there is no room, make some!
\item How we do this is covered in Section \ref{arraylist-reallocate}.
\end{itemize}
\item Shift all the existing items from \texttt{index} to the end of the list over one index to the right. This moves all the items already in the list to their new locations.
\item Store the item.
\item Increment the size.
\end{enumerate}
Those last two steps are important but not complicated. We will go ahead and handle them now and put in comments for the other parts.
\begin{minted}{Java}
public void add(int index, E item) {
// Check the index
// do we have room?
//shift over existing items
theData[index] = item;
size++;
}
\end{minted}
\begin{minted}{Python3}
def insert(self, index: int, item):
# Check the index
# do we have room?
# shift over existing items
self.theData[index] = item
self.size += 1
\end{minted}
\subsubsection{Checking The Index}
An optional step for pedagogy, but good practice. If the index is less than, we reject it.
If the \texttt{index > size}, we reject it.
The case of \texttt{index == size} is perfectly fine, but it feels weird, since you should have the rule ``valid indexes are 0...\texttt{array\_size}'' carved into your soul by this point.
This is because the index \texttt{size} would be the next empty slot for use to put an item.
Once we insert the item, we increment the size at the step of the method.
After that, our rule about valid indexes becomes true again.
\begin{minted}{Java}
public void add(int index, E item) {
// Check the index
if(index < 0 || index > size) {
throw new IndexOutOfBoundsException("Index " +index+ " is out of bounds.");
}
// do we have room?
//shift over existing items
theData[index] = item;
size++;
}
\end{minted}
In python, we take the additional step of checking if the index is an \texttt{int}.
\begin{minted}{Python3}
def insert(self, index: int, item):
# Check the index
if not isinstance(index, int):
raise IndexError(index + " is not an integer.")
if index < 0 or index > self.size:
raise IndexError("Index " + str(index) + " is out of range.")
# do we have room?
# shift over existing items
self.theData[index] = item
self.size += 1
\end{minted}
\subsubsection{Deciding to Reallocate}
Our array is only so big; if our current \texttt{size} and \texttt{capacity} are the same, we don't have any more room.
In this situation, we call \texttt{reallocate}, which doubles\footnote{As we will see later, doubling is what we chose for our implementation, but other options exist.} our capacity.
We will solve this issue in Section \ref{arraylist-reallocate} and handwave the implementation for now.
\begin{minted}{Java}
public void add(int index, E item) {
// Check the index
if(index < 0 || index > size) {
throw new IndexOutOfBoundsException("Index " +index+ " is out of bounds.");
}
// do we have room?
if(size == capacity) {
this.reallocate();
}
//shift over existing items
theData[index] = item;
size++;
}
\end{minted}
In python, we take the additional step of checking if the index is an \texttt{int}.
\begin{minted}{Python3}
def insert(self, index: int, item):
# Check the index
if not isinstance(index, int):
raise IndexError(index + " is not an integer.")
if index < 0 or index > self.size:
raise IndexError("Index " + str(index) + " is out of range.")
# do we have room?
if self.size == self.capacity:
self.__reallocate()
# shift over existing items
self.theData[index] = item
self.size += 1
\end{minted}
\subsubsection{Shifting the Items}
As mentioned previously, if \texttt{index == size}, we will be inserting the item we want to add into the next unused slot.
\subsubsection{Reallocation Implementation}
\label{arraylist:reallocate}
When we need to grow our arraylist, can't actually physically change the size of the array \texttt{theData}; you can't change the size of an array.
So we cheat.
We create a new array twice \footnote{ The one thing worth noting is that the real implementation of a list in python, listobject.c, uses a completely different pattern than doubling the capacity. This is more complicated than we need for this book; doubling is much simpler and accomplishes what we need.} the capacity of \texttt{theData}.
We then copy everything over to the new array and then store the reference to that new array in \texttt{theData}, making it our new underlying array.
%TODO fix this code or ensure that capacity is doubled before copying
\begin{minted}{Java}
private void reallocate(){
//doubles or 1.5x capacity
//don't do +1 capacity
capacity = 2 * capacity;
E[] newData = (E[]) new Object[capacity];
for(int i = 0; i < theData.length; i++) {
newData[i] = theData[i];
}
theData = newData;
}
\end{minted}
We want to double our capacity or at least increase it by 50\%, rather than increasing it by a static number.
Consider if we increase the capacity by one each time we reallocated.
If we did that, we would have to reallocate every time we added a new item to the list.
This would mean that every time we add an item to list, add becomes a linear time - $O(n)$ - operation.
Having empty slots might seem wasteful, but the advantage is that it takes constant time to add to the end of the arraylist. This is because we don't have to shift any existing elements around. It is a classic time/space trade-off.
%TODO clean up/get help
Because reallocation is a \textit{relatively} rare event compared to adding, we don't typically take that cost into account when analyzing an algorithm with a large number of \texttt{add} commands. This is because if we do have some capacity $n$, in order to trigger reallocation with a runtime of $O(n)$, we have to do $n$ add operations first.
We can then ``spread out'' the cost of the \texttt{reallocate} operation over our \texttt{add} operations.
\subsubsection{Finished Code}
\begin{minted}{Java}
public void add(int index, E item) {
if(index < 0 || index > size) {
throw new IndexOutOfBoundsException("Index " + index + " out of bounds.");
}
if(size == capacity) {
this.reallocate(); // O(n) time...sometimes. Amortized over the cost of adding
}
for(int i = size - 1; i >= index; i--) { //If adding to the end... constant
E temp = theData[i]; // Store the item from
theData[i+1] = temp; // Move the item from
}
theData[index] = item;
size++;
}
private void reallocate(){
//doubles or 1.5x capacity
//don't do +1 capacity
capacity = 2 * capacity;
E[] newData = (E[]) new Object[capacity];
for(int i = 0; i < theData.length; i++) {
newData[i] = theData[i];
}
theData = newData;
}
\end{minted}
\begin{minted}{Python3}
def insert(self, index: int, item):
if not isinstance(index, int):
raise IndexError(index + " is not an integer.")
if index < 0 or index > self.size:
raise IndexError("Index " + str(index) + " is out of range.")
if self.size == self.capacity:
self.__reallocate()
for i in range(self.size -1, index -1, -1):
temp = self.theData[i]
self.theData[i+1] = temp
self.theData[index] = item
self.size += 1
def __reallocate(self):
self.capacity = self.capacity * 2
newData = [None] * self.capacity
for index, item in enumerate(self.theData):
newData[index] = item
self.theData = newData
\end{minted}
\subsubsection{Adding to the End}
As previously mentioned, adding to the end is an extremely common operation, so we will overload our \texttt{add} method.
If our list is provided with only an \texttt{item}, as opposed to an \texttt{item} and an \texttt{index}, we will just add that \texttt{item} to the end.
Since we already wrote a perfectly good \texttt{add} method already that we know works, we'll just have our new method call that one.
\begin{minted}{Java}
public boolean add(E item) {
this.add(size, item); // size is the last valid index
return true; // What?
}
\end{minted}
Why are we returning \texttt{true} here?
The short answer is practice and consistency with future data structures.
The long answer is any \texttt{Collection} in Java has must have an \texttt{add} method and a \texttt{List} is type of \texttt{Collection}\footnote{Our \texttt{MyArrayList} isn't technically a \texttt{Collection} since we did not implement the \texttt{List} interface, but I digress.}.
\texttt{Collection} specifies that \texttt{add} must take in an \texttt{item} and return a \texttt{boolean}.
A \texttt{true} signals the \texttt{add} is successful.
A \texttt{false} signals that we could not add the \texttt{item}.
For example, this might happen with a \texttt{Set} (Chapter \ref{chap-sets})
On the other hand, our Adding at a specific index is unique to lists, and not part of collections, and will always work. Therefore, there's no need to return a boolean.
\subsection{toString and \_str\_}
Now that we supposedly have a method for adding items into the list, the next step is to test it.
The easiest way to test it is by printing out the contents of the list.
We'll do this in the laziest way possible.
In java, that would be invoking the \texttt{Arrays.toString} method, since directly turning an array into a string gives you representation of the memory location:
\begin{minted}{Java}
public String toString(){
// return theData+""; // memory location
return Arrays.toString(theData); // import the library
}
\end{minted}
That said, implementing it ourselves gives us good practice handling a common fence-posting problem, i.e. we need to print $n$ items separated by $n-1$ commas.
\begin{minted}{Java}
public String toString(){
String output = "["+theData[0];
for (int i = 1; i < size; i++) {
output+= ", " + theData[i];
}
return output + "]";
}
\end{minted}
\begin{minted}{Python3}
def __str__(self): # second attempt
output = "["
#only include indexes from 0 to size-1
for item in self.theData[:self.size]:
output += str(item) +","
output = output[:-1] # remove the last comma
return output + "]"
\end{minted}
\subsection{Get and Set}
The \texttt{get} and \texttt{set} methods are fairly straightforward:
\begin{enumerate}
\item[\texttt{get} -] Given an \texttt{index}, retrieve the item stored at that \texttt{index}.
\item[\texttt{set} -] Given an \texttt{index}, replace the old item stored at that \texttt{index} with the provided \texttt{item}.
\end{enumerate}
%For \texttt{get}, given an \texttt{index}, retrieve the item stored at that \texttt{index}.
%For \texttt{set}, given an \texttt{index}, replace the old item stored at that \texttt{index} with the provided \texttt{item}.
\texttt{set} has one additional quirk, we also want to return the old item we're replacing, just in case the programmer wants to doing something with the old item.
This would obviate the need for pairing a \texttt{get} and \texttt{set} call with each other if we want to replace the old item, but do something else with it.
For both \texttt{get} and \texttt{set}, we want to throw some kind of error if the provided index is out of bounds.
\subsubsection{Java}
Our \texttt{get} is fairly straightforward, but feel free to give more information with the error.
\begin{minted}{Java}
public E get(int index) {
if(index < 0 || index >= size) {
throw new IndexOutOfBoundsException("Index " + index + " out of bounds.");
}
return theData[index];
}
\end{minted}
The same goes for our \texttt{set} method.
\begin{minted}{Java}
public E set(int index, E item) {
if(index < 0 || index >= size) {
throw new IndexOutOfBoundsException("Index " + index + " out of bounds.");
}
E oldItem = theData[index];
theData[index] = item;
return oldItem;
}
\end{minted}
\subsubsection{Python}
Python supports negative indices.
We can take advantage of some of the method calls built into python to make our \texttt{myarraylist} support indexing.
\begin{minted}{Python3}
def __getitem__(self, index):
if index < 0:
index = index % self.size # yes!
# If you're confused, test modulo on
# negative numbers in python.
if index >= self.size:
raise IndexError("Index " + str(index) + " is out of range.")
return self.theData[index]
\end{minted}
This method, as written, will return \texttt{None} if the user tries to access an index that is within in the bounds of the capacity but above the size.
The same thing will happen if we use negative indices.
While this is fine for our pedagogical programming purposes, prudence posits proactive protection. That is to say, we should ask ``how do we prevent out users from accidentally getting the wrong data when they should be getting an error.''
Below, we will add two index checks.
\begin{minted}{Python3}
def __getitem__(self, index):
if index < 0:
index = index % self.size # yes!
# If you're confused, test modulo on
# negative numbers in python.
if index >= self.size:
raise IndexError("Index " + str(index) + " is out of range.")
return self.theData[index]
\end{minted}
\subsection{Remove}
The code for \texttt{remove} is almost identical in structure to \texttt{add}, but without a case for checking if there's room. Since we are removing, we don't have to worry about running out of room. We also make sure we save the item we are removing and return it, for the same reason we do for the \texttt{set} method.
\begin{itemize}
\item Check if \texttt{index} is valid.
\item Save the item at \texttt{index} for later.
\item Shift each item to the right of index over (indices greater than \texttt{index}) one to the left. This will overwrite what's stored at index, which is why we saved it.
\item Decrement the size.
\item Return the saved item.
\end{itemize}
A word of warning with \texttt{remove} operations on ``real'' implementations. Removing items from a list while you are iterating over it has the potential to get messy. Languages can sometimes even throw runtime exceptions to \textit{prevent} you from doing it. See the problem in Section \ref{ex-remove-all}
\section{Analysis}
When reading through our analysis, please keep in mind that we made a number of pedagogical choices when writing our Array List.
We did this to make our code readable and to help gain an understanding of the mechanisms .
The Array List implementation in your language of choice probably has a huge number of optimizations, at the cost of readability and complexity. For example, at the time of writing, \texttt{listobject.c}, the source code for the Python list, is almost 3500 lines long \cite{py-list-source}.
Those caveats aside, lets talk about the four primary operations for Lists that have a cost: add, remove, get, and set.
\subsection{Add/Remove}
The runtime for adda dnd
\subsection{Get/Set}
Arraylists use the same memory formula discussed in Section \ref{array-formula} to find a specific index. This calculation, which is an addition and multiplcation operation, takes the same amount of time no matter how big the ArrayList is. Thus the runtime is $\Theta(1)$
%\section{More Restrictive or Permissive Generics}
\subsection{A Note on Storage}
\section{A Few More Useful Methods}
\subsection{Constructors}
Java's \texttt{ArrayList} can optionally take in an integer as an argument.
This will start the underlying array's length at that value, rather than the default of 10.
This is useful if you know exactly how big your List will be.
However, if you aren't removing any items when populating your list, consider using an array instead.
\subsection{Manually Adjusting the Capacity}
Java provides two methods for manually adjusting the ArrayList capacity.
The method \texttt{ensureCapacity(n)} forces the ArrayList to grow to a capacity of \texttt{n} items, if it can't already.
Conversely The \texttt{trimToSize()} shrinks the capacity to be equal to the current size. This is useful if we know the ArrayList won't get any larger than it currently is and want to eliminate the wasting memory with empty array slots.
Python will automatically optimize lists for you. Python will automatically resize the list to shrink it if necessary \cite{py-list-source}.
\subsection{Adding Multiple Items in One Invocation}
One common operation is to move or copy all the items from one list to another.
In Java, we can use the \texttt{addAll()} method, which takes any Java collection as a parameter and all the items in that collection to the object.
\begin{minted}{Java}
List<Integer> a = new ArrayList<>();
List<Integer> b = new ArrayList<>();
for(int i = 0; i <3; i++) {
a.add(i);
}
for(int i = 3; i <6; i++) {
b.add(i);
}
a.addAll(b);
System.out.println(a); // 0 to 5 inclusive
System.out.println(b); // [3, 4, 5]
\end{minted}
In Python, we can use the \texttt{extend()} method on anything that is iterable or use some clever slicing. However, I would always recommend using the method call over the slice, since a method invocation is always more readable.
\begin{minted}{Python3}
a = [0, 1, 2]
b = [3, 4, 5]
c = a + b # creates a new list, which is similar to our goal
a.extend(b) # adds all of b's items to a
a[len(a):] = b # does the same thing but looks gross.
\end{minted}
A common beginner mistake in Python is to try to extend a list by calling \texttt{append} on the list like so.
\begin{minted}{Python3}
a = [0, 1, 2]
b = [3, 4, 5]
a.append(b) # a is now [0, 1, 2, [3, 4, 5]]
\end{minted}
This adds the entire list a single item in the list.
\newpage
\section{Exercises}
\subsection{Remove All Instances}
\label{ex-remove-all}
Write a method called \texttt{removeAllInstances()} which takes in a \texttt{List} and item\footnote{In other words, the first parameter is a list of generics and the other input is a single item of the same type the list holds.}.
The method then proceeds to remove each item in the list that matches the given item.
For example, if the method is passed the \texttt{List<Integer>} \texttt{[1, 4, 5, 6, 5, 5, 2]} and the \texttt{Integer }\texttt{5}, the method removes all \texttt{5}'s from the \texttt{List}.
The \texttt{List} then becomes \texttt{[1, 4, 6, 2]}.
It should return nothing, since the changes the \texttt{List} it was provided.
\footnote{This one is extremely tricky, since removing an item shifts the indexes.}
\newpage
\section{Source Code}
\subsection{Java}
\inputminted{Java}{code/MyArrayList.java}
\newpage
\subsection{Python}
\inputminted{Python3}{code/arraylist.py}