-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathtrees.tex
More file actions
208 lines (94 loc) · 63.9 KB
/
trees.tex
File metadata and controls
208 lines (94 loc) · 63.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
\chapter{Trees}
% AI Disclosure
% For this next chapter, I'm going to try something a bit different.
% Yes, I'm going to use AI to help me write this, but I'll be doing so using my own work.
% The attempted workflow is going to be me feeding Google Gemini (or some other service) my videos/ transcripts of my videos.
% Link here https://www.youtube.com/playlist?list=PLNDWoTOY5hTamLK3NAZiRP57YVQAYHB0t
% The transcript will need to be cleaned up since the autosub isn't fully accurate nor does it do punctuation
% Then the next step wull be to adapt the transcipt for the textbook, keeping as much of my original diction as possible.
% If sucessuful / time save, I will adapt for future chapters.
Our next major data structure is trees. Specifically, we will be looking at binary search trees.
Trees are an excellent data structure for storing things since they implement all the operations we care about for collections in logarhythmic time\footnote{Specifically , Trees implement everything in average case log rhythmic time and worst case linear time , but if we do a bit of extra work and make it a self balancing binary tree (which will seem much later in this chapter) we can make this tree worst case log arhythmic for all operations}
However, trees are not without limitations. Trees will only work with data that can be stored hierarchically or in an order.
\section{Terminology}
\section{Youtube Dump}
The other thing to know about this particular lecture is that I don't plan on doing any coding, well maybe a little bit, but I'm not actually my editor at any given point. This one is purely definitions and getting us getting our vocabulary. So fortunately, I'm not going to teach any like really weird words or anything that you need to know, like 'indicee' was a completely new word and a completely new context that you would have to figure out, or 'index,' right? And 'arrays,' those were weird words. Here, we're going to leverage knowledge that you already shouldn't have about trees. Okay, okay.
%So first and foremost, when it comes to trees, display settings. First, computer, as I've mentioned previously, computer scientists really rock when it comes to naming things. Okay, we're really good at that. On the other hand, we make terrible gardeners. Okay, so all that we like, we're really seriously, we're great at naming things because these things look like trees if you turn them upside down. So all the trees we're dealing with, so when I talk about trees, it's going to look like they're planted upside down. So with the root, the roots are at the top and all the leaves are at the bottom, just like imagine we're growing them in 0g or something, because we all like sci-fi or something like that. Okay, so that's the only thing to say. But otherwise, they make sense as trees. I mean, and you've actually dealt with trees like this before if you look at like family trees and that kind of stuff.
%So let's go over, so today's lecture is on trees. So today, we're going to learn how to use trees to represent hierarchical information, or hierarchical organization of information. We'll also be learning how to use recursion to process trees. So if trees are actually fairly easy to program because you can do so, you can do, you do so recursively, and all the cases are basically the same. And 90\% of tree problems can be solved by using one of two different algorithms we'll learn\footnote{Source: It came to me in a dream.}. And the remaining other ones, they can be solved by just throwing an additional data structure that you need around. So if you needed, so if you need to do something weird, you can generally solve it by using a different data structure sets in addition to that, what you have here, such as a stack or a queue. Okay, but in this, in this lecture, we're getting, sorry, so yeah, different ways of traversing a tree, understand the differences between binary trees, binary search trees, and heaps. So we're going to talk a bit about binary trees just because we have to deal, because all binary search trees are binary trees, right? A binary search tree is a type of binary tree, so we're going to learn about binary trees indirectly. But the two much primary data structures we're going to be learning this, this chapter, are the binary search tree and the heap. And we're going to learn how to implement them. Binary search trees will use a linked data structure, so we'll use something you'll new, use nodes, just like we use for a linked list, except the note, the in the nodes are going to look slightly different. The variable names are going to change, but otherwise it's not going to get more complex than that. And then for heaps, we'll use arrays instead of all these data structures, but you could use either. Okay, we're also going to learn about basic compression when we talk about Huffman trees. So we're going to learn how to encode stuff and do basic file compression, making file sizes smaller by compressing the data and getting it in a lossless format. Okay, so all the data structures we've been dealing with so far, the list, there, they're list, they don't have, they're linear. Every element, they've got the item to the form and the items after them. So if you want to find something in the list, you have to go through the list and take so event up time. Trees are nonlinear, meaning they're not stored in a list, they're not stored in a straight line. They're hierarchical, meaning that one is, that we treat something in the tree more importantly. Other than something else, and tree nodes, they can have multiple successors, but only one predecessor. That doesn't really make sense. I don't know why they put it in there because it only makes sense after you physically see what a tree looks like. Okay, but we deal with trees a lot. Class hierarchy, so you know the kind of hierarchies we've seen before with basically we have the object class in the collections class, and then we have all the different types of collections, and then we have all the different, and it's like lists and maps, which are lists and sets, and then under lists, we have array lists, linked lists, right? That kind of hierarchy, that's modeled by using the tree. Your disk directory and the subdirectories, that's modeled by a tree. If you're, if you're one of those, you happen to be using a UNIX machine right now, such as a Mac, or if you're running Linux, uh, actually, I'm running, I got a UNIX box right here on Windows, the UNIX subsystem, right? This structure of the file system structure looks, it's, it's a tree. So let me go to the root, which is the topmost directory, and inside, what is that, what directories do I have? That's, it's hard to see because it's blue, I'm black. So let's see, properties, let's go ahead and set color screen background to me white. Sure, wonderful. Okay, pop a background, pop-up text, text really. Why in the world does it do that? Okay, got to really do anything so I can change. But I've directory such as so much for for being clever, right? Properties, screen background, let's set that back to black, shall we? So CD, um, then CD, so in the binary directory, I have other stuff like this. Let's go ahead and go into, you'll see is maker, maker, or a directory. No, it is not. What about, but these are a bunch of binaries and files. What about, let's go into users, other stuff that has been games, include libs.
So, first and foremost, when it comes to trees, computer scientists really rock when it comes to naming things. We're really good at that. On the other hand, we make terrible gardeners. We're great at naming things because these things look like trees if you turn them upside down. All the trees we're dealing with, when I talk about trees, it's going to look like they're planted upside down, with the root at the top and all the leaves at the bottom, just imagine we're growing them in zero-G or something because we all like sci-fi or something like that. That's the only thing to say. But otherwise, they make sense as trees. You've actually dealt with trees like this before if you look at family trees and that kind of stuff.
Today's lecture is on trees. We're going to learn how to use trees to represent hierarchical information, or hierarchical organization of information. We'll also be learning how to use recursion to process trees. Trees are actually fairly easy to program because you can do so recursively, and all the cases are basically the same. And 90\% of tree problems can be solved by using one of two different algorithms we'll learn. And the remaining other ones, they can be solved by just throwing an additional data structure that you need around. So, if you need to do something weird, you can generally solve it by using a different data structure in addition to what you have here, such as a stack or a queue.
In this lecture, we're going to cover different ways of traversing a tree, and understand the differences between binary trees, binary search trees, and heaps. We're going to talk a bit about binary trees just because all binary search trees are binary trees. A binary search tree is a type of binary tree, so we're going to learn about binary trees indirectly. But the two primary data structures we're going to be learning this chapter are the binary search tree and the heap. And we're going to learn how to implement them. Binary search trees will use a linked data structure, so we'll use nodes, just like we use for a linked list, except the nodes are going to look slightly different. The variable names are going to change, but otherwise it's not going to get more complex than that. And then for heaps, we'll use arrays instead of all these data structures, but you could use either. We're also going to learn about basic compression when we talk about Huffman trees. So we're going to learn how to encode stuff and do basic file compression, making file sizes smaller by compressing the data and getting it in a lossless format.
All the data structures we've been dealing with so far, like lists, are linear. Every element has the item before and the items after it. So if you want to find something in a list, you have to go through the list, which takes time. Trees are nonlinear, meaning they're not stored in a list, they're not stored in a straight line. They're hierarchical, meaning that we treat something in the tree more importantly than something else. Tree nodes can have multiple successors, but only one predecessor. That doesn't really make sense now, but it will after you physically see what a tree looks like.
We deal with trees a lot. Class hierarchies, like the kind of hierarchies we've seen before with the object class and the collections class, and then all the different types of collections like lists and maps, and then under lists we have array lists and linked lists—that kind of hierarchy is modeled by using a tree. Your disk directory and subdirectories are also modeled by a tree. If you're using a UNIX machine right now, such as a Mac, or if you're running Linux (I'm running a UNIX box right here on Windows, the UNIX subsystem), the file system structure looks like a tree. If I go to the root, which is the topmost directory, what directories do I have inside? It's hard to see because it's blue on black. So let's see, properties, let's set the screen background to white. Sure, wonderful. Okay, pop-up background, pop-up text. Why in the world does it do that? I can't really do anything to change it. But I have directories. So much for being clever, right? Properties, screen background, let's set that back to black. So cd... then cd. In the binary directory, I have other stuff like this. Let's go into users, other stuff, games, include, libs. So let's go to cd Lib, or actually cd share. There's all sorts of stuff in here. We have a dictionary here, no, nothing in that. But basically, it's a tree-based, it's a hierarchical thing. You start at the top, and each folder has subfolders, and those subfolders have their own subfolders. It's the same way it works in Windows. I've got my C Drive at the top, and then all my system files are in the Windows drive. Then I've got things like Program Files. Let's see what else I have. I've got Steam, which has all the games that I've installed on here. And in Steam apps, you have all the subfolders that hold the games, and then actually it's in common. So there are all the games I have. It's all basically folders inside folders inside folders.
%So let's go to CDE Lib, or actually CD share. There's all sorts of stuff in here. We have a dictionary here, no, nothing in that. But basically, it's a, it's a tree based date, basically, it's just tree based. It's a hierarchical thing. You start at the top, and each folder has subfolders, and in each vault, you see those subfolders have their own subfolders. Okay, and that's the same way it works in Windows, right? I've got my C Drive at the top, and then like all my system files are in the windows drive. Um, let's see, then I've got like Program Files. I've got others. Let's see what do I have. Let's see, got Steam, which has all the games that I've installed on here. And in Steam apps, you have all, you know, the soul folders that hold the games, and then actually it's in common. So there's all the games I have. So right, so it's all basically folders inside folders inside folders. Okay, so we can also do a family tree, which is like basically parents and then your those. So that's, that's kind of a trees in the name, you understand how that looks like. So trees are recursive data structures because they can be defined recursively, which is a recursive definition. So that definition kind of sucks. Many methods to process trees are written recursively, that's more important. Trees are written recursive data structures. Well, that just simply means that basically the way a tree works is that every tree, sorry, every node in the tree is considered to be the parent of its own tree. Every, every node in the tree is considered to be the root of its own tree. Okay, so that's why we're able, just like when we're going through the linked list, every node in the linked list, we treated it as the head of its own new linked list. So now, why are we learning about trees and binary trees specifically? Well, most of the trees we can take, like there's trees with three children for Jill. Binary means that each node has two, two successors. So why do we deal with these things? Well, because when it comes to runtimes, binary trees are the jack-of-all-trades. They are log n add, log n remove, log n get, log n set. They're log n basically everything if they're balanced. Okay, so in Chapter nine, which is the most likely chapter to leave out of this textbook just because of time constraints, they go into something called self-balancing trees. Okay, and those are trees that basically make sure that they remain balanced. Okay, we're not going to go over that in that chapter, but in the real world, basically, these self-balancing trees ensure that everything is always log n worst case and average case. But with the trees that we're learning, we aren't learning how to balance them because that involves learning how to do rotations. We're going to just deal with that. And on average, it will be log n worst case scenario, we'll see it could be O of n time because worst case scenario, tree you can accidentally grow a tree in such a way, it looks like a linked list. But that's very, very rare. On average though, log n time to do everything, which is very, very quick.
In a file system, it's a tree-based, hierarchical thing. You start at the top, and each folder has subfolders, and those subfolders have their own subfolders. This is the same way it works in Windows. You have your C Drive at the top, and then your system files are in the Windows drive. Then you have Program Files, and other directories. For example, Steam contains all the games you've installed, and in Steam Apps, you have subfolders that hold the games, and in "common," you have all your games. So, it's all basically folders inside folders inside folders.
We can also represent a family tree, which consists of parents and their children. That's a tree in the name, so you understand how that looks.
Trees are recursive data structures because they can be defined recursively. While that definition isn't very helpful, it's more important that many methods to process trees are written recursively. The way a tree works is that every node in the tree is considered to be the root of its own subtree. This is why we are able to process them recursively, just like with a linked list, where every node was treated as the head of its own new linked list.
Why are we learning about trees and binary trees specifically? While there are trees with three or more children, "binary" means that each node has at most two successors. We deal with these because, when it comes to runtimes, binary trees are the jack-of-all-trades. They offer log(n) for add, log(n) for remove, log(n) for get, and log(n) for set. They are basically log(n) for everything if they're balanced. In Chapter 9 of the textbook (which is often skipped due to time constraints), they discuss self-balancing trees. These are trees that ensure they remain balanced. We won't cover them in that chapter, but in the real world, these self-balancing trees ensure that everything is always log(n) in both worst-case and average-case scenarios. However, with the trees we are learning, we aren't learning how to balance them because that involves learning how to do rotations. We're going to just deal with that. On average, it will be log(n) time. In the worst-case scenario, it could be O(n) time because you can accidentally grow a tree in such a way that it looks like a linked list. But that's very, very rare. On average though, log(n) time to do everything is very, very quick.
So I've been throwing out term successor, children, parent, root. So let's go ahead and get those. And what we're going to do is that we're going to leverage your, your knowledge of botany and family trees. And it doesn't need to be much botany at all to basically understand our stuff. So a tree is a collection of elements, our nodes. Okay, it's a set of elements or nodes, and every node is linked to its successors. So basically, these arrow, these links, there are links that point down. So here's dog, and it's connected to cat and wolf. So note at the top is called a root, is called the root of the tree. Okay, this is a very small tree with four nodes. Okay, now it doesn't necessarily make sense because it's like a dog is the parent of cat and wolf, cat and cat, it has a parent of canine. They'll make sense why it's in this order. Okay, so the node at the top of the tree is called its root. Okay, the links that go from from a node to its successors are called, we can call them either links or we can call the brain, either way, everybody's going to know what you talked about, links or branches. Okay, and successor and predecessor, they, those are big, those are big warned, big words with lots of syllables. So why don't we just use something a bit easier? Uh, this is a dog as the parent of cat and wolf. Cat and wolf are the children of dog. So the successors of a node are called the children. Okay, and your predecessor, the node above you, is called your parent. Okay, every node in the tree has exactly one parent. Okay, basically, you have one boss, essentially. You have one parent, one person above you, except for the root node, who's an orphan, and he's all alone. Maybe it's Batman or something. So, so those guys, so everybody in the tree has exactly one parent, and you can have up to end. If it's a binary tree, you can have up to two children. But now since we've established this like pie of this child-parent relationship, we can actually leverage, we can leverage this like crazy. So for instance, cat and wolf, they're both children of dog. So we can call them what? Siblings. We can call them siblings. Notice that I have the same parents or siblings, right? Then we can also make this also much a much more generic relationship. Let's go ahead and use this fake laser pointer. Okay, so let's generalize this relationship. What would the relationship between dog and canine be? Yes, grandpa, grandpa. Let's go with grandparent, right? Grandparent and grandchild, right? Dog is the grandparent of canine. Canine is the grandchild of dog, right? So you can generalize this out. After grandparent, it becomes a bit silly, so you might just want to use ancestor and descendant at that point. Okay, at that point, use the ancestors descended rather than trying to fit count, be a number of greats. You have your great-great-great, right? And and also just try to avoid using like first cousin once removed, that stop. Um, because, well, I know what those term means, not everybody does. Um, so, so a node that has no children, it's called a leaf node, right? Because you're at if you're, here's the root, right? And so if you're at the opposite side of the roof, you're sorry, obviously side of the root, you're a leaf. Okay, so canine is a leaf, wolf is leaf because those two nodes don't have children. Cat isn't a leaf node because it has one kid. We can also call them external nodes, but that's boring and more syllables, right? So we just call them leaf, a leaf node, or a leaf or leaf nodes, or it leaves, right? Non-leaves can be cut, can be referred to as internal nodes if you really had to, but leaf is quicker to say. Okay, so again, the generalization of the is the ancestor descendant. So dog is the parent of cat. Cat if the parent of canine. Canine its sentence cat and dog is the ancestor of canine industry. Okay, so what makes trees trees though, is that they're recursive. Every node in this is the root of their own troop, of their own subtree. So dog is the root of his own tree made up of these four guys. Cat is the root of his own tree made up these guys. So cat is the root of this tree that starts a cat with the with the left branch of canine and the right branch of nothing. Canine is his own, is his, wolf is his own branch and it's just himself. And canine's his own brain, oh sorry, his own subtree, just himself. We also have two more terms. This one's a bit more abstract, which, but once, but basically it takes you about 10 seconds to remember what it is. The level of a node, it's basically the distance from its rib, from the root. So basically, how far down the tree are you? If you're at the top of the tree, you're level 1. If you're the second, if you're at the second row, you're level two, make sense, right? Basically, it's your distance from the roof plus Y. So the root is 0 from itself, so one. And we can, we'll see that we can figure out what what your just what level you're at recursively. Okay, right? Um, the height is basically the length of the height of the tree, how big it is, or the thing that determines how big it is. And basically, it also determines a lot about the runtime will be is the longest path from the root to a weak node. And as we'll see, basically, the worst case scenario for an operation is that it's going to take height number of steps to do something in the tree if we're, if we're doing something like an add or remove. So the height of this tree is three, meaning because the longest path from root to a leaf is dog, cat, canine. It's not two because even though dog and wolf is a valid path, it's not the longest one. Another thing about these paths, it's not like one, two, three, four because these only go down. Let's go ahead and actually, hey, I've got a pen, I shouldn't use that, right?
These are arrows that go down. Typically, it's not, it's usually not bi-directional, right? These, you only have an attempt, you only point to your children, but your children, but you don't have a parent. Okay, so in a binary tree, each node has two subtrees essentially. So what makes binary trees, this is the formal definition of a binary tree. Each node has two subtrees. So a set of nodes is T's binary tree if either of the following is true. T is empty. So, so a set of nodes is a binary tree if it's a null set, right? Oh, and that's so this is a recursive definition. It's this is the base case over here. The recursive case is if it has two subtrees, left tree and right trees, such that left tree and right tree are both binary trees. So let's go ahead and put that on the board. Draw the example downward. So this is a binary tree. We'll see that this is not a binary search tree, but it is a binary tree right now. Okay, so this is a binary tree. If I were to do this, this is not a binary tree because one, because one of these guys has three, right? So first off, this, we're saying this entire set is a binary tree. We didn't split up into two. If we can take this and split up in, so we have the room and then the root has two trees, ones have left, the one to the right, and the one to left at the binary tree. So what about the ones are left? Well, this guy on the left, he has, he has three children, not to, right? He is a luxury, as left subtree, right subtree, in the center subtree. So he's not a binary tree, right? This center thing just doesn't work. Okay, so now he's a binary tree because he has two children, but he's not a binary search tree, which is what we're going to be dealing. So let's go ahead. So what we'll go back to these, these new, this other stuff later on. This stuff over here, like expression trees, we'll come back to that when we, when it acts, when we go to over tree traversals, same with Huffman trees. We'll come back to that. This is the different type of binary tree. But binary search trees are what we're going to be dealing with for the majority to chapter, which is that, um, and the reason we have this is the border in a binary search tree, all the elements to let on the left subtree are before all the elements on the right subtree. So this tree that we've been dealing with in the example, it's in alphabetical order, right? So cat and canine, soda. So cat and canine come before dog and wolf, and wolf comes after dog. Okay, so all the elements on the left come before the root, and all the elements on the right come after the root. And then this applies recursively for each subtree. In this subtree, all the elements to the right come before cat, and all these are all the elements left come before at all the elements the right come after cat. So a set of note, so this is the mathematical definition, a binary search tree. If T is right, well, this is the 702, you the binary search tree if either equality true, either if it's the empty set, or you have two subtrees, the left tree and the right tree, such that they are both binary search trees. And all the stuff in the left side is bigger than all the root node is bigger than all the stuff on the left and smaller than all of this stuff on the right. So let's go ahead and deal that and do that with numbers because it's easier to check with numbers. Let me draw a binary search tree, or let me draw a binary search tree on the board, and then we'll go ahead and, you know, change it around to make sure that. So I'm going to start with basically just single-digit numbers because that's easy. Five, nine, seven over here, nine, six, three. So this is a binary search tree. So all these stuff on the left side of the tree is smaller than the root. All the stuff on the right side of the tree is bigger than the root. Not only that, it, this is true for any subtree we look at, right? Let's look at three. All the stuff to let them treat, those are all the stuff to the left of three is three is smaller than three. All the stuff to the right, it's bigger than three. Okay, so let's look at one. All the stuff that's to his left to his left and bigger than him. All the stuff to his right is bigger than him, which is, and or rather, all the sections left to right are binary search, right? And nothing use binary search tree. All right, technically, he's got a bunch of null pointers right over here, which in some solutely. So, and that matters. And visualizing that from x matters. And reviewing this stuff like red black trees, which are stuff that self balances again. So this is a binary search tree, and that's actually pretty cool because this is that it's sorted, believe it or not. This is sorted. Uh, so let's say I wanted to find six in the tree. Okay, say if I want to know if this tree holds six. Nah, just basically this is the DA contains method. So let's go ahead and see. So I'll start at the root and if, hey, does this tree contain six? And I was like, well, I'm five. I don't know what this tree contains six. Why should I keep track of all my kids? I've got a lot of them. It's a big family. But if you, but I know that basically six is bigger than me. So if you go to my right, so why don't you go ahead and ask my right subtree? So we don't have to right subtree because that will hold all the values that are bigger than pi, right? And as a consequence, we don't have to look at any of these values over here, right? Because those are all less. This is binary search all over again, right? So we buy, so by basically asking the question, how are we bigger or less than the root or equal to it, right? Just like is the number I'm thinking of bigger, smaller, or equal to? We can throw out half of the tree that we have to search. Now, basically, we split this tree in half, the search phase in half. If we do that with every time you ask question, it's logarithmic. So now I go to seven. Do you know where does this binary search tree contain six? And something's like, I don't know, I'm seven, so I'm not six. Why don't you go at, but if six is in this tree, he's going to be to the left because he's less than me. So you go to what the, okay, does this binary retreat surgery contain sex? He goes, yes, of course it does. Opposites. And so we return true to here, with your furniture to here, which will return true to use it. So three steps to buy to search seven items. Okay, um, and in fact, if the tree is properly balanced, a definition will learn once we get back from the break because that one's of the words that describe trees, which are full, perfect, complete, and balanced, which are like all these things that might describe some like on an optimal new weight, you know, an optimal lifestyle, right? Full about perfect, complete, balanced, like somebody who's basically, you know, posting all those positive messages on Facebook, right? Wondering if something is okay. These are, right, those are the words that we're going to use to describe trees. But we're going to be able to describe them in mathematical terms. So, so I guess they'll cause a lot more irritation. So let's go ahead and do, so let's go ahead and and see if I was looking for one. I'd say, okay, five, one is less than five, one is less than three, one is equal to one to be bounded. Okay, so this is a binary search tree. Now, let's go ahead and make some changes. This still looks a binary search tree. Yeah, still binary search tree because all the items to left are bigger than the root and that also and that implies for every subtree. Okay, so now let's go ahead and see by root six, still binary search tree. What if I put six over here? No, because six is greater than five. It needs to be on this side of them. Okay, what if I were to put on say switch one over to here? Not a binary search tree. Well, because one is less than five on the right side of five, or beside the correct side of five, right, which is to the left, not the right. Three, oh sorry, one is to the right of three and it shouldn't be because one is smaller than three. Easy be on this side. Make sense to everybody there, right? It's got to be, it's got to be true all the way down. Got to be true completely recursively. Okay, so, um, let's see. So now let's go ahead and see it ask ourselves, is this a binary search tree? Five, seven, six, eight. This is a binary search tree. Yeah, everything to the right is big is all the elements on the light of everything gets bigger and all the items left or smaller moves. Let's go with this one. Go with five, ten, six, seven, with eight actually over here. So this is a binary search tree. Yes, it's a, it's a terrible one. It's this is definitely not what we call balanced, which again, is a term we have to define. But you can kind of see it's unbalanced, right? Five, ten, six, eight, seven. Okay, everything that's bigger than five, five is to the right. Everything that smaller than ten is two tens left. Everything that's bigger than six is to the left is to the left of a six in everything smaller than ages until is to let them of it, right? And we're looking at it in recursive manner, right? So this also brings us to an important consequence. When we're building trees, we always add like, we always add to a leaf essentially. When we're building a tree, we always build, we start from the root and then we build the leaves from the root and then we build out from there. So on, now this is not, now this is why I'm just saying that basically worst case scenario, it's going to take oh then to find something because this is just really a crooked looking link. Let's start here, right? If I wanted to find seven, I'd have to go, I have to go right, left, right, left. And basically, I'm not every time I'm making a choice, I'm just simply ruling out the thing I just surged. I'm not actually ruling out any tabs. I'm not having the search space at all. I'm doing a linear search here, which is so, um, this doesn't happen very often, but it can happen. But most of the time it doesn't, which is why we say the average time is a blog and to search for treating on average. But worst case scenarios like this are O of n, which are linear time. Um, now when we act again, once we actually use trees, if you're using a tree that's already been like a library that uses and treatments already built for you like the tree set, it's going to do worst case and average case and best case log of that. Some stuff might be O one because blog about. And why is this? Because will be something called self-balancing. And rebalancing the tree can be done in log n time. If you're basically when you add or remove something, you cause the tree to become unbalanced, it takes as so long as that was the up, you had one operation that calls to be unbalanced. Rebalancing it takes at log n time. So it's a very, there's a very effective manner on that. So today, it's just a very effective way to do that.
[Music]
Alright, let's see. Alright, so, so the cool thing about binary search trees, as I mentioned, is that they never have to be sorted because they're always sorted. A consequence of this is that binary search trees, binary search trees always have to store sorted data. They can only be, can only use comparable or sort of Alita. You can only store comparable items in there, right? Because comparables can be, can you can put it, put them in order. And if you can't put them in order, you can't put them in a tree because going to the left and right has no meaning anymore, right? So you can put them in a binary tree, but not in a binary search tree. Okay, so as we add or remove items, the binary search tree will remain in order. Okay, this is better than like trying to keep an array or a linked list sorted, which will take cost, which will take linear time to do that. Okay, so again, worst case scenario, but on average, it's all of log n. Worst case scenario, it's O of n. Okay, I'm this was the algorithm who were using the pseudocode for. If the binary, let me remove my heads. So not that there were many letters to figure out. If the tree is empty, we return null, basically saying that the target is not found. Otherwise, if it matches the root data, we can return the data there. Otherwise, if it's less than the root, we look in our left subtree. Otherwise, we look in our right subtree. It's a very easy algorithm. This is one of two algorithms. Is what I call the search elephant. You other ones, the traverse algorithm. Those two algorithms together solve 90\% of the, basically, our 90\% of what you need to basically write all the algorithms you need. You'll treat the remaining 10, 10\% can be dealt with either having a parent or variable or using using an additional data structure and being clever with that, such as a stack or queue. Um, it's very, it's just like link lists are very, are very popular. You use it as interview questions. Trees are very popular is interview questions. So let's go ahead and figure out those full, perfect, incomplete words. Those words that basically describe, scrap, there aren't really describing the wholesomeness of the tree's lifestyle. Who are we to judge that? But the, but rather, you know, just certain mathematical qualities they fulfill. So full is the easiest one to remember and probably and perfect is pretty much the next easiest one because it's visually easy to see. Full is easy to remember though. A full tree is a binary tree where all the nodes either have two children or no children. Okay, basically, uh, if you're a leaf node, you don't have any children by definition. So basically, we're saying here, all the non-leaves, they'll have both their children in slots filled, right? So here, in this example, this is a full tree, right? Because all the children, note, note zero, two, four, six, nine, eleven, and thirteen, they have no children. But seven, one, three, five, and ten and twelve, all of their children, all there, they both have, they all have two children. If I were to remove this two, right, this tree is no longer full because three has a right child, but not a left child. Make sense? See, can I erase all ink on slate? Cool. All right, that makes sense everybody. So full is a very easy one to see. You can just kind of look at it. In fact, you have to look at it to figure out to figure it out. And that's actually probably really good exam, a practice exam question, like write a method that detects whether that returns true if the tree is full and false if it's not, right? That's a pretty, that's a pretty nice one. I would do full, perfect, and complete binary trees. So a perfect binary tree, this one's also not so bad. It's the complete that gives people trouble, and it's understandable why because the definition is a bit obtuse. But once I start drawing, going to go through the examples, it's perfect is a full binary tree with a height of n. Honestly, it should be H to be honest, be a height of H with exactly two to the H minus one nodes because I like using n for the number of nodes. So it should be, it should be a high eight, a pull by a binary tree of height H. So this one's of height three, and notice that this one's height three and it has seven nodes, which is two to the third power minus one. And there's only one possible way a binary search tree can do this if it's full and has a height, vivid hat. If it's full, has exactly this many nodes and has exactly this height, there's only one way it can look, which is this way. So let me go ahead and draw some perfect trees on the board, on the actual board. Okay, so this is a, this is a full tree. It also has complete, right? It's an empty tree. This one is a full, this one is full and perfect, right? Actually, and I'm going to ignore numbers right now. This one is full and it is perfect, right? All the nodes have exactly zero or two children, and its height is 1. And there's exactly one note of it to the one minus one. This is a perfect tree of my two, right? This is a perfect tree of height 3. The perfect tree of height 4. Basically, you know, it's each of the bump, each of the cup, each of the rows are completely filled up. That's kind of the best way to visually represent purgatory. Each of the rows are completely filled up, right? It's not too bad to understand. Okay.
So complete is the kind of a terrible one, but it's not actually too bad. It's this more annoying than anything else. So a complete binary tree is a perfect binary tree up to level, up to be, it's perfect all the way through the except for the bottom level, right? It's perfect up to this point. Okay, perfect complete trees are perfect up to this point, but then they have some extra leaf nodes on the bottom, and they're all towards the left, which is what makes this such a terrible descriptor. They're all over here to the left. So this is a complete treatment because this over here, let me go ahead and I'm going to start drawing them. This is a complete tree because this level is perfect, and the level below that is not perfect, but all the nodes over here are shifted over to the left. So this is a complete tree. This is a complete tree, complete and perfect, right? This is also complete, complete, but as soon as I removed this guy over here, this guy's left child, it's not complete anymore. Why? Because we have a gap here. Okay, and the answer, so imagine we were labeling all these on all these nodes zero, one, two, three, four, five, six, seven, eight, nine, ten. So long as I can keep doing this without a gap, right, it's a complete tree. But as soon as I go over here and I would write eleven here for this left child, as soon as I get over here and add, and have to skip the number, that is no longer complete. Does that make sense, right? There's a gap here. This is a, this isn't complete. Neither is this. This one is complete though. Okay, while there's no dots here, they're all shifted to the right, so this is not a complete. They all have to be towards the left. Okay, now you may be wondering why in the world do we have such a funky definition in the first place, right? Um, and the answer is, is that it's actually kind of irrelevant for binary search trees unless you're storing them and arrays. Because what you'll do is you'll store visit index zero, one, two, three, four, five, six, seven, eight, right? You can actually figure, you can use this notation over here to figure out where all of us, where they will be stored in an array. And there's actually a formula that will learn when we go through piece different gear out what slot and B, right? You need to be storing these things on. But we're going to learn how to do that, you know, and link them with nodes instead of parades. But primarily, we're going to use this in heaps, and heaps are always going to be complete, always. It's kind of their definition. So we'll revisit this when we go over heats. So full and perfect, those are important for trees. The complete is important for heat, which is specific type of tree. Um, so I think it says is that we're not going to discuss general trees in this chapter, except we are discussing it right now. So I guess they're liars. But so the nodes of, but the point is, is that they're saying we're not figuring out how to solve these general trees, which is they're basically, yeah, you can make not just binary trees, but trinary trees and quaternary trees and whatever and airy trees you want. But we can pretty much, but we're going to show you that that doesn't matter because we can take this general tree. But the, you know, where and a tree basically, it's a tree so long as basically we have one parent. Okay, right? So here's basically like the, it's, you know, the English monarchy here, that's what we're doing here. And we're just looking, looking at basically that the Royal issue of the Royal of the Royal monarchy. Okay, so William the First, and then Henry the Tilde, Henry the Second, John, Henry Third, Edward, right? We can actually turn this into a binary tree so long as we basically care about only one parent and everyone. And you're like, no way, there's no way you do that. So here's the binary tree of that. And it's kind of just tilted. Okay, we kind of just tilted it vertically. And how did we change this? Well, if you sometimes you just have to be clever. In this case, what we did is that we said our left branch, so we're going to not use the term child over here because, you know, we're talking about people with actual children. So we're going to use left and right branch. So the left branch is the oldest actual child, right? And your right branch is your youngest sibling, right? So that's why we're using the branch term here, not the kid trip child because you want to make sure we're not. And that's also why we have branch, the branch terminology because that way we can, we can make sure we're not over, we can avoid overloading it too much. So the left branch is your oldest kid, and your right branch is your youngest sibling, right? So we'll use the first oldest kid is Robert. And bears all and to the right is all of William's kids, Robert, William the Second, Adela, and Henry, right? William and there's all his kids. So we just connected them all to each other. And if they have their own kids, those are on the left. So it's a pretty cool way to do that. So that's why I say, so you can pretty much turn these general trees into binary trees and get, you know, some log n time. But there's, uh, you know, there's some, no, I don't want to keep the said I did. So whatever. But you can do some really interesting things like learn about two, three trees, not two or three offense to three trees, right? You can do stuff where basically we have trees that not only have two children, but three children. So basically, you have two numbers in here. All the stuff that's less than a goes the left. All the stuff bigger than me is goes to the right. A is smaller than B. And if there's something in between, it goes in the center, right? So you can totally do that. And it's a two, three tree because it combines two nodes and three nodes. And then you can have two, three, four trees and all these other things like B trees, which is where it really gets exciting. Uh, you know, and that's kind of where it really gets advanced and really exciting. And, you know, even though it's 40 years old, we still teach them because they get used a lot for the for all as this basis for building a lot of other stuff. B plus tree, right? So the point here is that basically the stuff we're learning here is it is integral to understanding how basic file systems work. Integral. Okay, so all right.
So now we've got this tree up on the board. Let's go ahead and actually put an actual binary search tree on the board. The question is like, how do I like, you know, print it out or something, right? Like it's pretty easy to print out a list. I just go through the list and print it out, right? Okay, now the obvious solution, now the obvious answer pens in your head. It's like wondering, stop beating around the bush, right? You're just going to cook them in order, right? And that's completely correct. One of the ways to print out a tree is, let's go with this one. Five, seven, six, eight, three, two, right? Easy to use single-digit numbers here, right? So say we wanted to print this out. There's a couple ways to do this. There's a couple ways to basically go through all the trees in the note, sorry, although all the nodes in the tree. Okay, because that's the question we're doing. Keith, we're asking here, when we want to print out all the nodes and all values in the tree, or we want to print the tree, you want four double ones in the tree, which means we want print out all the values in tree. This is called a tree traversal, which is the second algorithm. First one was search e, which is zigging and zagging your weight of the tree. The other question is, is that oftentimes you want an algorithm that hits everything in the tree, right? So on, so often we want to describe the nodes of a tree and the relationship. And we can basically walk through the tree. So there's three different types of tree traversal: in-order, pre-order, and post-order. Addition, what possibly, I guess, terrifyingly familiar because we were talking not too long ago. And we were talking about stacks. It out in order notation, preordered notation, post order notations. That about to make a repair. Biers, I fear some, yeah, that will definitely be making a reappearance today. So there's three types of traversal. We're going to start with in-order, even though it's technically between the first two, right? Those in the middle because it's going to print stuff in order. Okay.
In-order is one of those that are wonderfully named. If you do an in-order traversal, it's going to go through everything in the sorted order of the array. So let me just go, I'm going to go through the output, and then I'll just simply, actually, let me just. So if I do an in-order traversal of this tree, I will do two, three, four, five, six, seven, eight. Okay, so because that's going to produce in order. And the way that it works is that basically, well, uh, because it's a binary search tree on everything that's less than the roof is on two left. So I'm going to do an in-order traversal of the left subtree first. Okay, and recursively do that. Then once that's done, I'll print out myself, and then I'll do a in-order traversal of my right subtree. Okay, and because that's recursive, right? Let me go. So what I will do is says, I'll do an in-order traversal of the tree starting at three. And three will go, I'll do an in-order traversal of the tree starting at two. Okay, and she will go, I'll do an in-order traversal my left, then I'll do myself, then I'll do my right. My left and my right were empty, so that was easy. Then three goes, okay, my left side is done. I'll do my self, and I'll do my right side. My right side and so my right side goes, okay, let me do my left myself and then my right side, right? And that happens recursively. Then he reports that it's finished. And five says that it's right. That makes sense, right? So it just simply goes left, myself, and right. Now, if you take a look at all these algorithms that I still have up on the screen, notice that basically the left always comes before the right. And all of these traversal, the only thing that changes is whether or not you're visiting, we're handling the root node first in between our left and right branches or after, right? You can even see that in the algorithms down here. The only thing that changes the prefix, right, of the function we call. And and one of them, what line visit the root appears on line three, line four, line five. Okay, so what's the prefix? So what's the pre-order traversal going to look like? No, I'm going to go ahead and know that you have enough time to study this. Go over here. Um, generally, but this is worth somewhere between six and eight points because not hard, but enough people, but but there's always just like one, there's like three people who do get it wrong. So it's worth keeping their it on the exam. So five, so here was what a pre-order traversal does is that we started the room. And pre-order says, I'm going to visit my cell first, then we'll do my left and not only my way. So does it myself first, then I'll visit my left. Okay, let goes, okay, let me visit myself, then visit my left and my right within two, right? Okay, then we go over here to the right side. So seven goes, let me visit myself, then I'll visit my left, right? And they do that same. So we've got the route here, then we've got our left and right trees. And then each of those guys, right? So there's kind of a structure that can be printed out here, right? Um, but basically, we've got myself and my left side, right? So there's kind of a structure that can be printed out back. The truthstream method that I already have written for my debugging my trees, what it does that an indent stuff in a nice manner. It will do something, it'll print it out in feet in order. Print out this treatise something like this. Five, and then it will indent it over here with three. And then two, four, sorry, then yeah, then to four for its children. And then it will have seven. It basically prints it out level by level where it's where the indent shows you what level that item is on. So it's not too bad.
Post-order is, well, let me, is I'll visit myself last. I'll do my left side, then my right side, then myself. So fought, so here I will go, let's do my left and my right. So then three will go, let me do my left and my right, and I'll do myself. Two, four, three, left, right, then myself. Um, so five ghosts, I do my left side, let's go might be my right side. This side goes, let me do my left, my right, and myself. Six, eight, seven. And now it does it myself, five. So we've got the left side, the right side of children, and the right children. Okay, and again, it's just, it's another way of just showing the order of the tree. So another question is like, okay, but are there any like actual applications of this? Okay, so I have to go through, let's see. I'm going to go all the way back to the, let's go all the way back, say here, too far, all the way forward. There we go. So the expression trees, I told you it come back this. Okay, so the idea here is that we can take some expression like X plus Y times A plus B over C. Okay, and we can encode that in a tree. We don't need to install operators industry. But basically, every node in the tree is an operator or an operand. Okay, right? It's a binary tree because up all the operators in you either have pretty much all the operators in in arithmetic our binary except for the positive sign in the negative sign, which just simply says, you know, make this positive, make this negative, which we could just easily encode. Um, so time, so X plus Y that, um, you know, that gets encoded here. Times A plus B divided by C. So the parentheses don't need to be stored in here because the tree structure dictates the order of operations, right? We're multiplying the result of adding of adding X and Y together, and we're adding and we're multiplying that against the result of dividing the sum of A and B divided by C. Okay, so the operators, so if you're at a higher tree level, it gets evaluated after like the lower you are in the tree, the higher precedences, right? Because we're going to need to solve you first. We're going to need to solve these first. And all and it makes sense that basically all the variables are going to be at the bottom of the tree because, um, they did, you know, if we had an operator there, that wouldn't make sense. Plus, if this was a plus, it would be plus what, right? Plus what left and what right? It wouldn't make sense there. So all the things have to be at the end. So what does this have to do with infix notation, prefix notation? Well, if we do an in-order traversal of this tree, we would be, let's see, so do my left side. So this is original expression. Let's do my left side. So X plus Y times A plus B divided by C, right? So I always, so I was just doing my left divide. If I was simply doing my left side, then myself and my right side into doing that recursively. And what that prints out is X plus Y times A plus B times plus B divided by C. Okay, and why does it do that? Because that's the order that it was stored into of the expression that's in fixed notation, right there. Ah, but it's ambiguous because we need the parenthesis to figure out what we originally wanted, right? But if we remember, if we do a post order or pre order, a post order traversal will give us the post fix notation of this. It will give us so X, Y plus A plus B. So A, B plus C divided. And then multiply those results together, right? So we could just simply throw this into a stack and it's going to work, right? And it will be completely unambiguous. The operators follow the operands. Here, we do a pre-order, we get the Polish notation, which is prefix notation. So here we treat this kind of as a function. We were multiplying plus X, Y, right? So multiplies the function call. And it's arguably car the function call plus, and it's arguments are X and Y. And the other argument for multiplied was divided. And its arguments where plus S X, Y, sorry, plus A, B. And the other argument for divide was C. So again, we don't need any parentheses here to actually being able to parse this, which is really cool, which is why again, all those Facebook fights about that stuff is should be answered with you should be using postfix notation. And you wouldn't have these arguments. We are, we pretty much we blazed through that and we still got 20 minutes. So there is a bit of stuff to do that we did cover because the other students, you know, or slightly more inquisitive. Um, but let's go ahead and see. So first off, I mentioned that we're going to be using the node class here. So what is the node going to look like? You know, what does the node look like actually? So just to make you sure you understand the structure. So here's our node. So this is the internal node. So it's private. And I'll put me here, although it's going to be E extends comparable. I only have so much space, right? Because again, these things are going to be convertible. Okay, E, I'm going to call it data this time, right? Because we store data in the tree. You could call it item if you want, you know, private E private. And then it has a, it stores the memory location of another. Now, know D, but instead of calling Nick them exit pre, we're going to call them left or my left child. And we're also going to call it right for my right shots. Private, no, E right. All right. And now you've got basically just use and then we would just simply use the same constructor where we'd taken a date, we'd take in some data as an argument and store it in. Okay, other Y. So in other words, except for left and right, it's exactly the same as the note we used in in a linked list. Well, it makes it different is so what makes actually this different is course we call it left and right. And the way that the algorithm builds stuff is going to be a bit different. So this will obviously be stuff I'll have to review when we get back because you're probably not going to remember a lot of it. And that's understandable because a week is a long time. Yeah, so they have your left, right. It'll have a two string there. But basically, the way it's going to work is that your binary tree is going to have a root. And that's all it's going to keep track of. And that note in each node will point to a left child and the right child. And each of those will point to their own left and right child's, which might may or may not be known. Okay, and so what are we going to have to figure out how to do this? Sorry, what methods do we really need? Well, they're, they have listed a bunch of methods like we'll get the left subtree. You get the right subtree of this. Honestly, that's not too necessary. But we really need our stuff like, um, adding, really, it's these things, adding something to the tree, removing something to the from the tree, and containing something in the tree. Sorry, asking if they contains them. The difference between a delete and remove is basically remove returns a boolean to say, hey, we successfully removed it. And delete return actually will return the value remove. Okay, so knows that this one returns null if you don't find anything. This one returns false if you didn't find anything to remove. Contains tells you if something's in the tree, whereas find actually returns the thing in the tree. You know, so one of the things that we will actually have to do is because Java in Java, you're limited to because you are actually limited to returning only one thing because of the way we write this, which makes it a lot easier. We're going to cheat a little. We're basically, we're going to store the values we want to return as as instance variables and then return the instance variables. So we have to do a little hackery where basically we have an outside value. We're going to store stuff in to give it back to the user because when we get into our recursive methods, our recursive methods are going to have to return on nodes to the user. I just just to make it so that it's nice and it's easy to read. But as a result, we have to just cheat a bit. But that's fine. It's not cheating if it works. But, um, we'll go over add and remove there. So let's see. Still 20 minutes left. But I didn't get into this into the evening lecture. And I don't want to get too far ahead. Let's see. Okay, so I'm probably going to actually stop the electric here because we're not like"
\section{The Parts of a Tree}
The first thing we need to do when introducing trees is define a vocabulary.
Much like the linked list, a tree is made of nodes.
However, unlike a linked list , nodes in a tree are not arranged in a line,
Instead, they are arranged in a heirachy.
Each node sit above multiple other nodes , with the nodes below it being referred to as their children or child nodes. The node connecting all these children is called the parent.
<A picture of one node, Represented by a circle with four arrows coming out below it. Each arrow points to yet another node. The Node with the arrows coming out of it is the parent, and the nodes below it are the children >
This relationship can be extended Ad infinitum as we can see with the picture below
<Picture with nodes labeled>
However anything above grandchild and grandparent just becomes tedious , so we tend to Generalize this relationship to ancestors and descendants. A key point here is to remember that while we are borrowing terms from the family tree , nodes will only have one parent . Each node can have multiple children, however .
We refer to the links connect each of the nodes as branches or links or edges. This tends to be a matter of personal preference.
Finally , we have one special node that sits above all the other nodes . This note is the root and it is analogous to the head of a linked list . All of our operations will start at the root of the node\footnote{Remember , programmers are stereotypically outdoors of averse, So they May have forgotten what a real tree looks like. Thus, we'll see that the root of the tree is at the top of the tree and our leaves are at the bottom\footnote{Or maybe it's some weird hydroponic zero-G kind of thing.}}.
Remember , programmers are stereotypically outdoors of averse, So they May have forgotten what a real tree looks like. Thus, we'll see that the root of the tree is at the top of the tree and our leaves are at the bottom\footnote{Or maybe it's some weird hydroponic zero-G kind of thing.}
\subsection{Where the Recursion comes in}
There is a reason we learned recursion before we introduce trees. Trees are the exemplar recursive data structure
Each tree has a root and That route has children . If we view each of those children as the root of their own subtree , this can make our algorithms for adding removing and searching extremely easy to write.
<picture Of tree, the recursive subtrees are dash circled.>
<Picture of the left subtree, with it's trees circled>
\section{Binary Search Trees}
A diagram of a binary search tree. It is made up of nodes, represented by circles, and edges (also called links or branches), represented by arrows.
In our diagrams, we will be using $\bigcirc$ to represent node. A $\times$ will be to represent a \texttt{null} type, eg the explicit absence of a Node. Finally, $\bigtriangleup$ represents a subtree9. This could be any size, including empty.
%\footnote{An aside about array based implementations.}
\section{Building a Binary Search Tree}
\subsection{The Code Outline}
As explained in Section \ref{sec:binarySearchJavaP2}, when we use the \texttt{Comparable} class in Java to require that all objects stored in the tree has a \textbf{total ordering}, meaning every pair of objects we're storing has an ordering.
In practice, this means that anything \texttt{Comparable} can be sorted.
Python, of course, doesn't need these restrictions.
\begin{minted}{Java}
public class BinaryTree<E extends Comparable<E>> {
}
\end{minted}
Much like our Linked List, we don't need much in the way of instance variables. We'll create a \texttt{root} to keep track of the starting place for our tree and size to keep track of how many items we have stored.
Finally, we will also create our inner \texttt{Node} class for the Tree.
It needs to hold the item and the locations of the left and right children.
We'll also go ahead and add a The constructor and a method for printing out the item in the node (\texttt{toString} in Java and \texttt{\_\_str\_\_} in Python )
\begin{javacode}{The Constructor and Inner Class}
public class BinaryTree<E extends Comparable<E>> {
private Node<E> root;
private int size;
public BinaryTree() {
this.root = null;
}
private static class Node<E extends Comparable<E>> {
private E item;
private Node<E> left; // left child
private Node<E> right; // right child
public Node(E item) {
this.item = item;
}
public String toString() {
return item.toString();
}
}
}
\end{javacode}
\subsection{Contains}
\subsection{Add}
All of our operations in out \texttt{BinaryTree} will be implemented recursively.
\subsection{Delete}