Skip to content

Commit 7b4257d

Browse files
authoredAug 25, 2024
Fix typos in Architecture chapter (#97)
1 parent ab5caf0 commit 7b4257d

11 files changed

+79
-79
lines changed
 

‎02_Architecture/01_Overview.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -4,31 +4,31 @@ Before going beyond a basic "hello world" and implementing the first real parts
44

55
It's worth noting that we're going to focus exclusively on `x86_64` here, and some concepts are specific to this platform (the GDT, for example), while some concepts are transferable across most platforms (like a higher half kernels). Some, like interrupts and interrupt handlers, are only partially transferable to other platforms.
66

7-
Similarly to the previous part, this chapter will be an high level introduction of the concept that will be explained later.
7+
Similarly to the previous part, this chapter will be a high level introduction of the concept that will be explained later.
88

99
The [Hello World](02_Hello_World.md) chapter will guide through the implementation of some basic _serial i/o_ functions to be used mostly for debugging purpose (especially with an emulator), we will see how to send characters, strings and how to read them.
1010

1111
Many modern operating systems place their kernel in the _Higher Half_ of the virtual memory space, what it is, and how to place the kernel there is explained in the [Higher Half](03_Higher_Half.md) chapter.
1212

1313
In the [GDT](04_GDT.md) we will explain one of the `x86` structures used to _describe_ the memory to the CPU, although is a legacy structures its usage is still required in several part of the kernel (especially when dealing with userspace)
1414

15-
Then the chapters [Interrup Handling](05_InterruptHandling.md), [ACPI Tables](06_AcpiTables.md) and [APIC](07_APIC.md) will discuss how the `x86` cpu handle the exceptions and interrupts, and how the kernel should deal with them.
15+
Then the chapters [Interrupt Handling](05_InterruptHandling.md), [ACPI Tables](06_AcpiTables.md) and [APIC](07_APIC.md) will discuss how the `x86` cpu handle the exceptions and interrupts, and how the kernel should deal with them.
1616

1717
The [Timers](08_Timers.md) chapter will use one of the Interrupts handling routines to interrupt the kernel execution at regular intervals, this will be the ground for the implementation of the multitasking in our kernel.
1818

19-
The final three chapters of this part: [PS2 Keyboard Overview](09_Add_Keyboard_Support.md), [PS2 Keybord Interrupt Handling](10_Keyboard_Interrupt_Handling.md), [PS2 Keyboard Driver implementation](11_Keyboard_Driver_Implemenation.md) will explain how a keyboard work, what are the scancodes, how to translate them into character, and finally describe the steps to implement a basic keyboard driver.
19+
The final three chapters of this part: [PS2 Keyboard Overview](09_Add_Keyboard_Support.md), [PS2 Keyboard Interrupt Handling](10_Keyboard_Interrupt_Handling.md), [PS2 Keyboard Driver implementation](11_Keyboard_Driver_Implemenation.md) will explain how a keyboard work, what are the scancodes, how to translate them into character, and finally describe the steps to implement a basic keyboard driver.
2020

2121
## Address Spaces
2222

23-
If we've never programmed at a low level before, we'll likely only dealt with a single address space: the virtual address space the program lives in. However there are actually many other address spaces to be aware of!
23+
If we've never programmed at a low level before, we'll likely only deal with a single address space: the virtual address space the program lives in. However, there are actually many other address spaces to be aware of!
2424

2525
This brings up the idea that an address is only useful in a particular address space. Most of the time we will be using virtual addresses, which is fine before our program lives in a virtual address space, but at times we will use *physical addresses* which, as we might have guessed, deal with the physical address space.
2626

2727
These are not the same, as we'll see later on we can convert virtual addresses to physical addresses (usually the cpu will do this for us), but they are actually separate things.
2828

2929
There are also other address spaces we may encounter in osdev, like:
3030

31-
- Port I/O: Some older devices on x86 are wired up to 'ports' on the cpu, with each port being given an address. These addresses are not virtual or physical memory addresses, so we can't access them like pointers. Instead special cpu instructions are used to move in and out of this address space.
31+
- Port I/O: Some older devices on x86 are wired up to 'ports' on the cpu, with each port being given an address. These addresses are not virtual or physical memory addresses, so we can't access them like pointers. Instead, special cpu instructions are used to move in and out of this address space.
3232
- PCI Config Space: PCI has an entirely separate address that for configuring devices. This address space has a few different ways to access it.
3333

3434
Most of the time we won't have to worry about which address space to deal with: hardware will only deal with physical addresses, and the code will mostly deal with virtual addresses. As mentioned earlier we'll later look at how we use both of these so don't worry!
@@ -49,7 +49,7 @@ It's easy to be overwhelmed by the number of fields in the GDT, but most modern
4949

5050
The currently active descriptors tell the CPU what mode it is in: if a user code descriptor is loaded - it's running user-mode code. Data descriptors tell the CPU what privilege level to use when we access memory, which interacts with the user/supervisor bit in the page tables (as we'll see later).
5151

52-
If unsure where to start, we'll need a 64-bit kernel code descriptor and 64-bit kernel data descriptor at the bare mimimum.
52+
If unsure where to start, we'll need a 64-bit kernel code descriptor and 64-bit kernel data descriptor at the bare minimum.
5353

5454
## How The CPU Executes Code
5555

@@ -63,7 +63,7 @@ These things can happen at any time, and as the operating system kernel we would
6363

6464
When an unexpected event happens, the cpu will immediately stop the current code it's running and start running a special function called an *interrupt handler*. The interrupt handler is something the kernel tells the cpu about, and the function can then work out what event happened, and then take some action. The interrupt handler then tells the cpu when it's done, and then cpu goes back to executing the previously running code.
6565

66-
The interrupted code is usually never aware that an interrupt even ocurred, and should continue on as normal.
66+
The interrupted code is usually never aware that an interrupt even occurred, and should continue on as normal.
6767

6868
## Drivers
6969

‎02_Architecture/02_Hello_World.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Hello World
22

3-
During the development of our kernel we will need to debug a lot, and checking a lot of values, but so far our kernel is not capable of doing anything, and having proper video output with scrolling, fonts etc, can take some time, so we need a quick way of getting some text out from our kernel, not necessarily on the screen.
3+
During the development of our kernel we will need to debug a lot, and checking a lot of values, but so far our kernel is not capable of doing anything, and having proper video output with scrolling, fonts etc., can take some time, so we need a quick way of getting some text out from our kernel, not necessarily on the screen.
44

55
This is where the serial logging came to an aid, we will use the serial port to output our text and numbers.
66

@@ -14,13 +14,13 @@ This will save the serial output on the file called `filename.log`, if we want t
1414

1515
## Printing to Serial
1616

17-
We will use the `inb` and `outb` instruction to communicate with the serial port. But the first thing our kernel should do is do is being able to write to serial ports. To do that we need:
17+
We will use the `inb` and `outb` instruction to communicate with the serial port. But the first thing our kernel should do is being able to write to serial ports. To do that we need:
1818

19-
* for simiplicity and readability two C functions that will make use of the inb/outb asm instructions (luckily they are asm functions so making their c version is very easy)
19+
* for simplicity and readability two C functions that will make use of the inb/outb asm instructions (luckily they are asm functions so making their c version is very easy)
2020
* initialization of serial communication
2121
* and at least an instruction to send characters and strings to the serial.
2222

23-
The first step is pretty strightforward, using inline assembly we will create two "one-line" functions for inb and outb:
23+
The first step is pretty straightforward, using inline assembly we will create two "one-line" functions for inb and outb:
2424

2525
```C
2626
extern inline unsigned char inportb (int portnum)
@@ -69,7 +69,7 @@ static int init_serial() {
6969
}
7070
```
7171

72-
Notice that usually the com1 port is mapped to address: *0x3f8*. The function above is setting just default values for serial communication. An alternative that does not require any initialization is to use the port `0xe9`, this is also know as the _debugcon_ or the _port e9 hack_ and it still use the `inportb` and `outportb` functions as they are, but is often faster because is a special port that sends data directly to the emulator console output.
72+
Notice that usually the com1 port is mapped to address: *0x3f8*. The function above is setting just default values for serial communication. An alternative that does not require any initialization is to use the port `0xe9`, this is also known as the _debugcon_ or the _port e9 hack_, and it still uses the `inportb` and `outportb` functions as they are, but is often faster because is a special port that sends data directly to the emulator console output.
7373

7474
### Sending a string
7575

@@ -105,9 +105,9 @@ As an example consider the number 1235: $1235/10=123.5$ and $1235 \mod 10=5$, r
105105
* $12/10 = 1$ and $12 \mod10 = 2$
106106
* $1/10 = 0$ and $1 \mod 10 = 1$
107107

108-
And as we can see we got all the digits in reverse order, so now the only thing we need to do is reverse the them. The implementation of this function should be now pretty straightforward, and it will be left as exercise.
108+
And as we can see we got all the digits in reverse order, so now the only thing we need to do is reverse them. The implementation of this function should be now pretty straightforward, and it will be left as exercise.
109109

110-
Printing other format like Hex or Octal is little bit different, but the base idea of getting the single number and converting it into a character is similar. The only tricky thing with the hex number is that now we have symbols for numbers between 10 and 15 that are characters, and they are before the digits symbol in the ascii map, but once that is known it is going to be just an if statement in our function.
110+
Printing other format like Hex or Octal is a little bit different, but the base idea of getting the single number and converting it into a character is similar. The only tricky thing with the hex number is that now we have symbols for numbers between 10 and 15 that are characters, and they are before the digits symbol in the ascii map, but once that is known it is going to be just an if statement in our function.
111111

112112
### Troubleshooting
113113

‎02_Architecture/04_GDT.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ Most descriptors are 8 bytes wide, usually resulting in the selectors looking li
1111
- null descriptor: selector 0x0
1212
- first descriptor: selector 0x8
1313
- second descriptor: selector 0x10
14-
- third descritor: selector 0x18
14+
- third descriptor: selector 0x18
1515
- etc ...
1616

1717
There is one exception to the 8-byte-per-descriptor rule, the TSS descriptor, which is used by the `ltr` instruction to load the task register with a task state segment. It's a 16-byte wide descriptor.
1818

1919
Usually these selectors are for code (CS) and data (DS, SS), which tell the cpu where it's allowed to fetch instructions from, and what regions of memory it can read/write to. There are other selectors, for example the first entry in the GDT must be all zeroes (called the null descriptor).
2020

21-
The null selector is mainly used for edge cases, and is usually treated as 'ignore segmentation', although it can lead to #GP faults if certain instructions are issued. Its usage only occurs with more advanced parts of x86, so we'll known to look out for it.
21+
The null selector is mainly used for edge cases, and is usually treated as 'ignore segmentation', although it can lead to #GP faults if certain instructions are issued. Its usage only occurs with more advanced parts of x86, so we'll know to look out for it.
2222

2323
The code and data descriptors are what they sound like: the code descriptor tells the cpu what region of memory it can fetch instructions from, and how to interpret them. Code selectors can be either 16-bit or 32-bit, or if running in long mode 64-bit or 32-bit.
2424

@@ -48,13 +48,13 @@ The various segment registers:
4848
- _FS_: F selector, no specific purpose. Sys V ABI uses it for thread local storage.
4949
- _GS_: G selector, no specific purpose. Sys V ABI uses it for process local storage, commonly used for cpu-local storage in kernels due to `swapgs` instruction.
5050

51-
When using a selector to refer to a GDT descriptor, we'll also need to specify the ring we're trying to access. This exists for legacy reasons to solve a few edge cases that have been solved in other ways. If we will need to use these mechanisms, we'll know, otherwise the default (setting to zero) is fine.
51+
When using a selector to refer to a GDT descriptor, we'll also need to specify the ring we're trying to access. This exists for legacy reasons to solve a few edge cases that have been solved in other ways. If we need to use these mechanisms, we'll know, otherwise the default (setting to zero) is fine.
5252

5353
A _segment selector_ contains the following information:
5454

5555
* `index` bits 15-3: is the GDT selector.
5656
* `TI` bit 2: is the Table Indicator if clear it means GDT, if set it means LDT, in our case we can leave it to 0.
57-
* `RPL` bits 1 and 0: is the Requested Priivlege Level, it will be explained later.
57+
* `RPL` bits 1 and 0: is the Requested Privilege Level, it will be explained later.
5858

5959

6060
Constructing a segment selector is done like so:
@@ -69,7 +69,7 @@ selector |= ((is_ldt_selector & 0b1) << 2);
6969

7070
The `is_ldt_selector` field can be set to tell the cpu this selector references the LDT (local descriptor table) instead of the GDT. We're not interested in the LDT, so we will leave this as zero. The `target_cpu_ring` field (called RPL in the manuals), is used to handle some edge cases. This is best set to the same ring the selector refers to (if the selector is for ring 0, set this to 0, if the selector is for ring 3, set this to 3).
7171

72-
It's worth noting that in the early stages of the kernel we only be using the GDT and kernel selectors, meaning these fields are zero. Therefore this calculation is not necessary, we can simply use the byte offset into the GDT as the selector.
72+
It's worth noting that in the early stages of the kernel we only be using the GDT and kernel selectors, meaning these fields are zero. Therefore, this calculation is not necessary, we can simply use the byte offset into the GDT as the selector.
7373

7474
This is also the first mention of the LDT (local descriptor table). The LDT uses the same structure as the GDT, but is loaded into a separate register. The idea being that the GDT would hold system descriptors, and the LDT would hold process-specific descriptors. This tied in with the hardware task switching that existed in protected mode. The LDT still exists in long mode, but should be considered deprecated by paging.
7575

@@ -92,7 +92,7 @@ When a descriptor is loaded into the appropriate segment register, it creates a
9292

9393
The idea is to place code in one region of memory, and then create a descriptor with a base and limit that only expose that region of memory to the cpu. Any attempts to fetch instructions from outside that region will result in a #GP fault being triggered, and the kernel will intervene.
9494

95-
Accessing memory inside a segment is done relative to its base. Lets say we have a segment with a base of `0x1000`,
95+
Accessing memory inside a segment is done relative to its base. Let's say we have a segment with a base of `0x1000`,
9696
and some data in memory at address `0x1100`.
9797
The data would be accessed at address `0x100` (assuming the segment is the active DS), as addressed are translated as `segment_base + offset`. In this case the segment base is `0x1000`, and the offset is `0x100`.
9898

@@ -115,7 +115,7 @@ mov $0x10, %ax
115115
mov %ax, %ss
116116
```
117117

118-
Changing CS (code segment) is a little trickier, as it can't be written to directly, instead it requires a far jump. Or in this case, a far return which performs the same job, it just get its values from the stack instead of from immediate operands.
118+
Changing CS (code segment) is a little trickier, as it can't be written to directly, instead it requires a far jump. Or in this case, a far return which performs the same job, it just gets its values from the stack instead of from immediate operands.
119119

120120
```x86asm
121121
reload_cs:
@@ -163,14 +163,14 @@ These are further distinguished with the `type` field, as outlined below.
163163
| 55 | 1 | Granularity: if set, limit is interpreted as 0x1000 sized chunks, otherwise as bytes |
164164
| 56 | 8 | Base address bits 31: 4 |
165165

166-
For system-type descriptors, it's best to consult the manual, the Intel SDM volume 3A chapter 3.5 has the relevent details.
166+
For system-type descriptors, it's best to consult the manual, the Intel SDM volume 3A chapter 3.5 has the relevant details.
167167

168168
The _Selector Type_ is a multibit field, for non-system descriptor types, the MSB (bit 3) is set for code descriptors, and cleared for data descriptors.
169169
The LSB (bit 0) is a flag for the cpu to communicate to the OS that the descriptor has been accessed in someway, but this feature is mostly abandoned, and should not be used.
170170

171171
For a data selector, the remaining two bits are: expand-down (bit 2) - causes the limit to grow downwards, instead of up. Useful for stack selectors. Write-allow (bit 1), allows writing to this region of memory. Region is read-only if cleared.
172172

173-
For a code selector, the remaining bits are: Conforming (bit 2) - a tricky subject to explain. Allow user code to run with kernel selectors under certain circumstances, best left cleared. Read-allow (bit 1), allows for read-only access to code for accessing constants stored near instructions. Otherwise code cannot be read as data, only for instruction fetches.
173+
For a code selector, the remaining bits are: Conforming (bit 2) - a tricky subject to explain. Allow user code to run with kernel selectors under certain circumstances, best left cleared. Read-allow (bit 1), allows for read-only access to code for accessing constants stored near instructions. Otherwise, code cannot be read as data, only for instruction fetches.
174174

175175
## Using the GDT
176176

@@ -205,7 +205,7 @@ For the type field we used the magic value `0b1011`. Bits 0/1/2 are the accessed
205205

206206
All the flags we've been setting are actually in the *upper* 32-bits of the descriptor, so we left shift by 32 bits before we place the descriptor in the GDT. The lower 32-bits of the descriptor are the limit and part of the offset fields, which are ignored in long mode.
207207

208-
For the kernel data selector we'd doing something similar:
208+
For the kernel data selector we'd do something similar:
209209

210210
```c
211211
uint64_t kernel_data = 0;

0 commit comments

Comments
 (0)
Please sign in to comment.