This repository was archived by the owner on Mar 8, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathanswers-lab1.txt
156 lines (127 loc) · 11.5 KB
/
answers-lab1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# Lab 1
### 1. At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?
Looking through the `obj/boot/boot.asm` file, the processor starts executing 32-bit code immediately after the instruction at line 77 (`lamp $PROT_MODE_CSEG, $protcseg`). This line is a long jump to the label `protcseg` which is located at address `00007c32` as shown in line 81.
The switch from 16- to 32-bit is caused by the code in lines 65-73 updating the Control Register 0 (`CR0`), then the line `url $CR0_PE_ON, %eax` setting the Protection Enable bit of the `CR0` register. This bit, when set, enables protected mode, however the CPU remains in a pseudo 32-bit state until a far jump is executed in line 77.
### 2. What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?
#### Last:
In `boot/main.c`, the last instruction is:
```
((void (*)(void)) (ELFHDR->e_entry))();
```
In `obj/boot/boot.asm`, the last instruction can be found on line 305:
```
7d61: ff 15 18 00 01 00 call *0x10018
```
#### First:
In `kern/entry.S`, the first instruction of the kernel, on line 44, is:
```
movw $0x1234,0x472 # warm boot
```
In `obj/kern/kernel.asm`, the first instruction is on line 20:
```
f010000c: 66 c7 05 72 04 00 00 movw $0x1234,0x472
```
### 3. Where is the first instruction of the kernel?
The first instruction of the kernel can be found in `kern/init.c`. We can find the address in gdb using `info function _start`:
```
>>> info function _start
All functions matching regular expression "_start":
Non-debugging symbols:
0x0010000c _start
```
The address is `0x0010000c`.
### 4. How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?
The boot loader decides how many sectors it must read in order to fetch the entire kernel from disk based on the program headers of the ELF object. Specifically, the boot loader looks at the `filesz` and `memsz` fields in the program headers. The `filesz` field indicates the size of the segment in the file, while the `memsz` field indicates the size of the segment in memory after it's loaded.
The information is found in the program headers of the ELF binary. The boot loader uses the ELF program headers to decide how to load the sections. The program headers specify which parts of the ELF object to load into memory and the destination address each should occupy. The areas of the ELF object that need to be loaded into memory are those that are marked as `LOAD`.
Therefore, to determine how many sectors it must read, the boot loader will use the `filesz` field from the program headers of the segments marked as `LOAD` in the ELF binary of the kernel. By calculating the total size based on these fields, the boot loader can then determine the number of sectors it needs to read from disk to fetch the entire kernel.
### 5. Where do the pointer addresses in lines 1 and 6 come from?
#### Line 1 of the output:
- `a = 0x16b927578`: This is the address of the array `a` on the stack. When you declare an array in a function, the compiler allocates space for it on the stack, and this address represents the beginning of that space.
- `b = 0x6000000bc020`: This is the address returned by the `malloc` function which allocates 16 bytes of memory on the heap. `b` points to this allocated memory.
- `c = 0xa4732d6784f006a`: This is an uninitialized pointer, so it contains a garbage value, which in this case happens to be `0xa4732d6784f006a`.
#### Line 6 of the output:
- `a = 0x16b927578`: The address of `a` hasn’t changes from line 1, as expected.
- `b = 0x16b92757c`: The address of `b` now points to the second element of array `a`. Since integers take up 4 bytes, adding 1 to an int pointer would advance it by 4 bytes, hence `0x16fd1f578 + 4 = 0x16fd1f57c`.
- `c = 0x16b927579`: The address of `c` now points to one byte after the beginning of array `a`. The assignment `c = (int *) ((char *) a + 1);` first cats the int pointer `a` to a char pointer (which works byte-by-byte) and then adds 1 to it, effectively moving the pointer by one byte.
### 6. How do we get all the values in lines 2 through 4?
2: The pointer `c` is set to the value of the array `a`, making both `a` and `c` reference the same array. A for loop is used to set initial values in the array, then the first value of the array is set using `c[0] = 200`.
3: The second value of the array is set using `c[1] = 300`. Then the pointer value of `c` (the first value of the array), increased by two, to set the third value of the array (`*(c + 2) = 301;`). Finally, the less common, but valid way to access array elements `3[c] = 302` is used to set the 4th value of the array. It is equivalent to `c[3]`.
4: The pointer value of `c` (previously the first value in the array) is incremented to be the second value in the array, then then `c` is set to `400`, making the second value in the array `400`.
### 7. Why are the values printed in line 5 seemingly corrupted?
The pointer type of `c` is changed temporarily to `char *`, then is incremented by one byte, and finally re-cast back to `int *`. This effectively shifts it by one byte. Now `c` points to an address between two `int` values of the `a` array. When `*c = 500` is called, it writes `500` starting from that in-between position. This leads to an undefined behavior, and it affects the bytes of both `a[1]` and `a[2]` due to the overlap and how integer values are stored in memory.
### 8. Why are they different?
They are different because, between the two breakpoints, the kernel is loaded into memory.
### 9. What is there at the second breakpoint? (You do not really need to use QEMU to answer this question.)
The kernel loaded into physical memory:
```
>>> x/8w 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x0000b812 0x220f0011 0xc0200fd8
```
### 10. What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren’t in place? Comment out the movl %eax %cr0 in `kern/entry.S`, trace into it, and see if you were right.
If the mapping weren’t in place, paging would be disabled. Without paging enabled, the following instruction would fail because it would interpret `0xf010002c` as a physical address and jump to an invalid address causing qemu to crash.
#### First Instruction
```
movl $0x0,%ebp # nuke frame pointer
```
This can be validated by commenting out `movl %eax %cr0` in `kern/entry.S`. With the line commented out, qemu crashes and we get the following output,
```
[streedan@os2 (lab1) ~/jos-labs-nilsstreedain$] make qemu-nox
***
*** Use Ctrl-a x to exit qemu
***
qemu-system-i386 -nographic -drive file=obj/kern/kernel.img,index=0,media=disk,format=raw -serial mon:stdio -gdb tcp::28507 -D qemu.log
qemu: fatal: Trying to execute code outside RAM or ROM at 0xf010002c
```
showing a failure to execute code at the address 0xf010002c.
### 11. Explain the interface between `kern/printf.c` and `kern/console.c`. Specifically, what function does `kern/console.c` export? How is this function used by `kern/printf.c`?
The interface between `kern/printf.c` and `kern/console.c` is established through the `cputchar` function exported by `kern/console.c`. This function plays a crucial role in handling character output operations within the operating system. It operates at a low level, responsible for placing characters in different output destination such as the serial/parallel ports and the CGA buffer which ultimately is displayed. When `kern/printf.c` needs to output characters, `cputchar` is called from `putch(int ch, int* int)`.
### 12. Explain the following from `kern/console.c`:
```c
if (crt_pos >= CRT_SIZE) {
int i;
memcpy(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
crt_buf[i] = 0x0700 | ' ';
crt_pos -= CRT_COLS;
}
```
This code block is responsible for implementing scrolling behavior in the console. When the cursor position (`crt_pos`) reaches the maximum size of the console buffer (`CRT_SIZE`), it shifts the contents of the buffer up by one row, discarding the top row, and fills the newly created bottom row with blank spaces, allowing text to scroll upwards when the console is full.
### 13. In the call to `cprintf()`, to what does `fmt` point? To what does `ap` point?
In the call to `cprintf()`, `fmt` points to the string `x %d, y %x, z %d\n`, whereas `ap` points to the array of variable objects on the stack (`[x, y, z]`).
### 14. List (in order of execution) each call to `cons_putc`, `va_arg`, and `vcprintf`. For `cons_putc`, list its argument as well. For `va_arg`, list what `ap` points to before and after the call. For `vcprintf` list the values of its two arguments.
```
cprintf (fmt=0xf0101c2e) fmt="x %d, y %x, z %d\n"
vcprintf (fmt=0xf0101c2e, ap=0xf010ff74) fmt="x %d, y %x, z %d\n”, ap="\001"
cons_putc (c=120) c=‘x’
cons_putc (c=32) c=‘ ’
va_arg(*ap, int) Before="\001”, After="\003"
cons_putc (c=49) c=‘1’
cons_putc (c=44) c=‘,’
cons_putc (c=32) c=‘ ’
cons_putc (c=121) c=‘y’
cons_putc (c=32) c=‘ ’
va_arg(*ap, int) Before="\003”, After="\004”
cons_putc (c=51) c=‘3’
cons_putc (c=44) c=‘,’
cons_putc (c=32) c=‘ ’
cons_putc (c=122) c=‘z’
cons_putc (c=32) c=‘ ’
va_arg(*ap, int) Before="\004”, After="T\034\020?\214_\021??\027\020??_\021??\027\020?_\021?_\021?"
cons_putc (c=52) c=‘4’
cons_putc (c=10) c=‘\n’
```
### 15. What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise.
#### Output:
```
He110 World
```
If we convert `57616` to its hex value, it is `0xE110` or `e110`. Next, i is an unsigned int assigned to a hex value, which as a byte sequence would be the following, `0x72`, `0x6c`, `0x64`, `0x00`. If treated as an array of characters, it would appear as `old\0` (NULL character at end). Therefore, if we combine it with the format of the original string (`H%x Wo%s`), replacing `%x` with `e110` and `%s` with `old\0`, we get `He110 World`
### 16. The output depends on the fact that the x86 is little-endian. If the x86 were instead big-endian what would you set `i` to in order to yield the same output? Would you need to change 57616 to a different value?
57616 does not need to be changed because the hex representation would stay the same even in a big-endian system since we are dealing with a single integer value and not a sequence of character. However, in a hypothetical big-endian x86, the most significant byte would be stored at the lowest memory address, so to achieve the same output, you would need to reverse the byte ordering of `i` so we would use `0x736c6400`.
### 17. In the following code, what is going to be printed after `y=`? (Note: the answer is not a specific value.) Why does this happen?
The function will attempt to access the location on the stack where it expects the argument to be, in this case, it will be some unknown 4 byte value above `x` on the stack.
### 18. How would you have to change `cprintf` or its interface so that it would still be possible to pass it a variable of arguments?
You could push an argument indicating the number of arguments in the function call.
### 19. How many 32-bit words does each recursive nesting level of `test_backtrace` push on the stack, and what are those words?
Each nesting level of `test_backtrace` pushes 8 32-bit words onto the stack, they consist of, the return address, current value of `ebp` and `ebx`, the value of `x` for recursively calling the next backtrace, and 4 unused/reserved spaces.