|
| 1 | +### 98-008: Intro to the Rust Programming Language |
| 2 | + |
| 3 | +# Row Lab |
| 4 | + |
| 5 | +This is the final assignment of the semester! Congrats on making it here. |
| 6 | + |
| 7 | +Your final task is to implement something similar to |
| 8 | +[The One Billion Row Challenge](https://www.morling.dev/blog/one-billion-row-challenge/). The goal |
| 9 | +here is to get familiar with parallelism in Rust, as well as put together everything you've learned |
| 10 | +over the past semester to write a program that has practical application in the real world! |
| 11 | + |
| 12 | +We are not going to give that much guidance here, since at this point you should be familiar enough |
| 13 | +with Rust as a language that you can figure out everything on your own. Of course, we'll explain |
| 14 | +enough to get you started. |
| 15 | + |
| 16 | +**The description of the original challenge can be found |
| 17 | +[here](https://www.morling.dev/blog/one-billion-row-challenge/), so give it a quick read!** |
| 18 | + |
| 19 | +The main difference between this assignment and the real challenge is A) we are not writing Java, |
| 20 | +and B) instead of reading the data from a file / disk, we computationally generate the random data |
| 21 | +(in-memory) via an iterator. |
| 22 | + |
| 23 | +_The second difference is mainly because Gradescope does not support more than 6 GB of memory per |
| 24 | +autograder (1 billion rows is approximately 14 GB), which means the complete data cannot fit in |
| 25 | +memory. Asking you to interact with I/O while also dealing with parallelism seemed a bit too cruel |
| 26 | +for this assignment, so we modified the challenge slightly. That being said, we encourage you to |
| 27 | +take the code you write for this lab and try the real challenge yourself!_ |
| 28 | + |
| 29 | +**For this lab, you are allowed to use third-party crates!** This means you will have to also submit |
| 30 | +your `Cargo.toml` file. See the [Submission](#submission) section for more information. |
| 31 | + |
| 32 | +# Starter Code |
| 33 | + |
| 34 | +We have provided quite a lot of starter code for you to use! The two files that you should be |
| 35 | +modifying are `aggregation.rs` and `lib.rs`. You are allowed to modify `main.rs` and |
| 36 | +`measurements.rs` locally on your own computer, but the Gradescope autograder will be using the |
| 37 | +starter code for those two files. The other two files you should know about are `tests/mock.rs` as |
| 38 | +well as `benches/brc.rs`, which are explained in the next two sections. |
| 39 | + |
| 40 | +`aggregation.rs` contains our recommended helper structs and methods for aggregating the data. You |
| 41 | +are allowed to completely rewrite everything except the function signature of `aggregate` and the |
| 42 | +struct definitions for `WeatherStations` and `AggregationResults` (but you are allowed to and |
| 43 | +encouraged to change the fields of `AggregationResults`). |
| 44 | + |
| 45 | +Once you have implemented the `todo!()`s in `aggregation.rs`, you can move on to `lib.rs`. We have |
| 46 | +provided you with a naive single-threaded version of this challenge. From here, it is up to you to |
| 47 | +make things faster! See the [Benchmarking](#benchmarking-and-leaderboard) section for some hints 🦀. |
| 48 | + |
| 49 | +# Testing |
| 50 | + |
| 51 | +There are 3 integration tests located in `tests/mock.rs`. We will manually check your code for |
| 52 | +parallelism, and as long as you have integrated parallelism in some non-trivial manner, you will |
| 53 | +receive full credit if you pass the 3 tests. |
| 54 | + |
| 55 | +If you make any changes to struct definitions or function signatures, make sure that you can still |
| 56 | +compile everything with `cargo test`! |
| 57 | + |
| 58 | +# Benchmarking and Leaderboard |
| 59 | + |
| 60 | +We have set up benchmarking via [Criterion](https://bheisler.github.io/criterion.rs/book/) for you. |
| 61 | +You can run `cargo bench` to see how long (on average) your `aggregate` function takes to aggregate |
| 62 | +1 billion rows. Note that the minimum number of samples it will run is 10, so if your code is |
| 63 | +**very** slow, you might just want to run the small timing program in `main.rs` via `cargo run`. |
| 64 | + |
| 65 | +There will also be a leaderboard on Gradescope! Compete to please Ferris with the fastest time. We |
| 66 | +will give you quite a lot of extra credit if you can beat Ferris (our reference solution) by some |
| 67 | +non-trivial amount. The top leaderboard finishers might get a huge amount of points 🦀🦀🦀🦀🦀 |
| 68 | + |
| 69 | +### Optimizations |
| 70 | + |
| 71 | +There are many, many ways to speed up a program like the one you need to implement. In fact, there |
| 72 | +is a whole field dedicated to speeding up this kind of program: when you have a `GROUP BY` clause in |
| 73 | +SQL, the relational database executing the SQL query is doing almost this exact aggregation! If you |
| 74 | +are interested in this, you should take CMU's |
| 75 | +[Databse Systems](https://15445.courses.cs.cmu.edu/spring2025/) course. |
| 76 | + |
| 77 | +We won't go into detail here, but you are allowed to go online and look at all of the techniques |
| 78 | +other people have used for this challenge. You can also read the |
| 79 | +[Rust Performance Book](https://nnethercote.github.io/perf-book/) online. Just make sure not to copy |
| 80 | +and paste anyone else's code without citing them first! |
| 81 | + |
| 82 | +For this assignment, we would actually encourage you to look at the reference solution after giving |
| 83 | +a good-faith attempt at designing an algorithm yourself. Our reference solution is purposefully not |
| 84 | +very well optimized, but it does show the syntax for using parallelism in Rust. We encourage you to |
| 85 | +play around with the code! |
| 86 | + |
| 87 | +Note that because the original challenge involved reading from a file (interacting with I/O), not |
| 88 | +everything online will be applicable to this assignment. Still, there's a lot of cool things on the |
| 89 | +internet that you _can_ make use of. Also, be careful when trying to use SIMD, as you will be graded |
| 90 | +on the Gradescope Docker containers. |
| 91 | + |
| 92 | +_That being said, if you really want to play around with I/O and perhaps some `unsafe`ty with system |
| 93 | +calls (like `mmap`), reach out to us! We might give permission for you to submit the real challenge |
| 94 | +if we think you are capable of it._ |
| 95 | + |
| 96 | +# Submission |
| 97 | + |
| 98 | +For this lab, you are allowed to use third-party crates! **This means that you must also submit your |
| 99 | +`Cargo.toml` file** (otherwise we wouldn't be able to compile your code). The `build.rs` build |
| 100 | +script will handle that for you |
| 101 | + |
| 102 | +### Formatting and Style |
| 103 | + |
| 104 | +The autograder will run these two commands on your code: |
| 105 | + |
| 106 | +```sh |
| 107 | +cargo clippy && cargo fmt --all -- --check |
| 108 | +``` |
| 109 | + |
| 110 | +**If the autograder detects any errors from the command above, you will not be able to receive** |
| 111 | +**any points.** This may seem strict, but we have decided to follow standard best practices for |
| 112 | +Rust. |
| 113 | + |
| 114 | +By following [Rust's style guidelines](https://doc.rust-lang.org/stable/style-guide/), you ensure |
| 115 | +that anybody reading your code (who is familiar with Rust) will be able to easily navigate your |
| 116 | +code. This can help with diving into an unfamiliar code base, and it also eliminates the need for |
| 117 | +debate with others over style rules, saving time and energy. |
| 118 | + |
| 119 | +See the official [guidelines](https://doc.rust-lang.org/stable/style-guide/) for more information. |
| 120 | + |
| 121 | +### Unix |
| 122 | + |
| 123 | +If you are on a unix system, we will try to create a `handin.zip` automatically for you, |
| 124 | +**but you will need to have `zip` already installed**. |
| 125 | + |
| 126 | +If you _do not_ have `zip` installed on your system, install `zip` on your machine or use the CMU |
| 127 | +Linux SSH machines. If you need help with this, please reach out to us! |
| 128 | + |
| 129 | +Once you have `zip` installed, we will create the `handin.zip` automatically for you (_take a peek_ |
| 130 | +_into `build.rs` file if you're interested in how this works!_). |
| 131 | + |
| 132 | +Once you have the `handin.zip` file, submit it (and only the zip) to Gradescope. |
| 133 | + |
| 134 | +### Windows |
| 135 | + |
| 136 | +If you are on a windows system, you can zip the `src/` folder manually and upload that to |
| 137 | +Gradescope. For this lab, you also need to add the `Cargo.toml` file to that zip folder. Please |
| 138 | +reach out to us if you are unsure how to do this! |
| 139 | + |
| 140 | +Note that you don't _need_ to name it `handin.zip`, you can name it whatever you'd like. |
| 141 | + |
| 142 | +# Collaboration |
| 143 | + |
| 144 | +In general, feel free to discuss homeworks with other students! As long as you do not copy someone |
| 145 | +else's work, any communication is fair game. |
| 146 | + |
| 147 | +All formal questions should be asked on Piazza. Try to discuss on Piazza so that other students can |
| 148 | +see your questions and answers as well! |
| 149 | + |
| 150 | +You can also discuss on Discord, but try to keep any technical questions on Piazza. |
| 151 | + |
| 152 | +# Feedback |
| 153 | + |
| 154 | +We would like to reiterate that you should let us know if you spent anywhere in significant excess |
| 155 | +of an hour on this homework. |
| 156 | + |
| 157 | +In addition, Rust has a notoriously steep learning curve, so if you find yourself not understanding |
| 158 | +the concepts, you should reach out to us and let us know as well --- chances are, you're not the |
| 159 | +only one! |
0 commit comments