rust-stuco
diff --git a/‎.gitignore
Lines changed: 0 additions & 1 deletion b/‎.gitignore
Lines changed: 0 additions & 1 deletion
diff --git a/‎week12/rowlab/Cargo.toml
Lines changed: 18 additions & 0 deletions b/‎week12/rowlab/Cargo.toml
Lines changed: 18 additions & 0 deletions
diff --git a/‎week12/rowlab/README.md
Lines changed: 159 additions & 0 deletions b/‎week12/rowlab/README.md
Lines changed: 159 additions & 0 deletions
diff --git a/‎week12/rowlab/benches/brc.rs
Lines changed: 25 additions & 0 deletions b/‎week12/rowlab/benches/brc.rs
Lines changed: 25 additions & 0 deletions
diff --git a/‎week12/rowlab/build.rs
Lines changed: 15 additions & 0 deletions b/‎week12/rowlab/build.rs
Lines changed: 15 additions & 0 deletions
diff --git a/‎week12/rowlab/src/aggregation.rs
Lines changed: 131 additions & 0 deletions b/‎week12/rowlab/src/aggregation.rs
Lines changed: 131 additions & 0 deletions
diff --git a/‎week12/rowlab/src/lib.rs
Lines changed: 35 additions & 0 deletions b/‎week12/rowlab/src/lib.rs
Lines changed: 35 additions & 0 deletions
@@ -13,7 +13,6 @@ Cargo.lock
 results/
 submission/
 
-
 # These are backup files generated by rustfmt
 **/*.rs.bk
 
 
@@ -0,0 +1,18 @@
+[package]
+name = "rowlab"
+version = "0.1.0"
+edition = "2024"
+default-run = "rowlab"
+
+[dependencies]
+itertools = "0.14.0"
+rand = "0.9.0"
+rand_distr = "0.5.1"
+regex = "1.11.1"
+
+[dev-dependencies]
+criterion = "0.5"
+
+[[bench]]
+name = "brc"
+harness = false
@@ -0,0 +1,159 @@
+### 98-008: Intro to the Rust Programming Language
+
+# Row Lab
+
+This is the final assignment of the semester! Congrats on making it here.
+
+Your final task is to implement something similar to
+[The One Billion Row Challenge](https://www.morling.dev/blog/one-billion-row-challenge/). The goal
+here is to get familiar with parallelism in Rust, as well as put together everything you've learned
+over the past semester to write a program that has practical application in the real world!
+
+We are not going to give that much guidance here, since at this point you should be familiar enough
+with Rust as a language that you can figure out everything on your own. Of course, we'll explain
+enough to get you started.
+
+**The description of the original challenge can be found
+[here](https://www.morling.dev/blog/one-billion-row-challenge/), so give it a quick read!**
+
+The main difference between this assignment and the real challenge is A) we are not writing Java,
+and B) instead of reading the data from a file / disk, we computationally generate the random data
+(in-memory) via an iterator.
+
+_The second difference is mainly because Gradescope does not support more than 6 GB of memory per
+autograder (1 billion rows is approximately 14 GB), which means the complete data cannot fit in
+memory. Asking you to interact with I/O while also dealing with parallelism seemed a bit too cruel
+for this assignment, so we modified the challenge slightly. That being said, we encourage you to
+take the code you write for this lab and try the real challenge yourself!_
+
+**For this lab, you are allowed to use third-party crates!** This means you will have to also submit
+your `Cargo.toml` file. See the [Submission](#submission) section for more information.
+
+# Starter Code
+
+We have provided quite a lot of starter code for you to use! The two files that you should be
+modifying are `aggregation.rs` and `lib.rs`. You are allowed to modify `main.rs` and
+`measurements.rs` locally on your own computer, but the Gradescope autograder will be using the
+starter code for those two files. The other two files you should know about are `tests/mock.rs` as
+well as `benches/brc.rs`, which are explained in the next two sections.
+
+`aggregation.rs` contains our recommended helper structs and methods for aggregating the data. You
+are allowed to completely rewrite everything except the function signature of `aggregate` and the
+struct definitions for `WeatherStations` and `AggregationResults` (but you are allowed to and
+encouraged to change the fields of `AggregationResults`).
+
+Once you have implemented the `todo!()`s in `aggregation.rs`, you can move on to `lib.rs`. We have
+provided you with a naive single-threaded version of this challenge. From here, it is up to you to
+make things faster! See the [Benchmarking](#benchmarking-and-leaderboard) section for some hints 🦀.
+
+# Testing
+
+There are 3 integration tests located in `tests/mock.rs`. We will manually check your code for
+parallelism, and as long as you have integrated parallelism in some non-trivial manner, you will
+receive full credit if you pass the 3 tests.
+
+If you make any changes to struct definitions or function signatures, make sure that you can still
+compile everything with `cargo test`!
+
+# Benchmarking and Leaderboard
+
+We have set up benchmarking via [Criterion](https://bheisler.github.io/criterion.rs/book/) for you.
+You can run `cargo bench` to see how long (on average) your `aggregate` function takes to aggregate
+1 billion rows. Note that the minimum number of samples it will run is 10, so if your code is
+**very** slow, you might just want to run the small timing program in `main.rs` via `cargo run`.
+
+There will also be a leaderboard on Gradescope! Compete to please Ferris with the fastest time. We
+will give you quite a lot of extra credit if you can beat Ferris (our reference solution) by some
+non-trivial amount. The top leaderboard finishers might get a huge amount of points 🦀🦀🦀🦀🦀
+
+### Optimizations
+
+There are many, many ways to speed up a program like the one you need to implement. In fact, there
+is a whole field dedicated to speeding up this kind of program: when you have a `GROUP BY` clause in
+SQL, the relational database executing the SQL query is doing almost this exact aggregation! If you
+are interested in this, you should take CMU's
+[Databse Systems](https://15445.courses.cs.cmu.edu/spring2025/) course.
+
+We won't go into detail here, but you are allowed to go online and look at all of the techniques
+other people have used for this challenge. You can also read the
+[Rust Performance Book](https://nnethercote.github.io/perf-book/) online. Just make sure not to copy
+and paste anyone else's code without citing them first!
+
+For this assignment, we would actually encourage you to look at the reference solution after giving
+a good-faith attempt at designing an algorithm yourself. Our reference solution is purposefully not
+very well optimized, but it does show the syntax for using parallelism in Rust. We encourage you to
+play around with the code!
+
+Note that because the original challenge involved reading from a file (interacting with I/O), not
+everything online will be applicable to this assignment. Still, there's a lot of cool things on the
+internet that you _can_ make use of. Also, be careful when trying to use SIMD, as you will be graded
+on the Gradescope Docker containers.
+
+_That being said, if you really want to play around with I/O and perhaps some `unsafe`ty with system
+calls (like `mmap`), reach out to us! We might give permission for you to submit the real challenge
+if we think you are capable of it._
+
+# Submission
+
+For this lab, you are allowed to use third-party crates! **This means that you must also submit your
+`Cargo.toml` file** (otherwise we wouldn't be able to compile your code). The `build.rs` build
+script will handle that for you
+
+### Formatting and Style
+
+The autograder will run these two commands on your code:
+
+```sh
+cargo clippy && cargo fmt --all -- --check
+```
+
+**If the autograder detects any errors from the command above, you will not be able to receive**
+**any points.** This may seem strict, but we have decided to follow standard best practices for
+Rust.
+
+By following [Rust's style guidelines](https://doc.rust-lang.org/stable/style-guide/), you ensure
+that anybody reading your code (who is familiar with Rust) will be able to easily navigate your
+code. This can help with diving into an unfamiliar code base, and it also eliminates the need for
+debate with others over style rules, saving time and energy.
+
+See the official [guidelines](https://doc.rust-lang.org/stable/style-guide/) for more information.
+
+### Unix
+
+If you are on a unix system, we will try to create a `handin.zip` automatically for you,
+**but you will need to have `zip` already installed**.
+
+If you _do not_ have `zip` installed on your system, install `zip` on your machine or use the CMU
+Linux SSH machines. If you need help with this, please reach out to us!
+
+Once you have `zip` installed, we will create the `handin.zip` automatically for you (_take a peek_
+_into `build.rs` file if you're interested in how this works!_).
+
+Once you have the `handin.zip` file, submit it (and only the zip) to Gradescope.
+
+### Windows
+
+If you are on a windows system, you can zip the `src/` folder manually and upload that to
+Gradescope. For this lab, you also need to add the `Cargo.toml` file to that zip folder. Please
+reach out to us if you are unsure how to do this!
+
+Note that you don't _need_ to name it `handin.zip`, you can name it whatever you'd like.
+
+# Collaboration
+
+In general, feel free to discuss homeworks with other students! As long as you do not copy someone
+else's work, any communication is fair game.
+
+All formal questions should be asked on Piazza. Try to discuss on Piazza so that other students can
+see your questions and answers as well!
+
+You can also discuss on Discord, but try to keep any technical questions on Piazza.
+
+# Feedback
+
+We would like to reiterate that you should let us know if you spent anywhere in significant excess
+of an hour on this homework.
+
+In addition, Rust has a notoriously steep learning curve, so if you find yourself not understanding
+the concepts, you should reach out to us and let us know as well --- chances are, you're not the
+only one!
@@ -0,0 +1,25 @@
+//! The 1 billion row challenge! Except without interacting with any I/O!
+
+use criterion::{Criterion, black_box, criterion_group, criterion_main};
+use rowlab::{BILLION, WeatherStations, aggregate};
+
+pub fn one_billion_row_challenge(c: &mut Criterion) {
+    // Create the measurements iterator. In the real challenge, you would be reading these values
+    // from a file on disk.
+    let stations = WeatherStations::new();
+    let measurements = stations.measurements();
+
+    c.bench_function("brc", |b| {
+        b.iter(|| {
+            black_box(aggregate(measurements.clone().take(BILLION)));
+        })
+    });
+}
+
+criterion_main!(benches);
+criterion_group! {
+    name = benches;
+    config = Criterion::default()
+                .sample_size(10);
+    targets = one_billion_row_challenge
+}
@@ -0,0 +1,15 @@
+use std::process::Command;
+
+fn main() {
+    if cfg!(unix) {
+        Command::new("zip")
+            .arg("-r")
+            .arg("handin.zip")
+            .arg("src/")
+            .arg("Cargo.toml")
+            .output()
+            .expect("\nError: Unable to zip handin files. Either the zip executable is not installed on this computer, the zip binary is not on your PATH, or something went very wrong with zip. Please contact the staff for help!\n\n");
+    }
+
+    println!("cargo:rerun-if-changed=handin.zip");
+}
@@ -0,0 +1,131 @@
+use itertools::Itertools;
+use std::collections::HashMap;
+use std::fmt::{Display, Write};
+
+/// Aggregate statistics for a specific [`WeatherStation`].
+#[derive(Debug, Clone, Copy)]
+pub struct StationAggregation {
+    /// The minimum temperature measurement.
+    min: f64,
+    /// The maximum temperature measurement.
+    max: f64,
+    /// The average / mean temperature measurement.
+    mean: f64,
+    /// Helper field for calculating mean (sum_measurements / num_measurements).
+    sum_measurements: f64,
+    /// Helper field for calculating mean (sum_measurements / num_measurements).
+    num_measurements: f64,
+}
+
+impl StationAggregation {
+    /// Creates a new `StationAggregation` for computing aggregations.
+    pub fn new() -> Self {
+        Self {
+            min: f64::INFINITY,
+            mean: 0.0,
+            max: f64::NEG_INFINITY,
+            sum_measurements: 0.0,
+            num_measurements: 0.0,
+        }
+    }
+
+    /// Updates the aggregation with a new measurement.
+    ///
+    /// TODO(student): Is processing measurements one-by-one the best way to compute aggregations?
+    /// Remember that you are allowed to add other methods in this implementation block!
+    pub fn add_measurement(&mut self, measurement: f64) {
+        todo!("Implement me!")
+    }
+
+    pub fn min(&self) -> f64 {
+        self.min
+    }
+
+    pub fn max(&self) -> f64 {
+        self.max
+    }
+
+    pub fn mean(&self) -> f64 {
+        self.mean
+    }
+}
+
+impl Display for StationAggregation {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{:.1}/{:.1}/{:.1}", self.min, self.mean, self.max)
+    }
+}
+
+/// The aggregation results for the billion row challenge.
+///
+/// TODO(student): This is purposefully not an ideal structure! You are allowed to change what
+/// types this struct contains. Think about what this structure should represent, and where the data
+/// might best be located. Also, you are allowed to use third-party data structures.
+#[derive(Debug)]
+pub struct AggregationResults {
+    /// A map from weather station identifier to its aggregate metrics.
+    results: HashMap<String, StationAggregation>,
+}
+
+impl AggregationResults {
+    /// Creates an empty `AggregationResult`.
+    pub fn new() -> Self {
+        Self {
+            results: HashMap::new(),
+        }
+    }
+
+    /// Updates the metrics for the given station with a measurement.
+    pub fn insert_measurement(&mut self, station: &str, measurement: f64) {
+        todo!("Implement me!")
+    }
+
+    /// Retrieve the stats of a specific station, if it exists. Used for testing purposes.
+    pub fn get_metrics(&self, station: &str) -> Option<StationAggregation> {
+        self.results.get(station).copied()
+    }
+}
+
+impl Display for AggregationResults {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        // Sort the results by weather station ID and join into the output string format.
+        let sorted_results: Vec<_> = self
+            .results
+            .iter()
+            .sorted_by(|a, b| Ord::cmp(&a.0, &b.0))
+            .collect();
+
+        f.write_char('{')?;
+
+        // Append each weather station's metrics to the output string.
+        for (station, aggregation) in sorted_results.iter().take(sorted_results.len() - 1) {
+            f.write_str(station)?;
+            f.write_char('=')?;
+            // Note that implementing `Display` on `StationAggregation` means that you can call
+            // `to_string` and it will do a similar thing as `Display::fmt`.
+            f.write_str(&aggregation.to_string())?;
+            f.write_char(',')?;
+            f.write_char(' ')?;
+        }
+
+        let (last_station, last_aggregation) =
+            sorted_results.last().expect("somehow empty results");
+        f.write_str(last_station)?;
+        f.write_char('=')?;
+        f.write_str(&last_aggregation.to_string())?;
+
+        f.write_char('}')
+    }
+}
+
+impl Default for StationAggregation {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl Default for AggregationResults {
+    fn default() -> Self {
+        Self::new()
+    }
+}
@@ -0,0 +1,35 @@
+#![doc = include_str!("../README.md")]
+
+mod aggregation;
+use aggregation::AggregationResults;
+
+mod measurements;
+pub use measurements::WeatherStations;
+
+/// One billion.
+pub const BILLION: usize = 1_000_000_000;
+
+/// Given an iterator that yields measurements for weather stations, aggregate each weather
+/// station's data.
+///
+/// TODO(student): This is purposefully an very bad way to compute aggregations (namely, completely
+/// sequentially). If you don't want to time out, you will need to introduce parallelism in some
+/// manner. And even after you introduce parallelism, there are many different things you can do to
+/// speed this up dramatically.
+///
+/// For this lab, we would encourage you to look at the reference solution after giving this a good
+/// attempt on your own! Note that the reference solution is purposefully not optimized in several
+/// places, and there is lots of room for improvement. We also encourage you to go online and see if
+/// you can find any interesting techniques for speeding this up.
+pub fn aggregate<'a, I>(measurements: I) -> AggregationResults
+where
+    I: Iterator<Item = (&'a str, f64)>,
+{
+    let mut results = AggregationResults::new();
+
+    for (station, measurement) in measurements {
+        results.insert_measurement(station, measurement);
+    }
+
+    results
+}