Memory memo #83

connortsui20 · 2025-04-24T18:36:24Z

Problem

Adds an in-memory implementation of the optimizer's memo table.

NOTE: Even though I am opening this PR, this is really authored by @SarveshOO7. I'll be the one reviewing it.

codecov-commenter · 2025-04-24T18:39:33Z

Codecov Report

Attention: Patch coverage is 0.70423% with 846 lines in your changes missing coverage. Please review.

Project coverage is 82.1%. Comparing base (70dbd27) to head (b43bd69).

Files with missing lines	Patch %	Lines
optd/src/core/memo/memory.rs	0.0%	833 Missing ⚠️
optd/src/core/memo/mod.rs	0.0%	13 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
optd/src/core/memo/merge_repr.rs	`98.6% <100.0%> (ø)`
optd/src/core/memo/mod.rs	`0.0% <0.0%> (ø)`
optd/src/core/memo/memory.rs	`0.0% <0.0%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Implements the in-memory memo table for the optimizeer. Co-authored-by: Sarvesh Tandon <[email protected]> Co-authored-by: Connor Tsui <[email protected]>

connortsui20

This Memozie trait is MASSIVE. I think it is worth considering if we should split the Memoize trait into several smaller subtraits (divided similarly to how there are literal section in the trait definition itself). So for example have a trait LogicalMemo that has all of the methods related to logical properties, expressions, groups. Then have another trait PhysicalMemo that has all the physical expressions and goal stuff. Then RuleStatusMemo, etc.

Additionally, I am worried about how the integration with the optimizer is going to go because there are several questionable types and signatures that I found in this first pass in just the Memoize trait alone. I know that we are very low on time, but I think it is critical that @SarveshOO7 and @yliang412 take just a little bit of time (like 30 minutes) to read through the Memoize trait and confirm that the Memo table API makes sense for the things they need in the optimizer (and document why things need to be there). Let me know if you need help with any of this.

connortsui20 · 2025-04-24T18:41:14Z

optd/src/core/memo/mod.rs

-    /// All logical expressions in this group
-    pub expressions: Vec<LogicalExpressionId>,
-}
-
 /// Result of merging two groups.
 #[derive(Debug)]
 pub struct MergeGroupResult {


Nit: This is not a great name. Usually if you have "Result" at the end of your type name, it should behave like an actual Result<T, E>. Maybe use MergeGroupInfo or GroupMergeInfo or GroupMergeData

I would call it MergeGroupDiff (but think the current name is OK)

connortsui20 · 2025-04-24T18:41:50Z

optd/src/core/memo/mod.rs

+    /// * `merged_groups` - Groups that were merged along with their expressions.
+    /// * `new_repr_group_id` - ID of the new representative group id.
+    pub fn new(new_repr_group_id: GroupId) -> Self {


This seems to be stale

In fact I would prefer to not have any constructor here as both of the fields of MergeGroupResult (which I still think should have a different name) are pub, and the only place this new function is called would probably be cleaner without new.

oh yeah please get rid of this useless constructor.
That's just code bloat.

connortsui20 · 2025-04-24T18:49:58Z

optd/src/core/memo/mod.rs

 #[derive(Debug)]
+pub struct MergePhysicalExprResult {


Maybe rename:

MergedGoalInfo -> MergedGoal

MergeGoalResult -> GoalMergeData / GoalMergeInfo

connortsui20 · 2025-04-24T18:50:04Z

optd/src/core/memo/mod.rs

+    // pub dirty_transformations: Vec<(LogicalExpressionId, TransformationRule)>,
+
+    // /// Implementations that were marked as dirty and need new application.
+    // pub dirty_implementations: Vec<(LogicalExpressionId, GoalId, ImplementationRule)>,

-    /// Implementations that were marked as dirty and need new application.
-    pub dirty_implementations: Vec<(LogicalExpressionId, GoalId, ImplementationRule)>,
+    // /// Costings that were marked as dirty and need recomputation.
+    // pub dirty_costings: Vec<PhysicalExpressionId>,
+}


It's dead right now because it hasn't been implemented yet

Then something that is not implemented should not be here... Otherwise make the surrounding code and add unimplemented!()

connortsui20 · 2025-04-24T18:50:28Z

optd/src/core/memo/mod.rs

+pub struct ForwardResult {
+    pub physical_expr_id: PhysicalExpressionId,
+    pub best_cost: Cost,
+    pub goals_forwarded: HashSet<GoalId>,
+}


Documentation needed here, what is this (and why is it called Result)?

I myself am confused and do not remember either 👍

What does forwarding a goal mean?

Also no need to add constructors

connortsui20 · 2025-04-24T18:56:50Z

optd/src/core/memo/mod.rs

+    async fn get_logical_properties(
+        &self,
+        group_id: GroupId,
+    ) -> MemoizeResult<Option<LogicalProperties>>;


Why does this return an Option<LogicalProperties>?

This led me to go look at GroupState in memory.rs , and the double Option is somewhat questionable:

struct GroupState { /// The logical properties of the group, might be `None` if it hasn't been derived yet. properties: Option<LogicalProperties>, logical_exprs: HashSet<LogicalExpressionId>, goals: HashSet<GoalId>, } #[derive(Debug, Clone, PartialEq)] pub struct LogicalProperties(pub Option<PropertiesData>);

👍 should not be an option

connortsui20 · 2025-04-24T19:01:25Z

optd/src/core/memo/mod.rs

+    /// Gets any logical expression ID in a group.
+    async fn get_any_logical_expr(&self, group_id: GroupId) -> MemoizeResult<LogicalExpressionId>;


At first glance, this seems like a super strange method to require (it's not obvious why we need this, and there's no documentation on why this is here). It might be good to think about requiring a get_all_logical_exprs that returns an iterator or even stream of logical expression IDs in any order so that we don't need to have these weird methods.

IIRC (@yliang412 ) can confirm, it was to deal with the case were we retrieve the properties of a group? Anyways, I agree it should be removed.

connortsui20 · 2025-04-24T19:03:50Z

optd/src/core/memo/mod.rs

@@ -115,6 +181,9 @@ pub trait Memoize: Send + Sync + 'static {
        group_id: GroupId,
    ) -> MemoizeResult<Vec<LogicalExpressionId>>;

+    /// Gets any logical expression ID in a group.
+    async fn get_any_logical_expr(&self, group_id: GroupId) -> MemoizeResult<LogicalExpressionId>;
+
    /// Finds group containing a logical expression ID, if it exists.


I'm curious (not really part of the review), in what situation is a LogicalExpressionId not going to exist? I'm assuming that LogicalExpressionId can only be created from a real logical expression (unless this is wrong and we can just have invalid logical expression IDs floating around)? Can logical expressions be destroyed somehow?

LogicalExpressionId can only be created from a real logical expression

Yes.

They cannot be destroyed either.

Oh in that case then this is part of the review. It would be good to encode these invariants in the types, or if that's too much work at least document this somewhere.

connortsui20 · 2025-04-24T19:04:28Z

optd/src/core/memo/mod.rs

    async fn create_group(
        &mut self,
        logical_expr_id: LogicalExpressionId,


The doc parameters are stale

connortsui20 · 2025-04-24T19:05:20Z

optd/src/core/memo/mod.rs

@@ -156,7 +224,7 @@ pub trait Memoize: Send + Sync + 'static {
        &mut self,
        group_id_1: GroupId,
        group_id_2: GroupId,
-    ) -> MemoizeResult<MergeResult>;
+    ) -> MemoizeResult<Option<MergeResult>>;


In what situation does merge group fail? I thought that the whole point what we've been talking about over the past month was to prevent group merge failures

AlSchlo

I would like to have a description of the merge algorithm.
I don't want to spend a day trying to reverse engineer how this works.

Also I don't see tests in this code?

AlSchlo · 2025-04-24T20:00:42Z

optd/src/core/memo/mod.rs

+pub type MemoizeResult<T> = Result<T, MemoizeError>;
+
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum MemoizeError {


Why have these as errors rather than having the functions return an option?
If these errors should never happen unless code is wrong, they should not be errors.

AlSchlo · 2025-04-24T20:01:27Z

optd/src/core/memo/mod.rs

-    /// All logical expressions in this group
-    pub expressions: Vec<LogicalExpressionId>,
-}
-
 /// Result of merging two groups.
 #[derive(Debug)]
 pub struct MergeGroupResult {


I would call it MergeGroupDiff (but think the current name is OK)

AlSchlo · 2025-04-24T20:03:05Z

optd/src/core/memo/mod.rs

+    /// * `merged_groups` - Groups that were merged along with their expressions.
+    /// * `new_repr_group_id` - ID of the new representative group id.
+    pub fn new(new_repr_group_id: GroupId) -> Self {


oh yeah please get rid of this useless constructor.
That's just code bloat.

AlSchlo · 2025-04-24T20:05:04Z

optd/src/core/memo/mod.rs

-    pub dirty_transformations: Vec<(LogicalExpressionId, TransformationRule)>,
+    /// Physical expression merge results.
+    pub physical_expr_merges: Vec<MergePhysicalExprResult>,
+    // /// Transformations that were marked as dirty and need new application.


don't keep dead comments in PR, or create an issue that point to it (e.g. TODO(#xx)) and recaps what needs to be done

AlSchlo · 2025-04-24T20:05:50Z

optd/src/core/memo/mod.rs

+pub struct ForwardResult {
+    pub physical_expr_id: PhysicalExpressionId,
+    pub best_cost: Cost,
+    pub goals_forwarded: HashSet<GoalId>,
+}


I myself am confused and do not remember either 👍

AlSchlo · 2025-04-24T20:06:32Z

optd/src/core/memo/mod.rs

+pub struct ForwardResult {
+    pub physical_expr_id: PhysicalExpressionId,
+    pub best_cost: Cost,
+    pub goals_forwarded: HashSet<GoalId>,
+}


Also no need to add constructors

AlSchlo · 2025-04-24T20:07:29Z

optd/src/core/memo/mod.rs

@@ -101,7 +150,24 @@ pub trait Memoize: Send + Sync + 'static {
    ///
    /// # Returns
    /// The properties associated with the group or an error if not found.
-    async fn get_logical_properties(&self, group_id: GroupId) -> MemoizeResult<LogicalProperties>;
+    async fn get_logical_properties(


Because of tokio...
The memo lies in the optimizer state, which is sent around co-routines.

AlSchlo · 2025-04-24T20:08:25Z

optd/src/core/memo/mod.rs

+    async fn get_logical_properties(
+        &self,
+        group_id: GroupId,
+    ) -> MemoizeResult<Option<LogicalProperties>>;


👍 should not be an option

AlSchlo · 2025-04-24T20:16:56Z

optd/src/core/memo/memory.rs

+        group_id_1: GroupId,
+        group_id_2: GroupId,
+    ) -> MemoizeResult<Option<MergeResult>> {
+        self.merge_groups_helper(group_id_1, group_id_2).await


why have a helper that seems to be the entire implementation?

AlSchlo · 2025-04-24T20:17:43Z

optd/src/core/memo/memory.rs

+        let group = self
+            .groups
+            .get_mut(&group_id)
+            .ok_or(MemoizeError::GroupNotFound(group_id))?;


Errors should be the exception, not the correct code path.

connortsui20 force-pushed the memory-memo branch from d4996c8 to 9bb84c2 Compare April 24, 2025 18:36

connortsui20 changed the base branch from main to reset-optimizer April 24, 2025 18:37

Base automatically changed from reset-optimizer to main April 24, 2025 18:46

connortsui20 and others added 2 commits April 24, 2025 14:47

add in-memory memo table

6158fd6

Implements the in-memory memo table for the optimizeer. Co-authored-by: Sarvesh Tandon <[email protected]> Co-authored-by: Connor Tsui <[email protected]>

fix clippy warnings

b43bd69

connortsui20 force-pushed the memory-memo branch from 9bb84c2 to b43bd69 Compare April 24, 2025 18:47

connortsui20 commented Apr 24, 2025

View reviewed changes

connortsui20 requested review from yliang412, SarveshOO7 and AlSchlo April 24, 2025 19:14

AlSchlo reviewed Apr 24, 2025

View reviewed changes

		/// Gets any logical expression ID in a group.
		async fn get_any_logical_expr(&self, group_id: GroupId) -> MemoizeResult<LogicalExpressionId>;

Memory memo #83

Are you sure you want to change the base?

Memory memo #83

Conversation

connortsui20 commented Apr 24, 2025

Problem

codecov-commenter commented Apr 24, 2025 • edited Loading

Codecov Report

connortsui20 left a comment

Choose a reason for hiding this comment

connortsui20 Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlSchlo Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlSchlo left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Apr 24, 2025 •

edited

Loading

connortsui20 Apr 24, 2025 •

edited

Loading

AlSchlo Apr 24, 2025 •

edited

Loading

AlSchlo left a comment •

edited

Loading