Skip to content

Memory memo #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Memory memo #83

wants to merge 2 commits into from

Conversation

connortsui20
Copy link
Member

Problem

Adds an in-memory implementation of the optimizer's memo table.

NOTE: Even though I am opening this PR, this is really authored by @SarveshOO7. I'll be the one reviewing it.

@connortsui20 connortsui20 changed the base branch from main to reset-optimizer April 24, 2025 18:37
@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2025

Codecov Report

Attention: Patch coverage is 0.70423% with 846 lines in your changes missing coverage. Please review.

Project coverage is 82.1%. Comparing base (70dbd27) to head (b43bd69).

Files with missing lines Patch % Lines
optd/src/core/memo/memory.rs 0.0% 833 Missing ⚠️
optd/src/core/memo/mod.rs 0.0% 13 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
optd/src/core/memo/merge_repr.rs 98.6% <100.0%> (ø)
optd/src/core/memo/mod.rs 0.0% <0.0%> (ø)
optd/src/core/memo/memory.rs 0.0% <0.0%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from reset-optimizer to main April 24, 2025 18:46
connortsui20 and others added 2 commits April 24, 2025 14:47
Implements the in-memory memo table for the optimizeer.

Co-authored-by: Sarvesh Tandon <[email protected]>
Co-authored-by: Connor Tsui <[email protected]>
Copy link
Member Author

@connortsui20 connortsui20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Memozie trait is MASSIVE. I think it is worth considering if we should split the Memoize trait into several smaller subtraits (divided similarly to how there are literal section in the trait definition itself). So for example have a trait LogicalMemo that has all of the methods related to logical properties, expressions, groups. Then have another trait PhysicalMemo that has all the physical expressions and goal stuff. Then RuleStatusMemo, etc.

Additionally, I am worried about how the integration with the optimizer is going to go because there are several questionable types and signatures that I found in this first pass in just the Memoize trait alone. I know that we are very low on time, but I think it is critical that @SarveshOO7 and @yliang412 take just a little bit of time (like 30 minutes) to read through the Memoize trait and confirm that the Memo table API makes sense for the things they need in the optimizer (and document why things need to be there). Let me know if you need help with any of this.

/// All logical expressions in this group
pub expressions: Vec<LogicalExpressionId>,
}

/// Result of merging two groups.
#[derive(Debug)]
pub struct MergeGroupResult {
Copy link
Member Author

@connortsui20 connortsui20 Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is not a great name. Usually if you have "Result" at the end of your type name, it should behave like an actual Result<T, E>. Maybe use MergeGroupInfo or GroupMergeInfo or GroupMergeData

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it MergeGroupDiff (but think the current name is OK)

Comment on lines +51 to +53
/// * `merged_groups` - Groups that were merged along with their expressions.
/// * `new_repr_group_id` - ID of the new representative group id.
pub fn new(new_repr_group_id: GroupId) -> Self {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be stale

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact I would prefer to not have any constructor here as both of the fields of MergeGroupResult (which I still think should have a different name) are pub, and the only place this new function is called would probably be cleaner without new.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah please get rid of this useless constructor.
That's just code bloat.

#[derive(Debug)]
pub struct MergePhysicalExprResult {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename:

  • MergedGoalInfo -> MergedGoal
  • MergeGoalResult -> GoalMergeData / GoalMergeInfo

Comment on lines +109 to +116
// pub dirty_transformations: Vec<(LogicalExpressionId, TransformationRule)>,

// /// Implementations that were marked as dirty and need new application.
// pub dirty_implementations: Vec<(LogicalExpressionId, GoalId, ImplementationRule)>,

/// Implementations that were marked as dirty and need new application.
pub dirty_implementations: Vec<(LogicalExpressionId, GoalId, ImplementationRule)>,
// /// Costings that were marked as dirty and need recomputation.
// pub dirty_costings: Vec<PhysicalExpressionId>,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dead right now because it hasn't been implemented yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then something that is not implemented should not be here... Otherwise make the surrounding code and add unimplemented!()

Comment on lines +118 to +122
pub struct ForwardResult {
pub physical_expr_id: PhysicalExpressionId,
pub best_cost: Cost,
pub goals_forwarded: HashSet<GoalId>,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation needed here, what is this (and why is it called Result)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I myself am confused and do not remember either 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does forwarding a goal mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no need to add constructors

async fn get_logical_properties(
&self,
group_id: GroupId,
) -> MemoizeResult<Option<LogicalProperties>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return an Option<LogicalProperties>?

This led me to go look at GroupState in memory.rs , and the double Option is somewhat questionable:

struct GroupState {
    /// The logical properties of the group, might be `None` if it hasn't been derived yet.
    properties: Option<LogicalProperties>,
    logical_exprs: HashSet<LogicalExpressionId>,
    goals: HashSet<GoalId>,
}

#[derive(Debug, Clone, PartialEq)]
pub struct LogicalProperties(pub Option<PropertiesData>);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 should not be an option

Comment on lines +184 to +185
/// Gets any logical expression ID in a group.
async fn get_any_logical_expr(&self, group_id: GroupId) -> MemoizeResult<LogicalExpressionId>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, this seems like a super strange method to require (it's not obvious why we need this, and there's no documentation on why this is here). It might be good to think about requiring a get_all_logical_exprs that returns an iterator or even stream of logical expression IDs in any order so that we don't need to have these weird methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC (@yliang412 ) can confirm, it was to deal with the case were we retrieve the properties of a group? Anyways, I agree it should be removed.

@@ -115,6 +181,9 @@ pub trait Memoize: Send + Sync + 'static {
group_id: GroupId,
) -> MemoizeResult<Vec<LogicalExpressionId>>;

/// Gets any logical expression ID in a group.
async fn get_any_logical_expr(&self, group_id: GroupId) -> MemoizeResult<LogicalExpressionId>;

/// Finds group containing a logical expression ID, if it exists.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious (not really part of the review), in what situation is a LogicalExpressionId not going to exist? I'm assuming that LogicalExpressionId can only be created from a real logical expression (unless this is wrong and we can just have invalid logical expression IDs floating around)? Can logical expressions be destroyed somehow?

Copy link
Collaborator

@AlSchlo AlSchlo Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogicalExpressionId can only be created from a real logical expression

Yes.

They cannot be destroyed either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh in that case then this is part of the review. It would be good to encode these invariants in the types, or if that's too much work at least document this somewhere.

Comment on lines 207 to 209
async fn create_group(
&mut self,
logical_expr_id: LogicalExpressionId,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc parameters are stale

@@ -156,7 +224,7 @@ pub trait Memoize: Send + Sync + 'static {
&mut self,
group_id_1: GroupId,
group_id_2: GroupId,
) -> MemoizeResult<MergeResult>;
) -> MemoizeResult<Option<MergeResult>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation does merge group fail? I thought that the whole point what we've been talking about over the past month was to prevent group merge failures

Copy link
Collaborator

@AlSchlo AlSchlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have a description of the merge algorithm.
I don't want to spend a day trying to reverse engineer how this works.

Also I don't see tests in this code?

pub type MemoizeResult<T> = Result<T, MemoizeError>;

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum MemoizeError {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have these as errors rather than having the functions return an option?
If these errors should never happen unless code is wrong, they should not be errors.

/// All logical expressions in this group
pub expressions: Vec<LogicalExpressionId>,
}

/// Result of merging two groups.
#[derive(Debug)]
pub struct MergeGroupResult {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it MergeGroupDiff (but think the current name is OK)

Comment on lines +51 to +53
/// * `merged_groups` - Groups that were merged along with their expressions.
/// * `new_repr_group_id` - ID of the new representative group id.
pub fn new(new_repr_group_id: GroupId) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah please get rid of this useless constructor.
That's just code bloat.

pub dirty_transformations: Vec<(LogicalExpressionId, TransformationRule)>,
/// Physical expression merge results.
pub physical_expr_merges: Vec<MergePhysicalExprResult>,
// /// Transformations that were marked as dirty and need new application.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't keep dead comments in PR, or create an issue that point to it (e.g. TODO(#xx)) and recaps what needs to be done

Comment on lines +118 to +122
pub struct ForwardResult {
pub physical_expr_id: PhysicalExpressionId,
pub best_cost: Cost,
pub goals_forwarded: HashSet<GoalId>,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I myself am confused and do not remember either 👍

Comment on lines +118 to +122
pub struct ForwardResult {
pub physical_expr_id: PhysicalExpressionId,
pub best_cost: Cost,
pub goals_forwarded: HashSet<GoalId>,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no need to add constructors

@@ -101,7 +150,24 @@ pub trait Memoize: Send + Sync + 'static {
///
/// # Returns
/// The properties associated with the group or an error if not found.
async fn get_logical_properties(&self, group_id: GroupId) -> MemoizeResult<LogicalProperties>;
async fn get_logical_properties(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of tokio...
The memo lies in the optimizer state, which is sent around co-routines.

async fn get_logical_properties(
&self,
group_id: GroupId,
) -> MemoizeResult<Option<LogicalProperties>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 should not be an option

group_id_1: GroupId,
group_id_2: GroupId,
) -> MemoizeResult<Option<MergeResult>> {
self.merge_groups_helper(group_id_1, group_id_2).await
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have a helper that seems to be the entire implementation?

let group = self
.groups
.get_mut(&group_id)
.ok_or(MemoizeError::GroupNotFound(group_id))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors should be the exception, not the correct code path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants