-
Notifications
You must be signed in to change notification settings - Fork 4
Memory memo #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Memory memo #83
Conversation
d4996c8
to
9bb84c2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
🚀 New features to boost your workflow:
|
Implements the in-memory memo table for the optimizeer. Co-authored-by: Sarvesh Tandon <[email protected]> Co-authored-by: Connor Tsui <[email protected]>
9bb84c2
to
b43bd69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Memozie
trait is MASSIVE. I think it is worth considering if we should split the Memoize
trait into several smaller subtraits (divided similarly to how there are literal section in the trait definition itself). So for example have a trait LogicalMemo
that has all of the methods related to logical properties, expressions, groups. Then have another trait PhysicalMemo
that has all the physical expressions and goal stuff. Then RuleStatusMemo
, etc.
Additionally, I am worried about how the integration with the optimizer is going to go because there are several questionable types and signatures that I found in this first pass in just the Memoize
trait alone. I know that we are very low on time, but I think it is critical that @SarveshOO7 and @yliang412 take just a little bit of time (like 30 minutes) to read through the Memoize
trait and confirm that the Memo table API makes sense for the things they need in the optimizer (and document why things need to be there). Let me know if you need help with any of this.
/// All logical expressions in this group | ||
pub expressions: Vec<LogicalExpressionId>, | ||
} | ||
|
||
/// Result of merging two groups. | ||
#[derive(Debug)] | ||
pub struct MergeGroupResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is not a great name. Usually if you have "Result" at the end of your type name, it should behave like an actual Result<T, E>
. Maybe use MergeGroupInfo
or GroupMergeInfo
or GroupMergeData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it MergeGroupDiff
(but think the current name is OK)
/// * `merged_groups` - Groups that were merged along with their expressions. | ||
/// * `new_repr_group_id` - ID of the new representative group id. | ||
pub fn new(new_repr_group_id: GroupId) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be stale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact I would prefer to not have any constructor here as both of the fields of MergeGroupResult
(which I still think should have a different name) are pub
, and the only place this new
function is called would probably be cleaner without new
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah please get rid of this useless constructor.
That's just code bloat.
#[derive(Debug)] | ||
pub struct MergePhysicalExprResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename:
MergedGoalInfo
->MergedGoal
MergeGoalResult
->GoalMergeData
/GoalMergeInfo
// pub dirty_transformations: Vec<(LogicalExpressionId, TransformationRule)>, | ||
|
||
// /// Implementations that were marked as dirty and need new application. | ||
// pub dirty_implementations: Vec<(LogicalExpressionId, GoalId, ImplementationRule)>, | ||
|
||
/// Implementations that were marked as dirty and need new application. | ||
pub dirty_implementations: Vec<(LogicalExpressionId, GoalId, ImplementationRule)>, | ||
// /// Costings that were marked as dirty and need recomputation. | ||
// pub dirty_costings: Vec<PhysicalExpressionId>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's dead right now because it hasn't been implemented yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then something that is not implemented should not be here... Otherwise make the surrounding code and add unimplemented!()
pub struct ForwardResult { | ||
pub physical_expr_id: PhysicalExpressionId, | ||
pub best_cost: Cost, | ||
pub goals_forwarded: HashSet<GoalId>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation needed here, what is this (and why is it called Result)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I myself am confused and do not remember either 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does forwarding a goal mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also no need to add constructors
async fn get_logical_properties( | ||
&self, | ||
group_id: GroupId, | ||
) -> MemoizeResult<Option<LogicalProperties>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this return an Option<LogicalProperties>
?
This led me to go look at GroupState
in memory.rs
, and the double Option
is somewhat questionable:
struct GroupState {
/// The logical properties of the group, might be `None` if it hasn't been derived yet.
properties: Option<LogicalProperties>,
logical_exprs: HashSet<LogicalExpressionId>,
goals: HashSet<GoalId>,
}
#[derive(Debug, Clone, PartialEq)]
pub struct LogicalProperties(pub Option<PropertiesData>);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 should not be an option
/// Gets any logical expression ID in a group. | ||
async fn get_any_logical_expr(&self, group_id: GroupId) -> MemoizeResult<LogicalExpressionId>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first glance, this seems like a super strange method to require (it's not obvious why we need this, and there's no documentation on why this is here). It might be good to think about requiring a get_all_logical_exprs
that returns an iterator or even stream of logical expression IDs in any order so that we don't need to have these weird methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC (@yliang412 ) can confirm, it was to deal with the case were we retrieve the properties of a group? Anyways, I agree it should be removed.
@@ -115,6 +181,9 @@ pub trait Memoize: Send + Sync + 'static { | |||
group_id: GroupId, | |||
) -> MemoizeResult<Vec<LogicalExpressionId>>; | |||
|
|||
/// Gets any logical expression ID in a group. | |||
async fn get_any_logical_expr(&self, group_id: GroupId) -> MemoizeResult<LogicalExpressionId>; | |||
|
|||
/// Finds group containing a logical expression ID, if it exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious (not really part of the review), in what situation is a LogicalExpressionId
not going to exist? I'm assuming that LogicalExpressionId
can only be created from a real logical expression (unless this is wrong and we can just have invalid logical expression IDs floating around)? Can logical expressions be destroyed somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogicalExpressionId can only be created from a real logical expression
Yes.
They cannot be destroyed either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh in that case then this is part of the review. It would be good to encode these invariants in the types, or if that's too much work at least document this somewhere.
async fn create_group( | ||
&mut self, | ||
logical_expr_id: LogicalExpressionId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc parameters are stale
@@ -156,7 +224,7 @@ pub trait Memoize: Send + Sync + 'static { | |||
&mut self, | |||
group_id_1: GroupId, | |||
group_id_2: GroupId, | |||
) -> MemoizeResult<MergeResult>; | |||
) -> MemoizeResult<Option<MergeResult>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what situation does merge group fail? I thought that the whole point what we've been talking about over the past month was to prevent group merge failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have a description of the merge algorithm.
I don't want to spend a day trying to reverse engineer how this works.
Also I don't see tests in this code?
pub type MemoizeResult<T> = Result<T, MemoizeError>; | ||
|
||
#[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
pub enum MemoizeError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have these as errors rather than having the functions return an option?
If these errors should never happen unless code is wrong, they should not be errors.
/// All logical expressions in this group | ||
pub expressions: Vec<LogicalExpressionId>, | ||
} | ||
|
||
/// Result of merging two groups. | ||
#[derive(Debug)] | ||
pub struct MergeGroupResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it MergeGroupDiff
(but think the current name is OK)
/// * `merged_groups` - Groups that were merged along with their expressions. | ||
/// * `new_repr_group_id` - ID of the new representative group id. | ||
pub fn new(new_repr_group_id: GroupId) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah please get rid of this useless constructor.
That's just code bloat.
pub dirty_transformations: Vec<(LogicalExpressionId, TransformationRule)>, | ||
/// Physical expression merge results. | ||
pub physical_expr_merges: Vec<MergePhysicalExprResult>, | ||
// /// Transformations that were marked as dirty and need new application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't keep dead comments in PR, or create an issue that point to it (e.g. TODO(#xx)) and recaps what needs to be done
pub struct ForwardResult { | ||
pub physical_expr_id: PhysicalExpressionId, | ||
pub best_cost: Cost, | ||
pub goals_forwarded: HashSet<GoalId>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I myself am confused and do not remember either 👍
pub struct ForwardResult { | ||
pub physical_expr_id: PhysicalExpressionId, | ||
pub best_cost: Cost, | ||
pub goals_forwarded: HashSet<GoalId>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also no need to add constructors
@@ -101,7 +150,24 @@ pub trait Memoize: Send + Sync + 'static { | |||
/// | |||
/// # Returns | |||
/// The properties associated with the group or an error if not found. | |||
async fn get_logical_properties(&self, group_id: GroupId) -> MemoizeResult<LogicalProperties>; | |||
async fn get_logical_properties( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of tokio...
The memo lies in the optimizer state, which is sent around co-routines.
async fn get_logical_properties( | ||
&self, | ||
group_id: GroupId, | ||
) -> MemoizeResult<Option<LogicalProperties>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 should not be an option
group_id_1: GroupId, | ||
group_id_2: GroupId, | ||
) -> MemoizeResult<Option<MergeResult>> { | ||
self.merge_groups_helper(group_id_1, group_id_2).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have a helper that seems to be the entire implementation?
let group = self | ||
.groups | ||
.get_mut(&group_id) | ||
.ok_or(MemoizeError::GroupNotFound(group_id))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors should be the exception, not the correct code path.
Problem
Adds an in-memory implementation of the optimizer's memo table.
NOTE: Even though I am opening this PR, this is really authored by @SarveshOO7. I'll be the one reviewing it.