Skip to content

Feature/split gauge reuse #1543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: develop
Choose a base branch
from
Open

Feature/split gauge reuse #1543

wants to merge 26 commits into from

Conversation

aroadant
Copy link

@aroadant aroadant commented Mar 19, 2025

These modifications will allow the reuse of the redistributed gauge buffers.

QudaGaugeParam.use_split_gauge_bkup is added to determine whether to delete the buffer after use or not. Its default value is 1. The internal flag update_split_gauge is used to determine whether an update of buffer is needed or not.

The initial test with staggered_invert_test.cpp gives the same results with gauge_param.use_split_gauge_bkup = 0 or 1. Also inversions can be done correctly with inv_param.split_grid[i] = 1 after split gauge inversions. Results from various default tests are good with default value use_split_gauge_bkup=0/1.

The test staggered_invert_test is updated with the flag --use_split_gauge_bkup. Results from the following input looks good

staggered_invert_test --dim 6 6 6 6 --gridsize 2 2 2 2 --grid-partition 2 2 2 2
-dslash-type staggered --compute-fat-long false --unit-gauge 0 --tadpole-coeff 1.0 --verbosity-precondition debug --recon 13
--mass 0.02 --solve-type direct-pc --solution-type mat --inv-type cg --matpc even-even
--prec double --prec-sloppy single
--nsrc 64 --nsrc-tile 16 --niter 10000 --tol 1e-100
--inv-multigrid false --mass-normalization mass --fermion-t-boundary periodic --verify 1 --use_split_gauge_bkup 1

@aroadant aroadant requested a review from a team as a code owner March 19, 2025 16:47
@maddyscientist
Copy link
Member

@bjoo and @hummingtree can you review this PR? @bjoo is this helpful for Chroma at all?

@maddyscientist
Copy link
Member

Just a note regarding the the CI failing on CSCS at the moment: the Alps machine is being used for Gordon Bell runs at the moment. We expect the CI to be restored again next week.

Copy link
Member

@hummingtree hummingtree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aroadant for this - I have left some comments, and I feel we could use some better documentation for the newly added functions.

Comment on lines 112 to 131
callMultiSrcQuda related gauge split
update_split_gauge : -1 delete buffer after usage, 0 split gauge in buf, 1 split gauge not in buf
update_split_gauge set to -1 if QudaGaugeParam.use_split_gauge_bkup == QUDA_BOOLEAN_FALSE, 1 if QudaGaugeParam.use_split_gauge_bkup == QUDA_BOOLEAN_TRUE
split_grid_bkup will be used to check whether split layout is changed or not
update_split_gauge > 0 and split gauge in buf, the original links will swap with *_bkup
*/
int update_split_gauge = -1;
int split_grid_bkup[QUDA_MAX_DIM];
quda::GaugeField *collected_gauge = nullptr;
quda::GaugeField *collected_milc_fatlink_field = nullptr;
quda::GaugeField *collected_milc_longlink_field = nullptr;
quda::CloverField *collected_clover = nullptr;
GaugeBundleBackup* thin_links_bkup = nullptr;
GaugeBundleBackup* fat_links_bkup = nullptr;
GaugeBundleBackup* long_links_bkup = nullptr;
CloverBundleBackup* clov_bkup = nullptr;
lat_dim_t X_bkup;
int is_clover_bkup = -1;
int is_asqtad_bkup = -1;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use Enum to replace the -1/0/1 so it's more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for is_clover_bkup/is_asqtad_bkup.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the unnecessary parameters and changed to Enum.

@@ -138,6 +159,7 @@ static TimeProfile profileInvert("invertQuda");

//!< Profiler for invertMultiSrcQuda
static TimeProfile profileInvertMultiSrc("invertMultiSrcQuda");
static TimeProfile profileUpdate_split_gauge("Update_split_gauge");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use CamelCase consistently.

@@ -568,6 +590,15 @@ void freeUniqueSloppyGaugeUtility(GaugeField *&precise, GaugeField *&sloppy, Gau
void freeUniqueGaugeUtility(GaugeField *&precise, GaugeField *&sloppy, GaugeField *&precondition, GaugeField *&refinement,
GaugeField *&eigensolver, GaugeField *&extended, bool preserve_precise);

// Generate or swap to re-distributed gauge links
void Update_split_gauge(QudaInvertParam *param, const int is_asqtad, const bool is_clover, CommKey& split_key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use CamelCase consistently.

void Update_split_gauge(QudaInvertParam *param, const int is_asqtad, const bool is_clover, CommKey& split_key);

// free current gauges if keep_buffer == 0, restore previous gauge links, could be used to swap back to split gauge
void swapGaugeSplit(const int keep_buffer );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep_buffer can be a bool? Also I am not too sure about what this function does. Could probably use some more documentations.

@@ -3112,6 +3157,237 @@ void loadFatLongGaugeQuda(QudaInvertParam *inv_param, QudaGaugeParam *gauge_para
}
}

void swapGaugeSplit(const int keep_buffer = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some documentation on what this does.

swapGaugeSplit(0);
}

void Update_split_gauge(QudaInvertParam *param, const int is_asqtad, const bool is_clover, CommKey& split_key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some documentation here.

@@ -570,6 +571,7 @@ std::shared_ptr<QUDAApp> make_app(std::string app_description, std::string app_n
"--laplace3D", laplace3D,
"Restrict laplace operator to omit the t dimension (n=3), or include all dims (n=4) (default 4)");
quda_app->add_option("--load-gauge", latfile, "Load gauge field \" file \" for the test (requires QIO)");
quda_app->add_option("--use_split_gauge_bkup", use_split_gauge_bkup, "Use gauge split buff or not");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do --use-split-gauge-bkup

@aroadant
Copy link
Author

aroadant commented May 8, 2025

@hummingtree Thanks for the suggestions. I have updated the function documentation and made some changes. There is one issue about clover which may need to be resolved. The check of changes of clover is not good enough and will cause trouble if one needs to switch different clover parameters.

int is_clover_bkup = 0;

Copy link
Member

@hummingtree hummingtree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjoo When you get a chance, can you check and see if you are comfortable with the changes as it changes many of the codes you added. It would be even better if you could see if there is anything that you can use in Chroma.

@@ -648,6 +648,13 @@ typedef enum QudaWFlowStepType_s {
WFLOW_FOURTH_ORDER_STEP_6,
} QudaWFlowStepType;

//Used by update_split_gauge
typedef enum QudaUpdateSpliteGauge_s {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typedef enum QudaUpdateSpliteGauge_s {
typedef enum QudaUpdateSplitGauge_s {

@@ -648,6 +648,13 @@ typedef enum QudaWFlowStepType_s {
WFLOW_FOURTH_ORDER_STEP_6,
} QudaWFlowStepType;

//Used by update_split_gauge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a more descriptive comment by explaining what these 3 options mean.

Comment on lines 653 to 656
QUDA_UPDATE_SPLITE_GAUGE_OFF,
QUDA_UPDATE_SPLITE_GAUGE_TRUE = 1,
QUDA_UPDATE_SPLITE_GAUGE_FALSE = 0,
} QudaUpdateSpliteGauge;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
QUDA_UPDATE_SPLITE_GAUGE_OFF,
QUDA_UPDATE_SPLITE_GAUGE_TRUE = 1,
QUDA_UPDATE_SPLITE_GAUGE_FALSE = 0,
} QudaUpdateSpliteGauge;
QUDA_UPDATE_SPLIT_GAUGE_OFF,
QUDA_UPDATE_SPLIT_GAUGE_TRUE = 1,
QUDA_UPDATE_SPLIT_GAUGE_FALSE = 0,
} QudaUpdateSplitGauge;

/**
* callMultiSrcQuda related gauge split
* update_split_gauge :
* QUDA_UPDATE_SPLITE_GAUGE_OFF delete buffer after usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use split instead of splite consistently.

* callMultiSrcQuda related gauge split
* update_split_gauge :
* QUDA_UPDATE_SPLITE_GAUGE_OFF delete buffer after usage
* QUDA_UPDATE_SPLITE_GAUGE_FALSE split gauge in buf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in buf" probably means reusing the split gauge that has been backed up previously?


void freeGaugeSplit()
{
// swap current gauge with the splitted ones if splits are not null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// swap current gauge with the splitted ones if splits are not null
// swap current gauges with the split ones if split ones are not null

// swap current gauge with the splitted ones if splits are not null
swapGaugeSplit(true);

// free the splitted gauge and swap to loaded gauge, if splits are not null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// free the splitted gauge and swap to loaded gauge, if splits are not null
// free the split gauges and swap the split gauges with the buffered gauges, if split gauges are not null

gaugeLongRefinement = long_links_bkup.refinement;
gaugeLongEigensolver = long_links_bkup.eigensolver;
gaugeLongExtended = long_links_bkup.extended;
// swap back to original links, detete split gauge if update_split_gauge == QUDA_UPDATE_SPLITE_GAUGE_OFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// swap back to original links, detete split gauge if update_split_gauge == QUDA_UPDATE_SPLITE_GAUGE_OFF
// switch back to the original links, detete split gauge if update_split_gauge == QUDA_UPDATE_SPLITE_GAUGE_OFF

include/quda.h Outdated
@@ -70,6 +70,7 @@ extern "C" {
int staple_pad; /**< Used by link fattening */
int llfat_ga_pad; /**< Used by link fattening */
int mom_ga_pad; /**< Used by the gauge and fermion forces */
QudaBoolean use_split_gauge_bkup; /**< Used by gauge split buffers (default=true keep split gauge after usage)*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use bool directly - C99 supports boolean types now.

Comment on lines 116 to 121
* QUDA_UPDATE_SPLITE_GAUGE_TRUE split gauge not in buf, need to call split
* update_split_gauge set to QUDA_UPDATE_SPLITE_GAUGE_OFF if QudaGaugeParam.use_split_gauge_bkup == QUDA_BOOLEAN_FALSE, QUDA_UPDATE_SPLITE_GAUGE_TRUE if QudaGaugeParam.use_split_gauge_bkup == QUDA_BOOLEAN_TRUE
* update_split_gauge != QUDA_UPDATE_SPLITE_GAUGE_OFF and split gauge in buf, the original links will swap with *_bkup
* split_grid_bkup will be used to check whether split layout is changed or not
* change in gauge precsions will need loadGaugeQuda which results in re-distribute
* assumed no change in clover parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the comment here to make it clearer - but please correct me if I say anything wrong.

Suggested change
* QUDA_UPDATE_SPLITE_GAUGE_TRUE split gauge not in buf, need to call split
* update_split_gauge set to QUDA_UPDATE_SPLITE_GAUGE_OFF if QudaGaugeParam.use_split_gauge_bkup == QUDA_BOOLEAN_FALSE, QUDA_UPDATE_SPLITE_GAUGE_TRUE if QudaGaugeParam.use_split_gauge_bkup == QUDA_BOOLEAN_TRUE
* update_split_gauge != QUDA_UPDATE_SPLITE_GAUGE_OFF and split gauge in buf, the original links will swap with *_bkup
* split_grid_bkup will be used to check whether split layout is changed or not
* change in gauge precsions will need loadGaugeQuda which results in re-distribute
* assumed no change in clover parameters
* - QUDA_UPDATE_SPLITE_GAUGE_TRUE: the input gauge fields will be split and the buffered (split) gauges will be updated accordingly;
* - QUDA_UPDATE_SPLITE_GAUGE_FALSE: the input gauge fields will not be split and the buffered (split) gauges will be used for split grid solves;
* - QUDA_UPDATE_SPLITE_GAUGE_OFF: nothing will be done.
* split_grid_bkup will be used to check whether split layout is changed or not
* change in gauge precsions will need loadGaugeQuda which results in re-distribute
* assumed no change in clover parameters

@aroadant
Copy link
Author

aroadant commented May 8, 2025

@hummingtree Thanks! It's very helpful to make the comments to be clear!

@aroadant aroadant requested a review from hummingtree May 9, 2025 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants