Skip to content

Modification to DRL to support parallel sampler gathering/bounded actions.#32802

Open
grmnptr wants to merge 51 commits intoidaholab:nextfrom
grmnptr:drl-mods
Open

Modification to DRL to support parallel sampler gathering/bounded actions.#32802
grmnptr wants to merge 51 commits intoidaholab:nextfrom
grmnptr:drl-mods

Conversation

@grmnptr
Copy link
Copy Markdown
Contributor

@grmnptr grmnptr commented Apr 22, 2026

Closes #32511

@grmnptr grmnptr marked this pull request as ready for review April 28, 2026 22:30
@moosebuild
Copy link
Copy Markdown
Contributor

Job Precheck, step Clang format on 3b80ab4 wanted to post the following:

Your code requires style changes.

A patch was auto generated and copied here
You can directly apply the patch by running, in the top level of your repository:

curl -s https://mooseframework.inl.gov/docs/PRs/32802/clang_format/style.patch | git apply -v

Alternatively, with your repository up to date and in the top level of your repository:

git clang-format 953c577aaeb4885a251cedbbe9d1bc8819be587e

@moosebuild
Copy link
Copy Markdown
Contributor

Job Test, step Results summary on ef3ee6e wanted to post the following:

Framework test summary

Compared against 953c577 in job civet.inl.gov/job/3779372.

No change

Modules test summary

ERROR: Results do not exist for event 292341

Copy link
Copy Markdown
Member

@lindsayad lindsayad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewing the framework part

Comment on lines +105 to +111
/// Update cached affine metadata vectors from the registered libtorch buffers.
void synchronizeAffineFactorsFromBuffers();

/**
* Map an activation name to the orthogonal-initialization gain we want to use.
* @param activation Activation name to look up.
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the wild-west for doxygen comment structure. We should get something in our style guide about this at some point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I try to do the slashes for short comments and the asterisk for longer ones. But I never thought about defining what is short and what is long.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this a bit more uniform.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't blame you. It's a reasonable heuristic and maybe that's the one we'll end up putting in the style guide. Generally I've always done the block comment structure for methods and then /// for data. But as this is not in the style guide, I can't say what I do is the right way


/**
* Initialize the trainable weights and biases.
* @param generator Optional torch random-number generator used for reproducible initialization.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachmprince should we just use libtorch for random number generation? This is in reference to your recent PR. The cost would be a no-longer-optional dependency. Possible gain would be reduced code maintenance and overall less code duplication across the OSS ecosystem? I defer to you two on this. I'm not an expert in this area


void to_json(nlohmann::json & json, const Moose::LibtorchArtificialNeuralNet * const & network);

void loadLibtorchArtificialNeuralNetState(Moose::LibtorchArtificialNeuralNet & nn,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doxygen please

Comment on lines +111 to +112
// File-backed controllers are loaded after full construction so derived controls can override
// the loader without constructor-time type checks.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good thing? Constructor-time type checks sound nice

* @param archive Archive being read.
* @param key Serialized tensor name.
* @param tensor Tensor that receives the loaded data.
* @return True when the tensor was found and loaded.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return True when the tensor was found and loaded.
* @return whether the tensor was found and loaded.

* @param nn Neural network that receives the loaded state.
* @param filename Checkpoint file to read.
* @param error Human-readable error string filled on failure.
* @return True when the network was loaded successfully.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return True when the network was loaded successfully.
* @return whether the network was loaded successfully.

* @param nn Neural network that receives the loaded state.
* @param filename Checkpoint file to read.
* @param error Human-readable error string filled on failure.
* @return True when the network was loaded successfully.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return True when the network was loaded successfully.
* @return whether the network was loaded successfully.

void
LibtorchArtificialNeuralNet::initializeNeuralNetwork(const c10::optional<at::Generator> generator)
{
for (unsigned int i = 0; i < numHiddenLayers(); ++i)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned int i = 0; i < numHiddenLayers(); ++i)
for (const auto i : make_range(numHiddenLayers()))

const std::vector<std::vector<Real>> & component_trajectories,
const unsigned int time_index) const
{
validateTrajectoryShape(component_trajectories);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an invalid state be reached through user input or would this be a developer error?

Copy link
Copy Markdown
Contributor Author

@grmnptr grmnptr Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I added this check to make sure we have the right sizes because at the beginning I had some invalid timestep execute on settings etc. But this might be too restrictive because this could block the usage of adaptive timestepping. I will check what I can do. Part of me was also thinking about adaptive timesteps not being usable with multiple input timestep stacking but I suppose that depends on the problem. I think I can make this less restrictive, and I should.

PointValue::execute()
{
_value = _system.point_value(_var_number, _point, false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this new line 😄

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, I had a print statement here for some not-so-advanced debugging and I accidentally removed one too many new lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade Deep Reinforcement Learning STM capability(ies)

3 participants