Modification to DRL to support parallel sampler gathering/bounded actions.#32802
Modification to DRL to support parallel sampler gathering/bounded actions.#32802grmnptr wants to merge 51 commits intoidaholab:nextfrom
Conversation
…the trainer/control object. (idaholab#32511)
|
Job Precheck, step Clang format on 3b80ab4 wanted to post the following: Your code requires style changes. A patch was auto generated and copied here
Alternatively, with your repository up to date and in the top level of your repository:
|
|
Job Test, step Results summary on ef3ee6e wanted to post the following: Framework test summaryCompared against 953c577 in job civet.inl.gov/job/3779372. No change Modules test summaryERROR: Results do not exist for event 292341 |
lindsayad
left a comment
There was a problem hiding this comment.
Just reviewing the framework part
| /// Update cached affine metadata vectors from the registered libtorch buffers. | ||
| void synchronizeAffineFactorsFromBuffers(); | ||
|
|
||
| /** | ||
| * Map an activation name to the orthogonal-initialization gain we want to use. | ||
| * @param activation Activation name to look up. | ||
| */ |
There was a problem hiding this comment.
It's the wild-west for doxygen comment structure. We should get something in our style guide about this at some point
There was a problem hiding this comment.
In general, I try to do the slashes for short comments and the asterisk for longer ones. But I never thought about defining what is short and what is long.
There was a problem hiding this comment.
I'll make this a bit more uniform.
There was a problem hiding this comment.
I don't blame you. It's a reasonable heuristic and maybe that's the one we'll end up putting in the style guide. Generally I've always done the block comment structure for methods and then /// for data. But as this is not in the style guide, I can't say what I do is the right way
|
|
||
| /** | ||
| * Initialize the trainable weights and biases. | ||
| * @param generator Optional torch random-number generator used for reproducible initialization. |
There was a problem hiding this comment.
@zachmprince should we just use libtorch for random number generation? This is in reference to your recent PR. The cost would be a no-longer-optional dependency. Possible gain would be reduced code maintenance and overall less code duplication across the OSS ecosystem? I defer to you two on this. I'm not an expert in this area
|
|
||
| void to_json(nlohmann::json & json, const Moose::LibtorchArtificialNeuralNet * const & network); | ||
|
|
||
| void loadLibtorchArtificialNeuralNetState(Moose::LibtorchArtificialNeuralNet & nn, |
| // File-backed controllers are loaded after full construction so derived controls can override | ||
| // the loader without constructor-time type checks. |
There was a problem hiding this comment.
that's a good thing? Constructor-time type checks sound nice
| * @param archive Archive being read. | ||
| * @param key Serialized tensor name. | ||
| * @param tensor Tensor that receives the loaded data. | ||
| * @return True when the tensor was found and loaded. |
There was a problem hiding this comment.
| * @return True when the tensor was found and loaded. | |
| * @return whether the tensor was found and loaded. |
| * @param nn Neural network that receives the loaded state. | ||
| * @param filename Checkpoint file to read. | ||
| * @param error Human-readable error string filled on failure. | ||
| * @return True when the network was loaded successfully. |
There was a problem hiding this comment.
| * @return True when the network was loaded successfully. | |
| * @return whether the network was loaded successfully. |
| * @param nn Neural network that receives the loaded state. | ||
| * @param filename Checkpoint file to read. | ||
| * @param error Human-readable error string filled on failure. | ||
| * @return True when the network was loaded successfully. |
There was a problem hiding this comment.
| * @return True when the network was loaded successfully. | |
| * @return whether the network was loaded successfully. |
| void | ||
| LibtorchArtificialNeuralNet::initializeNeuralNetwork(const c10::optional<at::Generator> generator) | ||
| { | ||
| for (unsigned int i = 0; i < numHiddenLayers(); ++i) |
There was a problem hiding this comment.
| for (unsigned int i = 0; i < numHiddenLayers(); ++i) | |
| for (const auto i : make_range(numHiddenLayers())) |
| const std::vector<std::vector<Real>> & component_trajectories, | ||
| const unsigned int time_index) const | ||
| { | ||
| validateTrajectoryShape(component_trajectories); |
There was a problem hiding this comment.
Can an invalid state be reached through user input or would this be a developer error?
There was a problem hiding this comment.
Well, I added this check to make sure we have the right sizes because at the beginning I had some invalid timestep execute on settings etc. But this might be too restrictive because this could block the usage of adaptive timestepping. I will check what I can do. Part of me was also thinking about adaptive timesteps not being usable with multiple input timestep stacking but I suppose that depends on the problem. I think I can make this less restrictive, and I should.
| PointValue::execute() | ||
| { | ||
| _value = _system.point_value(_var_number, _point, false); | ||
|
|
There was a problem hiding this comment.
Damn, I had a print statement here for some not-so-advanced debugging and I accidentally removed one too many new lines.
Closes #32511