[Question] Auto-regressive manner policy network? #2010

wadmes · 2024-09-17T03:20:50Z

❓ Question

Hi there,

I am trying to implement a autoregressive policy network, for example,
in each action, I need to sample target object, and then use the target object embedding to sample the action type, and finally output the action parameters.

But it looks it is impossible to achive this since what _build_mlp_extractor in MaskableActorCriticPolicy asks it to return a action embedding and value embedding.

Is there a good way to achieve the feature?

Example figure:

Thanks!

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

araffin · 2024-09-18T12:45:58Z

Is there a good way to achieve the feature?

For something custom like that, it's probably better to fork SB3 / use CleanRL as a starting point.

wadmes added the question Further information is requested label Sep 17, 2024

araffin closed this as completed Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Auto-regressive manner policy network? #2010

[Question] Auto-regressive manner policy network? #2010

wadmes commented Sep 17, 2024

araffin commented Sep 18, 2024

[Question] Auto-regressive manner policy network? #2010

[Question] Auto-regressive manner policy network? #2010

Comments

wadmes commented Sep 17, 2024

❓ Question

Checklist

araffin commented Sep 18, 2024