This PR fixes an issue where reward_EMA=False caused adv to be undefined in _compute_actor_loss. Previously, adv was only computed inside the reward_EMA branch, which resulted in a runtime error when the option was disabled.
17 KiB
17 KiB