Unai Ruiz
a8456e95bc
Fix missing advantage computation when reward_EMA is disabled
...
This PR fixes an issue where reward_EMA=False caused adv to be undefined in _compute_actor_loss.
Previously, adv was only computed inside the reward_EMA branch, which resulted in a runtime error when the option was disabled.
2026-03-03 16:34:52 +01:00
NM512
be5e5ecf40
apply mean to log items for consistency
2026-02-21 20:58:43 +09:00
NM512
7433d1e877
avoid ".to(device)"
2024-09-28 07:58:15 +09:00
NM512
a4fdfad938
bug fix for onehot distribution
2024-01-14 21:55:34 +09:00
NM512
7f66ed5333
erased unused options
2024-01-05 23:23:09 +09:00
NM512
a27711ab96
limit action values in sampling stage
2024-01-05 11:42:45 +09:00
NM512
a9e85e8b7c
modified weight initialization
2024-01-05 10:46:54 +09:00
NM512
78e86703f4
modified loss calculation
2024-01-05 10:44:04 +09:00
NM512
e0487f8206
merged action head into MLP and modified configs
2024-01-05 10:26:48 +09:00
NM512
e0f2017e28
unified the place to initialize the latents
2024-01-05 10:09:13 +09:00
NM512
16635df3e4
removed scheduling function
2023-09-26 20:58:55 +09:00
NM512
3f6659d365
changed treatment of obs shape in minecraft
2023-08-03 08:12:44 +09:00
NM512
9c58ab62c0
introduced return used in author's code
2023-06-17 16:59:40 +09:00
NM512
f7c505579c
erased unnecessary lines
2023-06-17 15:27:09 +09:00
NM512
02c3d45fcf
modification of expl.
2023-05-21 08:17:47 +09:00
NM512
b984e69b6e
added state input capability
2023-05-14 23:38:46 +09:00
NM512
0eb66997fb
learnable initial state options for RSSM
2023-04-29 07:54:03 +09:00
NM512
2a8b44eb0c
erased unnecessary code
2023-04-27 07:42:08 +09:00
NM512
628b856c63
changed the discount head to predict terminal
2023-04-22 09:34:23 +09:00
NM512
942eae10a9
updated result, requirements and torch version
2023-03-24 07:51:57 +09:00
NM512
6273444394
modified based on author's implementation
2023-03-18 08:38:23 +09:00
NM512
fb5c21557a
Initial Commit
2023-02-12 22:35:25 +09:00