add instructions for distributed training

2024-01-07 11:55:07 -08:00
parent a7ff00b0cd
commit 33876d124f
1 changed files with 2 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -112,6 +112,8 @@ $ python train.py task=walker-walk obs=rgb

 We recommend using default hyperparameters for single-task online RL, including the default model size of 5M parameters (`model_size=5`). Multi-task offline RL benefits from a larger model size, but larger models are also increasingly costly to train and evaluate. Available arguments are `model_size={1, 5, 19, 48, 317}`. See `config.yaml` for a full list of arguments.

+**As of Jan 7, 2024 the TD-MPC2 codebase also supports multi-GPU training for multi-task offline RL experiments**; use branch `distributed` and argument `world_size=N` to train on `N` GPUs. We cannot guarantee that distributed training will yield the same results, but they appear to be similar based on our limited testing.
+
 ----

 ## Citation