add instructions for distributed training

This commit is contained in:
Nicklas Hansen
2024-01-07 11:55:07 -08:00
parent a7ff00b0cd
commit 33876d124f

View File

@@ -112,6 +112,8 @@ $ python train.py task=walker-walk obs=rgb
We recommend using default hyperparameters for single-task online RL, including the default model size of 5M parameters (`model_size=5`). Multi-task offline RL benefits from a larger model size, but larger models are also increasingly costly to train and evaluate. Available arguments are `model_size={1, 5, 19, 48, 317}`. See `config.yaml` for a full list of arguments.
**As of Jan 7, 2024 the TD-MPC2 codebase also supports multi-GPU training for multi-task offline RL experiments**; use branch `distributed` and argument `world_size=N` to train on `N` GPUs. We cannot guarantee that distributed training will yield the same results, but they appear to be similar based on our limited testing.
----
## Citation