Skip to content

How to colocate generator and actor on the same device #769

@shendiaomo

Description

@shendiaomo

📚 The doc issue or request

For small models, my understanding is that torchforge supports the following colocation strategy:

  1. offload torchtitan's weights to torchstore
  2. run the generator on the GPU cards
  3. release generator's GPU memory occupation
  4. reload torchtitan's weights and continue training

But I can't see such an application in the demos

Suggest a potential alternative/fix

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions