Skip to content

Decrease model precision #119

@alexandre239

Description

@alexandre239

Hi!

After seeing some issues related to OOM errors due to high prompt length, I was wondering if an option to generate sequences with Evo using higher prompts (>1kb and so on) would be, rather than GPU sharding, to decrease the float precision?

  • I believe that currently it is set to float16 (as in model.backbone = model.backbone.to(torch.bfloat16) from the generation_to_folding.py script), but would float8 be an option (is it anyhow compatible with Evo?)?
  • If so, do you expect a big decrease in generation performance, or do you already have data representing precision vs performance?

Thanks so much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions