Expressive Value Learning for
Scalable Offline Reinforcement Learning

Paper | Project Page

Abstract

Reinforcement learning (RL) is a powerful paradigm for learning to make sequences of decisions. However, RL has yet to be fully leveraged in robotics, principally due to its lack of scalability. Offline RL offers a promising avenue by training agents on large, diverse datasets, avoiding the costly real-world interactions of online RL. Scaling offline RL to increasingly complex datasets requires expressive generative models such as diffusion and flow matching. However, existing methods typically depend on either backpropagation through time (BPTT), which is computationally prohibitive, or policy distillation, which introduces compounding errors and limits scalability to larger base policies. In this paper, we consider the question of how to develop a scalable offline RL approach without relying on distillation or backpropagation through time. We introduce Expressive Value Learning for Offline Reinforcement Learning (EVOR): a scalable offline RL approach that integrates both expressive policies and expressive value functions. EVOR learns an optimal, regularized Q-function via flow matching during training. At inference-time, EVOR performs inference-time policy extraction via rejection sampling against the expressive value function, enabling efficient optimization, regularization, and compute-scalable search without retraining. Empirically, we show that EVOR outperforms baselines on a diverse set of offline RL tasks, demonstrating the benefit of integrating expressive value learning into offline RL.

Installation

pip install -r requirements.txt

Datasets

We use the 100M dataset for cube-double. Instructions on how to download it can be found on https://github.com/seohongpark/horizon-reduction. To use the dataset, include the following flag in the command line --ogbench_dataset_dir=[path/to/cube-double-play-100m-v0/].

Running `EVOR`

We include the example command for all the methods we evaluate in our paper below. For the scene environment, use --sparse=True. All training and evaluation parameters remain the same for EVOR.

# antmaze-large-navigate (tasks 1-5)
python main.py --env_name=antmaze-large-navigate-singletask-task1-v0

# antmaze-large-stitch (tasks 1-5)
python main.py --env_name=antmaze-large-stitch-singletask-task1-v0

# cube-double-play-100M (tasks 1-5)
python main.py --env_name=cube-double-play-singletask-task1-v0 --og_dataset_dir=/path/to/cube-double-play-100m-v0

# pointmaze-medium-navigate (tasks 1-5)
python main.py --env_name=antmaze-large-navigate-singletask-task1-v0

# scene-play-sparse (tasks 1-5)
python main.py --env_name=antmaze-large-navigate-singletask-task1-v0 --sparse=True

Acknowledgments

This codebase is built on top of QC and FQL.

Citation

@article{espinosa2025expressive,
  title={Expressive Value Learning for Scalable Offline Reinforcement Learning},
  author={Espinosa-Dice, Nicolas and Brantley, Kiante and Sun, Wen},
  journal={arXiv preprint},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agents		agents
envs		envs
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
evaluation.py		evaluation.py
log_utils.py		log_utils.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Expressive Value Learning for
Scalable Offline Reinforcement Learning

Paper | Project Page

Abstract

Installation

Datasets

Running `EVOR`

Acknowledgments

Citation

About

Uh oh!

Languages

License

nico-espinosadice/expressive-value-learning

Folders and files

Latest commit

History

Repository files navigation

Expressive Value Learning forScalable Offline Reinforcement Learning

Paper | Project Page

Abstract

Installation

Datasets

Running EVOR

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

Expressive Value Learning for
Scalable Offline Reinforcement Learning

Running `EVOR`