Extremum Flow Matching for
Offline Goal Conditioned Reinforcement Learning

Quentin Rouxel1,2             Clemente Donoso1             Fei Chen2             Serena Ivaldi1             Jean-Baptiste Mouret1             1Inria, CNRS, UniversitĂ© de Lorraine, France 2Department of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong logo inria logo cnrs logo university lorraine logo loria logo eurobin logo eu logo pepr logo clover logo cuhk

Abstract

goal conditioned policy

Imitation learning is a promising approach for enabling generalist capabilities in humanoid robots, but its scaling is fundamentally constrained by the scarcity of high-quality expert demonstrations. This limitation can be mitigated by leveraging suboptimal, open-ended play data, often easier to collect and offering greater diversity. This work builds upon recent advances in generative modeling, specifically Flow Matching, an alternative to Diffusion models. We introduce a method for estimating the extremum of the learned distribution by leveraging the unique properties of Flow Matching, namely, deterministic transport and support for arbitrary source distributions. We apply this method to develop several goal-conditioned imitation and reinforcement learning algorithms based on Flow Matching, where policies are conditioned on both current and goal observations. We explore and compare different architectural configurations by combining core components, such as critic, planner, actor, or world model, in various ways. We evaluated our agents on the OGBench benchmark and analyzed how different demonstration behaviors during data collection affect performance in a 2D non-prehensile pushing task. Furthermore, we validated our approach on real hardware by deploying it on the Talos humanoid robot to perform complex manipulation tasks based on high-dimensional image observations, featuring a sequence of pick-and-place and articulated object manipulation in a realistic kitchen environment.

Goal Conditioned Manipulation on Talos Humanoid Robot

Methodology

Objectives:

  • Build a generalist humanoid robot capable of complex, long-horizon manipulation tasks.
  • Use goal-conditioned imitation learning with generative methods. No simulation, no reward design, or task labels required.
  • Learn from play data: open-ended, diverse, exploratory demonstrations not tied to specific tasks or goals.
  • Play data is easier and cheaper to collect, enabling scalable training across multiple tasks and environments.

Challenges:

  • Optimality: Play data contains both direct and inefficient paths to reach goals. Agents must identify and prefer the most efficient actions.
  • Stitching: Full paths to specific distant goals are rarely demonstrated in play data. Agents must learn to piece together meaningful segments to reach long-horizon targets.
GCRL Challenges

Main Ideas

  • Introduce Extremum Flow Matching to address optimality by estimating minimum and maximum of conditional distributions.
  • Propose several goal-conditioned imitation and offline reinforcement learning agents based on Flow Matching.
  • Evaluate agents on OGBench, analyze the impact of data collection strategies, and validate on the real Talos humanoid robot.

Whole-Body Low-Level Controller

Extremum Flow Matching

extremum flow matching
  • Estimates the minimum or maximum of a distribution using Flow Matching.
  • Leverages Flow Matching’s unique properties: deterministic transport and support for arbitrary source distributions, unlike Diffusion.
  • Serves as a principled alternative to Expectile Regression for offline reinforcement learning.
  • Extends to multi-dimensional distributions using a structured approach similar to the conditioning-on-returns framework.

Comparison and Impact of Demonstration Behaviors

dataset expert
dataset play in full
dataset play in partitioned

FM-GC

FM-AC-no-RL

FM-PC-no-RL

FM-PS-no-RL

FM-AS-no-RL

FM-AC-use-RL

FM-PC-use-RL

FM-PS-use-RL

FM-AS-use-RL

Dataset Expert Reach Goal

Dataset Play in Full Space

Dataset Play in Partitioned Spaces

BibTeX

Paper pages
@article{rouxel2025extremumflowmatchingoffline,
      title={Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning}, 
      author={Quentin Rouxel and Clemente Donoso and Fei Chen and Serena Ivaldi and Jean-Baptiste Mouret},
      year={2025},
      journal={arXiv preprint arXiv:2505.19717},
}