Reinforcement Learning on a Diet (RELOAD) – ANITI research chair

TL;DR

The RELOAD research chair aims to develop algorithms for frugal life-long reinforcement learning.
RELOAD is a research chair of the ANITI AI cluster.

Statement of purpose

Deep reinforcement learning (RL) – learning optimal behaviors from interaction data, using deep neural networks – is often seen as one of the next frontiers in artificial intelligence. While current RL algorithms do not escape the relentless pursuit of larger models, bigger data and more computation demands, we posit real-world impacts of RL will also stem from algorithms that are relevant in the small data regime, on reasonable computing architectures. RL is at a crossroads where one wishes to retain the versatility and representational abilities of deep neural networks, while coping with limited data and resources. Under such real world limitations, understanding how to preserve algorithmic convergence properties, robustness to uncertainties, worst case guarantees, transferable features, or behavior explanation elements is an open field. Hence, we endeavor to put RL on a diet, in order to reach a better understanding of frugal, life-long RL: its theoretical foundations, the many ways one can compensate for limited data, the sound algorithms one can design, and the practical impacts it can have on the many real world applications where, intrinsically, data is costly and resources are limited, ranging from autonomous robotics to personalized medicine.

The chair investigates topics in reinforcement learning, including (but not limited to):

reusable skill learning, including skills with generalization abilities, robustness guarantees, skill diversity, invariance properties,
inductive biases and priors in skill learning,
life-long and frugal (both in data and computation) skill adaptation, transfer between tasks.

The endeavor to develop frugal (both in terms of computation and data) algorithms for life-long RL stems from a number of inspiration problems, such as:

onboard policy adjustment (in satellites, robots, and more generally devices with limited hardware),
robustness to dynamics uncertainty or dynamics change in mobile robots or more abstract systems,
mixing data from multi-fidelity simulations for fluid control,
rapid response to various operation research problems,
learning dynamic treatment regimes for patients with little data,
language (and action) models fine-tuning.

The chair does not aim to tackle all these problems, and the list itself is non-exhaustive. We are happy to expand this list and collaborate with partners depending on opportunities.

Members

Emmanuel Rachelson (Chair holder, ISAE-SUPAERO professor)
Alexandre Albore (ONERA researcher)
David Bertoin (INSA assistant professor)

Hiring

We post our job offers here.