✉️

About

PhD Candidate

Coordinator of computer science courses LAMSADE, Paris-Dauphine University

ENSAE Paris

Coordinator of computer science courses LAMSADE, Paris-Dauphine University

ENSAE Paris

I am a PhD candidate under the supervision of Rida Laraki, Laurent Gourvès and Guillaume Vigeral since 2019 at Paris-Dauphine University. I work at the intersection of computer science and game theory. I am currently focused on algorithm to perform learning in (stochastic) games. Prior to my PhD, I worked on programming languages and software verification.

I am also coordinator of computer science courses at ENSAE.

I am also coordinator of computer science courses at ENSAE.

## Learning in Stochastic Games

Strategic Behavior and No-Regret Learning in Queueing Systems

Feb 2023

This paper studies a dynamic discrete-time queuing model where at every period players get a new job and must send all their jobs to a queue that has a limited capacity. Players have an incentive to send their jobs as late as possible; however if a job does not exit the queue by a fixed deadline, the owner of the job incurs a penalty and this job is sent back to the player and joins the queue at the next period. Therefore, stability, i.e. the boundedness of the number of jobs in the system, is not guaranteed. We show that if players are myopically strategic, then the system is stable when the penalty is high enough. Moreover, if players use a learning algorithm derived from a typical no-regret algorithm (exponential weight), then the system is stable when penalties are greater than a bound that depends on the total number of jobs in the system.

Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

Nov 2022

Recent extensions to dynamic games (Leslie et al. [2020], Sayin et al. [2020], Baudin and Laraki [2022]) of the well-known fictitious play learning procedure in static games were proved to globally converge to stationary Nash equilibria in two important classes of dynamic games (zero-sum and identical-interest discounted stochastic games). However, those decentralized algorithms need the players to know exactly the model (the transition probabilities and their payoffs at every stage). To overcome these strong assumptions, our paper introduces regularizations of the systems in Leslie et al. [2020], Baudin and Laraki [2022] to construct a family of new decentralized learning algorithms which are model-free (players don’t know the transitions and their payoffs are perturbed at every stage). Our procedures can be seen as extensions to stochastic games of the classical smooth fictitious play learning procedures in static games (where the players best responses are regularized, thanks to a smooth strictly concave perturbation of their payoff functions). We prove the convergence of our family of procedures to stationary regularized Nash equilibria in zero-sum and identical-interest discounted stochastic games. The proof uses the continuous smooth best-response dynamics counterparts, and stochastic approximation methods. When there is only one player, our problem is an instance of Reinforcement Learning and our procedures are proved to globally converge to the optimal stationary policy of the regularized MDP. In that sense, they can be seen as an alternative to the well known Q-learning procedure.

Fictitious Play and Best-Response Dynamics in Identical Interest and Zero Sum Stochastic Games

Jul 2022

This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model called discounted stochastic game (Shapley, 1953). Our family of discrete-time FP procedures is proven to converge to the set of stationary Nash equilibria in identical interest discounted stochastic games. This extends similar convergence results for static games (Monderer & Shapley, 1996a). We then analyze the continuous-time counterpart of our FP procedures, which include as a particular case the best-response dynamic introduced and studied by Leslie et al. (2020) in the context of zero-sum stochastic games. We prove the convergence of this dynamics to stationary Nash equilibria in identical-interest and zero-sum dis- counted stochastic games. Thanks to stochastic approximations, we can infer from the continuous- time convergence some discrete time results such as the convergence to stationary equilibria in zero-sum and team stochastic games (Holler, 2020).

Fictitious Play and Best-Response Dynamics in Identical Stochastic Games

Nov 2021

This paper combines ideas from Q-learning and fictitious play, to define three reinforcement learning procedures which converge to the set of stationary mixed Nash equilibria in identical interest discounted stochastic games. First, we analyse three continuous-time systems that generalize the best-response dynamics defined by Leslie et al. for zero-sum discounted stochastic games. Under some assumptions depending on the system, the dynamics are shown to converge to the set of stationary equilibria in identical interest discounted stochastic games. Then, we introduce three analog discrete-time procedures in the spirit of Sayin et al. and demonstrate their convergence to the set of stationary equilibria using our results in continuous time together with stochastic approximation techniques. Some numerical experiments complement our theoretical findings.

## Verification

Disornamentation

Sep 2018

Ornaments, which have recently been put in the spotlight, are a way to describe changes in datatype definitions, reorganizing, adding, or dropping some pieces of data. Ornamentation is the process of translating code operating on the original datatype to code operating on the new one. A formalization and an implementation of ornaments has been proposed for the ML family of languages. Our work focuses on the opposite transformation, called disornamentation. We generalize the ornamentation framework developed for ML and based on a posteriori abstraction so that both ornamentation and disornamentation become instances of this framework, allowing more expressive relational transformations of datatypes. We adapt the ornamentation prototype to support such bidirectional transformations and use it to present several typical examples using disornamentation or a combination of ornamentation and disornamentation.

Deductive Verification with the Help of Abstract Interpretation

2017

Abstract interpretation is widely used in static analysis tools. However, to the best of our knowledge, it has not been extensively experimented in an interactive deductive verification context. This paper describes an analysis by abstract interpretation to infer loop invariants in the Why3 tool. The analysis takes advantage of the logic annotations present in the source code, including potential handwritten loop invariants. The inferred invariants are added to loops as if written by the user. They are checked by external provers and thus the analysis by abstract interpretation does not need to be trusted. Our analysis goes beyond numerical invariants. We describe two functors that add uninterpreted functions and quantifiers to a numerical abstract domain. The resulting domain is generic enough to reason about algebraic types, Booleans, or arrays. We show that it achieves a level of expressivity comparable to that of ad-hoc domains for arrays found in the literature.

## Presentations

HEC

January 31st 2023

In this presentation I showed our two papers about Fictitious Play and Smooth Fictitious play in stochastic games.