23rd Winter school on Mathematical Finance
Abstracts



Minicourses

Xin Guo: A control-theoretical perspective of continuous-time reinforcement learning

Reinforcement Learning (RL) is a cornerstone of modern machine learning, enabling agents to learn optimal decision-making through interaction with complex environments and other agents. While RL was traditionally developed for discrete-time settings, many real-world physical and financial systems are intrinsically continuous. This mini-course provides a comprehensive overview of recent advancements in continuous-time RL, analyzed through the rigorous lens of stochastic control theory. It consists of three parts.

Part I. Foundations: Q-Functions and Dynamic Programming
We begin by establishing a bridge between RL and classical control theory. By leveraging the Dynamic Programming Principle (DPP) and the Feyman-Kac formula, we explore the similarities between these fields, specifically through the lens of continuous-time Q-functions, and martingale characterization.
References
[1] Gu, Guo, Wei, Xu (2020). Dynamic programming principle for mean field controls with learning. Operations Research, 2023. Arxiv: 1911.07314.
[2] Gu, X. Guo, Wei, Xu (2021) Mean-Field controls with Q-learning for cooperative MARL: convergence and complexity analysis. SIAM Journal on Mathematics of Data Science. ArXiv:2002.04131.
[3] Cheng, Guo, Zhang (2025). Bridging discrete and continuous RL: stable deterministic policy gradient with martingale characterization. arXiv:2509.23711.

Part II. The Theoretical Divide: Regret Analysis and BSDEs
To distinguish RL from classical control, we analyze ``the cost of learning'' through regret analysis. This section delves into the mathematical tools required to quantify performance gaps when the system model is unknown. We show how the regularity of Hamilton-Jacobi-Bellman (HJB) equations and Backward Stochastic Differential Equations (BSDEs) enables establishing regret bounds. In the particular case of Linear-Quadratic (LQ) Models: we present proofs for logarithmic regret in episodic finite-horizon RL via regularities of the Riccati equation.
References
[4] Basei, Guo, Hu, Zhang (2020). Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon. Journal of Machine Learning Research. Arxiv.org/abs/2006.15316.
[5] Guo, Hu, Zhang (2021). Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls. SIAM J. on Control Optimization (SICON). Arxiv.org/abs/2104.09311.

Part III. Modern Frontiers: Transfer Learning and Rough Paths
Finally, we explore how cutting-edge developments from Large Language Models (LLMs), such as transfer learning, can be integrated into RL to significantly boost algorithmic performance. We discuss fast policy learning for LQ control using entropy techniques to balance exploration and exploitation. We will also uncover how the regularity of rough paths provides a framework for analyzing policy transfer in continuous-time RL, offering a robust mathematical foundation for cross-domain adaptation.
References
[6] Guo, Li, Xu (2025). Fast policy learning for linear quadratic control with entropy regularization. SICON. ArXiv:2311.14168.
[7] Cao, Gu., Guo, and Rosenbaum (2024). Transfer learning for portfolio optimization. arXiv:2307.13546.
[8] G. and Lyu (2025). Policy transfer for continuous-time reinforcement learning: a (rough) differential equation approach, arxiv2510.15165.
Slides: Slides 1

Christian Bayer: Rough volatility modelling

Rough volatility models are stochastic volatility models with a "rough" stochastic volatility process. In this context, rough means that the process behaves like fractional Brownian motion with Hurst index $0 < H< 1/2$. Rough volatility models have become popular in the last few years, because they allow to take into account two consistent empirical observations:
[1] Realized variance on short-scales is much rougher than the asset price time-series itself. When estimating Hölder/scaling coefficients, we typically see values significantly smaller than $1/2$.
[2] The implied volatility surface exhibits a singularity for very short maturity. Specifically, the derivative w.r.t. the log-moneyness variable (a.k.a. "skew") shows a power-law explosion in terms of time-to-maturity going to $0$. As a consequence, rough volatility models are neither semi-martingales nor Markov processes, leading to considerable challenges in theoretical and numerical analysis.
In this mini-course, we will explore the path-regularity of realized variance time series of asset prices, and the empirical behavior of the implied volatility skew. This motivates the introduction of rough volatility models, specifically the rough Bergomi and rough Heston models. Using tools of the theory of large deviations, we show that rough volatility models exhibit power law behavior of the implied volatility skew. Finally, we study Markovian approximations of rough volatility models from a theoretical and a numerical point of view.
Code from Florian Bourgey’s excellent adaptation of Jim Gatheral’s code Link
References
[1] C. Bayer and P. K. Friz, M. Fukasawa, J. Gatheral, A. Jacquier, M. Rosenbaum. Rough volatility. SIAM (2023),
Link
[2] A. Alfonsi and A. Kebaier. Approximation of Stochastic Volterra Equations with kernels of completely monotone type. Mathematics of Computation. 93(346) (2024), pp. 643--677,
Link
[3] C. Bayer, B. Simon. Markovian approximations of stochastic Volterra equations with the fractional kernel. Quantitative Finance 23.1 (2023): 53-70,
Link
[4] C. Bayer, B. Simon. Weak Markovian approximations of rough Heston. arXiv preprint arXiv:2309.07023 (2023),
Link
[5] C. Bayer, B. Simon. Efficient option pricing in the rough Heston model using weak simulation schemes. Quantitative Finance 24.9 (2024): 1247-1261,
Link Homework: Notebook1 Notebook2 Notebook3
Slides: Slides

Special invited lectures

Eduardo Abi Jaber: Path-Signatures: Memory and Stationarity

We explore the interplay between path-signatures, memory, and stationarity, highlighting their implications for machine learning, representation of stochastic processes and applications in mathematical finance. In a first part, we provide explicit series expansions to certain stochastic path-dependent integral equations in terms of the path signature of the time augmented driving Brownian motion. Our framework encompasses a large class of stochastic linear Volterra and delay equations and in particular the fractional Brownian motion with a Hurst index H in (0, 1). Our expressions allow to disentangle an infinite dimensional Markovian structure. In addition they open the door to: (i) straightforward and simple approximation schemes that we illustrate numerically, (ii) representations of certain Fourier-Laplace transforms in terms of a non-standard infinite dimensional Riccati equation with important applications for pricing and hedging in quantitative finance. In a second part, we introduce a time-invariant version of the signature: the fading-memory signature, and establish powerful algebraic, analytic and probabilistic properties with applications to learning stationary relationships in time series. This is based on joint works with Paul Gassiat, Louis-Amand Gérard, Yuxing Huang, Dimitri Sotnikov.
References
[1] Abi Jaber, E., and Louis-Amand, G. Signature volatility models: pricing and hedging with Fourier. SIAM Journal on Financial Mathematics 16.2 (2025): 606-642, Link
[1] Abi Jaber, E., and Sotnikov, D. Exponentially Fading Memory Signature. arXiv:2507.03700 (2025),
Link
Slides:
Slides

Giorgia Callegaro: A stochastic Gordon-Loeb model for optimal cybersecurity investment under clustered attacks

We develop a continuous-time stochastic model for optimal cybersecurity investment under the threat of cyberattacks. The arrival of attacks is modeled using a Hawkes process, capturing the empirically relevant feature of clustering in cyberattacks. Extending the Gordon-Loeb model, each attack may result in a breach, with breach probability depending on the system's vulnerability. We aim at determining the optimal cybersecurity investment to reduce vulnerability. The problem is cast as a two-dimensional Markovian stochastic optimal control problem and solved using dynamic programming methods. Numerical results illustrate how accounting for attack clustering leads to more responsive and effective investment policies, offering significant improvements over static and Poisson-based benchmark strategies. Our findings underscore the value of incorporating realistic threat dynamics into cybersecurity risk management. This is a joint work with: C. Fontana, C. Hillairet and B. Ongarato.
References
Callegaro, Giorgia, et al. "A stochastic Gordon-Loeb model for optimal cybersecurity investment under clustered attacks." arXiv preprint arXiv:2505.01221 (2025), Link
Slides:
Slides

Short lectures

Josha Dekker: Stochastic optimal control with randomly arriving control moments

Control problems with randomly arriving control moments occur naturally. Financial situations in which control moments may arrive randomly are e.g., asset-liquidity spirals or optimal hedging in illiquid markets. We develop methods and algorithms to analyze such problems in a continuous time finite horizon setting, under mild conditions on the arrival process of control moments. Operating on the random timescale implied by the control moments, we obtain a discrete time, infinite-horizon problem. This problem may be solved accordingly or suitably truncated to a finite-horizon problem. We develop a stochastic primal-dual simulation-and-regression algorithm that does not require knowledge of the transition probabilities, as these may not be readily available for such problems. To this end, we present a corresponding dual representation result. We also discuss some insights regarding choices of regression functions and sampling methods and illustrate their effect on the duality gaps. We then apply our methods to several examples, where we explore in particular the effect of randomly arriving control moments on the optimal control policies. Joint work with Roger J.A. Laeven, John G.M. Schoenmakers and Michel H. Vellekoop

Guanyu Jin: Constructing Uncertainty Sets for Robust Risk Measures: A Composition of $\phi$-Divergences Approach to Combat Tail Uncertainty

Risk measures, which typically evaluate the impact of large losses, are highly sensitive to model misspecification in the tails. In this talk, we discuss a robust optimization approach to combat tail uncertainty by proposing a unifying framework to construct uncertainty sets for a broad class of risk measures, given a specified nominal model. Our framework is based on a parametrization of robust risk measures using two (or multiple) $\phi$-divergence functions, which enables us to provide uncertainty sets that are tailored to both the sensitivity of each risk measure to tail losses and the tail behavior of the nominal distribution. In addition, our formulation allows for a tractable computation of robust risk measures, and elicitation of the $\phi$-divergences that describe a decision maker's risk and ambiguity preferences. We illustrate and implement our results in several examples, including a newsvendor problem and a financial hedging problem.

Gijs Mast : A COS-tensor Framework for Credit Exposure Calculations

Monte Carlo (MC) simulation methods remain the predominant approach for computing credit exposures in the pricing and risk management practices of the financial industry, owing to their flexibility, implementation simplicity, and transparency. However, accurate computation of high-quantile exposure metrics for large portfolios remains time consuming due to the intrinsically slow convergence of MC simulation. This paper introduces a novel ``COS-tensor'' framework that transcends MC simulation. It has the potential to serve as a computationally efficient alternative, particularly for large, liquid portfolios, while maintaining sufficient flexibility and transparency. Our key insight is that the problem can be transformed and solved in the Fourier domain through two steps: First, rather than generating total exposure samples as in MC methods, we numerically compute the characteristic function (ch.f.) of the total exposure and subsequently recover the cumulative distribution function via the one-dimensional COS method, see Fang and Oosterlee (2009). Second, to circumvent the curse of dimensionality in ch.f. computation, we apply tensor decomposition to the Fourier-cosine coefficient tensor of the joint density function. This ``COS-tensor'' approach constitutes a general framework that generates distinct dimensionally reduced cosine expansions for different tensor decomposition techniques, effectively shifting the curse of dimensionality to the offline training phase for tensor decomposition. The main part of this paper builds and studies the COS-CPD method, resulted from inserting low-rank Canonical Polyadic (CP) decomposition into the COS-tensor framework. A secondary innovation herewith is our Fourier-domain training algorithm for offline CP decomposition, which demonstrates over 100-fold improvements in speed and accuracy compared to physical-domain backpropagation with gradient descent. Extensive testing on real-sized portfolios shows that achieving 0.1\% error for portfolios of thousands of trades under seven risk factors requires only a fraction of the computation time of MC simulation. Results confirm our theoretical error analysis: exponential convergence with respect to Fourier-cosine terms and quadrature points at netting-set level, versus algebraic convergence at counterparty level. Notably, the computational performance remains largely unaffected as portfolio size increases, which is a stark contrast to MC methods. The computational bottleneck lies in the offline training, where the dimensionality curse for risk factors persists. This limitation, combined with diverse extension avenues, indicates substantial potential for further research. For instance, as we will demonstrate in a companion paper, using Tensor Train decomposition can markedly alleviate high-dimensional training constraints. We further note that existing instrument-level acceleration methods remain compatible with our framework, and for portfolios with numerous risk factors, the COS-tensor methods can serve as effective variance reduction techniques for MC simulation.

Marco Zullino: Dynamic star-shaped risk measures via BSDEs

In this talk, we present characterization results for dynamic return and star-shaped risk measures induced by backward stochastic differential equations (BSDEs). We begin by characterizing a broad family of static star-shaped functionals in a locally convex Fréchet lattice. Then, using the Pasch-Hausdorff envelope, we construct a suitable family of convex drivers of BSDEs, which induce a corresponding family of dynamic convex risk measures. The dynamic return and star-shaped risk measures arise as their essential minimum. Moreover, we prove that if the set of star-shaped supersolutions of a BSDE is non-empty, then for each terminal condition there exists at least one convex BSDE with a non-empty set of supersolutions, yielding the minimal star-shaped supersolution. Finally, we illustrate our theoretical results with a few examples

Poster presentations

  • Cyril Nefzaoui Blanchard (University of Evry-Paris-Saclay): Deep BSDE method for Quantile Hedging
  • Linn Engström (KTH Royal Institute of Technology): Computation of Robust Option Prices via Structured Multimarginal Martingale Optimal Transport
  • Diogo Sousa Franquinho (University of Lisbon): Neural network empowered liquidity pricing in a two-price economy under conic finance settings
  • Markus Karl (The London School of Economics and Political Science): Reconstructing Financial Networks Under Netting Constraints
  • Ivo Richert (Kiel University): Estimation of dynamically recalibrated affine models in finance
  • Truong Nguyen (Utrecht University): Single- and Multilevel Fourier–RQMC Methods for Estimating Multivariate Shortfall Risk
  • Laura Voss (CAU Kiel): Data-Driven Optimal Stopping of Multidimensional Diffusions via Non-Parametric Estimation
  • Chaorui Wang (University of Bath): A measure-valued HJB perspective on Bayesian optimal adaptive control

To the homepage of the Winter School on Mathematical Finance