Herke van Hoof is currently assistant professor at the University of Amsterdam in the Netherlands, where he is part of the Amlab. He is interested in reinforcement learning with structured data and prior knowledge. Reinforcement learning is a very general framework, but this tends to result in extremely data-hungry algorithms. Exploiting structured prior knowledge, or using value function or policy parametrizations that respect known structural properties, is a promising avenue to learn more with less data. Examples of this line of work include reinforcement learning (RL) for combinatorial optimisation, RL with symbolic prior knowledge, and equivariant RL.  

Before joining the University of Amsterdam, Herke van Hoof was a postdoc at McGill University in Montreal, Canada, where he worked with Professors Joelle Pineau, Dave Meger, and Gregory Dudek. He obtained his PhD at TU Darmstadt, Germany, under the supervision of Professor Jan Peters, where he graduated in November 2016. Herke got his bachelor and master degrees in Artificial Intelligence at the University of Groningen in the Netherlands.

Recent news

  • AAMAS paper accepted! (2/8/2024)

    Our paper on Uncoupled Learning of Differential Stackelberg Equilibria with Commitments has been accepted for publication at AAMAS. Congratulations, Robert and Mert!

    Preprint is accessible here.

  • PhD positions on machine learning for fintech (1/26/2024)

    At Amlab, we have two open PhD positions on machine learning in the fintech domain (in collaboration with Adyen). One student will work with Sara Magliacane on causal machine learning, and the second student will work with me on reinforcement learning. Deadline for applications is March 11th. For all details and how to apply, please see the official vacancy.

  • PhD position on reinforcement learning for controlling critical infrastructure (11/27/2023)

    We are looking for a PhD candidate to work on several fundamental questions necessary for allowing AI methods to support human operators in critical infrastructure. For example, can machine learning tools be combined with conventional optimisers to improve safety and data efficiency? Can complex, structured decisions be made jointly by a human operator and an artificial agent? Can such algorithms deal with hierarchies in decision making, and how can decisions or models be explained and verified? Full details and instructions to apply can be found on this page.

An archive of news items can be found on the News page.

Highlighted publications
Kuric, David; Hoof, Herke: Reusable Options through Gradient-based Meta Learning. In: Transactions on Machine Learning Research, vol. 03/2023, 2023. (Type: Journal Article | Links | BibTeX)
Gagrani, Mukul; Rainone, Corrado; Yang, Yang; Teague, Harris; Jeon, Wonseok; Hoof, Herke; Zeng, Weiliang Will; Zappi, Piero; Lott, Christopher; Bondesan, Roberto: Neural Topological Ordering for Computation Graphs. In: Advances in Neural Information Processing Systems, 2022. (Type: Proceedings Article | Links | BibTeX)
Pol, Elise; Hoof, Herke; Oliehoek, Frans; Welling, Max: Multi-Agent MDP Homomorphic Networks. In: Proceedings of the International Conference on Learning Representations, 2022. (Type: Proceedings Article | Links | BibTeX)
Kool, Wouter; Hoof, Herke; Welling, Max: Estimating Gradients for Discrete Random Variables by Sampling without Replacement. In: International Conference on Learning Representations, 2020. (Type: Proceedings Article | Links | BibTeX)
Smith, M.; Hoof, H.; Pineau, J.: An Inference-Based Policy Gradient Method for Learning Options. In: International Conference on Machine Learning, pp. 4703-4712, 2018. (Type: Proceedings Article | Links | BibTeX)
Hoof, H. Van; Neumann, G.; Peters, J.: Non-parametric Policy Search with Limited Information Loss. In: Journal of Machine Learning Research, vol. 18, no. 73, pp. 1-46, 2017. (Type: Journal Article | Links | BibTeX)