Evangelos Kanoulas

Professor

University of Amsterdam, Informatics Institute

Lab42 Room: L5.57

Science Park 904, 1098 XH Amsterdam, The Netherlands

Email: e dot kanoulas at uva.nl |

https://staff.fnwi.uva.nl/e.kanoulas/

Biography

I am a professor of Artificial Intelligence at the University of Amsterdam, where my research focuses on Information Retrieval, Language Technology, and Evaluation. I work on developing AI systems that retrieve, reason, interact, and generate in more autonomous and reliable ways, and I lead research on robustness, multimodal information access, and evaluation science. My contributions span conversational and agentic information access, adversarial robustness for retrieval models, speech- and vision-based retrieval, and new approaches to evaluating both retrieval and generative systems. I also co-lead several international benchmarking efforts, including TREC and CLEF, and collaborate regularly with industry on applied AI problems.

Beyond academia, I am involved in several entrepreneurial initiatives. Through LYDS.AI, I work with organizations on applied AI projects that require deep technical expertise. I co-founded Calli Labs, a joint venture focused on AI-driven tools for video content analysis and creation. Previously I co-founded Ellogon.AI a start up with a focus on improving cancer patient treatment. These activities reflect my broader interest in connecting research with real-world applications across domains such as media, diagnostics, and public-sector innovation.

Research Interests

Information Retrieval
Agentic AI
Evaluation Methods
Language Technology
Recommender Systems

Education

Ph.D., Computer Science, 2004-2009
Northeastern University, Boston, MA, USA
Dissertation: Building Reliable Test and Training Collections in Information Retrieval
M.Sc., Computer Science, 2002-2004
Northeastern University, Boston, MA, USA
Specialization: Database Systems
B.Sc., Applied Informatics, 1998-2002
University of Macedonia, Thessaloniki, Greece
Specialization: Database Systems

Research Pillars

Agent-Centric Information Retrieval

As large language models become increasingly specialized, information access is evolving from retrieving documents to orchestrating networks of intelligent agents. My research explores this transition: not only how agents use retrieval as a tool, but how retrieval itself becomes the mechanism for identifying the right agents, coordinating them, and enabling structured reasoning across their outputs.

This vision is articulated in Agent-Centric Information Access, which proposes a future ecosystem of millions of expert LLMs—each with unique, domain-specific capabilities. In such a system, answering a query requires dynamically estimating expertise, retrieving the most relevant agents, querying them efficiently, and synthesizing their responses. The paper introduces the first scalable evaluation framework for this setting, enabling experimentation with thousands of synthetic expert agents generated via clustering and retrieval-augmented generation.

To make agent-centric retrieval effective, however, agents must operate over reasoning-friendly representations. Complex queries involving negation, sets, and logical operators cannot be handled reliably by today's dense retrieval systems. In A Comprehensive Taxonomy of Negation for NLP and Neural Retrievers (preprint), we formalize the different forms of negation encountered in natural language and map out where current neural retrievers systematically break. Complementing this, Constructing Set-Compositional and Negated Representations for First-Stage Ranking proposes new representation-learning strategies that encode set operations and negation directly into embeddings—laying the representational foundations necessary for logical and compositional reasoning.

These representational advances connect directly to agent-centric retrieval: agents operating over reasoning-aware embeddings can interpret complex queries, delegate sub-tasks, and coordinate through structured operations, making them far more capable collaborators.

Reasoning also requires agents to decide how to search. In Query Decomposition for RAG: Balancing Exploration–Exploitation, we study how LLMs break down multi-faceted queries into sub-queries that balance known relevant directions with the need to explore new ones. Decomposition naturally supports multi-agent retrieval pipelines, where different sub-queries can be routed to highly specialized expert agents.

Finally, retrieval becomes an action within a broader agentic decision-making process. ChatR1: Reinforcement Learning for Conversational Reasoning and Retrieval-Augmented Question Answering demonstrates how reinforcement learning can train agents to choose when to issue retrieval queries, how to integrate retrieved evidence, and how to conduct multi-turn reasoning with external knowledge sources.

Together, these works outline a cohesive agenda: to transform information retrieval into an agentic ecosystem where reasoning-aware representations, expert-agent retrieval, and retrieval-as-action combine to support scalable, collaborative multi-agent intelligence.

Robustness & Reliability of Neural Retrieval Systems

Neural retrieval models have become the dominant paradigm in modern search, yet they remain surprisingly brittle. My research investigates why these models fail when exposed to real-world conditions—and how we can build retrieval systems that remain accurate under noise, distribution shift, and adversarial pressure.

A first source of fragility arises from natural perturbations found in realistic data pipelines. In speech-based retrieval, for example, automatic speech recognition (ASR) injects systematic errors that degrade dense retriever performance. I explore this through multimodal and multilingual settings in Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering and in the FIRE shared task report Findings of Shared Task on Spoken Query Cross-Lingual Information Retrieval for the Indic Languages at FIRE 2024. Similarly, OCR errors and irregular linguistic forms in historical documents expose weaknesses in standard text models, as shown in Too Young to NER: Improving Entity Recognition on Dutch Historical Documents.

Even small textual perturbations, such as typos, can lead to dramatic retrieval failures. To address this, Improving the Robustness of Dense Retrievers Against Typos via Multi-Positive Contrastive Learning demonstrates how noise-aware contrastive training can significantly harden dense retrievers against minor surface variations.

Beyond natural noise, modern retrieval models also suffer from adversarial vulnerabilities, particularly in the form of corpus poisoning. Unlike traditional classifiers, retrieval systems are vulnerable not only through malicious queries, but also through manipulations of the indexed collection itself. My work in Unsupervised Corpus Poisoning Attacks in Continuous Space for Dense Retrieval and Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval reveals that subtle manipulations in embedding space can systematically redirect retrieval toward attacker-selected documents, even without query access or supervision. These studies highlight a significant, underexplored threat model for real-world search systems.

Together, these works motivate a broader scientific agenda: to develop benchmarks, simulation frameworks, adversarial testbeds, and robust training methodologies that ensure retrieval models behave reliably under realistic and hostile conditions.

Going forward, I aim to build unified robustness evaluations spanning typos, OCR/ASR noise, historical document variability, paraphrasing, adversarial perturbations, and also semantically challenging cases such as negation and set-based compositional queries. This direction supports a long-term vision of retrieval systems that remain faithful, stable, and interpretable—even when deployed in messy, imperfect, multimodal, or adversarial environments.

Modeling, Quantifying, and Acting on Uncertainty in Retrieval

Even the most advanced neural retrievers operate under inherent uncertainty: uncertainty about the query's intent, the relevance of retrieved documents, and the reliability of the model's own internal representations. My work in this pillar seeks to answer two fundamental questions: How can we formally quantify uncertainty in retrieval models? How should retrieval systems act when uncertainty cannot be eliminated?

A first step is establishing what uncertainty means in Retrieval-Augmented Generation (RAG) and dense retrieval pipelines. In Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis, we show that many popular uncertainty estimation methods fail even basic axiomatic criteria when applied to retrieval. This work demonstrates that uncertainty in RAG is fundamentally different from uncertainty in classification and calls for new, retrieval-centric foundations.

Understanding retrieval uncertainty also requires studying how systems behave in constrained or perturbed environments. Multivariate Dense Retrieval: A Reproducibility Study Under a Memory-Limited Setup investigates how multivariate formulations of dense retrieval degrade or stabilize under strict memory constraints. This work reveals that uncertainty is shaped not only by data and models but also by the practical resource limits common in real-world deployments.

When uncertainty cannot be fully resolved internally, retrieval systems must interact with users to reduce it. This motivates a line of research on clarifying questions, aiming to close the gap between ambiguous user intent and system predictions. In Corpus-Informed Retrieval Augmented Generation of Clarifying Questions, we propose a retrieval-guided, evidence-grounded method for generating clarifying questions that reduce ambiguity before answer generation.

This builds upon a substantial body of user-centric research examining how people perceive, respond to, and benefit from clarification in search interactions. Through controlled user studies—A User Study on Asking Clarifying Questions in Web Search and Asking Clarifying Questions: To Benefit or to Disturb Users in Web Search?—we investigate when clarification helps, when it harms, and how users interpret system uncertainty. In Users Meet Clarifying Questions: Toward a Better Understanding of User Interactions for Search Clarification, we further show that effective clarification requires aligning question generation with user expectations, cognitive load, and interaction flow.

Taken together, these works outline an emerging research agenda: to build retrieval systems that are not only robust and accurate, but also self-aware—capable of estimating their own uncertainty, recognizing when they do not know enough, and taking principled actions (such as asking clarifying questions) to reduce ambiguity. This pillar connects formal axiomatic analysis, empirical reproducibility, user behavior studies, and interactive retrieval to move toward retrieval systems that reason about their own uncertainty and collaborate with users to overcome it.