$ whoami

Carel van Niekerk

Machine Learning Researcher & Engineer

I build and align large language models — and the tools that train them.

Düsseldorf, Germany

about

Research scientist and engineer specialising in reinforcement learning post-training and LLM alignment. I work on model trustworthiness, uncertainty quantification, and hallucination reduction — bridging rigorous mathematical theory and production-level engineering through scalable, reliable, and modular training frameworks.

I hold a PhD (magna cum laude) in computer science from Heinrich Heine University Düsseldorf, where I also worked as a postdoctoral researcher on reinforcement learning post-training and LLM alignment. With 10+ years across academia and industry, my work spans publications at NeurIPS, ACL, and EMNLP — and the open-source tooling that made them possible.

def research_themes():

Agentic & Tool-Augmented Systems

Reinforcement learning and multi-agent coordination for long-horizon decision making in agentic dialogue and tool-using systems.

Alignment-Oriented Post-Training

Reinforcement learning methods for aligning large language models using intrinsic and self-supervised reward signals, reducing reliance on external human preference data.

Scalable Research Infrastructure

Reproducible, configurable, and distributed training systems enabling rapid experimentation across HPC and cloud environments.

Uncertainty-Aware Reasoning

Bayesian and distributional methods for uncertainty estimation, calibration, and robustness — applied to trustworthy and controllable AI systems.

highlights

$ tail -n 4 ~/highlights.log

Oct 2025

Paper accepted at NeurIPS 2025

"Less is More: Local Intrinsic Dimensions of Contextual Language Models" — using the geometry of contextual embeddings to study LLM training dynamics and generalisation.
Jul 2025

RLSF preprint released

"Post-Training Large Language Models via Reinforcement Learning from Self-Feedback" — using the model's own confidence as an intrinsic reward, no human feedback required.
Jul 2025

CAMELL published in TACL & presented at ACL 2025

"A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction" — confidence-driven active learning and label validation for sequential multi-output tasks.
Apr 2024

PhD awarded (magna cum laude)

Dissertation "Uncertainty Estimation, Management, and Utilisation in Human-Computer Dialogue" defended at Heinrich Heine University Düsseldorf.

experience

$ git log --oneline --career

bd93f9 (HEAD -> now)

Postdoctoral Researcher @ Heinrich Heine University

Mar 2024 → Jun 2026 · Düsseldorf, Germany
- Agentic Multi-Agent Reinforcement Learning — Led the development of a MARL framework for telephonic dialogue systems, enabling coordinated decision-making between router and expert agents with explicit credit assignment — improving routing accuracy by over 15 percentage points in a production-level dialogue product.
- HydraXcel — Sole developer and maintainer of an open-source, configuration-driven deep learning experiment launcher integrating Hydra, Hugging Face Accelerate, and the UV workflow — enabling seamless, scalable multi-GPU and distributed training for the research team.
- HPC & Cloud Training Infrastructure — Designed Hydra launcher plugins for transparent experiment execution on SLURM-managed HPC clusters and SkyPilot-orchestrated cloud platforms, enabling high-throughput experimentation and rapid switching between compute backends without code changes.
- Academic Leadership — Supervised multiple Master's theses on MARL and task-oriented dialogue. Designed and taught the "Implementing Transformers" course — building the Attention Is All You Need architecture from first principles in PyTorch — achieving a 95% course pass rate.
ff79c6

PhD Candidate @ Heinrich Heine University

Jul 2019 → Mar 2024 · Düsseldorf, Germany
- Uncertainty-Aware Decision Making — Developed computationally efficient uncertainty quantification methods for intent classification in collaboration with Yandex Research. Integrated uncertainty features into RL policies, improving real-user interaction success by 5 percentage points, and designed an active learning strategy that matched full-dataset performance using only 16% of expert annotations.
- ConvLab-3 Dialogue Systems Toolkit — Core developer of a large-scale dialogue system toolkit in collaboration with Tsinghua University and Microsoft Research. Architected a unified data format enabling seamless integration of heterogeneous datasets and models — adopted in 30+ research papers spanning RL- and LLM-based dialogue agents.
- YRRSDS 2022 Co-organiser — Co-organised the Young Researchers Roundtable on Spoken Dialogue Systems, collocated with SIGDIAL in Edinburgh — managing digital infrastructure, branding, and sponsorship acquisition.
8be9fd

AI Applications Consultant @ NGA Risksecure

Jun 2018 → May 2019 · Pretoria, South Africa
- Named Entity Sentiment Analysis — Co-developed a sentiment scoring system for news-based entities, delivering reliable quantitative metrics to banking clients at under 70% of the cost of manual analysis.
- Multimodal Computer Vision — Built a proof-of-concept application combining visual and sensor data to monitor greenhouse plant health for a CBD producer in Southern Africa.

projects

HydraXcel

Configuration-driven deep learning experiment launcher

Open-source experiment launcher unifying Facebook Hydra, Hugging Face Accelerate, and the UV workflow. One config launches anything from a local debug run to multi-GPU distributed training on SLURM clusters or SkyPilot-managed cloud — no code changes between backends.

Python
Hydra
Accelerate
UV
SLURM
SkyPilot

RLSF

Reinforcement Learning from Self-Feedback

Post-training method that uses a language model's own confidence as an intrinsic reward signal — aligning LLMs and improving calibration and reasoning without external human preference labels.

PyTorch
TRL
Transformers

coming soon

Agentic RL

Multi-agent RL for long-horizon agentic systems

Coordinated decision-making between router and expert agents with explicit credit assignment. Write-up in progress.

MARL
LangGraph

publications

selected work — NeurIPS · ACL · EMNLP · TACL

arXiv 2026

Post-Training Large Language Models via Reinforcement Learning from Self-Feedback

Carel van Niekerk, Renato Vukovic, Benjamin Matthias Ruppik, Hsien-chin Lin, Milica Gašić

pdf

TACL 2025

A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

Carel van Niekerk, Christian Geishauser, Michael Heck, Shutong Feng, Hsien-chin Lin, Nurul Lubis, Benjamin Ruppik, Renato Vukovic, Milica Gašić

pdf doi slides

NeurIPS 2025

Less is More: Local Intrinsic Dimensions of Contextual Language Models

Benjamin Matthias Ruppik, Julius von Rohrscheidt, Carel van Niekerk, Michael Heck, Renato Vukovic, Shutong Feng, Hsien-chin Lin, Nurul Lubis, Bastian Rieck, Marcus Zibrowius, Milica Gašić

pdf

SIGDial 2020 Best Paper Award

TripPy: A Triple Copy Strategy for Value Independent Neural Dialog State Tracking

Michael Heck, Carel van Niekerk, Nurul Lubis, Christian Geishauser, Hsien-Chin Lin, Marco Moresi, Milica Gasic

doi

view all publications

education & skills

class Education:

BSc in Actuarial and Financial Mathematics

University of Pretoria · 2013–2015 · Pretoria, South Africa
BSc (Hons) in Mathematical Statistics

University of Pretoria · 2016–2016 · Pretoria, South Africa
MSc in Mathematical Statistics

University of Pretoria · 2017–2018 · Pretoria, South Africa

Statistical learning, data analytics and visualization.
PhD in Computer Science

Heinrich Heine University · 2019–2024 · Düsseldorf, Germany

Magna cum laude. Thesis: Uncertainty Estimation, Management, and Utilisation in Human-Computer Dialogue.

class Skills:

Research

Reinforcement Learning (RLHF / intrinsic feedback)
Uncertainty Quantification
Self-supervised Learning
LLM Evaluation & Benchmarking
Human-in-the-loop
Model Debugging
Distribution Theory

Deep Learning

PyTorch
Transformers
TRL
Accelerate
Datasets

Agentic Systems & LLM APIs

LangGraph
DeepEval
OpenAI API
Vertex AI

Programming

Python (Advanced)
C++
Rust
JavaScript / TypeScript
SQL
Bash / Zsh
MyPy / Ty

Infrastructure & Cloud

DeepSpeed
SLURM
SkyPilot
Hydra
Distributed Training
Docker
Google Cloud / Cloud Run
Microsoft Azure
MongoDB

Engineering

Design Patterns
PyTest
Ruff
FastAPI
Pydantic
UV / Poetry

Languages

English (Native)
Afrikaans (Native)
German (Fluent)