Skip to content

$ whoami

Carel van Niekerk

Machine Learning Researcher & Engineer

I build and align large language models — and the tools that train them.

Düsseldorf, Germany

scroll

about

Research scientist and engineer specialising in reinforcement learning post-training and LLM alignment. I work on model trustworthiness, uncertainty quantification, and hallucination reduction — bridging rigorous mathematical theory and production-level engineering through scalable, reliable, and modular training frameworks.

I hold a PhD (magna cum laude) in computer science from Heinrich Heine University Düsseldorf, where I also worked as a postdoctoral researcher on reinforcement learning post-training and LLM alignment. With 10+ years across academia and industry, my work spans publications at NeurIPS, ACL, and EMNLP — and the open-source tooling that made them possible.

Portrait of Carel van Niekerk

def research_themes():

Agentic & Tool-Augmented Systems

Reinforcement learning and multi-agent coordination for long-horizon decision making in agentic dialogue and tool-using systems.

Alignment-Oriented Post-Training

Reinforcement learning methods for aligning large language models using intrinsic and self-supervised reward signals, reducing reliance on external human preference data.

Scalable Research Infrastructure

Reproducible, configurable, and distributed training systems enabling rapid experimentation across HPC and cloud environments.

Uncertainty-Aware Reasoning

Bayesian and distributional methods for uncertainty estimation, calibration, and robustness — applied to trustworthy and controllable AI systems.

highlights

$ tail -n 4 ~/highlights.log

  1. Oct 2025

    Paper accepted at NeurIPS 2025

    "Less is More: Local Intrinsic Dimensions of Contextual Language Models" — using the geometry of contextual embeddings to study LLM training dynamics and generalisation.

  2. Jul 2025

    RLSF preprint released

    "Post-Training Large Language Models via Reinforcement Learning from Self-Feedback" — using the model's own confidence as an intrinsic reward, no human feedback required.

  3. Jul 2025

    CAMELL published in TACL & presented at ACL 2025

    "A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction" — confidence-driven active learning and label validation for sequential multi-output tasks.

  4. Apr 2024

    PhD awarded (magna cum laude)

    Dissertation "Uncertainty Estimation, Management, and Utilisation in Human-Computer Dialogue" defended at Heinrich Heine University Düsseldorf.

experience

$ git log --oneline --career

  1. bd93f9 (HEAD -> now)

    Postdoctoral Researcher @ Heinrich Heine University

    Mar 2024 → Jun 2026 · Düsseldorf, Germany

    • Agentic Multi-Agent Reinforcement Learning Led the development of a MARL framework for telephonic dialogue systems, enabling coordinated decision-making between router and expert agents with explicit credit assignment — improving routing accuracy by over 15 percentage points in a production-level dialogue product.
    • HydraXcel Sole developer and maintainer of an open-source, configuration-driven deep learning experiment launcher integrating Hydra, Hugging Face Accelerate, and the UV workflow — enabling seamless, scalable multi-GPU and distributed training for the research team.
    • HPC & Cloud Training Infrastructure Designed Hydra launcher plugins for transparent experiment execution on SLURM-managed HPC clusters and SkyPilot-orchestrated cloud platforms, enabling high-throughput experimentation and rapid switching between compute backends without code changes.
    • Academic Leadership Supervised multiple Master's theses on MARL and task-oriented dialogue. Designed and taught the "Implementing Transformers" course — building the Attention Is All You Need architecture from first principles in PyTorch — achieving a 95% course pass rate.
  2. ff79c6

    PhD Candidate @ Heinrich Heine University

    Jul 2019 → Mar 2024 · Düsseldorf, Germany

    • Uncertainty-Aware Decision Making Developed computationally efficient uncertainty quantification methods for intent classification in collaboration with Yandex Research. Integrated uncertainty features into RL policies, improving real-user interaction success by 5 percentage points, and designed an active learning strategy that matched full-dataset performance using only 16% of expert annotations.
    • ConvLab-3 Dialogue Systems Toolkit Core developer of a large-scale dialogue system toolkit in collaboration with Tsinghua University and Microsoft Research. Architected a unified data format enabling seamless integration of heterogeneous datasets and models — adopted in 30+ research papers spanning RL- and LLM-based dialogue agents.
    • YRRSDS 2022 Co-organiser Co-organised the Young Researchers Roundtable on Spoken Dialogue Systems, collocated with SIGDIAL in Edinburgh — managing digital infrastructure, branding, and sponsorship acquisition.
  3. 8be9fd

    AI Applications Consultant @ NGA Risksecure

    Jun 2018 → May 2019 · Pretoria, South Africa

    • Named Entity Sentiment Analysis Co-developed a sentiment scoring system for news-based entities, delivering reliable quantitative metrics to banking clients at under 70% of the cost of manual analysis.
    • Multimodal Computer Vision Built a proof-of-concept application combining visual and sensor data to monitor greenhouse plant health for a CBD producer in Southern Africa.

projects

HydraXcel

Configuration-driven deep learning experiment launcher

Open-source experiment launcher unifying Facebook Hydra, Hugging Face Accelerate, and the UV workflow. One config launches anything from a local debug run to multi-GPU distributed training on SLURM clusters or SkyPilot-managed cloud — no code changes between backends.

  • Python
  • Hydra
  • Accelerate
  • UV
  • SLURM
  • SkyPilot

RLSF

Reinforcement Learning from Self-Feedback

Post-training method that uses a language model's own confidence as an intrinsic reward signal — aligning LLMs and improving calibration and reasoning without external human preference labels.

  • PyTorch
  • TRL
  • Transformers
coming soon

Agentic RL

Multi-agent RL for long-horizon agentic systems

Coordinated decision-making between router and expert agents with explicit credit assignment. Write-up in progress.

  • MARL
  • LangGraph

publications

selected work — NeurIPS · ACL · EMNLP · TACL

view all publications

education & skills

class Education:

  1. BSc in Actuarial and Financial Mathematics

    University of Pretoria · 2013–2015 · Pretoria, South Africa

  2. BSc (Hons) in Mathematical Statistics

    University of Pretoria · 2016–2016 · Pretoria, South Africa

  3. MSc in Mathematical Statistics

    University of Pretoria · 2017–2018 · Pretoria, South Africa

    Statistical learning, data analytics and visualization.

  4. PhD in Computer Science

    Heinrich Heine University · 2019–2024 · Düsseldorf, Germany

    Magna cum laude. Thesis: Uncertainty Estimation, Management, and Utilisation in Human-Computer Dialogue.

class Skills:

Research

  • Reinforcement Learning (RLHF / intrinsic feedback)
  • Uncertainty Quantification
  • Self-supervised Learning
  • LLM Evaluation & Benchmarking
  • Human-in-the-loop
  • Model Debugging
  • Distribution Theory

Deep Learning

  • PyTorch
  • Transformers
  • TRL
  • Accelerate
  • Datasets

Agentic Systems & LLM APIs

  • LangGraph
  • DeepEval
  • OpenAI API
  • Vertex AI

Programming

  • Python (Advanced)
  • C++
  • Rust
  • JavaScript / TypeScript
  • SQL
  • Bash / Zsh
  • MyPy / Ty

Infrastructure & Cloud

  • DeepSpeed
  • SLURM
  • SkyPilot
  • Hydra
  • Distributed Training
  • Docker
  • Google Cloud / Cloud Run
  • Microsoft Azure
  • MongoDB

Engineering

  • Design Patterns
  • PyTest
  • Ruff
  • FastAPI
  • Pydantic
  • UV / Poetry

Languages

  • English (Native)
  • Afrikaans (Native)
  • German (Fluent)