Engineering

ML Engineer

Department: Engineering
Location: Stockholm
Type: Full-time
Workplace: On-site

Or write to contact@brayns.ai

About Brayns

We are a small team teaching machines to run compliance work end-to-end. Brayns reads how operators actually work — documents, case history, the judgment of senior people — and turns that structure into agents that execute continuously, traceably, at scale.

The role

We’re building what we think will be one of the next unicorns out of Stockholm, and the quality of our ML work is one of the things that has to be world-class for that to happen.

This role is for someone who lives and breathes the model layer — not a product engineer who happens to work with LLMs. You’ll spend your time on retrieval, long-context reasoning, prompt and pipeline optimization, and squeezing more out of the models we run. We want someone who is genuinely curious about how these systems work under the hood and who keeps up with the field as it moves.

This role is onsite in Stockholm and open only to candidates who already live in Stockholm or are ready to relocate before starting. We will not be considering remote applicants.

What you will do

Design, build, and improve our RAG systems end-to-end — retrieval, ranking, context construction, evaluation.
Work with long-context LLMs and figure out how to use that context well rather than just dumping tokens into it.
Use DSPy (and similar tools) to systematically optimize prompts, pipelines, and behavior — instead of hand-tuning by feel.
Optimize LLM inference — latency, throughput, cost, batching, caching, quantization where relevant.
Run experiments rigorously: build evals, measure honestly, and let the numbers drive decisions.

What we look for

A model-layer person, not a product person. You’re happiest deep in retrieval quality, eval design, prompt optimization, and inference internals. Shipping product features is not where you want to spend most of your time — and that’s exactly what we want from this role.
RAG expertise. You’ve built non-trivial RAG systems and have strong opinions about chunking, embeddings, retrieval strategies, rerankers, and evaluation. You know where naive RAG breaks and how to fix it.
DSPy experience. You’ve actually used DSPy in real work — not just read the docs — and understand when it’s the right tool and when it isn’t.
Long-context LLMs. You’ve worked with long-context models in practice and understand how they actually behave — attention degradation, position effects, retrieval-vs-context tradeoffs, and what tends to work versus what merely sounds good in a paper.
LLM inference optimization. You’ve worked on making LLMs faster, cheaper, or both — batching, caching, speculative decoding, quantization, serving stack choices, or similar.

Nice to have

Research sensibility. A research background is a plus. We like people who read papers, can tell the difference between hype and signal, and can take an idea from a paper and turn it into something that works for us.

Don't check every box? Apply anyway. We weigh trajectory and taste over a perfect résumé.