Lab · Research

Notes from the work of making AI reliable.

We don’t do research for papers — we do it so the systems we ship hold up in production. These are the questions we keep returning to, and what we’ve learned answering them on real projects.

What we research

01

Reliability & evaluation

How do you know an agent is good enough to ship? We build task-specific eval suites so every system is measured against real work before it touches a customer — and watched after it does.

02

Retrieval that grounds

Getting a model to answer from your documents, not its imagination. We research chunking, hybrid search and citation so every answer can be traced back to a source a human can check.

03

Agents & tool-use

Narrow agents that take real actions safely — calling APIs, booking jobs, drafting replies — with clear limits and a human in the loop on anything that matters.

04

Sovereign & on-prem

Running capable models inside a customer’s own boundary, so regulated industries keep their data in New Zealand. The work behind Lux 1.0 and private deployment lives here.

05

Voice

Latency, turn-taking and accent handling for phone agents that sound natural in a Kiwi context — and route cleanly to a person when they should.

06

Cost & latency

Making production AI economical: routing, caching, distillation and smaller fine-tuned models that hit the bar without the bill.

Recent notes

Read the deeper write-ups on the blog, or see how it ships in our work.

Got a hard problem worth researching together?

Talk to us