A working LLM feature, wired into your product
Not a notebook. A shipped path: retrieval, evals, guardrails, rollout plan. Your engineers own it by the end.
Your team is staring at a pile of LLM prototypes, a hype-allergic CTO, and a launch date. You want answers that survive real traffic — not another slide deck. Good. That's what gets built here. A small senior team plugs into yours for a few focused weeks, removes the vague bits, and leaves you with a system your engineers can run on Monday morning.
You've run the workshops. You've read the threads. You've watched three vendors demo the same chatbot with different names on the tab. Meanwhile the actual product still hallucinates, the retrieval is flaky, costs are creeping, and nobody on your team has the bandwidth to fix it properly without falling behind on the roadmap.
That's the spot we come in. Not to tell you AI is the future — you know. To sit at your workbench for a few weeks, get hands dirty, and hand you something testable, observable, and honest.
Not a notebook. A shipped path: retrieval, evals, guardrails, rollout plan. Your engineers own it by the end.
Tool-use, memory, traces, and the boring parts — budgets, fallbacks, replay. Built to debug, not to demo.
Golden sets, scorer rubrics, CI hooks. Stops the "it worked on my prompt" cycle before launch.
Model routing, caching, streaming, batching. We'll show the math before touching production.
Pairing over presentations. When we leave, your engineers have the muscle — and the notes — to keep shipping.
If a problem doesn't need an LLM, we say so. If it does, we tell you where the dragons live. No hand-waving.
We hear your context, tell you straight whether we can help, and share a written sketch of what a pilot would look like.
One week, one question: is this AI idea worth building? You leave with a prototype, a cost model, and a "keep going / stop" recommendation.
A focused team sits next to yours and ships a feature end-to-end. Retrieval, agents, evals, observability — whatever the thing needs.
Your team keeps the keyboard. We review PRs, unblock tough calls, and keep the architecture from drifting into fashion.
Your engineers build a real thing in our codebase: an agent, a RAG pipeline, or an eval harness. No slideware. Working code at the end.
Vide (incremental UI), FsHttp (HTTP DSL), Trulla (templating), LocSta (state machines), FluX (data flow). Not weekend hacks — published, maintained, tested in production by teams we've never met.
A 24×24 RGB display programmable in C#, with a browser simulator, OTA firmware, a mobile app, and a community of makers. Two people built it. That's the operating model we bring to your team: small, senior, accountable.
Fifteen-plus years building software that ends up in production: distributed systems, SDKs, compilers, and — for the last few — LLM pipelines, agents, and retrieval-heavy features. Mostly .NET / F# / C#. Public body of work on GitHub. A habit of finishing.
Alongside consulting, Ronald and one other engineer run Cumin & Potato GmbH — the company behind PXL Clock. That's not a side note; it's the operating model. Small. Hands-on. Allergic to decks that outweigh the code.
One week. One question answered. One working prototype. If it's not worth building, we say so — and you'll have saved a quarter.