Own Tailor's AI product surface. Extend the PACT protocol's reference implementation, ship production RAG and agent systems, set the eval discipline that proves we're not just LLM-prompted glue.
Brisbane HQ · Hybrid$200k + equitySovereign-routed · stored in Australia
About Tailor
Tailor builds document collaboration software for teams that need reviews, approvals, evidence, ownership changes, and decision history to stay connected in one product.
The product is live across local government, construction, and mining; in pilot with a major state-government department; and being rolled out with a tier-1 systems integrator. The AI surface is practical product work: PACT protocol, RAG pipelines, agent coordination, model routing, and eval discipline.
We are an AI-native engineering team. Cursor and Claude Code are our daily drivers. We expect you to leverage AI to write the boilerplate so you can focus on architecture, evals, and the genuinely hard problems.
The Role
You will be Tailor's first dedicated AI / ML hire at Staff level. Your job is to own the AI product surface — the technical work that makes Tailor more than "an LLM with a UI".
You will architect and own the LLM, RAG, and agent layer end-to-end. You will extend the PACT protocol — our open MIT spec for multi-agent consensus — and ship the reference implementation other vendors will judge their own work against.
You will define the eval discipline that catches regressions before they ship. You will set the model-routing strategy that keeps cost per tenant predictable as the model market churns. You will pair with the Senior Full-Stack Developer on platform-side integration and with product leadership on direction.
This is a Staff-level seat. You set technical direction. You do not just clear tickets.
What You Will Do
Architect and own the LLM / RAG / agent layer end-to-end — chunking, embedding, retrieval, reranking, hallucination control
Extend the PACT protocol — shape how multi-agent consensus actually works in production
Build and own the prompt-eval discipline — offline + online evals, regression suites, drift detection, model-rotation playbooks
Lead model-routing strategy — which model for which task, which region, which sovereignty constraint, which cost budget
Drive AI cost discipline — per-tenant cost, COSD attribution (#1215), instrumentation against real billing data
Mentor the Senior FSD on AI patterns; lead architecture reviews on AI-touching features
Contribute to PACT and adjacent open-source as the protocol opens up
5+ years engineering, with experience setting technical direction on AI-touching systems
Production RAG. You have built and operated RAG in production — chunking, embedding, retrieval, reranking, hallucination control. You know what breaks.
Agent systems beyond toy demos. You have shipped function-calling, tool-use, or multi-agent coordination in real customer hands — not just a notebook.
Eval discipline. You own offline + online evals, regression catches, drift detection, and model-rotation playbooks. You can quantify when an LLM change is better, not just "feels better".
Strong opinions on model routing. You know when to use a 4o-mini vs Opus vs a fine-tune vs an open-weight model on your own GPU; you can defend the call.
Comfortable across the LLM ecosystem. Azure OpenAI, Anthropic, open weights, Semantic Kernel / LangChain / LlamaIndex equivalents — you have used several and have opinions.
Bonus Points
Built or contributed to MCP (Model Context Protocol) or other agent-coordination protocols
Production experience with vector databases and hybrid retrieval (BM25 + dense)
Has run AI cost optimisation against real billing data — not just back-of-napkin estimates
Open-source maintainer for an AI / ML library
Background in .NET (we run .NET 9; the agent loop runs in C#)
Why Tailor
Staff-level architectural ownership of the AI surface — direct line to product and engineering leadership, no red tape, the protocol is yours to shape.
PACT is open MIT. Your work has reach beyond Tailor's customers — the spec is the reference implementation other vendors will judge themselves against.
Bleeding-edge stack (.NET 9, React 19, Semantic Kernel, Azure OpenAI). Real production load, real customers, real cost discipline.
AI-native team. Cursor and Claude Code are our daily drivers. You will leverage AI to focus on the genuinely hard problems — routing, evals, multi-agent consensus.
Competitive comp + meaningful equity (4-year vest, 1-year cliff). Equity weighted enough that the right person feels ownership.
How to Apply
Drop your CV (and a cover letter if you have one) into the Apply form at the top of this page. In the cover letter, link to a production RAG or agent system you have shipped — tell us what broke, what you measured, and what you would do differently.
If you would rather email us directly, reach us at careers@tailor.au.
What we look for · transparency
These are the criteria a human reviewer compares your application against. Same rubric for every applicant; nothing hidden.
weight 3/3
Production RAG — shipped, operated, debugged
Has built and operated a RAG system in production with real customers
Can describe specific failure modes — embedding drift, retrieval quality, hallucinations — and how they fixed them
Knows hybrid retrieval (BM25 + dense) and reranking trade-offs
weight 3/3
Agent / function-calling systems beyond toy demos
Has shipped function-calling, tool-use, or multi-agent coordination in real customer hands
Comfortable with structured-output enforcement and recovery from malformed outputs
Has handled long-horizon agent tasks with state management
weight 3/3
Eval discipline — offline + online + regression
Owns evaluation suites that catch regressions before they ship
Can quantify a model change as better/worse with real metrics, not just vibes
Has built drift detection and model-rotation playbooks
weight 2/3
Model routing — opinions and trade-offs
Has chosen between Azure OpenAI, Anthropic, open weights, fine-tunes for production tasks
Can defend a routing decision with cost / latency / quality trade-offs
Comfortable with sovereignty and residency constraints
weight 2/3
Senior engineering experience
5+ years of professional engineering
Has set technical direction for an AI-touching system, not just shipped features inside one
Mentors more junior engineers; leads architecture reviews
weight 1/3
MCP / agent-coordination protocol experience
Has built or contributed to MCP (Model Context Protocol) or similar
Familiar with the broader agent-protocol ecosystem (A2A, ACP, etc.)
weight 1/3
AI cost discipline — against real billing data
Has run AI cost optimisation against real billing data, not back-of-napkin estimates
Knows per-tenant cost attribution and how to instrument it