About Tailor

Tailor builds document collaboration software for teams that need reviews, approvals, evidence, ownership changes, and decision history to stay connected in one product.

The product is live across local government, construction, and mining; in pilot with a major state-government department; and being rolled out with a tier-1 systems integrator. The AI surface is practical product work: PACT protocol, RAG pipelines, agent coordination, model routing, and eval discipline.

We are an AI-native engineering team. Cursor and Claude Code are our daily drivers. We expect you to leverage AI to write the boilerplate so you can focus on architecture, evals, and the genuinely hard problems.

The Role

You will be Tailor's first dedicated AI / ML hire at Staff level. Your job is to own the AI product surface — the technical work that makes Tailor more than "an LLM with a UI".

You will architect and own the LLM, RAG, and agent layer end-to-end. You will extend the PACT protocol — our open MIT spec for multi-agent consensus — and ship the reference implementation other vendors will judge their own work against.

You will define the eval discipline that catches regressions before they ship. You will set the model-routing strategy that keeps cost per tenant predictable as the model market churns. You will pair with the Senior Full-Stack Developer on platform-side integration and with product leadership on direction.

This is a Staff-level seat. You set technical direction. You do not just clear tickets.

What You Will Do

Architect and own the LLM / RAG / agent layer end-to-end — chunking, embedding, retrieval, reranking, hallucination control
Extend the PACT protocol — shape how multi-agent consensus actually works in production
Build and own the prompt-eval discipline — offline + online evals, regression suites, drift detection, model-rotation playbooks
Lead model-routing strategy — which model for which task, which region, which sovereignty constraint, which cost budget
Drive AI cost discipline — per-tenant cost, COSD attribution (#1215), instrumentation against real billing data
Mentor the Senior FSD on AI patterns; lead architecture reviews on AI-touching features
Contribute to PACT and adjacent open-source as the protocol opens up

The Stack You Will Wield

AI: Azure OpenAI, Anthropic, open-weight models, Semantic Kernel, structured-output / function-calling, custom RAG and reranking pipelines
Backend: .NET 9, ASP.NET Core, MediatR CQRS, EF Core, SignalR, OpenTelemetry
Data: Azure SQL, vector retrieval, prompt + eval libraries
Infra: .NET Aspire (local), Azure Container Apps + App Service, Bicep IaC, GitHub Actions CD

What We Are Looking For

5+ years engineering, with experience setting technical direction on AI-touching systems
Production RAG. You have built and operated RAG in production — chunking, embedding, retrieval, reranking, hallucination control. You know what breaks.
Agent systems beyond toy demos. You have shipped function-calling, tool-use, or multi-agent coordination in real customer hands — not just a notebook.
Eval discipline. You own offline + online evals, regression catches, drift detection, and model-rotation playbooks. You can quantify when an LLM change is better, not just "feels better".
Strong opinions on model routing. You know when to use a 4o-mini vs Opus vs a fine-tune vs an open-weight model on your own GPU; you can defend the call.
Comfortable across the LLM ecosystem. Azure OpenAI, Anthropic, open weights, Semantic Kernel / LangChain / LlamaIndex equivalents — you have used several and have opinions.

Bonus Points

Built or contributed to MCP (Model Context Protocol) or other agent-coordination protocols
Production experience with vector databases and hybrid retrieval (BM25 + dense)
Has run AI cost optimisation against real billing data — not just back-of-napkin estimates
Open-source maintainer for an AI / ML library
Background in .NET (we run .NET 9; the agent loop runs in C#)

Why Tailor

Staff-level architectural ownership of the AI surface — direct line to product and engineering leadership, no red tape, the protocol is yours to shape.
PACT is open MIT. Your work has reach beyond Tailor's customers — the spec is the reference implementation other vendors will judge themselves against.
Bleeding-edge stack (.NET 9, React 19, Semantic Kernel, Azure OpenAI). Real production load, real customers, real cost discipline.
AI-native team. Cursor and Claude Code are our daily drivers. You will leverage AI to focus on the genuinely hard problems — routing, evals, multi-agent consensus.
Competitive comp + meaningful equity (4-year vest, 1-year cliff). Equity weighted enough that the right person feels ownership.

How to Apply

Drop your CV (and a cover letter if you have one) into the Apply form at the top of this page. In the cover letter, link to a production RAG or agent system you have shipped — tell us what broke, what you measured, and what you would do differently.

If you would rather email us directly, reach us at careers@tailor.au.

What we look for · transparency

These are the criteria a human reviewer compares your application against. Same rubric for every applicant; nothing hidden.

weight 3/3
Production RAG — shipped, operated, debugged
- Has built and operated a RAG system in production with real customers
- Can describe specific failure modes — embedding drift, retrieval quality, hallucinations — and how they fixed them
- Knows hybrid retrieval (BM25 + dense) and reranking trade-offs
weight 3/3
Agent / function-calling systems beyond toy demos
- Has shipped function-calling, tool-use, or multi-agent coordination in real customer hands
- Comfortable with structured-output enforcement and recovery from malformed outputs
- Has handled long-horizon agent tasks with state management
weight 3/3
Eval discipline — offline + online + regression
- Owns evaluation suites that catch regressions before they ship
- Can quantify a model change as better/worse with real metrics, not just vibes
- Has built drift detection and model-rotation playbooks
weight 2/3
Model routing — opinions and trade-offs
- Has chosen between Azure OpenAI, Anthropic, open weights, fine-tunes for production tasks
- Can defend a routing decision with cost / latency / quality trade-offs
- Comfortable with sovereignty and residency constraints
weight 2/3
Senior engineering experience
- 5+ years of professional engineering
- Has set technical direction for an AI-touching system, not just shipped features inside one
- Mentors more junior engineers; leads architecture reviews
weight 1/3
MCP / agent-coordination protocol experience
- Has built or contributed to MCP (Model Context Protocol) or similar
- Familiar with the broader agent-protocol ecosystem (A2A, ACP, etc.)
weight 1/3
AI cost discipline — against real billing data
- Has run AI cost optimisation against real billing data, not back-of-napkin estimates
- Knows per-tenant cost attribution and how to instrument it

Staff AI / ML Engineer