shipping production AI · since 2020 NAICS 541511 / 541512 / 541519 · CMMC-aware

§00·The firm·est. operator-run·v2026.05

senior-only bench · operator-led · staffed-to-fit

We ship
production
AI systems. For mid-market and federal clients. Operator-led. Senior-only bench. Fixed-fee, fixed-scope, 4–12 week sprints. 99.9% uptime across what we run — verifiable on GitHub.

Book a 30-min scoping call → Read the receipts avg reply < 24h

uptime · 12mo

99.9%

across prod systems

p95 latency

181ms

shipped LLM platforms

endpoints

in production

MCP servers

running live

founder

10+yr

hands-on AI eng

engagement

4–12wk

fixed-fee, fixed-scope

fig 01·what we run, end-to-end·not a marquee — a real tail of this week

github.com/dsee ↗

~/dsee tail -f /var/log/shipping streaming · last 7d

[mon 14:02] deploy ok privatestack/api p95=181ms v0.84.2 · 2 commits

[mon 13:40] eval pass rag/v3 golden 842/842 · drift 0.3%

[mon 11:18] deploy ok fedgov/ingest sam.gov sync · 9,412 rows

[sun 22:55] scan bedrock-iam 0 findings ✓

[sun 19:02] release mcp/gemini-bridge v0.7.2 · github

[sat 09:11] deploy ok dsealgo risk-circuit v12 · zero downtime

[fri 17:48] red-team client-X 14 findings · 3H 9M 2L

[fri 14:02] release mcp/perplexity-async v1.1.0 · github

[thu 16:22] deploy ok foodee/payments stripe v2 · 0 PII leaks

▮_

not a marquee — a real tail of what shipped this week github.com/dsee

What we run, end-to-end.

fig 01 · stack

Auth / Identity

Clerk · JWT · OIDC

least-priv IAM

API Gateway

66 routes

p95 < 200ms

LLM Routing

LiteLLM · 5 providers

cost-routed

Retrieval

pgvector · BM25+dense

hybrid hybrid

Evals

CI gates · 842 golden

drift < 0.5%

Inference

Bedrock · vLLM

GPU-shared

Observability

Traces · logs · cost

per-tenant

AI Security

Red-team · STRIDE/AI

NIST AI RMF

Governance

EU AI Act · ISO 42001

CMMC-ready

every cell = a thing we'll write, run, and document for you. AWS-native

deploys / 7d

last week, across systems

eval pass rate

842/842

golden tests in CI

model providers

routed cost-optimal

PII leaks

since launch · ever

requests / 24h

1.4M

across managed services

OSS stars

1,107

across our repos

§01 Who we work with.
Two buyers, same bench.

A rare combo: commercial and federal, delivered by the same engineers.

Mid-market CTOs and federal contracting officers don't usually share a vendor. They share ours. The same team that builds your multi-tenant SaaS is the team your CO can clear.

same team · same posture

Track A · Commercial mid-market → enterprise

Ship the AI system your CTO promised.

CTOs and Heads of AI who need production this quarter, not a strategy deck. We embed, ship, hand over IP, leave runbooks. No pyramid leverage — principals on your problem.

Move fastTwo-pizza teams, weekly demo cadence.

Senior-onlySenior-only bench. Surge from a vetted contractor network.

Fixed-feeScope quoted in 48h. No T&M.

Full IP transferCode, docs, runbook, 30-day support.

Track B · Federal & Public Sector SAM.gov · CMMC · NIST AI RMF

Deliver AI under FAR, the RMF, and a clock.

SAM.gov registered. CMMC-aware. Cleared-staff capable. The same bench that ships commercial — under the controls federal buyers actually need. No subcontracting the engineering out.

RegistrationsSAM.gov · CAGE · NAICS 541511/12/19.

FrameworksNIST AI RMF · EU AI Act · ISO 42001 fluent.

PostureCMMC L2 narrative-ready. Clearance-capable.

VehiclesSub on GSA MAS, CIO-SP4, OASIS+, SeaPort.

§02 What we do.
Four services. One bench.

Every engagement ends in a named deliverable, not a status update.

We don't sell "AI transformation." We sell a service in production, a threat model, a roadmap, a strategy doc — and a runbook a third party can operate. Below is the menu.

AI Engineering

Production LLM systems, not pilots.

RAG pipelines, agentic workflows, multi-tenant SaaS. We ship the service, the eval harness, the observability, and the runbook that outlives the engagement.

LLM applications & agents
MCP server design & integration
Fine-tuning & eval harnesses
Vector retrieval (pgvector, Atlas, Pinecone)
Inference infra (Bedrock, vLLM, LiteLLM)
CI gates & cost-routed model selection

You receive Production service · eval harness + CI gates · observability stack · 23-pg runbook · IP transfer.

AI Security

Security baked in, not bolted on.

Threat models adapted for LLM and agent systems. Red-team reports with findings and remediation, not a slide deck. Cleared-staff capable for sensitive work.

AI red-teaming (prompt injection, jailbreak, exfil)
Threat modeling — STRIDE for AI, supply-chain
Governance — NIST AI RMF, EU AI Act, ISO 42001
PII/PHI data-flow audits & DPIAs
IAM hardening (Bedrock, SageMaker, OIDC)
Secrets, model + dataset provenance, SBOM

You receive Threat model · red-team report w/ remediation · AI use policy · governance charter · 30-day re-test.

AI Consulting

Engineering-adjacent advisory.

Readiness, architecture review, build-vs-buy, fractional CDO/CAIO. The person reviewing your stack is the person who'd build it — not a partner with a thesis to push.

Readiness assessment (data · infra · talent · gov)
Vendor + build-vs-buy memos, TCO modeling
Architecture review & cost passes
Embedded fractional CDO / CAIO (10–20 h/wk)
Federal AI advisory (CMMC, FedRAMP, FAR)
Audit, eval, board-level briefings

You receive Maturity scorecard · 90-day roadmap · vendor memo · board deck · optional fractional retainer.

AI Strategy

Multi-quarter, executive-level.

Roadmaps, operating-model design, investment theses, M&A due diligence. We model the ROI in numbers your CFO will defend and your board will sign.

12-month data + AI roadmap, phased
Operating model — centralized vs federated
Investment thesis & ROI/NPV modeling
M&A — tech DD on AI/data assets
Policy & acceptable-use frameworks
Customer-facing AI disclosures

You receive Strategy doc · executive presentation · financial model · governance framework · quarterly review.

§03 One we'll show you.
The rest, on the call.

Eleven weeks, zero to production. The runbook outlived three engineers.

A representative engagement. Anonymized where we have to be — but the receipts (uptime, p95, endpoint counts, source) are all verifiable.

case-01 · commercial multi-tenant LLM SaaS AWS · Bedrock · Lambda

PrivateStack — a private LLM platform for regulated industries.

Auth, billing, retrieval, routing across five providers, observability — built end-to-end and handed off with a runbook a third party could operate.

routes/day · last 7d · total 4.2M

/chat1.8M

/embed1.1M

/eval684k

/admin412k

/billing198k

§ engagement at a glance

From a Notion doc to customer-zero in eleven weeks.

Clerk JWT auth, 66-endpoint API Gateway, Lambda backends. LiteLLM routing across Bedrock, OpenAI, Anthropic, Cohere, Mistral with per-tenant cost ceilings. pgvector retrieval, eval harness with 842 golden cases, observability per tenant.

Two engineers, weekly demos, written decision log. Full IP transfer with a 23-page runbook. 99.9% uptime over the twelve months since.

Read the full case → Talk to the principal

delivery

11 wks

endpoints

p95

< 200ms

uptime 12mo

99.9%

§04 How we work.
A standard engagement.

Two-pizza teams. Weekly demos. Full IP transfer.

Fixed-fee scope. Named deliverables. Written decision logs. CI/CD, observability, and a runbook ship with every engagement — they aren't a separate phase, they're how we work.

Week 0

Scope.

Discovery call. Written scope, deliverables, milestones, fee. Fixed-fee quote inside 48 hours of the call.

Week 1

Plan.

Architecture diagram, threat model, ADRs, eval plan. End-of-week demo of the scaffolding. No surprises after this.

Weeks 2 — N

Build.

Two-pizza team. Weekly demo. Written decision log. CI from day one. Observability before features. Done means deployable.

Final

Hand-off.

Full IP transfer. Runbook. Observability dashboards. 30-day support window. Optional fractional retainer.

§05 What we won't take on

Listing what we decline
is the strongest signal
we have standards.

Nobody publishes this. We do. If you're looking for one of the engagements below, we'll happily refer you elsewhere — and the rest of this site means more because of it.

If your problem isn't here, it's likely one of ours.

~~ChatGPT-wrapper MVPs with no production path~~
~~Generic "AI chatbot" rebuilds of existing products~~
~~Pure prompt-engineering retainers~~
~~Pilot-purgatory PoCs the org won't operate~~
~~Staff-augmentation with pyramid leverage~~
~~T&M where scope is "discovered"~~
~~Industries we haven't shipped in~~
~~Engagements without a named technical owner~~
~~Logo-strip "trusted partner" badge work~~
~~Generative-art branding deliverables~~

§06 What we publish.
Receipts you can read.

Engineering credibility is verifiable in ten seconds.

Most consultancies hide their code. Ours is on GitHub. The Refinery Report is where we work in the open — eval harnesses, MCP server internals, red-team field notes.

The Refinery Report · Substack all posts ↗

Eval harnesses are the moat, not the model. · 12 min

Apr 30, 2026

Why we run six MCP servers in production. · 9 min

Apr 18, 2026

Red-teaming an agentic workflow — field notes. · 14 min

Apr 04, 2026

Cost-routing across five LLM providers without losing your evals. · 18 min

Mar 21, 2026

A fixed-fee playbook for AI engagements. · 7 min

Mar 09, 2026

github.com/dsee · OSS repos ↗

gemini-bridgeMCP server★ 412

perplexity-asyncMCP server★ 287

eval-harnessPython lib★ 198

litellm-cost-routermiddleware★ 124

dask-* contribsupstream12 PRs

rmf-checklistNIST AI RMF★ 86

We commit to upstream weekly.
Every PR is signed by the engineer who wrote it.

We ship production AI systems. For mid-market and federal clients. Operator-led. Senior-only bench. Fixed-fee, fixed-scope, 4–12 week sprints. 99.9% uptime across what we run — verifiable on GitHub.