shipping production AI · since 2020 NAICS 541511 / 541512 / 541519  ·  CMMC-aware
§00·The firm·est. operator-run·v2026.05
senior-only bench · operator-led · staffed-to-fit

We ship
production
AI systems. For mid-market and federal clients. Operator-led. Senior-only bench. Fixed-fee, fixed-scope, 4–12 week sprints. 99.9% uptime across what we run — verifiable on GitHub.

uptime · 12mo
99.9%
across prod systems
p95 latency
181ms
shipped LLM platforms
endpoints
66
in production
MCP servers
6
running live
founder
10+yr
hands-on AI eng
engagement
4–12wk
fixed-fee, fixed-scope
fig 01·what we run, end-to-end·not a marquee — a real tail of this week
github.com/dsee ↗
~/dsee   tail -f /var/log/shipping streaming · last 7d
[mon 14:02] deploy ok privatestack/api p95=181ms v0.84.2 · 2 commits
[mon 13:40] eval pass rag/v3 golden 842/842 · drift 0.3%
[mon 11:18] deploy ok fedgov/ingest sam.gov sync · 9,412 rows
[sun 22:55] scan bedrock-iam 0 findings ✓
[sun 19:02] release mcp/gemini-bridge v0.7.2 · github
[sat 09:11] deploy ok dsealgo risk-circuit v12 · zero downtime
[fri 17:48] red-team client-X 14 findings · 3H 9M 2L
[fri 14:02] release mcp/perplexity-async v1.1.0 · github
[thu 16:22] deploy ok foodee/payments stripe v2 · 0 PII leaks
_
not a marquee — a real tail of what shipped this week github.com/dsee
What we run, end-to-end.
fig 01 · stack
Auth / Identity
Clerk · JWT · OIDC
least-priv IAM
API Gateway
66 routes
p95 < 200ms
LLM Routing
LiteLLM · 5 providers
cost-routed
Retrieval
pgvector · BM25+dense
hybrid hybrid
Evals
CI gates · 842 golden
drift < 0.5%
Inference
Bedrock · vLLM
GPU-shared
Observability
Traces · logs · cost
per-tenant
AI Security
Red-team · STRIDE/AI
NIST AI RMF
Governance
EU AI Act · ISO 42001
CMMC-ready
every cell = a thing we'll write, run, and document for you. AWS-native
deploys / 7d
14
last week, across systems
eval pass rate
842/842
golden tests in CI
model providers
5
routed cost-optimal
PII leaks
0
since launch · ever
requests / 24h
1.4M
across managed services
OSS stars
1,107
across our repos
§01 Who we work with.
Two buyers, same bench.

A rare combo: commercial and federal, delivered by the same engineers.

Mid-market CTOs and federal contracting officers don't usually share a vendor. They share ours. The same team that builds your multi-tenant SaaS is the team your CO can clear.

same team · same posture
Track A · Commercial mid-market → enterprise

Ship the AI system your CTO promised.

CTOs and Heads of AI who need production this quarter, not a strategy deck. We embed, ship, hand over IP, leave runbooks. No pyramid leverage — principals on your problem.

Move fastTwo-pizza teams, weekly demo cadence.
Senior-onlySenior-only bench. Surge from a vetted contractor network.
Fixed-feeScope quoted in 48h. No T&M.
Full IP transferCode, docs, runbook, 30-day support.
Track B · Federal & Public Sector SAM.gov · CMMC · NIST AI RMF

Deliver AI under FAR, the RMF, and a clock.

SAM.gov registered. CMMC-aware. Cleared-staff capable. The same bench that ships commercial — under the controls federal buyers actually need. No subcontracting the engineering out.

RegistrationsSAM.gov · CAGE · NAICS 541511/12/19.
FrameworksNIST AI RMF · EU AI Act · ISO 42001 fluent.
PostureCMMC L2 narrative-ready. Clearance-capable.
VehiclesSub on GSA MAS, CIO-SP4, OASIS+, SeaPort.
§02 What we do.
Four services. One bench.

Every engagement ends in a named deliverable, not a status update.

We don't sell "AI transformation." We sell a service in production, a threat model, a roadmap, a strategy doc — and a runbook a third party can operate. Below is the menu.

01
AI Engineering

Production LLM systems, not pilots.

RAG pipelines, agentic workflows, multi-tenant SaaS. We ship the service, the eval harness, the observability, and the runbook that outlives the engagement.

  • LLM applications & agents
  • MCP server design & integration
  • Fine-tuning & eval harnesses
  • Vector retrieval (pgvector, Atlas, Pinecone)
  • Inference infra (Bedrock, vLLM, LiteLLM)
  • CI gates & cost-routed model selection
You receive Production service · eval harness + CI gates · observability stack · 23-pg runbook · IP transfer.
02
AI Security

Security baked in, not bolted on.

Threat models adapted for LLM and agent systems. Red-team reports with findings and remediation, not a slide deck. Cleared-staff capable for sensitive work.

  • AI red-teaming (prompt injection, jailbreak, exfil)
  • Threat modeling — STRIDE for AI, supply-chain
  • Governance — NIST AI RMF, EU AI Act, ISO 42001
  • PII/PHI data-flow audits & DPIAs
  • IAM hardening (Bedrock, SageMaker, OIDC)
  • Secrets, model + dataset provenance, SBOM
You receive Threat model · red-team report w/ remediation · AI use policy · governance charter · 30-day re-test.
03
AI Consulting

Engineering-adjacent advisory.

Readiness, architecture review, build-vs-buy, fractional CDO/CAIO. The person reviewing your stack is the person who'd build it — not a partner with a thesis to push.

  • Readiness assessment (data · infra · talent · gov)
  • Vendor + build-vs-buy memos, TCO modeling
  • Architecture review & cost passes
  • Embedded fractional CDO / CAIO (10–20 h/wk)
  • Federal AI advisory (CMMC, FedRAMP, FAR)
  • Audit, eval, board-level briefings
You receive Maturity scorecard · 90-day roadmap · vendor memo · board deck · optional fractional retainer.
04
AI Strategy

Multi-quarter, executive-level.

Roadmaps, operating-model design, investment theses, M&A due diligence. We model the ROI in numbers your CFO will defend and your board will sign.

  • 12-month data + AI roadmap, phased
  • Operating model — centralized vs federated
  • Investment thesis & ROI/NPV modeling
  • M&A — tech DD on AI/data assets
  • Policy & acceptable-use frameworks
  • Customer-facing AI disclosures
You receive Strategy doc · executive presentation · financial model · governance framework · quarterly review.
§03 One we'll show you.
The rest, on the call.

Eleven weeks, zero to production. The runbook outlived three engineers.

A representative engagement. Anonymized where we have to be — but the receipts (uptime, p95, endpoint counts, source) are all verifiable.

case-01 · commercial multi-tenant LLM SaaS AWS · Bedrock · Lambda

PrivateStack — a private LLM platform for regulated industries.

Auth, billing, retrieval, routing across five providers, observability — built end-to-end and handed off with a runbook a third party could operate.

routes/day · last 7d   ·   total 4.2M
/chat1.8M
/embed1.1M
/eval684k
/admin412k
/billing198k
§ engagement at a glance

From a Notion doc to customer-zero in eleven weeks.

Clerk JWT auth, 66-endpoint API Gateway, Lambda backends. LiteLLM routing across Bedrock, OpenAI, Anthropic, Cohere, Mistral with per-tenant cost ceilings. pgvector retrieval, eval harness with 842 golden cases, observability per tenant.

Two engineers, weekly demos, written decision log. Full IP transfer with a 23-page runbook. 99.9% uptime over the twelve months since.

Read the full case Talk to the principal
delivery
11 wks
endpoints
66
p95
< 200ms
uptime 12mo
99.9%
§04 How we work.
A standard engagement.

Two-pizza teams. Weekly demos. Full IP transfer.

Fixed-fee scope. Named deliverables. Written decision logs. CI/CD, observability, and a runbook ship with every engagement — they aren't a separate phase, they're how we work.

01
Week 0

Scope.

Discovery call. Written scope, deliverables, milestones, fee. Fixed-fee quote inside 48 hours of the call.

02
Week 1

Plan.

Architecture diagram, threat model, ADRs, eval plan. End-of-week demo of the scaffolding. No surprises after this.

03
Weeks 2 — N

Build.

Two-pizza team. Weekly demo. Written decision log. CI from day one. Observability before features. Done means deployable.

04
Final

Hand-off.

Full IP transfer. Runbook. Observability dashboards. 30-day support window. Optional fractional retainer.

§05  What we won't take on

Listing what we decline
is the strongest signal
we have standards.

Nobody publishes this. We do. If you're looking for one of the engagements below, we'll happily refer you elsewhere — and the rest of this site means more because of it.

If your problem isn't here, it's likely one of ours.
§06 What we publish.
Receipts you can read.

Engineering credibility is verifiable in ten seconds.

Most consultancies hide their code. Ours is on GitHub. The Refinery Report is where we work in the open — eval harnesses, MCP server internals, red-team field notes.

The Refinery Report · Substack all posts ↗
Eval harnesses are the moat, not the model. · 12 min
Apr 30, 2026
Why we run six MCP servers in production. · 9 min
Apr 18, 2026
Red-teaming an agentic workflow — field notes. · 14 min
Apr 04, 2026
Cost-routing across five LLM providers without losing your evals. · 18 min
Mar 21, 2026
A fixed-fee playbook for AI engagements. · 7 min
Mar 09, 2026
github.com/dsee · OSS repos ↗
gemini-bridgeMCP server★ 412
perplexity-asyncMCP server★ 287
eval-harnessPython lib★ 198
litellm-cost-routermiddleware★ 124
dask-* contribsupstream12 PRs
rmf-checklistNIST AI RMF★ 86
We commit to upstream weekly.
Every PR is signed by the engineer who wrote it.