tokenoptiest. 2026

Lower the cost of running large language models in production.

Thesis

Most production LLM bills pay for frontier-model intelligence on tasks that don't need it — classification, tagging, simple summarization. Builders default to the largest model because the engineering cost of optimizing — pipelines, training, evals, routing, drift management — eats the savings for everyone but the largest companies.

What we do

tokenopti is a self-hosted proxy that sits in front of your existing LLM calls. Change the base URL and nothing else. We capture traces inside your environment, fine-tune smaller open models against your real traffic, and serve them behind the same API endpoint. No application code changes. No data leaves your VPC.

How it ships

Promotions move through a shadow → canary → ramp → full ladder with guardrail-gated auto-promote — the same experimentation pattern we used at our former employers. If quality regresses on any metric, we roll back before the customer notices. A dashboard and CLI surface what's being optimized; humans approve, reject, or pin any swap.

Business

Transparent margin on the open-source models we serve. We earn only when our model wins the A/B against the incumbent. If we don't beat the baseline, we don't get paid.

Status

In development. Founded by operators from Gumloop and major public-company ads platforms. founders@tokenopti.com