Lower the cost of running large language models in production.
Most production LLM bills pay for frontier-model intelligence on tasks that don't need it — classification, tagging, simple summarization. Builders default to the largest model because the engineering cost of optimizing — pipelines, training, evals, routing, drift management — eats the savings for everyone but the largest companies.
tokenopti is a self-hosted proxy that sits in front of your existing LLM calls. Change the base URL and nothing else. We capture traces inside your environment, fine-tune smaller open models against your real traffic, and serve them behind the same API endpoint. No application code changes. No data leaves your VPC.
Promotions move through a shadow → canary → ramp → full ladder with guardrail-gated auto-promote — the same experimentation pattern we used at our former employers. If quality regresses on any metric, we roll back before the customer notices. A dashboard and CLI surface what's being optimized; humans approve, reject, or pin any swap.
Transparent margin on the open-source models we serve. We earn only when our model wins the A/B against the incumbent. If we don't beat the baseline, we don't get paid.
In development. Founded by operators from Gumloop and major public-company ads platforms. founders@tokenopti.com