Back to Projects

Systems•2024

MultiLLM Proxy

Unified proxy server for multiple LLM providers with one consistent API.

Auth, rate limiting, streaming, monitoring, and provider health checks behind one gateway.

TL;DR

Single endpoint that normalizes provider APIs.
Routing policy with failover and caching for resilience.
Operational visibility with structured logs and metrics.

Artifacts

Architecture diagram

Client -> proxy -> router -> provider adapters, with auth, rate limits, and cache.

Request trace view

Latency breakdown and fallback paths per request.

Context

Teams were shipping LLM features across multiple providers, each with different APIs, limits, and reliability characteristics.

Problem

Direct integrations created brittle code paths, inconsistent observability, and no safe way to switch providers during outages.

Approach

Define a provider-agnostic request/response contract.
Implement adapter modules per provider with retry and fallback logic.
Enforce auth and rate limits at the edge.
Add a Redis-backed cache keyed by model, prompt, and params.
Instrument requests with trace IDs and structured logging.

Tradeoffs

Kept the API surface to common features to avoid vendor lock-in.
Used deterministic cache keys to keep results predictable.
Chose Go for throughput, accepting a slightly slower iteration loop.

Testing and Reliability

Contract tests with provider mocks.
Load testing for streaming and concurrency.
Replay suite for regression coverage.

Deployment and Ops

Dockerized service deployable behind an API gateway.
Environment-driven configuration for routing policies.

Outcome

Simplified client integrations to one endpoint.
Safer experimentation with routing policies.
Lower tail latency for repeated prompts through caching.

If I had two more weeks

Add budget-aware routing policies.
Expose a lightweight policy dashboard.