AI Chat Blog

Insights on AI, machine learning, and developer tools.

Building smarter applications with large language models

A practical guide to integrating LLMs into your applications — architecture patterns, prompt engineering, and common pitfalls.

What we learned cutting our chat latency in half

Speculative decoding, smarter batching, and the un-glamorous work of fixing one slow tokenizer that dominated our P99.

Designing tool-use without losing the plot

Why the agent that confidently calls the wrong tool is worse than the one that says “I don’t know” — and how we evaluate for it.

Notes from running an LLM gateway in production

Caching, fallbacks, cost controls, and the metric that matters more than tokens-per-second: tokens-per-dollar-per-useful-answer.