Mohith G · Writing

Mohith G · WritingEssays on AI product engineering, LLM evals, agents, prompts, and the napkin math of running models in production.https://mohithg.com/en-usSetting the quality bar for AI features: how good is good enoughhttps://mohithg.com/writing/ai-product-quality-bar/https://mohithg.com/writing/ai-product-quality-bar/AI features are non-deterministic. They will make mistakes. The product question is how often, on which inputs, with what user-visible consequences. Here's the framework.Tue, 02 Jun 2026 00:00:00 GMTAI product engineeringVersioning AI products: who pays when behavior changeshttps://mohithg.com/writing/ai-product-versioning/https://mohithg.com/writing/ai-product-versioning/AI product behavior changes when models change. Users notice. The versioning model determines who absorbs the change. Get this wrong and your users feel like the product is randomly different.Mon, 01 Jun 2026 00:00:00 GMTAI product engineeringFrom AI demo to production: the gap is bigger than it lookshttps://mohithg.com/writing/from-demo-to-production/https://mohithg.com/writing/from-demo-to-production/A working AI demo is maybe 20% of the work. The other 80% is everything that makes it survive contact with real users. Here's the punch list.Sun, 31 May 2026 00:00:00 GMTAI product engineeringTeam shapes for AI products: who owns whathttps://mohithg.com/writing/ai-product-team-shape/https://mohithg.com/writing/ai-product-team-shape/Building AI products requires combinations of skill that don't fit traditional team structures. Here's the team shape that actually works and the dysfunction patterns to avoid.Sat, 30 May 2026 00:00:00 GMTAI product engineeringRoadmapping AI products: planning for a moving foundationhttps://mohithg.com/writing/ai-product-roadmap/https://mohithg.com/writing/ai-product-roadmap/Traditional roadmaps assume the technology underneath is stable. AI products live on a substrate that changes every few months. Here's the planning approach that adapts.Fri, 29 May 2026 00:00:00 GMTAI product engineeringBuilding user trust in AI featureshttps://mohithg.com/writing/ai-product-trust/https://mohithg.com/writing/ai-product-trust/AI features have a trust problem most software features don't. Users have learned to be skeptical. The features that earn trust do specific things. Here's the list.Thu, 28 May 2026 00:00:00 GMTAI product engineeringMeasuring AI product success: which metrics actually mean somethinghttps://mohithg.com/writing/measuring-ai-product-success/https://mohithg.com/writing/measuring-ai-product-success/Most AI product dashboards track the wrong things. Engagement is misleading; AI-feature usage is decoration. Here are the metrics that actually tell you whether your AI feature is working.Wed, 27 May 2026 00:00:00 GMTAI product engineeringFeature flags for AI features: rolling out the unpredictablehttps://mohithg.com/writing/ai-feature-flags/https://mohithg.com/writing/ai-feature-flags/AI features fail differently from regular features. Standard rollout patterns leave you exposed to model regressions and traffic-driven failures. Here's the gating model that fits.Tue, 26 May 2026 00:00:00 GMTAI product engineeringOnboarding for AI products: setting expectations the model can meethttps://mohithg.com/writing/ai-product-onboarding/https://mohithg.com/writing/ai-product-onboarding/First-touch experience determines whether users come back. AI products have a unique onboarding problem: managing expectations the model may or may not meet. Here's the playbook.Mon, 25 May 2026 00:00:00 GMTAI product engineeringAI features that disappear (and why that's the goal)https://mohithg.com/writing/ai-features-that-disappear/https://mohithg.com/writing/ai-features-that-disappear/The best AI features in 2026 don't have an 'AI' label. They're invisible improvements to existing flows. Here's why most AI-branded features fail and the disappearing ones succeed.Sun, 24 May 2026 00:00:00 GMTAI product engineeringOptimizing LLM spend after the bill is already bighttps://mohithg.com/writing/optimizing-llm-spend-late-stage/https://mohithg.com/writing/optimizing-llm-spend-late-stage/Most cost-optimization advice assumes you're starting from scratch. What if you already have a $100K/month bill and need to bring it down without breaking the product? Here's the order of operations.Sat, 23 May 2026 00:00:00 GMTThe napkin math of AI in productionBatch vs realtime LLM workloads: pick the right surfacehttps://mohithg.com/writing/llm-batch-vs-realtime/https://mohithg.com/writing/llm-batch-vs-realtime/Many LLM workloads that run synchronously in production should be running asynchronously, and vice versa. The cost and reliability difference is large. Here's the framing.Fri, 22 May 2026 00:00:00 GMTThe napkin math of AI in productionCost attribution for LLM features: knowing where your bill comes fromhttps://mohithg.com/writing/cost-attribution-llm/https://mohithg.com/writing/cost-attribution-llm/An aggregate API bill tells you nothing about which features, users, or queries drive cost. Without attribution, you can't optimize. Here's the model that works.Thu, 21 May 2026 00:00:00 GMTThe napkin math of AI in productionLLM build vs buy: the questions that actually matterhttps://mohithg.com/writing/llm-build-vs-buy/https://mohithg.com/writing/llm-build-vs-buy/Should you build your own model, fine-tune, host open-source, or call APIs? The decision depends on a few specific questions, and the answer is usually 'call APIs.'Wed, 20 May 2026 00:00:00 GMTThe napkin math of AI in productionLLM rate limits: budgeting for the throughput you actually needhttps://mohithg.com/writing/llm-rate-limits/https://mohithg.com/writing/llm-rate-limits/Provider rate limits constrain what you can ship more often than they should. Most teams hit the limits at the wrong time and don't have a plan. Here's the planning framework.Tue, 19 May 2026 00:00:00 GMTThe napkin math of AI in productionThe cost of context: why bigger windows aren't freehttps://mohithg.com/writing/cost-of-context/https://mohithg.com/writing/cost-of-context/Long context windows let you stuff more into a prompt. They don't let you do it for free. The cost scales superlinearly with context size in ways that surprise teams.Mon, 18 May 2026 00:00:00 GMTThe napkin math of AI in productionPricing tiers for AI features: matching limits to economicshttps://mohithg.com/writing/llm-pricing-tier-design/https://mohithg.com/writing/llm-pricing-tier-design/Flat-rate AI pricing leaves you exposed to the heavy users. Pure pay-per-use is hostile to most users. The middle ground is tiers with clear limits, designed around your cost distribution.Sun, 17 May 2026 00:00:00 GMTThe napkin math of AI in productionModel routing: spending the right amount of intelligencehttps://mohithg.com/writing/model-routing-cost/https://mohithg.com/writing/model-routing-cost/Not every request needs the frontier model. Routing requests to the right model tier is one of the highest-leverage cost optimizations and one of the most underused.Sat, 16 May 2026 00:00:00 GMTThe napkin math of AI in productionPrompt caching: the optimization most teams underusehttps://mohithg.com/writing/prompt-caching-economics/https://mohithg.com/writing/prompt-caching-economics/Modern LLM APIs let you cache the static parts of your prompt. Most teams enable it, then design prompts that defeat it. Here's how to get the actual savings.Fri, 15 May 2026 00:00:00 GMTThe napkin math of AI in productionLLM unit economics: the math your CFO will eventually ask abouthttps://mohithg.com/writing/llm-unit-economics/https://mohithg.com/writing/llm-unit-economics/Unit economics for LLM features look different from regular software unit economics. The variable costs are real, the gross margins can flip with usage patterns, and the questions are coming. Here's how to think about them.Thu, 14 May 2026 00:00:00 GMTThe napkin math of AI in productionAgent cost control: where the money actually goeshttps://mohithg.com/writing/agent-cost-control/https://mohithg.com/writing/agent-cost-control/An agent that costs $0.10 per run becomes a $30K monthly bill at meaningful traffic. Here's where the cost concentrates and which controls keep it sustainable.Wed, 13 May 2026 00:00:00 GMTAgent architectureObservability for agents: what to instrument from day onehttps://mohithg.com/writing/agent-observability/https://mohithg.com/writing/agent-observability/An agent without observability is a black box that occasionally produces output. Here's what to instrument, what to alert on, and what to keep out of your dashboards.Tue, 12 May 2026 00:00:00 GMTAgent architectureAgent latency: where the seconds actually gohttps://mohithg.com/writing/agent-latency/https://mohithg.com/writing/agent-latency/An agent that takes 30 seconds to answer is unusable for most product surfaces. Here's where the time actually goes and which optimizations move the needle.Mon, 11 May 2026 00:00:00 GMTAgent architectureTool permissions for agents: the principle of least privilegehttps://mohithg.com/writing/agent-tool-permissions/https://mohithg.com/writing/agent-tool-permissions/An agent with the wrong tool permissions is a security incident waiting to happen. Here's the permission model that keeps agents capable without giving them the keys to everything.Sun, 10 May 2026 00:00:00 GMTAgent architectureEvaluating agents: trajectory matters as much as outcomehttps://mohithg.com/writing/agent-evals/https://mohithg.com/writing/agent-evals/Eval frameworks for single-prompt LLM features don't translate cleanly to agents. Agents have process. The bench needs to grade the process, not just the result.Sat, 09 May 2026 00:00:00 GMTAgent architectureMulti-agent vs single-agent: when the orchestra is worth ithttps://mohithg.com/writing/multi-agent-vs-single-agent/https://mohithg.com/writing/multi-agent-vs-single-agent/Multi-agent architectures look elegant in diagrams. In production, they're more often a tax than a benefit. Here's when the orchestra actually beats the soloist.Fri, 08 May 2026 00:00:00 GMTAgent architectureThe five most common agent failure modes (and how to fix each)https://mohithg.com/writing/agent-failure-modes/https://mohithg.com/writing/agent-failure-modes/Production agents fail in predictable ways. Knowing the patterns saves weeks of debugging. Here are the five I see most often and what actually fixes them.Thu, 07 May 2026 00:00:00 GMTAgent architectureWhen to use an agent (and when not to)https://mohithg.com/writing/when-to-use-agents/https://mohithg.com/writing/when-to-use-agents/The 'agent' label has been applied to almost every LLM feature. Most of them shouldn't be agents. Here's the actual decision criteria.Wed, 06 May 2026 00:00:00 GMTAgent architectureAgent state management: the part nobody writes abouthttps://mohithg.com/writing/agent-state-management/https://mohithg.com/writing/agent-state-management/Most agent tutorials skip past the question of where state lives. In production, state management is half the work. Here's the model that scales.Tue, 05 May 2026 00:00:00 GMTAgent architectureTool design for agents: APIs the model can actually usehttps://mohithg.com/writing/tool-design-for-agents/https://mohithg.com/writing/tool-design-for-agents/An agent is only as good as the tools you give it. Most teams design tools the way they design APIs for other engineers, and pay for it. Here's the difference that matters.Mon, 04 May 2026 00:00:00 GMTAgent architectureThe AI's vocabulary is a hidden API contracthttps://mohithg.com/writing/the-ai-vocabulary-problem/https://mohithg.com/writing/the-ai-vocabulary-problem/Every word your LLM is allowed to say imposes obligations on the systems beneath it. Treat the prompt's vocabulary like an interface or pay for it later.Sun, 03 May 2026 00:00:00 GMTPrompts as API contractsHuman-in-the-loop evals: where it's still essential in 2026https://mohithg.com/writing/human-in-the-loop-evals/https://mohithg.com/writing/human-in-the-loop-evals/Automated evals can do a lot, but not everything. Here's where humans still beat any LLM judge, and how to set up the human review loop without breaking the bank.Sun, 03 May 2026 00:00:00 GMTLLM eval engineeringWhat an LLM eval bench actually needs to dohttps://mohithg.com/writing/llm-eval-bench-actually-needs/https://mohithg.com/writing/llm-eval-bench-actually-needs/Most eval frameworks measure whether the model returned a string. Production eval benches measure whether shipping the change is safe. The gap is everything.Sat, 02 May 2026 00:00:00 GMTLLM eval engineeringAdversarial evals: what to break before users dohttps://mohithg.com/writing/adversarial-evals/https://mohithg.com/writing/adversarial-evals/The friendly cases will tell you the model usually works. The adversarial cases will tell you what happens when things go wrong. Most teams don't have enough of the second kind.Sat, 02 May 2026 00:00:00 GMTLLM eval engineeringEval drift: when your bench stops measuring what you care abouthttps://mohithg.com/writing/eval-drift/https://mohithg.com/writing/eval-drift/An eval bench can pass with flying colors while production quality declines. The gap is called eval drift, and it's the most common silent failure in LLM ops.Fri, 01 May 2026 00:00:00 GMTLLM eval engineeringAgent loops are just function-call graphshttps://mohithg.com/writing/agent-loops-are-graphs/https://mohithg.com/writing/agent-loops-are-graphs/Strip away the agent terminology and you're left with a graph of function calls with conditional edges. The patterns that ship treat them that way.Fri, 01 May 2026 00:00:00 GMTAgent architectureThe hidden cost of evals (and how to keep them affordable)https://mohithg.com/writing/hidden-cost-of-evals/https://mohithg.com/writing/hidden-cost-of-evals/Eval pipelines are easy to start and expensive to run at scale. Here's where the cost actually comes from and how to keep it under control without losing the safety net.Thu, 30 Apr 2026 00:00:00 GMTLLM eval engineeringThe economics of running an LLM agent at scalehttps://mohithg.com/writing/economics-of-llm-in-production/https://mohithg.com/writing/economics-of-llm-in-production/Napkin math for the unit cost of an AI feature: tokens, latency, caching, model routing, and the surprising line items nobody publishes.Thu, 30 Apr 2026 00:00:00 GMTThe napkin math of AI in productionEval datasets that hold up over timehttps://mohithg.com/writing/eval-datasets-that-hold-up/https://mohithg.com/writing/eval-datasets-that-hold-up/Most eval datasets rot. The cases drift, the rubrics get stale, the bench becomes a museum piece. Here's how to build one that stays useful for years.Wed, 29 Apr 2026 00:00:00 GMTLLM eval engineeringBuild the substance, then the surfacehttps://mohithg.com/writing/build-substance-then-surface/https://mohithg.com/writing/build-substance-then-surface/Most AI product failures are LLM wrappers shipped before there's anything underneath worth wrapping. The hard part of an AI product is almost never the prompt.Wed, 29 Apr 2026 00:00:00 GMTAI product engineeringPrompts as type signatureshttps://mohithg.com/writing/prompts-as-type-signatures/https://mohithg.com/writing/prompts-as-type-signatures/The quickest mental model improvement for prompt engineering: stop thinking of prompts as instructions, start thinking of them as type signatures for the model's output.Tue, 28 Apr 2026 00:00:00 GMTPrompts as API contractsThree kinds of evals: continuous, deep, and shadowhttps://mohithg.com/writing/continuous-vs-deep-vs-shadow-evals/https://mohithg.com/writing/continuous-vs-deep-vs-shadow-evals/Most teams treat 'evals' as one thing. The teams shipping reliable AI products run three distinct eval loops at different cadences. Here's the breakdown.Tue, 28 Apr 2026 00:00:00 GMTLLM eval engineeringSystem prompts that age wellhttps://mohithg.com/writing/system-prompts-that-age-well/https://mohithg.com/writing/system-prompts-that-age-well/A system prompt is shipped code. It needs the same discipline. Here are the patterns that survive a year of model upgrades, prompt edits, and team turnover.Mon, 27 Apr 2026 00:00:00 GMTPrompts as API contractsThe eval rubric is the workhttps://mohithg.com/writing/eval-rubrics-are-the-work/https://mohithg.com/writing/eval-rubrics-are-the-work/Most teams treat the eval rubric as paperwork. The teams shipping reliable LLM products treat the rubric as the actual product specification. Here's the difference.Mon, 27 Apr 2026 00:00:00 GMTLLM eval engineeringPrompt versioning that doesn't suckhttps://mohithg.com/writing/prompt-versioning-practical/https://mohithg.com/writing/prompt-versioning-practical/Versioning prompts is harder than versioning code because the artifact is a string and the test suite is fuzzy. Here's the workflow that ships.Sun, 26 Apr 2026 00:00:00 GMTPrompts as API contractsLLM-as-judge: what actually works in 2026https://mohithg.com/writing/llm-as-judge-what-works/https://mohithg.com/writing/llm-as-judge-what-works/Using one LLM to grade another LLM's output is the most over-deployed and under-evaluated eval pattern in production. Here's when it works, when it fails, and how to use it well.Sun, 26 Apr 2026 00:00:00 GMTLLM eval engineeringWhen to ship a prompt changehttps://mohithg.com/writing/when-to-ship-a-prompt-change/https://mohithg.com/writing/when-to-ship-a-prompt-change/The decision rule that separates teams who ship prompt changes confidently from teams who hover their finger over the button.Sat, 25 Apr 2026 00:00:00 GMTPrompts as API contractsThe minimum viable eval bench (and why most teams skip it)https://mohithg.com/writing/minimum-viable-eval-bench/https://mohithg.com/writing/minimum-viable-eval-bench/Most LLM teams ship without a real eval bench. The reason isn't that benches are hard. It's that the first one feels too small to matter. Here's the smallest useful one.Sat, 25 Apr 2026 00:00:00 GMTLLM eval engineeringFew-shot design: the prompt technique that's underused in 2026https://mohithg.com/writing/few-shot-design/https://mohithg.com/writing/few-shot-design/Few-shot examples are the most reliable way to shape model behavior. Most production prompts use them badly or skip them entirely. Here's how to use them well.Fri, 24 Apr 2026 00:00:00 GMTPrompts as API contractsSchema-first prompts: stop asking the model nicelyhttps://mohithg.com/writing/schema-first-prompts/https://mohithg.com/writing/schema-first-prompts/Constrained generation, structured output APIs, and JSON schema have made prompt engineering more like API design and less like creative writing. Lean in.Thu, 23 Apr 2026 00:00:00 GMTPrompts as API contractsPersonas in prompts: useful or theatre?https://mohithg.com/writing/personas-useful-or-theatre/https://mohithg.com/writing/personas-useful-or-theatre/Almost every system prompt starts with 'You are a helpful assistant.' Most personas in prompts are decorative. Here's when they actually move the needle, and when they're padding.Wed, 22 Apr 2026 00:00:00 GMTPrompts as API contractsDebugging LLM apps: the trace-everything approachhttps://mohithg.com/writing/trace-everything-debugging/https://mohithg.com/writing/trace-everything-debugging/You cannot debug what you cannot replay. The single highest-leverage habit in LLM engineering is making every model call inspectable after the fact.Tue, 21 Apr 2026 00:00:00 GMTPrompts as API contractsSystem, user, developer: which message goes wherehttps://mohithg.com/writing/system-vs-user-vs-developer/https://mohithg.com/writing/system-vs-user-vs-developer/Modern LLM APIs distinguish between system, user, developer, and assistant roles. The rules for which content goes in which slot aren't intuitive. Here's the working model.Sun, 19 Apr 2026 00:00:00 GMTPrompts as API contracts