<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Mohith G · Writing</title><description>Essays on AI product engineering, LLM evals, agents, prompts, and the napkin math of running models in production.</description><link>https://mohithg.com/</link><language>en-us</language><item><title>Shipping AI products in 2026: the playbook by phase</title><link>https://mohithg.com/writing/ai-product-playbook-2026/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-product-playbook-2026/</guid><description>If I were building a new AI product today, here&apos;s the order I&apos;d do it in. Phase by phase, with the specific decisions that matter at each stage.</description><pubDate>Tue, 07 Jul 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>The AI engineering stack in 2026: a map of the discipline</title><link>https://mohithg.com/writing/ai-engineering-stack-2026/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-engineering-stack-2026/</guid><description>AI product engineering has become a real discipline with its own stack. Here&apos;s the map I&apos;ve ended up with after 18 months of writing about each piece, and where each piece fits.</description><pubDate>Mon, 06 Jul 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Deploying AI changes safely: rollouts that don&apos;t surprise users</title><link>https://mohithg.com/writing/ai-deployment-rollouts/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-deployment-rollouts/</guid><description>AI deployments have unique risks. Standard CI/CD patterns leave gaps. Here&apos;s the rollout discipline that catches problems before they reach all users.</description><pubDate>Sun, 05 Jul 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>Load testing AI features: what breaks first under load</title><link>https://mohithg.com/writing/llm-load-testing/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-load-testing/</guid><description>AI features fail differently under load than regular APIs. Standard load tests miss the failure modes that matter. Here&apos;s the load testing approach that finds real problems.</description><pubDate>Sat, 04 Jul 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>Multi-region AI deployment: latency, residency, and reliability</title><link>https://mohithg.com/writing/multi-region-ai/</link><guid isPermaLink="true">https://mohithg.com/writing/multi-region-ai/</guid><description>Once your AI product has users worldwide, single-region deployment hurts. Multi-region adds complexity but solves real problems. Here&apos;s the architecture that works.</description><pubDate>Fri, 03 Jul 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>LLM caching layers: prompt cache, response cache, semantic cache</title><link>https://mohithg.com/writing/llm-caching-layers/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-caching-layers/</guid><description>Caching for LLM products has more layers than caching for regular APIs. Each layer has different tradeoffs. Here&apos;s the stack and the patterns that compound.</description><pubDate>Thu, 02 Jul 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>Streaming LLM responses: the UX win that&apos;s harder than it looks</title><link>https://mohithg.com/writing/streaming-llm-responses/</link><guid isPermaLink="true">https://mohithg.com/writing/streaming-llm-responses/</guid><description>Streaming the model&apos;s tokens to the user as they&apos;re generated dramatically improves perceived latency. The implementation has more gotchas than tutorials suggest.</description><pubDate>Wed, 01 Jul 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>GPU economics for AI inference: where the money actually goes</title><link>https://mohithg.com/writing/gpu-economics/</link><guid isPermaLink="true">https://mohithg.com/writing/gpu-economics/</guid><description>Self-hosting LLMs means renting GPUs. The cost calculation isn&apos;t just $/hour. Utilization, batching, quantization, and cold starts all change the picture. Here&apos;s the real math.</description><pubDate>Tue, 30 Jun 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>Inference serving in 2026: vLLM, TGI, SGLang, and the choice that matters</title><link>https://mohithg.com/writing/inference-serving-frameworks/</link><guid isPermaLink="true">https://mohithg.com/writing/inference-serving-frameworks/</guid><description>If you&apos;re self-hosting LLMs, the inference server is one of the highest-leverage choices. Here&apos;s the landscape and the criteria that actually drive the decision.</description><pubDate>Mon, 29 Jun 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>Model Context Protocol (MCP): what it actually is and why it matters</title><link>https://mohithg.com/writing/mcp-explained/</link><guid isPermaLink="true">https://mohithg.com/writing/mcp-explained/</guid><description>MCP is the protocol decoupling AI models from the tools and data they use. In 2026 it&apos;s becoming a baseline. Here&apos;s what it is and what to actually do about it.</description><pubDate>Sun, 28 Jun 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>The LLM gateway pattern: one API for all your AI</title><link>https://mohithg.com/writing/llm-gateway-pattern/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-gateway-pattern/</guid><description>Calling LLM APIs directly from product code is fine until it isn&apos;t. The gateway pattern centralizes the cross-cutting concerns. Here&apos;s how to build one without overengineering.</description><pubDate>Sat, 27 Jun 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>AI infrastructure: the boring layer that decides if you scale</title><link>https://mohithg.com/writing/ai-infrastructure-decides-scale/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-infrastructure-decides-scale/</guid><description>Prompts and models get attention. Infrastructure decides whether the product survives. Here&apos;s the infrastructure thinking that separates teams that scale from teams that don&apos;t.</description><pubDate>Fri, 26 Jun 2026 00:00:00 GMT</pubDate><category>AI infrastructure</category></item><item><title>Abuse detection for AI products: spotting bad actors at scale</title><link>https://mohithg.com/writing/ai-abuse-detection/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-abuse-detection/</guid><description>Some users will try to abuse your AI product. The volume of normal usage hides the abusive patterns until they&apos;re costly. Here&apos;s how to detect abuse without spying on legitimate users.</description><pubDate>Thu, 25 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Incident response for AI features: the playbook</title><link>https://mohithg.com/writing/ai-incident-response/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-incident-response/</guid><description>AI incidents look different from regular software incidents. The playbook is similar but with AI-specific steps. Here&apos;s the runbook the teams I&apos;ve seen use successfully.</description><pubDate>Wed, 24 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Audit trails for AI: who decided what, when</title><link>https://mohithg.com/writing/audit-trails-ai/</link><guid isPermaLink="true">https://mohithg.com/writing/audit-trails-ai/</guid><description>When something goes wrong with an AI system, the audit trail is what tells you what happened. Most AI systems don&apos;t have one. Here&apos;s the structure that holds up under investigation.</description><pubDate>Tue, 23 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Designing refusal: how AI says no without alienating users</title><link>https://mohithg.com/writing/refusal-design/</link><guid isPermaLink="true">https://mohithg.com/writing/refusal-design/</guid><description>Refusing user requests is part of every safe AI product. How the refusal is communicated determines whether users tolerate the limit or abandon the product. Here&apos;s the design.</description><pubDate>Mon, 22 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Hallucination mitigation: not &apos;fewer hallucinations&apos; but &apos;no harmful ones&apos;</title><link>https://mohithg.com/writing/hallucination-mitigation/</link><guid isPermaLink="true">https://mohithg.com/writing/hallucination-mitigation/</guid><description>Eliminating hallucination is unrealistic. Preventing hallucinations from causing harm is achievable. Here&apos;s the reframing and the patterns that work.</description><pubDate>Sun, 21 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>PII handling in LLM products: where the data actually goes</title><link>https://mohithg.com/writing/pii-handling-llm/</link><guid isPermaLink="true">https://mohithg.com/writing/pii-handling-llm/</guid><description>AI products handle user data. Most teams don&apos;t have a clear picture of where PII flows in their stack. Here&apos;s the audit and the patterns that actually keep data safe.</description><pubDate>Sat, 20 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Jailbreak resistance: how production systems hold up</title><link>https://mohithg.com/writing/jailbreak-resistance/</link><guid isPermaLink="true">https://mohithg.com/writing/jailbreak-resistance/</guid><description>Jailbreaks are attempts to make the AI ignore its constraints. They keep evolving. Defending against them requires more than the model&apos;s built-in resistance. Here&apos;s how.</description><pubDate>Fri, 19 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Content moderation for AI: the pipeline that holds up</title><link>https://mohithg.com/writing/content-moderation-pipeline/</link><guid isPermaLink="true">https://mohithg.com/writing/content-moderation-pipeline/</guid><description>Models can produce content you don&apos;t want users to see. A moderation pipeline catches it before it reaches them. Here&apos;s the architecture and the patterns that work.</description><pubDate>Thu, 18 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Red-teaming your own AI: how to break it before users do</title><link>https://mohithg.com/writing/red-teaming-ai/</link><guid isPermaLink="true">https://mohithg.com/writing/red-teaming-ai/</guid><description>The cheapest safety incident is the one you found yourself. Most teams don&apos;t red-team their AI products. Here&apos;s how to do it without a dedicated security team.</description><pubDate>Wed, 17 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Prompt injection: the actual threat model</title><link>https://mohithg.com/writing/prompt-injection-defense/</link><guid isPermaLink="true">https://mohithg.com/writing/prompt-injection-defense/</guid><description>Prompt injection gets discussed as a generic risk. The actual threats are specific and the defenses are specific. Here&apos;s the threat model and the defenses that work.</description><pubDate>Tue, 16 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>AI safety as engineering discipline, not philosophy</title><link>https://mohithg.com/writing/ai-safety-as-engineering/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-safety-as-engineering/</guid><description>Most AI safety conversations stay abstract. The teams shipping reliable AI products treat safety as concrete engineering: architecture, eval, instrumentation. Here&apos;s the discipline.</description><pubDate>Mon, 15 Jun 2026 00:00:00 GMT</pubDate><category>AI safety and guardrails</category></item><item><title>Freshness in RAG: keeping the index in sync with the world</title><link>https://mohithg.com/writing/freshness-in-rag/</link><guid isPermaLink="true">https://mohithg.com/writing/freshness-in-rag/</guid><description>A RAG system that returns yesterday&apos;s data on questions about today&apos;s reality is a liability. Keeping the index fresh is harder than it sounds. Here&apos;s the patterns.</description><pubDate>Sun, 14 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>RAG with permissions: keeping users out of each other&apos;s data</title><link>https://mohithg.com/writing/rag-with-permissions/</link><guid isPermaLink="true">https://mohithg.com/writing/rag-with-permissions/</guid><description>A multi-tenant RAG system has to enforce permissions at retrieval time, not after. Get this wrong and you have a data leak. Here&apos;s the architecture that holds up.</description><pubDate>Sat, 13 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Long context vs RAG: when to retrieve and when to stuff</title><link>https://mohithg.com/writing/long-context-vs-rag/</link><guid isPermaLink="true">https://mohithg.com/writing/long-context-vs-rag/</guid><description>Modern models support 200K+ token contexts. Some say RAG is dead. The reality is more nuanced. Here&apos;s the framing for when each approach actually wins.</description><pubDate>Fri, 12 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Document preprocessing for RAG: garbage in, garbage out</title><link>https://mohithg.com/writing/document-preprocessing-rag/</link><guid isPermaLink="true">https://mohithg.com/writing/document-preprocessing-rag/</guid><description>RAG systems are downstream of your document preprocessing. Bad text extraction, lost structure, broken tables: each one degrades retrieval. Here&apos;s the pipeline that matters.</description><pubDate>Thu, 11 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Choosing a vector database: the criteria that actually matter</title><link>https://mohithg.com/writing/vector-db-choice/</link><guid isPermaLink="true">https://mohithg.com/writing/vector-db-choice/</guid><description>Vector DB choice gets discussed at length and decided poorly. Most teams pick by feature checklist; the actual tradeoffs are different. Here&apos;s the framework.</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Query rewriting: the underused RAG optimization</title><link>https://mohithg.com/writing/query-rewriting-rag/</link><guid isPermaLink="true">https://mohithg.com/writing/query-rewriting-rag/</guid><description>User queries are not optimal retrieval queries. Rewriting the query before retrieval, often with an LLM, can dramatically improve recall. Most teams don&apos;t do it.</description><pubDate>Tue, 09 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Evaluating RAG: separating retrieval quality from answer quality</title><link>https://mohithg.com/writing/rag-evaluation/</link><guid isPermaLink="true">https://mohithg.com/writing/rag-evaluation/</guid><description>Most teams evaluate the final answer their RAG system produces. That&apos;s necessary but not sufficient. Without evaluating retrieval separately, you can&apos;t tell what to fix.</description><pubDate>Mon, 08 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Reranking: the second-stage retrieval most teams skip</title><link>https://mohithg.com/writing/reranking-in-rag/</link><guid isPermaLink="true">https://mohithg.com/writing/reranking-in-rag/</guid><description>First-pass retrieval is fast and noisy. A reranker on top cleans up the order in tens of milliseconds. Skipping it leaves quality on the table.</description><pubDate>Sun, 07 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Choosing an embedding model: the decision that compounds</title><link>https://mohithg.com/writing/embedding-model-choice/</link><guid isPermaLink="true">https://mohithg.com/writing/embedding-model-choice/</guid><description>Your embedding model decision affects retrieval quality, cost, and the cost of every future migration. Most teams pick by leaderboard. Here&apos;s the decision that actually fits your product.</description><pubDate>Sat, 06 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Hybrid search: why pure vector retrieval isn&apos;t enough</title><link>https://mohithg.com/writing/hybrid-search/</link><guid isPermaLink="true">https://mohithg.com/writing/hybrid-search/</guid><description>Vector search is great until it isn&apos;t. The cases it misses are the ones BM25 catches. Combining both is the right default for most production RAG, and it&apos;s not as hard as it looks.</description><pubDate>Fri, 05 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Chunking strategies that hold up in production</title><link>https://mohithg.com/writing/chunking-strategies/</link><guid isPermaLink="true">https://mohithg.com/writing/chunking-strategies/</guid><description>How you split documents for retrieval is one of the highest-leverage RAG decisions and one of the most under-discussed. Here&apos;s the chunking playbook that actually works.</description><pubDate>Thu, 04 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Retrieval is the unsexy half of every AI product</title><link>https://mohithg.com/writing/retrieval-is-the-product/</link><guid isPermaLink="true">https://mohithg.com/writing/retrieval-is-the-product/</guid><description>Generative AI gets the attention. Retrieval does the work. The teams shipping reliable AI products spend most of their effort on the indexing, chunking, and ranking that nobody writes about.</description><pubDate>Wed, 03 Jun 2026 00:00:00 GMT</pubDate><category>Retrieval and RAG</category></item><item><title>Setting the quality bar for AI features: how good is good enough</title><link>https://mohithg.com/writing/ai-product-quality-bar/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-product-quality-bar/</guid><description>AI features are non-deterministic. They will make mistakes. The product question is how often, on which inputs, with what user-visible consequences. Here&apos;s the framework.</description><pubDate>Tue, 02 Jun 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Versioning AI products: who pays when behavior changes</title><link>https://mohithg.com/writing/ai-product-versioning/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-product-versioning/</guid><description>AI product behavior changes when models change. Users notice. The versioning model determines who absorbs the change. Get this wrong and your users feel like the product is randomly different.</description><pubDate>Mon, 01 Jun 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>From AI demo to production: the gap is bigger than it looks</title><link>https://mohithg.com/writing/from-demo-to-production/</link><guid isPermaLink="true">https://mohithg.com/writing/from-demo-to-production/</guid><description>A working AI demo is maybe 20% of the work. The other 80% is everything that makes it survive contact with real users. Here&apos;s the punch list.</description><pubDate>Sun, 31 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Team shapes for AI products: who owns what</title><link>https://mohithg.com/writing/ai-product-team-shape/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-product-team-shape/</guid><description>Building AI products requires combinations of skill that don&apos;t fit traditional team structures. Here&apos;s the team shape that actually works and the dysfunction patterns to avoid.</description><pubDate>Sat, 30 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Roadmapping AI products: planning for a moving foundation</title><link>https://mohithg.com/writing/ai-product-roadmap/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-product-roadmap/</guid><description>Traditional roadmaps assume the technology underneath is stable. AI products live on a substrate that changes every few months. Here&apos;s the planning approach that adapts.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Building user trust in AI features</title><link>https://mohithg.com/writing/ai-product-trust/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-product-trust/</guid><description>AI features have a trust problem most software features don&apos;t. Users have learned to be skeptical. The features that earn trust do specific things. Here&apos;s the list.</description><pubDate>Thu, 28 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Measuring AI product success: which metrics actually mean something</title><link>https://mohithg.com/writing/measuring-ai-product-success/</link><guid isPermaLink="true">https://mohithg.com/writing/measuring-ai-product-success/</guid><description>Most AI product dashboards track the wrong things. Engagement is misleading; AI-feature usage is decoration. Here are the metrics that actually tell you whether your AI feature is working.</description><pubDate>Wed, 27 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Feature flags for AI features: rolling out the unpredictable</title><link>https://mohithg.com/writing/ai-feature-flags/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-feature-flags/</guid><description>AI features fail differently from regular features. Standard rollout patterns leave you exposed to model regressions and traffic-driven failures. Here&apos;s the gating model that fits.</description><pubDate>Tue, 26 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Onboarding for AI products: setting expectations the model can meet</title><link>https://mohithg.com/writing/ai-product-onboarding/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-product-onboarding/</guid><description>First-touch experience determines whether users come back. AI products have a unique onboarding problem: managing expectations the model may or may not meet. Here&apos;s the playbook.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>AI features that disappear (and why that&apos;s the goal)</title><link>https://mohithg.com/writing/ai-features-that-disappear/</link><guid isPermaLink="true">https://mohithg.com/writing/ai-features-that-disappear/</guid><description>The best AI features in 2026 don&apos;t have an &apos;AI&apos; label. They&apos;re invisible improvements to existing flows. Here&apos;s why most AI-branded features fail and the disappearing ones succeed.</description><pubDate>Sun, 24 May 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Optimizing LLM spend after the bill is already big</title><link>https://mohithg.com/writing/optimizing-llm-spend-late-stage/</link><guid isPermaLink="true">https://mohithg.com/writing/optimizing-llm-spend-late-stage/</guid><description>Most cost-optimization advice assumes you&apos;re starting from scratch. What if you already have a $100K/month bill and need to bring it down without breaking the product? Here&apos;s the order of operations.</description><pubDate>Sat, 23 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>Batch vs realtime LLM workloads: pick the right surface</title><link>https://mohithg.com/writing/llm-batch-vs-realtime/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-batch-vs-realtime/</guid><description>Many LLM workloads that run synchronously in production should be running asynchronously, and vice versa. The cost and reliability difference is large. Here&apos;s the framing.</description><pubDate>Fri, 22 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>Cost attribution for LLM features: knowing where your bill comes from</title><link>https://mohithg.com/writing/cost-attribution-llm/</link><guid isPermaLink="true">https://mohithg.com/writing/cost-attribution-llm/</guid><description>An aggregate API bill tells you nothing about which features, users, or queries drive cost. Without attribution, you can&apos;t optimize. Here&apos;s the model that works.</description><pubDate>Thu, 21 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>LLM build vs buy: the questions that actually matter</title><link>https://mohithg.com/writing/llm-build-vs-buy/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-build-vs-buy/</guid><description>Should you build your own model, fine-tune, host open-source, or call APIs? The decision depends on a few specific questions, and the answer is usually &apos;call APIs.&apos;</description><pubDate>Wed, 20 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>LLM rate limits: budgeting for the throughput you actually need</title><link>https://mohithg.com/writing/llm-rate-limits/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-rate-limits/</guid><description>Provider rate limits constrain what you can ship more often than they should. Most teams hit the limits at the wrong time and don&apos;t have a plan. Here&apos;s the planning framework.</description><pubDate>Tue, 19 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>The cost of context: why bigger windows aren&apos;t free</title><link>https://mohithg.com/writing/cost-of-context/</link><guid isPermaLink="true">https://mohithg.com/writing/cost-of-context/</guid><description>Long context windows let you stuff more into a prompt. They don&apos;t let you do it for free. The cost scales superlinearly with context size in ways that surprise teams.</description><pubDate>Mon, 18 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>Pricing tiers for AI features: matching limits to economics</title><link>https://mohithg.com/writing/llm-pricing-tier-design/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-pricing-tier-design/</guid><description>Flat-rate AI pricing leaves you exposed to the heavy users. Pure pay-per-use is hostile to most users. The middle ground is tiers with clear limits, designed around your cost distribution.</description><pubDate>Sun, 17 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>Model routing: spending the right amount of intelligence</title><link>https://mohithg.com/writing/model-routing-cost/</link><guid isPermaLink="true">https://mohithg.com/writing/model-routing-cost/</guid><description>Not every request needs the frontier model. Routing requests to the right model tier is one of the highest-leverage cost optimizations and one of the most underused.</description><pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>Prompt caching: the optimization most teams underuse</title><link>https://mohithg.com/writing/prompt-caching-economics/</link><guid isPermaLink="true">https://mohithg.com/writing/prompt-caching-economics/</guid><description>Modern LLM APIs let you cache the static parts of your prompt. Most teams enable it, then design prompts that defeat it. Here&apos;s how to get the actual savings.</description><pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>LLM unit economics: the math your CFO will eventually ask about</title><link>https://mohithg.com/writing/llm-unit-economics/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-unit-economics/</guid><description>Unit economics for LLM features look different from regular software unit economics. The variable costs are real, the gross margins can flip with usage patterns, and the questions are coming. Here&apos;s how to think about them.</description><pubDate>Thu, 14 May 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>Agent cost control: where the money actually goes</title><link>https://mohithg.com/writing/agent-cost-control/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-cost-control/</guid><description>An agent that costs $0.10 per run becomes a $30K monthly bill at meaningful traffic. Here&apos;s where the cost concentrates and which controls keep it sustainable.</description><pubDate>Wed, 13 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>Observability for agents: what to instrument from day one</title><link>https://mohithg.com/writing/agent-observability/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-observability/</guid><description>An agent without observability is a black box that occasionally produces output. Here&apos;s what to instrument, what to alert on, and what to keep out of your dashboards.</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>Agent latency: where the seconds actually go</title><link>https://mohithg.com/writing/agent-latency/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-latency/</guid><description>An agent that takes 30 seconds to answer is unusable for most product surfaces. Here&apos;s where the time actually goes and which optimizations move the needle.</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>Tool permissions for agents: the principle of least privilege</title><link>https://mohithg.com/writing/agent-tool-permissions/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-tool-permissions/</guid><description>An agent with the wrong tool permissions is a security incident waiting to happen. Here&apos;s the permission model that keeps agents capable without giving them the keys to everything.</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>Evaluating agents: trajectory matters as much as outcome</title><link>https://mohithg.com/writing/agent-evals/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-evals/</guid><description>Eval frameworks for single-prompt LLM features don&apos;t translate cleanly to agents. Agents have process. The bench needs to grade the process, not just the result.</description><pubDate>Sat, 09 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>Multi-agent vs single-agent: when the orchestra is worth it</title><link>https://mohithg.com/writing/multi-agent-vs-single-agent/</link><guid isPermaLink="true">https://mohithg.com/writing/multi-agent-vs-single-agent/</guid><description>Multi-agent architectures look elegant in diagrams. In production, they&apos;re more often a tax than a benefit. Here&apos;s when the orchestra actually beats the soloist.</description><pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>The five most common agent failure modes (and how to fix each)</title><link>https://mohithg.com/writing/agent-failure-modes/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-failure-modes/</guid><description>Production agents fail in predictable ways. Knowing the patterns saves weeks of debugging. Here are the five I see most often and what actually fixes them.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>When to use an agent (and when not to)</title><link>https://mohithg.com/writing/when-to-use-agents/</link><guid isPermaLink="true">https://mohithg.com/writing/when-to-use-agents/</guid><description>The &apos;agent&apos; label has been applied to almost every LLM feature. Most of them shouldn&apos;t be agents. Here&apos;s the actual decision criteria.</description><pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>Agent state management: the part nobody writes about</title><link>https://mohithg.com/writing/agent-state-management/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-state-management/</guid><description>Most agent tutorials skip past the question of where state lives. In production, state management is half the work. Here&apos;s the model that scales.</description><pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>Tool design for agents: APIs the model can actually use</title><link>https://mohithg.com/writing/tool-design-for-agents/</link><guid isPermaLink="true">https://mohithg.com/writing/tool-design-for-agents/</guid><description>An agent is only as good as the tools you give it. Most teams design tools the way they design APIs for other engineers, and pay for it. Here&apos;s the difference that matters.</description><pubDate>Mon, 04 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>The AI&apos;s vocabulary is a hidden API contract</title><link>https://mohithg.com/writing/the-ai-vocabulary-problem/</link><guid isPermaLink="true">https://mohithg.com/writing/the-ai-vocabulary-problem/</guid><description>Every word your LLM is allowed to say imposes obligations on the systems beneath it. Treat the prompt&apos;s vocabulary like an interface or pay for it later.</description><pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>Human-in-the-loop evals: where it&apos;s still essential in 2026</title><link>https://mohithg.com/writing/human-in-the-loop-evals/</link><guid isPermaLink="true">https://mohithg.com/writing/human-in-the-loop-evals/</guid><description>Automated evals can do a lot, but not everything. Here&apos;s where humans still beat any LLM judge, and how to set up the human review loop without breaking the bank.</description><pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>What an LLM eval bench actually needs to do</title><link>https://mohithg.com/writing/llm-eval-bench-actually-needs/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-eval-bench-actually-needs/</guid><description>Most eval frameworks measure whether the model returned a string. Production eval benches measure whether shipping the change is safe. The gap is everything.</description><pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>Adversarial evals: what to break before users do</title><link>https://mohithg.com/writing/adversarial-evals/</link><guid isPermaLink="true">https://mohithg.com/writing/adversarial-evals/</guid><description>The friendly cases will tell you the model usually works. The adversarial cases will tell you what happens when things go wrong. Most teams don&apos;t have enough of the second kind.</description><pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>Eval drift: when your bench stops measuring what you care about</title><link>https://mohithg.com/writing/eval-drift/</link><guid isPermaLink="true">https://mohithg.com/writing/eval-drift/</guid><description>An eval bench can pass with flying colors while production quality declines. The gap is called eval drift, and it&apos;s the most common silent failure in LLM ops.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>Agent loops are just function-call graphs</title><link>https://mohithg.com/writing/agent-loops-are-graphs/</link><guid isPermaLink="true">https://mohithg.com/writing/agent-loops-are-graphs/</guid><description>Strip away the agent terminology and you&apos;re left with a graph of function calls with conditional edges. The patterns that ship treat them that way.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><category>Agent architecture</category></item><item><title>The hidden cost of evals (and how to keep them affordable)</title><link>https://mohithg.com/writing/hidden-cost-of-evals/</link><guid isPermaLink="true">https://mohithg.com/writing/hidden-cost-of-evals/</guid><description>Eval pipelines are easy to start and expensive to run at scale. Here&apos;s where the cost actually comes from and how to keep it under control without losing the safety net.</description><pubDate>Thu, 30 Apr 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>The economics of running an LLM agent at scale</title><link>https://mohithg.com/writing/economics-of-llm-in-production/</link><guid isPermaLink="true">https://mohithg.com/writing/economics-of-llm-in-production/</guid><description>Napkin math for the unit cost of an AI feature: tokens, latency, caching, model routing, and the surprising line items nobody publishes.</description><pubDate>Thu, 30 Apr 2026 00:00:00 GMT</pubDate><category>The napkin math of AI in production</category></item><item><title>Eval datasets that hold up over time</title><link>https://mohithg.com/writing/eval-datasets-that-hold-up/</link><guid isPermaLink="true">https://mohithg.com/writing/eval-datasets-that-hold-up/</guid><description>Most eval datasets rot. The cases drift, the rubrics get stale, the bench becomes a museum piece. Here&apos;s how to build one that stays useful for years.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>Build the substance, then the surface</title><link>https://mohithg.com/writing/build-substance-then-surface/</link><guid isPermaLink="true">https://mohithg.com/writing/build-substance-then-surface/</guid><description>Most AI product failures are LLM wrappers shipped before there&apos;s anything underneath worth wrapping. The hard part of an AI product is almost never the prompt.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><category>AI product engineering</category></item><item><title>Prompts as type signatures</title><link>https://mohithg.com/writing/prompts-as-type-signatures/</link><guid isPermaLink="true">https://mohithg.com/writing/prompts-as-type-signatures/</guid><description>The quickest mental model improvement for prompt engineering: stop thinking of prompts as instructions, start thinking of them as type signatures for the model&apos;s output.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>Three kinds of evals: continuous, deep, and shadow</title><link>https://mohithg.com/writing/continuous-vs-deep-vs-shadow-evals/</link><guid isPermaLink="true">https://mohithg.com/writing/continuous-vs-deep-vs-shadow-evals/</guid><description>Most teams treat &apos;evals&apos; as one thing. The teams shipping reliable AI products run three distinct eval loops at different cadences. Here&apos;s the breakdown.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>System prompts that age well</title><link>https://mohithg.com/writing/system-prompts-that-age-well/</link><guid isPermaLink="true">https://mohithg.com/writing/system-prompts-that-age-well/</guid><description>A system prompt is shipped code. It needs the same discipline. Here are the patterns that survive a year of model upgrades, prompt edits, and team turnover.</description><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>The eval rubric is the work</title><link>https://mohithg.com/writing/eval-rubrics-are-the-work/</link><guid isPermaLink="true">https://mohithg.com/writing/eval-rubrics-are-the-work/</guid><description>Most teams treat the eval rubric as paperwork. The teams shipping reliable LLM products treat the rubric as the actual product specification. Here&apos;s the difference.</description><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>Prompt versioning that doesn&apos;t suck</title><link>https://mohithg.com/writing/prompt-versioning-practical/</link><guid isPermaLink="true">https://mohithg.com/writing/prompt-versioning-practical/</guid><description>Versioning prompts is harder than versioning code because the artifact is a string and the test suite is fuzzy. Here&apos;s the workflow that ships.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>LLM-as-judge: what actually works in 2026</title><link>https://mohithg.com/writing/llm-as-judge-what-works/</link><guid isPermaLink="true">https://mohithg.com/writing/llm-as-judge-what-works/</guid><description>Using one LLM to grade another LLM&apos;s output is the most over-deployed and under-evaluated eval pattern in production. Here&apos;s when it works, when it fails, and how to use it well.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>When to ship a prompt change</title><link>https://mohithg.com/writing/when-to-ship-a-prompt-change/</link><guid isPermaLink="true">https://mohithg.com/writing/when-to-ship-a-prompt-change/</guid><description>The decision rule that separates teams who ship prompt changes confidently from teams who hover their finger over the button.</description><pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>The minimum viable eval bench (and why most teams skip it)</title><link>https://mohithg.com/writing/minimum-viable-eval-bench/</link><guid isPermaLink="true">https://mohithg.com/writing/minimum-viable-eval-bench/</guid><description>Most LLM teams ship without a real eval bench. The reason isn&apos;t that benches are hard. It&apos;s that the first one feels too small to matter. Here&apos;s the smallest useful one.</description><pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate><category>LLM eval engineering</category></item><item><title>Few-shot design: the prompt technique that&apos;s underused in 2026</title><link>https://mohithg.com/writing/few-shot-design/</link><guid isPermaLink="true">https://mohithg.com/writing/few-shot-design/</guid><description>Few-shot examples are the most reliable way to shape model behavior. Most production prompts use them badly or skip them entirely. Here&apos;s how to use them well.</description><pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>Schema-first prompts: stop asking the model nicely</title><link>https://mohithg.com/writing/schema-first-prompts/</link><guid isPermaLink="true">https://mohithg.com/writing/schema-first-prompts/</guid><description>Constrained generation, structured output APIs, and JSON schema have made prompt engineering more like API design and less like creative writing. Lean in.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>Personas in prompts: useful or theatre?</title><link>https://mohithg.com/writing/personas-useful-or-theatre/</link><guid isPermaLink="true">https://mohithg.com/writing/personas-useful-or-theatre/</guid><description>Almost every system prompt starts with &apos;You are a helpful assistant.&apos; Most personas in prompts are decorative. Here&apos;s when they actually move the needle, and when they&apos;re padding.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>Debugging LLM apps: the trace-everything approach</title><link>https://mohithg.com/writing/trace-everything-debugging/</link><guid isPermaLink="true">https://mohithg.com/writing/trace-everything-debugging/</guid><description>You cannot debug what you cannot replay. The single highest-leverage habit in LLM engineering is making every model call inspectable after the fact.</description><pubDate>Tue, 21 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item><item><title>System, user, developer: which message goes where</title><link>https://mohithg.com/writing/system-vs-user-vs-developer/</link><guid isPermaLink="true">https://mohithg.com/writing/system-vs-user-vs-developer/</guid><description>Modern LLM APIs distinguish between system, user, developer, and assistant roles. The rules for which content goes in which slot aren&apos;t intuitive. Here&apos;s the working model.</description><pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate><category>Prompts as API contracts</category></item></channel></rss>