How DoorDash actually tests LLMs in production
6 min read
ai
DoorDash engineered a dedicated testing flywheel to systematically evaluate large language models before deploying them to production. By building an automated evaluation pipeline, they transitioned from vibe checks to deterministic quality metrics for generative AI features....