AI Benchmarks Standard LLM benchmarks fail at science so OpenAI built LifeSciBench Jun 25, 2026 6 min read paid Subscribe