Topic
Evals and Benchmarks
Methods for measuring model and system performance, including benchmarks, task-specific evals, red teaming, reliability tests, and quality measurement.
Methods for measuring model and system performance, including benchmarks, task-specific evals, red teaming, reliability tests, and quality measurement.