Feature
Full visibility into every evaluation run.
The Bible Bench Runs interface gives you real-time monitoring, structured trace inspection, and one-click retry management for every LLM evaluation. When something goes wrong, you'll know exactly where, why, and how to fix it.
Each Campaign, Bible, Book, Chapter is atomic at the LLM model level. This permits granular tracking to re-run or inspect any failure modes.
Execution monitoring built for reliability
Runs are designed to be observable, recoverable, and repeatable — so partial failures never leave your benchmarks in an inconsistent state. Bible Bench executes ETL runs a chapter at a time. All verses are collected and processed in a single API fetch call.
Run Execution Monitoring
Track every evaluation run from submission to completion. See which chapters have been processed, which are queued, and which encountered errors — in real time.
Retry Management
Automatic retry and fallback logic works for both verse and chapter evaluations. Retry logic is idempotent — re-running a verse never creates duplicates or corrupts existing results.
Trace Inspection
Drill into the execution trace for any verse evaluation: see the raw prompt sent, the model's response, processing steps applied, and the computed fidelity score.
Throughput Metrics
Per-run statistics track verses attempted per minute, success rate, error rate, and estimated time to completion — so you can plan resources accordingly.
Error Surface and Diagnosis
Errors are categorized by type — timeout, API error, empty response, parse failure — and surfaced with the full request context needed to diagnose and resolve them.
Monitor, trace, and recover
Full execution observability from run-level summary down to individual verse traces.
Run your first benchmark with confidence.
Join the waitlist for early access and get the execution observability your LLM evaluation workflow deserves.