Feature
Focus your benchmarks with scoped evaluation campaigns.
Campaigns let you group LLM evaluations by translation, book set, or time period. Run focused, repeatable benchmarks that answer the questions that matter most — and come back to them whenever the model landscape changes.
Designed to capture new models as they are released, and to be used for long-term tracking of model performance. This includes discrete improvement or degradation in a LLM model's ability to reproduce the King James Version of the Bible.
Built for structured, repeatable evaluation
Every campaign is a reusable evaluation contract — define the scope once and re-run it as often as needed.
Evaluation Scoping
Define exactly which bible translations, books, and verse ranges a campaign should evaluate. Narrow the focus to a single testament or compare across the entire canon.
Time-Bounded Analysis
Campaigns carry a creation timestamp and run history, making it trivial to compare how a model's performance on the same verses changes over time.
Model Grouping
Assign one or more models to a campaign and receive side-by-side fidelity results — no need to manually correlate data across separate runs.
Persistent Campaign State
Campaigns persist in the platform indefinitely. Revisit historical campaigns to audit past evaluations or compare against a new model version.
Translation-Aware Comparison
Pin a campaign to a specific Bible translation (currently only KJV) and ensure every model is evaluated against the same canonical source.
Campaigns at a glance
Organize your benchmarks with campaigns designed for clarity and repeatability.
Start building structured evaluation campaigns.
Join the waitlist for early access and bring focused, repeatable scripture benchmarking to your research workflow.