Feature

Register any LLM and benchmark it against scripture.

Bible Bench supports any large language model from any provider, as long as it supports structured output. Register a model once, configure its evaluation parameters, and have its scripture fidelity scores sitting on the leaderboard in minutes — no matter who built it or how it's hosted.

This includes local models that run on your own hardware, as long as it will fit in GPU (or unified) memory. For example, LLM Studio can download and run any open model weight, dramatically reducing the token cost for ETL processing.

Provider-agnostic model management

Every model in Bible Bench is treated the same: same prompts, same scoring logic, same canonical comparison. Fair, consistent, and fully auditable.

LLM Registration

Register any LLM foundation model from any provider as long as it is addressible from an API endpoint. Bible bench needs the model name, provider tag, and model identifier. No API lock-in required.

Multi-Provider Support

Bible Bench is provider-agnostic. Connect models from OpenAI, Anthropic, Google, Meta, or any custom endpoint. Evaluation logic remains consistent across all providers.

Version Tracking

Register multiple versions of the same model family (e.g., gpt-5-mini vs. gpt-5.4) and compare how architecture improvements affect scripture fidelity over time.

Admin-Controlled Registry

Only platform administrators can register or remove models, ensuring the leaderboard reflects intentional, auditable evaluations rather than ad-hoc submissions. Especially because API costs can be significant.

Per-Model Configuration

Set temperature, system prompt, and sampling parameters per model. Standardized prompting ensures apples-to-apples comparisons across very different model architectures.

Model registry and configuration

A clean, auditable registry of every LLM evaluated on the platform.

Add your models to the benchmark.

Join the waitlist to register your LLMs and see how they measure up against the canonical scripture standard.