Feature

Drill down to every verse with surgical precision.

The Bible Bench Explorer provides a hierarchical drill-down interface that takes you from a whole-Bible fidelity summary all the way to the individual character differences in a single verse — with diff visualization and normalization previews at every step.

Forensic visibility into model outputs

The Explorer is built for researchers who need to understand not just whether a model scored well, but exactly why it did or didn't.

Hierarchical Drill-Down

Navigate the full scripture hierarchy — Bible → Book → Chapter → Verse — with fidelity scores aggregated at every level so you can zero in on problem areas fast.

Verse-Level Comparison

Side-by-side view of the canonical verse text and the model's response, with character-level diff highlighting that shows every substitution, omission, and addition.

Fidelity Score Visualization

Each verse, chapter, and book carries a computed fidelity score (0–100). Heatmaps and sorted lists surface the lowest-scoring verses across any model or campaign.

Raw vs. Processed Response

Toggle between the raw LLM response and the processed (normalized) form that was actually used for scoring — so you can audit exactly how normalization affects results.

Client-Side Normalization Preview

Experiment with text normalization transforms in real time, previewing how different processing strategies would affect fidelity scores without persisting any changes.

Explorer in action

Navigate the hierarchy and inspect every verse with rich diff visualization.

Explore scripture fidelity at the verse level.

Get early access to the Explorer and start understanding exactly how your models handle scripture reproduction.