Report

Find every model's weakest verses.

The Worst by Model Report identifies the verses where each registered LLM performed worst. Cross-model and per-model views reveal whether poor performance clusters around specific books, verse styles, or linguistic patterns — giving you actionable data for targeted model improvement.

What the Worst by Model Report surfaces

Per-model and cross-model analysis of the weakest verse performances — revealing patterns, clusters, and universal failure points across all evaluated LLMs.

Weakest Percentile Threshold

The editDistance score at the 10th percentile for each model — defining the boundary below which verse performance is considered critically poor.

Poor Performance Clusters

Display of the worst 10 chapters for each model. These expose the verses that are the most corrupted as defined by Fidelity score or Perfect Match rate.

Cross-Model Overlap

Percentage of worst-performing verses shared by two or more models, surfacing verses that represent universal challenges for LLM scripture reproduction.

Worst Single Verse

The highest editDistance score observed across all models and verses, with the full canonical text and model response for direct comparison.

Worst by model in action

Per-model rankings, cross-model heatmaps, and detailed verse drill-downs — every view you need to understand where each model falls short.

See a sample worst-by-model report

Download a sample PDF to see how worst-verse rankings and cross-model comparisons are presented for offline analysis.

Discover where each model struggles.

Join the waitlist for early access and run per-model worst-verse analysis across your entire scripture fidelity dataset.