Report
Find every model's weakest verses.
The Worst by Model Report identifies the verses where each registered LLM performed worst. Cross-model and per-model views reveal whether poor performance clusters around specific books, verse styles, or linguistic patterns — giving you actionable data for targeted model improvement.
What the Worst by Model Report surfaces
Per-model and cross-model analysis of the weakest verse performances — revealing patterns, clusters, and universal failure points across all evaluated LLMs.
Weakest Percentile Threshold
The editDistance score at the 10th percentile for each model — defining the boundary below which verse performance is considered critically poor.
Poor Performance Clusters
Display of the worst 10 chapters for each model. These expose the verses that are the most corrupted as defined by Fidelity score or Perfect Match rate.
Cross-Model Overlap
Percentage of worst-performing verses shared by two or more models, surfacing verses that represent universal challenges for LLM scripture reproduction.
Worst Single Verse
The highest editDistance score observed across all models and verses, with the full canonical text and model response for direct comparison.
Worst by model in action
Per-model rankings, cross-model heatmaps, and detailed verse drill-downs — every view you need to understand where each model falls short.
See a sample worst-by-model report
Download a sample PDF to see how worst-verse rankings and cross-model comparisons are presented for offline analysis.
Discover where each model struggles.
Join the waitlist for early access and run per-model worst-verse analysis across your entire scripture fidelity dataset.