Safety23. 01 2026
MoReBench: Evaluating the Process of AI Moral Reasoning
MoReBench is a benchmark designed to evaluate the procedural moral reasoning of large language models. Using expert-authored rubrics across diverse ethical scenarios, it scores models on the structure and coherence of their reasoning rather than task outcomes. Our findings show that moral reasoning remains weakly correlated with established benchmarks and warrants targeted evaluation and training.
Brandon Handoko, Matthew Siegel, Mike Lee