The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems - Scale Labs | Scale Labs