Research to Advance AI
Scale Labs advances AI through research. Our research focuses on agents, post-training, reasoning, safety, evaluation, and alignment, and the science of data.
[LEADERBOARDS]
Benchmarks for frontier, agentic, and safety capabilities
[SHOWDOWN]
Model-preference rankings from real-world usage.
[PAPERS]
Research papers and publications covering agents, post-training, reasoning, safety, evaluation, and alignment, and the science of data.






Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR
[BLOG]
Insights, analysis, and updates from Scale Labs
HiL-Dynamics: Understanding Agents That Don’t Know What They Don’t Know
HiL-Dynamics is our new diagnostic tool for studying how coding agents handle underspecified tasks. Across four modern harnesses, the verdict is the same: agents have learned to ask well, but not when to ask.
The Path to Large Scale Dense Video Captioning
We ran dozens of experiments on dense captioning for robot manipulation video. The biggest lever turned out to be how we represented the video to the model. Most techniques from the literature added noise on smaller models.
57 Healthcare Professionals Told Us What They Need from AI
We surveyed 57 healthcare professionals about what they actually want from AI. Their answers point to three capability gaps that current evaluations miss.
Coverage Not Averages: Rethinking Retrieval Evaluation
A single benchmark score suggests stability and completeness. In reality, it may reflect performance on a narrow and biased slice of the problem.
View allAll posts