Research to Advance AI
Scale Labs advances AI through research. Our research focuses on agents, post-training, reasoning, safety, evaluation, and alignment, and the science of data.
[LEADERBOARDS]
Benchmarks for frontier, agentic, and safety capabilities
[SHOWDOWN]
Model-preference rankings from real-world usage.
[PAPERS]
Research papers and publications covering agents, post-training, reasoning, safety, evaluation, and alignment, and the science of data.






Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents
[BLOG]
Insights, analysis, and updates from Scale Labs
Insights Generator: Automated Failure Mode Analysis for Agents
Insights Generator (IG) analyzes thousands of agent execution traces at once and surfaces the behavioral patterns behind agent failures, with grounded evidence and prevalence estimates for each finding.
Can Coding Agents Tackle Early-Stage Drug Discovery?
Across 66 expert-curated drug-discovery tasks, three frontier coding agents each show distinct strengths but share one weakness: the long, multi-step pipelines that demand high-level planning rather than scientific knowledge.
HiL-Dynamics: Understanding Agents That Don’t Know What They Don’t Know
HiL-Dynamics is our new diagnostic tool for studying how coding agents handle underspecified tasks. Across four modern harnesses, the verdict is the same: agents have learned to ask well, but not when to ask.
The Path to Large Scale Dense Video Captioning
We ran dozens of experiments on dense captioning for robot manipulation video. The biggest lever turned out to be how we represented the video to the model. Most techniques from the literature added noise on smaller models.
View allAll posts