← All posts
Posts by George Pu
Research17. 02 2026
Introducing Long Horizon Augmented Workflows: Controllable Underspecification for Long-Horizon Tasks
LHAW is a dataset-agnostic pipeline for generating underspecified long-horizon tasks and evaluating strategic clarification. Across MCP-Atlas, TAC, and SWE-Bench Pro, we find large differences in how frontier models detect missing information and recover performance under ambiguity.
George Pu, Mike Lee, Sam Denton
Research17. 11 2025
Scaling Enterprise Agent Performance with Reinforcement Learning via Verifiable Feedback Loops
We demonstrate that reinforcement learning can be used to fine-tune agents within realistic enterprise environments, leveraging task-specific feedback and structured rewards to substantially improve performance metrics compared to baseline models.
Jerry Chan, Vijay Kalmath, George Pu, Sam Denton