The Art of Building Verifiers for Computer Use Agents

Corby Rosset, Pratyusha Sharma, Andrew Zhao, Miguel Gonzalez-Fernandez, Ahmed Awadallah
4/5/2026
cs.CRcs.AIcs.MA

Abstract

Verifying the success of computer use agent (CUA) trajectories is a critical challenge: without reliable verification, neither evaluation nor training signal can be trusted. In this paper, we present lessons learned from building a best-in-class verifier for web tasks we call the Universal Verifier. We design the Universal Verifier around four key principles: 1) constructing rubrics with meaningful, non-overlapping criteria to reduce noise; 2) separating process and outcome rewards that yield complementary signals, capturing cases where an agent follows the right steps but gets blocked or succeeds through an unexpected path; 3) distinguishing between controllable and uncontrollable failures scored via a cascading-error-free strategy for finer-grained failure understanding; and 4) a divide-and-conquer context management scheme that attends to all screenshots in a trajectory, improving reliability on longer task horizons. We validate these findings on CUAVerifierBench, a new set of CUA trajectories with both process and outcome human labels, showing that our Universal Verifier agrees with humans as often as humans agree with each other. We report a reduction in false positive rates to near zero compared to baselines like WebVoyager ($\geq$ 45\%) and WebJudge ($\geq$ 22\%). We emphasize that these gains stem from the cumulative effect of the design choices above. We also find that an auto-research agent achieves 70\% of expert quality in 5\% of the time, but fails to discover all strategies required to replicate the Universal Verifier. We open-source our Universal Verifier system along with CUAVerifierBench; available at https://github.com/microsoft/fara.

View on arXivView PDF

Code Implementations(1)

microsoft/faraOfficial100%
MIT

Fara-7B: An Efficient Agentic Model for Computer Use

5,012473Shell, PythonOct 29, 20251 months agoMIT
agentbrowser-usecomputer-usecomputer-use-agentcua

Cite this paper

@article{rosset2026building,
  title  = {The Art of Building Verifiers for Computer Use Agents},
  author = {Corby Rosset and Pratyusha Sharma and Andrew Zhao and Miguel Gonzalez-Fernandez and Ahmed Awadallah},
  year   = {2026},
  eprint = {2604.06240},
  archivePrefix = {arXiv},
  url    = {http://arxiv.org/abs/2604.06240v1}
}

Discussion