Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History

Corresponding author
Yonsei University
Persona2Web overview

Large language models have advanced web agents, yet current agents lack personalization capabilities. In real-world scenarios, users rarely specify every detail of their intent, assuming that systems understand their implicit context. To be genuinely practical, web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. Persona2Web is the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle.

Why Persona2Web?

Personalization on the Real Open Web

Persona2Web is specifically designed for evaluating personalization capability of web agents on the real open web.

Implicit User History

Persona2Web provides rigorously constructed user histories that reveal user preferences in implicit and indirect ways over long time spans, rather than providing them explicitly at once.

Clarify-to-Personalize Queries

Persona2Web deliberately embeds ambiguity, requiring agents to infer implicit contexts from user history without explicit cues.

Reasoning-aware Evaluation

Beyond simple task completion, Persona2Web examines full trajectories and reasoning traces through structured rubrics to distinguish personalization failures from navigation failures.

Constructing Persona2Web

Persona2Web construction pipeline

Persona2Web is constructed through a multi-stage generation pipeline that starts from 50 distinct user profiles with demographic information and domain-specific preferences. Based on these profiles, event seeds are generated to define recurring activity patterns, then decomposed into web actions and dispersed across different temporal points over a year. The resulting user history is structured with timestamp, type, object, and website, so that preferences are revealed implicitly through behavioral patterns rather than explicit statements. Finally, query sets are created by first generating clear queries with all explicit cues, then masking website and preference constraints to produce ambiguous queries under the clarify-to-personalize principle.