Authors: Hiroki Furuta, Yutaka Matsuo, Aleksandra Faust, Izzeddin Gur
Published on: November 30, 2023
Impact Score: 8.15
Arxiv code: Arxiv:2311.18751
Summary
- What is new: Introduction of a new benchmark, CompWoB, for assessing language model agents (LMAs) on compositional web automation tasks.
- Why this is important: Existing LMAs exhibit diminished performance on real-world applications involving combinations of tasks.
- What the research proposes: A trained model, HTML-T5++, that demonstrates improved performance on both base and compositional tasks, outperforming humans on a subset (MiniWoB).
- Results: HTML-T5++ achieves 61.5% success rate on CompWoB, showing a significantly reduced generalization gap compared to other models.
Technical Details
Technological frameworks used: CompWoB for compositional task assessment, HTML-T5++ model architecture.
Models used: gpt-3.5-turbo, gpt-4, transferred LMAs
Data used: 50 new compositional web automation tasks
Potential Impact
Web automation, AI development firms, businesses reliant on complex web-based task automation
Want to implement this idea in a business?
We have generated a startup concept here: RoboTasker AI.
Leave a Reply