Back to Glossary
Agents
Browser Agents
Definition
Browser agents are AI systems that can autonomously navigate websites, fill forms, click buttons, and extract information from web pages to complete tasks on behalf of users.
Why It Matters
Many tasks require interacting with websites: booking appointments, filling forms, researching products, monitoring prices. Browser agents automate these interactions intelligently, adapting to dynamic websites in ways traditional automation (Selenium, Puppeteer) cannot. They can understand intent and handle unexpected situations.
How They Work
Browser agents combine:
- Vision: Screenshot understanding to “see” the page
- DOM Access: Reading page structure and content
- Action Space: Click, type, scroll, navigate
- Reasoning: LLM deciding what action to take
- Memory: Tracking progress through multi-step tasks
Key Tools & Frameworks
- Browser Use: Open-source browser automation
- Playwright + LLM: Traditional automation with AI reasoning
- Anthropic Computer Use: Claude’s browser control
- OpenAI CUA: Computer-Using Agent capability
- Claude Chrome Extension: Consumer browser integration
Use Cases
Research automation, form filling, price monitoring, data extraction, testing/QA, and personal assistants that interact with web services.