Agent TARS Complete Setup Guide: Open-Source Multimodal AI Agent for Browser Automation
Published on January 12, 2026
Introduction
Agent TARS is an open-source multimodal AI agent stack developed by ByteDance that brings the power of GUI agents and vision capabilities to your terminal, browser, and applications. Unlike traditional automation tools, Agent TARS uses cutting-edge Vision-Language Models (VLMs) to understand and interact with graphical interfaces the way humans doโby seeing and interpreting visual content.
This comprehensive guide covers everything from basic installation to advanced configurations, including MCP server integration, model provider setup, and browser automation strategies.
What is Agent TARS?
Agent TARS evolved from the UI-TARS-desktop project, shifting from an Electron-based desktop app to a lightweight CLI with Web UI for better portability and efficiency. This allows it to run โanytime, anywhereโ without heavy dependencies like bundled Chromium, reducing installation size and improving iteration speed.
Agent TARS is designed to complete tasks in a human-like manner through:
- Visual Understanding: Uses Vision-Language Models to see and interpret screenshots
- Browser Control: Automates web browsing using visual grounding, DOM manipulation, or hybrid strategies
- Code Execution: Runs shell commands, Jupyter notebooks, and file editing in sandboxed environments
- MCP Integration: Connects to Model Context Protocol (MCP) servers for extensible tool access
- Multi-Interface: Provides both CLI and Web UI for different use cases
- Context Engineering: Manages long-running tasks with dynamic sliding windows and hierarchical memory
- Observability: Uses a Snapshot framework for deterministic replays and automated benchmarking
- Event Streaming: Real-time Agent Event Stream for monitoring agent status, tool calls, and responses
Context Engineering
Agent TARS implements sophisticated context management to prevent context overflow in models with limited token windows (e.g., 128k tokens):
| Memory Level | Name | Purpose |
|---|---|---|
| L0 | Permanent | Core system instructions, always retained |
| L1 | Run | Session-level context, persists across loops |
| L2 | Loop | Current task iteration, may be summarized |
| L3 | Ephemeral | Temporary data, discarded after use |
This hierarchical memory system enables efficient handling of complex, multi-step tasks without exceeding token limits.
Agent TARS vs UI-TARS Desktop
These are two distinct products in the same repository:
| Feature | Agent TARS | UI-TARS Desktop |
|---|---|---|
| Interface | CLI + Web UI | Native Desktop App (Electron) |
| Primary Use | Browser automation, code execution | Local computer GUI control |
| Model Backend | Cloud APIs (OpenAI, Claude, etc.) | Local/Remote VLM models (UI-TARS series) |
| Architecture | Lightweight, no bundled browser | Electron with bundled Chromium |
| Best For | Web tasks, terminal-based automation | Desktop automation, direct computer control |
| Installation | npm/npx | Download installer |
Agent TARS is a general-purpose agent stack focused on multimodal workflows and CLI-driven tasks, while UI-TARS Desktop is specialized for native GUI automation using the UI-TARS model series.
๐ Note: If you need to control your local computer desktop (not just browser), see our separate UI-TARS Desktop Complete Guide.
Part 1: System Requirements
Before installing Agent TARS, ensure your system meets these requirements.
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 4 GB | 8 GB+ |
| CPU | 2 cores | 4+ cores |
| Storage | 2 GB free | 5 GB+ free |
| Network | Stable internet | Low-latency connection |
| Browser | Google Chrome | Google Chrome (latest) |
Software Requirements
| Software | Required Version | Notes |
|---|---|---|
| Node.js | 22.x or higher | Critical: Version 22+ is mandatory |
| npm | Comes with Node.js | Used for package installation |
| Google Chrome | Latest stable | Required for browser automation |
| Git | Latest | Optional, for development |
API Key Requirements
You need an API key from at least one of these providers:
| Provider | Recommended Models | Pricing |
|---|---|---|
| Anthropic | claude-3-7-sonnet-latest, claude-3-5-sonnet-20241022 | Pay-per-use |
| OpenAI | gpt-4o, gpt-4-turbo-vision | Pay-per-use |
| VolcEngine | doubao-1-5-thinking-vision-pro-250428 | Pay-per-use |
| gemini-2.5-pro-preview-03-25 | Free tier available | |
| Azure OpenAI | gpt-4o (deployed) | Enterprise pricing |
| Mistral | Various vision models | Pay-per-use |
Part 2: Node.js Installation
Agent TARS requires Node.js version 22 or higher. Hereโs how to install it on each platform.
Node.js on Linux (Ubuntu/Debian)
Option A: Using nvm (Recommended)
nvm (Node Version Manager) allows easy installation and switching between Node.js versions.
# Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
# Reload shell configuration
source ~/.bashrc
# Or for zsh users:
# source ~/.zshrc
# Verify nvm installation
nvm --version What this does: Downloads and runs the nvm installation script, which adds nvm to your shell configuration.
# Install Node.js 22 (LTS)
nvm install 22
# Set Node.js 22 as default
nvm alias default 22
# Verify installation
node --version
# Should output: v22.x.x
npm --version
# Should output: 10.x.x or higher What this does: Installs Node.js version 22 and sets it as your default Node.js version.
Option B: Using NodeSource Repository
# Add NodeSource repository for Node.js 22
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
# Install Node.js
sudo apt-get install -y nodejs
# Verify installation
node --version
npm --version What this does: Adds the official NodeSource repository and installs the latest Node.js 22.x version.
Node.js on macOS
Option A: Using nvm (Recommended)
# Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
# Reload terminal or run:
source ~/.zshrc
# Install and use Node.js 22
nvm install 22
nvm use 22
nvm alias default 22
# Verify
node --version Option B: Using Homebrew
# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install Node.js 22
brew install node@22
# Add to PATH (if not automatically done)
echo 'export PATH="/opt/homebrew/opt/node@22/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
# Verify
node --version What this does: Uses Homebrew package manager to install Node.js 22 and adds it to your system PATH.
Node.js on Windows
Option A: Using nvm-windows (Recommended)
- Download
nvm-windowsfrom GitHub releases - Run the installer (
nvm-setup.exe) - Accept defaults and complete installation
Open PowerShell (or Command Prompt) as Administrator:
# List available Node.js versions
nvm list available
# Install Node.js 22 (latest LTS)
nvm install 22
# Use Node.js 22
nvm use 22
# Verify installation
node --version
npm --version What this does: Installs nvm-windows which allows managing multiple Node.js versions on Windows.
Option B: Direct Installation
- Visit nodejs.org
- Download the Node.js 22.x LTS installer for Windows
- Run the installer:
- Accept license agreement
- Keep default installation directory
- Ensure โnpm package managerโ is checked
- Enable โAdd to PATHโ option
- Restart your terminal/PowerShell
# Verify installation
node --version
npm --version Part 3: Installing Google Chrome
Agent TARS uses Google Chrome for browser automation. Ensure you have it installed:
Chrome on Linux (Ubuntu/Debian)
# Download and install Chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb
# Verify installation
google-chrome --version Chrome on macOS
Download from google.com/chrome or use Homebrew:
brew install --cask google-chrome Chrome on Windows
Download from google.com/chrome and run the installer.
Part 4: Installing Agent TARS CLI
There are three main ways to run Agent TARS CLI:
Method 1: Using npx (Quickest)
This method runs Agent TARS without installing it globally:
# Run Agent TARS directly with npx
npx @agent-tars/cli@latest What this does: Downloads and runs the latest version of Agent TARS CLI without installing it permanently. The package is cached locally for faster subsequent runs.
๐ก Tip: Use this method for quick testing or one-time use. For regular use, global installation is recommended.
Method 2: Global npm Installation (Recommended)
Install Agent TARS globally for easy access:
Linux/macOS:
# Install globally (latest stable)
npm install -g @agent-tars/cli@latest
# Or install beta version for newest features
npm install -g @agent-tars/cli@beta
# Verify installation
agent-tars --version
# Run Agent TARS
agent-tars Windows (PowerShell as Administrator):
# Install globally (latest stable)
npm install -g @agent-tars/cli@latest
# Or install beta version for newest features
npm install -g @agent-tars/cli@beta
# Verify installation
agent-tars --version
# Run Agent TARS
agent-tars What this does: Installs Agent TARS CLI globally, making the agent-tars command available from any directory.
Updating Agent TARS
To update to the latest version:
# Update to latest stable
npm update -g @agent-tars/cli
# Or reinstall latest
npm install -g @agent-tars/cli@latest Method 3: Development Installation (From Source)
For contributors or those who want the latest development version:
# Clone the repository
git clone https://github.com/bytedance/UI-TARS-desktop.git
cd UI-TARS-desktop
# Install pnpm package manager
npm install -g pnpm
# Install dependencies
pnpm install
# Navigate to CLI package
cd multimodal/agent-tars/cli
# Run in development mode
pnpm dev What this does: Clones the entire UI-TARS-desktop monorepo and runs Agent TARS CLI from source.
Method 4: Docker Installation (Custom Setup)
Official Docker support isnโt provided, but you can containerize Agent TARS for portable deployments.
Create a Dockerfile
FROM node:22-alpine
# Set working directory
WORKDIR /app
# Install Agent TARS CLI globally
RUN npm install -g @agent-tars/cli@latest
# Set default environment variables (override at runtime)
ENV AGENT_PROVIDER=volcengine
ENV AGENT_MODEL=doubao-1-5-thinking-vision-pro-250428
# Expose Web UI port
EXPOSE 8888
# Default command to run Agent TARS
CMD ["agent-tars", "--provider", "${AGENT_PROVIDER}", "--model", "${AGENT_MODEL}", "--apiKey", "${AGENT_API_KEY}"] Dockerfile explained:
FROM node:22-alpine: Lightweight Node.js 22 base imageWORKDIR /app: Sets working directory inside containerRUN npm install -g: Installs Agent TARS CLIEXPOSE 8888: Exposes the default Web UI portCMD: Default command to run Agent TARS
Build and Run
# Build the Docker image
docker build -t agent-tars .
# Run with API key as environment variable
docker run -p 8888:8888
-e AGENT_PROVIDER=anthropic
-e AGENT_MODEL=claude-3-7-sonnet-latest
-e AGENT_API_KEY=your-api-key
agent-tars
# Access Web UI at http://localhost:8888 Platform notes:
- Linux: Works directly
- macOS/Windows: Ensure Docker Desktop is installed and allocate sufficient RAM (4GB+ recommended)
Part 5: Running Agent TARS
Basic Usage
Run Agent TARS with your preferred model provider:
Using Anthropic Claude:
agent-tars --provider anthropic
--model claude-3-7-sonnet-latest
--apiKey YOUR_ANTHROPIC_API_KEY Using OpenAI:
agent-tars --provider openai
--model gpt-4o
--apiKey YOUR_OPENAI_API_KEY Using VolcEngine (ByteDance):
agent-tars --provider volcengine
--model doubao-1-5-thinking-vision-pro-250428
--apiKey YOUR_VOLCENGINE_API_KEY Using Google Gemini:
agent-tars --provider google
--model gemini-2.5-pro-preview-03-25
--apiKey YOUR_GOOGLE_API_KEY What Happens When You Run Agent TARS
- Web UI Launches: A browser window opens with the Agent TARS interface
- Agent Ready: The agent waits for your natural language instructions
- Execution: When you give a task, the agent:
- Takes screenshots of the browser
- Analyzes the visual content using the VLM
- Plans and executes actions (click, type, navigate)
- Provides real-time feedback
Example Tasks
# Book a flight
"Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline"
# Research task
"Search for the top 5 programming languages in 2025 and create a summary"
# Web automation
"Go to GitHub, find the ByteDance/UI-TARS-desktop repository, and tell me how many stars it has" Part 6: Configuration Options
Command Line Arguments
| Argument | Description | Default |
|---|---|---|
--provider | Model provider (anthropic, openai, volcengine, google, azure, mistral) | Required |
--model | Model name | Required |
--apiKey | API key for the provider | Required |
--baseURL | Custom API endpoint | Provider default |
--headless | Run browser in headless mode | false |
--port | Port for Web UI server | 8888 |
--config | Path to configuration file | None |
Using Environment Variables
Instead of passing arguments every time, set environment variables:
Linux/macOS:
# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_API_KEY="your-api-key"
export OPENAI_API_KEY="your-api-key"
export VOLCENGINE_API_KEY="your-api-key"
# Reload configuration
source ~/.bashrc Windows:
# Set permanently
[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "your-api-key", "User")
[Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "your-api-key", "User")
# Or temporarily for current session
$env:ANTHROPIC_API_KEY = "your-api-key" Then run without specifying the API key:
agent-tars --provider anthropic --model claude-3-7-sonnet-latest Workspace Initialization
For persistent configurations, initialize a workspace:
# Initialize a new workspace with config files
agent-tars workspace --init What this does: Creates a directory with configuration files (agent-tars.config.ts) and prompts you for initial setup options like provider and API key.
Configuration File
Create a configuration file for complex setups:
agent-tars.config.ts (in your project directory):
import { defineConfig } from '@agent-tars/cli';
export default defineConfig({
provider: 'anthropic',
model: {
id: 'claude-3-7-sonnet-latest',
// apiKey is read from environment variable
},
browser: {
headless: false,
viewport: {
width: 1920,
height: 1080
}
},
server: {
port: 8888, // Default Web UI port
storage: {
type: 'sqlite',
uri: './agent-sessions.db'
}
},
mcp: [
// MCP server configurations
]
}); Run with configuration file:
agent-tars --config ./agent-tars.config.ts Or if youโve initialized a workspace, simply run:
agent-tars
# Uses workspace config automatically Part 7: Web UI Overview
When Agent TARS launches, it opens a Web UI with these main components:
Interface Elements
| Component | Description |
|---|---|
| Chat Input | Enter natural language instructions |
| Browser View | Live preview of what the agent sees |
| Event Stream | Real-time log of agent actions |
| Tool Calls | Display of MCP tools being used |
| Settings | Configure model, providers, and options |
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
Enter | Send message |
Shift + Enter | New line in message |
Ctrl/Cmd + K | Clear conversation |
Escape | Stop current action |
Part 8: Browser Control Strategies
Agent TARS uses three strategies for browser control:
1. Visual Grounding (GUI Agent)
Uses Vision-Language Models to:
- Identify clickable elements by their visual appearance
- Understand the layout and context of the page
- Make decisions based on what it โseesโ
Best for: Complex UIs, dynamically generated content, visual recognition tasks
2. DOM Manipulation
Uses traditional web automation to:
- Query elements by CSS selectors
- Extract text content and attributes
- Interact with JavaScript-heavy applications
Best for: Speed, reliability, well-structured websites
3. Hybrid Mode (Default)
Combines both approaches:
- Uses visual grounding for understanding and planning
- Falls back to DOM manipulation for reliable execution
- Adapts strategy based on page complexity
Configuration:
# Visual grounding only
agent-tars --browser-strategy visual
# DOM only
agent-tars --browser-strategy dom
# Hybrid (default)
agent-tars --browser-strategy hybrid Part 9: MCP Server Integration
Model Context Protocol (MCP) allows Agent TARS to connect to external tools and services.
What is MCP?
MCP is an open protocol for connecting AI systems to external resources:
- Tools: Functions the AI can call (search, file operations, etc.)
- Resources: Data sources the AI can access
- Prompts: Pre-defined templates for common tasks
Built-in MCP Servers
Agent TARS includes several MCP servers:
| Server | Purpose | Tools |
|---|---|---|
mcp-server-browser | Browser automation | navigate, click, screenshot, extract_content |
mcp-server-filesystem | File operations | read_file, write_file, list_directory |
mcp-server-commands | Shell commands | execute_command |
mcp-server-search | Web search | search |
Configuring MCP Servers
In your configuration file:
export default defineConfig({
// ... other config
mcp: [
{
name: 'filesystem',
transport: 'stdio',
command: 'npx',
args: ['@agent-infra/mcp-server-filesystem', '--root', '/allowed/directory']
},
{
name: 'search',
transport: 'sse',
url: 'http://localhost:8080/mcp'
}
]
}); Adding Custom MCP Servers
You can add any MCP-compatible server:
mcp: [
{
name: 'custom-tool',
transport: 'stdio',
command: 'node',
args: ['./my-custom-mcp-server.js'],
// Optional: filter which tools to expose
include: ['tool1', 'tool2'],
exclude: ['sensitive-tool']
}
] Part 10: Code Execution (Sandbox)
Agent TARS can execute code in isolated sandboxed environments using the @agent-infra/sandbox package.
Sandbox Capabilities
| Feature | Description |
|---|---|
| Bash Commands | Execute shell commands securely |
| Jupyter Notebooks | Run Python code interactively |
| File Editing | Create and modify files |
| Environment Isolation | Each session is isolated |
Enabling Sandbox
export default defineConfig({
// ... other config
sandbox: {
enabled: true,
url: 'http://localhost:8081', // AIO Sandbox URL
// Or use Docker-based sandbox
docker: {
image: 'agent-infra/sandbox:latest',
autoStart: true
}
}
}); Starting AIO Sandbox
The AIO (All-in-One) Sandbox provides an isolated execution environment:
# Using Docker
docker run -d -p 8081:8081 ghcr.io/agent-infra/sandbox:latest
# Or using npm
npx @agent-infra/sandbox Part 11: Running as a Server
Agent TARS can run as a persistent server for multi-session management.
Server Mode
agent-tars server --port 3000 API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/v1/sessions | POST | Create new session |
/api/v1/sessions/:id | GET | Get session status |
/api/v1/sessions/:id/messages | POST | Send message to session |
/api/v1/sessions/:id/events | GET (SSE) | Stream session events |
/api/v1/sessions/:id | DELETE | End session |
Example API Usage
// Create a new session
const response = await fetch('http://localhost:3000/api/v1/sessions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: {
provider: 'anthropic',
id: 'claude-3-7-sonnet-latest'
}
})
});
const { sessionId } = await response.json();
// Send a message
await fetch(`http://localhost:3000/api/v1/sessions/${sessionId}/messages`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
content: 'Search for the weather in New York'
})
}); Part 12: Headless Mode
Run Agent TARS without a visible browser for automation scripts:
agent-tars --provider anthropic
--model claude-3-7-sonnet-latest
--headless Use Cases for Headless Mode
- CI/CD Pipelines: Automated testing
- Scheduled Tasks: Cron jobs for web scraping
- Server Environments: Running on headless servers
- Batch Processing: Processing multiple tasks sequentially
Part 13: Storage and Persistence
Agent TARS supports multiple storage backends for session persistence.
SQLite (Default)
export default defineConfig({
server: {
storage: {
type: 'sqlite',
uri: './sessions.db'
}
}
}); MongoDB
export default defineConfig({
server: {
storage: {
type: 'mongodb',
uri: 'mongodb://localhost:27017/agent-tars'
}
}
}); File-based (JSON)
export default defineConfig({
server: {
storage: {
type: 'file',
path: './sessions-data'
}
}
}); Part 14: Troubleshooting
Common Issues and Solutions
Issue: โNode.js version too oldโ
Error: Agent TARS requires Node.js >= 22 Solution: Update Node.js to version 22 or higher (see Part 2).
Issue: โChrome not foundโ
Error: Could not find Chrome installation Solution: Install Google Chrome or specify the path:
agent-tars --browser-executable-path /path/to/chrome Issue: โAPI key invalidโ
Error: Invalid API key for provider Solution:
- Verify your API key is correct
- Check that the key has necessary permissions
- Ensure the key is not expired
Issue: โWebSocket connection failedโ
Error: Failed to connect to browser websocket Solution:
- Ensure Chrome isnโt running with conflicting flags
- Close any existing Chrome processes
- Try running with
--no-sandboxflag (Linux):
agent-tars --browser-args="--no-sandbox" Issue: Permission denied (Linux)
Error: EACCES: permission denied Solution: Fix npm global installation permissions:
# Create npm global directory in home
mkdir ~/.npm-global
npm config set prefix '~/.npm-global'
# Add to PATH
echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.bashrc
source ~/.bashrc Part 15: Best Practices
1. Security Considerations
- Never share API keys in public repositories
- Use environment variables for sensitive data
- Limit sandbox permissions to required directories
- Monitor usage to avoid unexpected API costs
2. Performance Optimization
- Use headless mode for automated tasks
- Enable DOM-only mode for simple, structured websites
- Configure appropriate timeouts for slow networks
- Use session persistence for long-running tasks
3. Cost Management
- Choose smaller models for simple tasks
- Use Claude Haiku or GPT-4o-mini for cost-sensitive operations
- Implement rate limiting in production
- Monitor token usage regularly
Part 16: Advanced Use Cases
Automated Testing
# Run test suite with Agent TARS
agent-tars --headless
--task "Navigate to example.com, click login, verify dashboard loads"
--output-format json
--output-file test-results.json Web Scraping
# Extract data from websites
agent-tars --task "Go to news.ycombinator.com, extract top 10 headlines with links"
--output-format csv Form Automation
# Fill complex forms
agent-tars --task "Fill out the job application form with: Name=John Doe, Email=john@example.com, Resume=upload resume.pdf" Part 17: CLI Reference
Full Command Reference
agent-tars [options]
Options:
-V, --version Output version number
-p, --provider <provider> Model provider
-m, --model <model> Model name
-k, --apiKey <key> API key
-b, --baseURL <url> Custom API base URL
--headless Run browser in headless mode
--port <port> Web UI port
--config <path> Configuration file path
--browser-strategy <strategy> Browser control strategy
--browser-executable-path <path> Chrome executable path
--browser-args <args> Additional browser arguments
--sandbox-url <url> Sandbox service URL
--storage-type <type> Storage backend type
--storage-uri <uri> Storage connection URI
-h, --help Display help Conclusion
Agent TARS represents a significant advancement in AI-powered browser automation. By combining Vision-Language Models with traditional web automation techniques, it offers a powerful and flexible solution for complex web tasks.
Key Takeaways
- Easy Installation: Single npm command to get started
- Flexible Configuration: Multiple model providers and configuration options
- Extensible: MCP integration allows connecting to any external tool
- Production Ready: Server mode with session persistence
Next Steps
- Join the Discord community for support
- Check out example use cases
- Read the official documentation
Related Guides
- UI-TARS Desktop Complete Guide - For local computer desktop automation
- MCP Server Development Guide - Build custom tools
Appendix A: Comparison of Model Providers
| Provider | Best Model for Agent TARS | Strengths | Limitations |
|---|---|---|---|
| Anthropic | claude-3-7-sonnet-latest | Best vision understanding, reliable | Usage limits on free tier |
| OpenAI | gpt-4o | Fast, good general performance | Higher API costs |
| gemini-2.5-pro-preview | Long context, free tier | Beta features may change | |
| VolcEngine | doubao-1-5-thinking-vision-pro | Optimized for Agent TARS | Requires Chinese API access |
| Azure OpenAI | gpt-4o (deployed) | Enterprise compliance | Complex setup |
Appendix B: Changelog Highlights
- v0.3.0 (Nov 2025): Streaming support, Event Stream Viewer, AIO Sandbox integration
- Beta Release (Jun 2025): Initial public release with CLI and Web UI
- Ongoing: Active development with regular updates
Last updated: January 2026
Comments
Sign in to join the discussion!
Your comments help others in the community.