Agent TARS Complete Setup Guide: Open-Source Multimodal AI Agent for Browser Automation

Published on January 12, 2026

Introduction

Agent TARS is an open-source multimodal AI agent stack developed by ByteDance that brings the power of GUI agents and vision capabilities to your terminal, browser, and applications. Unlike traditional automation tools, Agent TARS uses cutting-edge Vision-Language Models (VLMs) to understand and interact with graphical interfaces the way humans do—by seeing and interpreting visual content.

This comprehensive guide covers everything from basic installation to advanced configurations, including MCP server integration, model provider setup, and browser automation strategies.

What is Agent TARS?

Agent TARS evolved from the UI-TARS-desktop project, shifting from an Electron-based desktop app to a lightweight CLI with Web UI for better portability and efficiency. This allows it to run “anytime, anywhere” without heavy dependencies like bundled Chromium, reducing installation size and improving iteration speed.

Agent TARS is designed to complete tasks in a human-like manner through:

Visual Understanding: Uses Vision-Language Models to see and interpret screenshots
Browser Control: Automates web browsing using visual grounding, DOM manipulation, or hybrid strategies
Code Execution: Runs shell commands, Jupyter notebooks, and file editing in sandboxed environments
MCP Integration: Connects to Model Context Protocol (MCP) servers for extensible tool access
Multi-Interface: Provides both CLI and Web UI for different use cases
Context Engineering: Manages long-running tasks with dynamic sliding windows and hierarchical memory
Observability: Uses a Snapshot framework for deterministic replays and automated benchmarking
Event Streaming: Real-time Agent Event Stream for monitoring agent status, tool calls, and responses

Context Engineering

Agent TARS implements sophisticated context management to prevent context overflow in models with limited token windows (e.g., 128k tokens):

Memory Level	Name	Purpose
L0	Permanent	Core system instructions, always retained
L1	Run	Session-level context, persists across loops
L2	Loop	Current task iteration, may be summarized
L3	Ephemeral	Temporary data, discarded after use

This hierarchical memory system enables efficient handling of complex, multi-step tasks without exceeding token limits.

Agent TARS vs UI-TARS Desktop

These are two distinct products in the same repository:

Feature	Agent TARS	UI-TARS Desktop
Interface	CLI + Web UI	Native Desktop App (Electron)
Primary Use	Browser automation, code execution	Local computer GUI control
Model Backend	Cloud APIs (OpenAI, Claude, etc.)	Local/Remote VLM models (UI-TARS series)
Architecture	Lightweight, no bundled browser	Electron with bundled Chromium
Best For	Web tasks, terminal-based automation	Desktop automation, direct computer control
Installation	npm/npx	Download installer

Agent TARS is a general-purpose agent stack focused on multimodal workflows and CLI-driven tasks, while UI-TARS Desktop is specialized for native GUI automation using the UI-TARS model series.

📝 Note: If you need to control your local computer desktop (not just browser), see our separate UI-TARS Desktop Complete Guide.

Part 1: System Requirements

Before installing Agent TARS, ensure your system meets these requirements.

Hardware Requirements

Component	Minimum	Recommended
RAM	4 GB	8 GB+
CPU	2 cores	4+ cores
Storage	2 GB free	5 GB+ free
Network	Stable internet	Low-latency connection
Browser	Google Chrome	Google Chrome (latest)

Software Requirements

Software	Required Version	Notes
Node.js	22.x or higher	Critical: Version 22+ is mandatory
npm	Comes with Node.js	Used for package installation
Google Chrome	Latest stable	Required for browser automation
Git	Latest	Optional, for development

API Key Requirements

You need an API key from at least one of these providers:

Provider	Recommended Models	Pricing
Anthropic	claude-3-7-sonnet-latest, claude-3-5-sonnet-20241022	Pay-per-use
OpenAI	gpt-4o, gpt-4-turbo-vision	Pay-per-use
VolcEngine	doubao-1-5-thinking-vision-pro-250428	Pay-per-use
Google	gemini-2.5-pro-preview-03-25	Free tier available
Azure OpenAI	gpt-4o (deployed)	Enterprise pricing
Mistral	Various vision models	Pay-per-use

Part 2: Node.js Installation

Agent TARS requires Node.js version 22 or higher. Here’s how to install it on each platform.

Node.js on Linux (Ubuntu/Debian)

Option A: Using nvm (Recommended)

nvm (Node Version Manager) allows easy installation and switching between Node.js versions.

# Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

# Reload shell configuration
source ~/.bashrc
# Or for zsh users:
# source ~/.zshrc

# Verify nvm installation
nvm --version

What this does: Downloads and runs the nvm installation script, which adds nvm to your shell configuration.

# Install Node.js 22 (LTS)
nvm install 22

# Set Node.js 22 as default
nvm alias default 22

# Verify installation
node --version
# Should output: v22.x.x

npm --version
# Should output: 10.x.x or higher

What this does: Installs Node.js version 22 and sets it as your default Node.js version.

Option B: Using NodeSource Repository

# Add NodeSource repository for Node.js 22
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -

# Install Node.js
sudo apt-get install -y nodejs

# Verify installation
node --version
npm --version

What this does: Adds the official NodeSource repository and installs the latest Node.js 22.x version.

Node.js on macOS

Option A: Using nvm (Recommended)

# Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

# Reload terminal or run:
source ~/.zshrc

# Install and use Node.js 22
nvm install 22
nvm use 22
nvm alias default 22

# Verify
node --version

Option B: Using Homebrew

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Node.js 22
brew install node@22

# Add to PATH (if not automatically done)
echo 'export PATH="/opt/homebrew/opt/node@22/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

# Verify
node --version

What this does: Uses Homebrew package manager to install Node.js 22 and adds it to your system PATH.

Node.js on Windows

Option A: Using nvm-windows (Recommended)

Download nvm-windows from GitHub releases
Run the installer (nvm-setup.exe)
Accept defaults and complete installation

Open PowerShell (or Command Prompt) as Administrator:

# List available Node.js versions
nvm list available

# Install Node.js 22 (latest LTS)
nvm install 22

# Use Node.js 22
nvm use 22

# Verify installation
node --version
npm --version

What this does: Installs nvm-windows which allows managing multiple Node.js versions on Windows.

Option B: Direct Installation

Visit nodejs.org
Download the Node.js 22.x LTS installer for Windows
Run the installer:
- Accept license agreement
- Keep default installation directory
- Ensure “npm package manager” is checked
- Enable “Add to PATH” option
Restart your terminal/PowerShell

# Verify installation
node --version
npm --version

Part 3: Installing Google Chrome

Agent TARS uses Google Chrome for browser automation. Ensure you have it installed:

Chrome on Linux (Ubuntu/Debian)

# Download and install Chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

# Verify installation
google-chrome --version

Chrome on macOS

Download from google.com/chrome or use Homebrew:

brew install --cask google-chrome

Chrome on Windows

Download from google.com/chrome and run the installer.

Part 4: Installing Agent TARS CLI

There are three main ways to run Agent TARS CLI:

Method 1: Using npx (Quickest)

This method runs Agent TARS without installing it globally:

# Run Agent TARS directly with npx
npx @agent-tars/cli@latest

What this does: Downloads and runs the latest version of Agent TARS CLI without installing it permanently. The package is cached locally for faster subsequent runs.

💡 Tip: Use this method for quick testing or one-time use. For regular use, global installation is recommended.

Method 2: Global npm Installation (Recommended)

Install Agent TARS globally for easy access:

Linux/macOS:

# Install globally (latest stable)
npm install -g @agent-tars/cli@latest

# Or install beta version for newest features
npm install -g @agent-tars/cli@beta

# Verify installation
agent-tars --version

# Run Agent TARS
agent-tars

Windows (PowerShell as Administrator):

# Install globally (latest stable)
npm install -g @agent-tars/cli@latest

# Or install beta version for newest features
npm install -g @agent-tars/cli@beta

# Verify installation
agent-tars --version

# Run Agent TARS
agent-tars

What this does: Installs Agent TARS CLI globally, making the agent-tars command available from any directory.

Updating Agent TARS

To update to the latest version:

# Update to latest stable
npm update -g @agent-tars/cli

# Or reinstall latest
npm install -g @agent-tars/cli@latest

Method 3: Development Installation (From Source)

For contributors or those who want the latest development version:

# Clone the repository
git clone https://github.com/bytedance/UI-TARS-desktop.git
cd UI-TARS-desktop

# Install pnpm package manager
npm install -g pnpm

# Install dependencies
pnpm install

# Navigate to CLI package
cd multimodal/agent-tars/cli

# Run in development mode
pnpm dev

What this does: Clones the entire UI-TARS-desktop monorepo and runs Agent TARS CLI from source.

Method 4: Docker Installation (Custom Setup)

Official Docker support isn’t provided, but you can containerize Agent TARS for portable deployments.

Create a Dockerfile

FROM node:22-alpine

# Set working directory
WORKDIR /app

# Install Agent TARS CLI globally
RUN npm install -g @agent-tars/cli@latest

# Set default environment variables (override at runtime)
ENV AGENT_PROVIDER=volcengine
ENV AGENT_MODEL=doubao-1-5-thinking-vision-pro-250428

# Expose Web UI port
EXPOSE 8888

# Default command to run Agent TARS
CMD ["agent-tars", "--provider", "${AGENT_PROVIDER}", "--model", "${AGENT_MODEL}", "--apiKey", "${AGENT_API_KEY}"]

Dockerfile explained:

FROM node:22-alpine: Lightweight Node.js 22 base image
WORKDIR /app: Sets working directory inside container
RUN npm install -g: Installs Agent TARS CLI
EXPOSE 8888: Exposes the default Web UI port
CMD: Default command to run Agent TARS

Build and Run

# Build the Docker image
docker build -t agent-tars .

# Run with API key as environment variable
docker run -p 8888:8888 
    -e AGENT_PROVIDER=anthropic 
    -e AGENT_MODEL=claude-3-7-sonnet-latest 
    -e AGENT_API_KEY=your-api-key 
    agent-tars

# Access Web UI at http://localhost:8888

Platform notes:

Linux: Works directly
macOS/Windows: Ensure Docker Desktop is installed and allocate sufficient RAM (4GB+ recommended)

Part 5: Running Agent TARS

Basic Usage

Run Agent TARS with your preferred model provider:

Using Anthropic Claude:

agent-tars --provider anthropic 
           --model claude-3-7-sonnet-latest 
           --apiKey YOUR_ANTHROPIC_API_KEY

Using OpenAI:

agent-tars --provider openai 
           --model gpt-4o 
           --apiKey YOUR_OPENAI_API_KEY

Using VolcEngine (ByteDance):

agent-tars --provider volcengine 
           --model doubao-1-5-thinking-vision-pro-250428 
           --apiKey YOUR_VOLCENGINE_API_KEY

Using Google Gemini:

agent-tars --provider google 
           --model gemini-2.5-pro-preview-03-25 
           --apiKey YOUR_GOOGLE_API_KEY

What Happens When You Run Agent TARS

Web UI Launches: A browser window opens with the Agent TARS interface
Agent Ready: The agent waits for your natural language instructions
Execution: When you give a task, the agent:
- Takes screenshots of the browser
- Analyzes the visual content using the VLM
- Plans and executes actions (click, type, navigate)
- Provides real-time feedback

Example Tasks

# Book a flight
"Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline"

# Research task
"Search for the top 5 programming languages in 2025 and create a summary"

# Web automation
"Go to GitHub, find the ByteDance/UI-TARS-desktop repository, and tell me how many stars it has"

Part 6: Configuration Options

Command Line Arguments

Argument	Description	Default
`--provider`	Model provider (anthropic, openai, volcengine, google, azure, mistral)	Required
`--model`	Model name	Required
`--apiKey`	API key for the provider	Required
`--baseURL`	Custom API endpoint	Provider default
`--headless`	Run browser in headless mode	false
`--port`	Port for Web UI server	8888
`--config`	Path to configuration file	None

Using Environment Variables

Instead of passing arguments every time, set environment variables:

Linux/macOS:

# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_API_KEY="your-api-key"
export OPENAI_API_KEY="your-api-key"
export VOLCENGINE_API_KEY="your-api-key"

# Reload configuration
source ~/.bashrc

Windows:

# Set permanently
[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "your-api-key", "User")
[Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "your-api-key", "User")

# Or temporarily for current session
$env:ANTHROPIC_API_KEY = "your-api-key"

Then run without specifying the API key:

agent-tars --provider anthropic --model claude-3-7-sonnet-latest

Workspace Initialization

For persistent configurations, initialize a workspace:

# Initialize a new workspace with config files
agent-tars workspace --init

What this does: Creates a directory with configuration files (agent-tars.config.ts) and prompts you for initial setup options like provider and API key.

Configuration File

Create a configuration file for complex setups:

agent-tars.config.ts (in your project directory):

import { defineConfig } from '@agent-tars/cli';

export default defineConfig({
  provider: 'anthropic',
  model: {
    id: 'claude-3-7-sonnet-latest',
    // apiKey is read from environment variable
  },
  browser: {
    headless: false,
    viewport: {
      width: 1920,
      height: 1080
    }
  },
  server: {
    port: 8888, // Default Web UI port
    storage: {
      type: 'sqlite',
      uri: './agent-sessions.db'
    }
  },
  mcp: [
    // MCP server configurations
  ]
});

Run with configuration file:

agent-tars --config ./agent-tars.config.ts

Or if you’ve initialized a workspace, simply run:

agent-tars
# Uses workspace config automatically

Part 7: Web UI Overview

When Agent TARS launches, it opens a Web UI with these main components:

Interface Elements

Component	Description
Chat Input	Enter natural language instructions
Browser View	Live preview of what the agent sees
Event Stream	Real-time log of agent actions
Tool Calls	Display of MCP tools being used
Settings	Configure model, providers, and options

Keyboard Shortcuts

Shortcut	Action
`Enter`	Send message
`Shift + Enter`	New line in message
`Ctrl/Cmd + K`	Clear conversation
`Escape`	Stop current action

Part 8: Browser Control Strategies

Agent TARS uses three strategies for browser control:

1. Visual Grounding (GUI Agent)

Uses Vision-Language Models to:

Identify clickable elements by their visual appearance
Understand the layout and context of the page
Make decisions based on what it “sees”

Best for: Complex UIs, dynamically generated content, visual recognition tasks

2. DOM Manipulation

Uses traditional web automation to:

Query elements by CSS selectors
Extract text content and attributes
Interact with JavaScript-heavy applications

Best for: Speed, reliability, well-structured websites

3. Hybrid Mode (Default)

Combines both approaches:

Uses visual grounding for understanding and planning
Falls back to DOM manipulation for reliable execution
Adapts strategy based on page complexity

Configuration:

# Visual grounding only
agent-tars --browser-strategy visual

# DOM only
agent-tars --browser-strategy dom

# Hybrid (default)
agent-tars --browser-strategy hybrid

Part 9: MCP Server Integration

Model Context Protocol (MCP) allows Agent TARS to connect to external tools and services.

What is MCP?

MCP is an open protocol for connecting AI systems to external resources:

Tools: Functions the AI can call (search, file operations, etc.)
Resources: Data sources the AI can access
Prompts: Pre-defined templates for common tasks

Built-in MCP Servers

Agent TARS includes several MCP servers:

Server	Purpose	Tools
`mcp-server-browser`	Browser automation	navigate, click, screenshot, extract_content
`mcp-server-filesystem`	File operations	read_file, write_file, list_directory
`mcp-server-commands`	Shell commands	execute_command
`mcp-server-search`	Web search	search

Configuring MCP Servers

In your configuration file:

export default defineConfig({
  // ... other config
  mcp: [
    {
      name: 'filesystem',
      transport: 'stdio',
      command: 'npx',
      args: ['@agent-infra/mcp-server-filesystem', '--root', '/allowed/directory']
    },
    {
      name: 'search',
      transport: 'sse',
      url: 'http://localhost:8080/mcp'
    }
  ]
});

Adding Custom MCP Servers

You can add any MCP-compatible server:

mcp: [
  {
    name: 'custom-tool',
    transport: 'stdio',
    command: 'node',
    args: ['./my-custom-mcp-server.js'],
    // Optional: filter which tools to expose
    include: ['tool1', 'tool2'],
    exclude: ['sensitive-tool']
  }
]

Part 10: Code Execution (Sandbox)

Agent TARS can execute code in isolated sandboxed environments using the @agent-infra/sandbox package.

Sandbox Capabilities

Feature	Description
Bash Commands	Execute shell commands securely
Jupyter Notebooks	Run Python code interactively
File Editing	Create and modify files
Environment Isolation	Each session is isolated

Enabling Sandbox

export default defineConfig({
  // ... other config
  sandbox: {
    enabled: true,
    url: 'http://localhost:8081', // AIO Sandbox URL
    // Or use Docker-based sandbox
    docker: {
      image: 'agent-infra/sandbox:latest',
      autoStart: true
    }
  }
});

Starting AIO Sandbox

The AIO (All-in-One) Sandbox provides an isolated execution environment:

# Using Docker
docker run -d -p 8081:8081 ghcr.io/agent-infra/sandbox:latest

# Or using npm
npx @agent-infra/sandbox

Part 11: Running as a Server

Agent TARS can run as a persistent server for multi-session management.

Server Mode

agent-tars server --port 3000

API Endpoints

Endpoint	Method	Description
`/api/v1/sessions`	POST	Create new session
`/api/v1/sessions/:id`	GET	Get session status
`/api/v1/sessions/:id/messages`	POST	Send message to session
`/api/v1/sessions/:id/events`	GET (SSE)	Stream session events
`/api/v1/sessions/:id`	DELETE	End session

Example API Usage

// Create a new session
const response = await fetch('http://localhost:3000/api/v1/sessions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: {
      provider: 'anthropic',
      id: 'claude-3-7-sonnet-latest'
    }
  })
});

const { sessionId } = await response.json();

// Send a message
await fetch(`http://localhost:3000/api/v1/sessions/${sessionId}/messages`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    content: 'Search for the weather in New York'
  })
});

Part 12: Headless Mode

Run Agent TARS without a visible browser for automation scripts:

agent-tars --provider anthropic 
           --model claude-3-7-sonnet-latest 
           --headless

Use Cases for Headless Mode

CI/CD Pipelines: Automated testing
Scheduled Tasks: Cron jobs for web scraping
Server Environments: Running on headless servers
Batch Processing: Processing multiple tasks sequentially

Part 13: Storage and Persistence

Agent TARS supports multiple storage backends for session persistence.

SQLite (Default)

export default defineConfig({
  server: {
    storage: {
      type: 'sqlite',
      uri: './sessions.db'
    }
  }
});

MongoDB

export default defineConfig({
  server: {
    storage: {
      type: 'mongodb',
      uri: 'mongodb://localhost:27017/agent-tars'
    }
  }
});

File-based (JSON)

export default defineConfig({
  server: {
    storage: {
      type: 'file',
      path: './sessions-data'
    }
  }
});

Part 14: Troubleshooting

Common Issues and Solutions

Issue: “Node.js version too old”

Error: Agent TARS requires Node.js >= 22

Solution: Update Node.js to version 22 or higher (see Part 2).

Issue: “Chrome not found”

Error: Could not find Chrome installation

Solution: Install Google Chrome or specify the path:

agent-tars --browser-executable-path /path/to/chrome

Issue: “API key invalid”

Error: Invalid API key for provider

Solution:

Verify your API key is correct
Check that the key has necessary permissions
Ensure the key is not expired

Issue: “WebSocket connection failed”

Error: Failed to connect to browser websocket

Solution:

Ensure Chrome isn’t running with conflicting flags
Close any existing Chrome processes
Try running with --no-sandbox flag (Linux):

agent-tars --browser-args="--no-sandbox"

Issue: Permission denied (Linux)

Error: EACCES: permission denied

Solution: Fix npm global installation permissions:

# Create npm global directory in home
mkdir ~/.npm-global
npm config set prefix '~/.npm-global'

# Add to PATH
echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

Part 15: Best Practices

1. Security Considerations

Never share API keys in public repositories
Use environment variables for sensitive data
Limit sandbox permissions to required directories
Monitor usage to avoid unexpected API costs

2. Performance Optimization

Use headless mode for automated tasks
Enable DOM-only mode for simple, structured websites
Configure appropriate timeouts for slow networks
Use session persistence for long-running tasks

3. Cost Management

Choose smaller models for simple tasks
Use Claude Haiku or GPT-4o-mini for cost-sensitive operations
Implement rate limiting in production
Monitor token usage regularly

Part 16: Advanced Use Cases

Automated Testing

# Run test suite with Agent TARS
agent-tars --headless 
           --task "Navigate to example.com, click login, verify dashboard loads" 
           --output-format json 
           --output-file test-results.json

Web Scraping

# Extract data from websites
agent-tars --task "Go to news.ycombinator.com, extract top 10 headlines with links" 
           --output-format csv

Form Automation

# Fill complex forms
agent-tars --task "Fill out the job application form with: Name=John Doe, Email=john@example.com, Resume=upload resume.pdf"

Part 17: CLI Reference

Full Command Reference

agent-tars [options]

Options:
  -V, --version                       Output version number
  -p, --provider <provider>           Model provider
  -m, --model <model>                 Model name
  -k, --apiKey <key>                  API key
  -b, --baseURL <url>                 Custom API base URL
  --headless                          Run browser in headless mode
  --port <port>                       Web UI port
  --config <path>                     Configuration file path
  --browser-strategy <strategy>       Browser control strategy
  --browser-executable-path <path>    Chrome executable path
  --browser-args <args>               Additional browser arguments
  --sandbox-url <url>                 Sandbox service URL
  --storage-type <type>               Storage backend type
  --storage-uri <uri>                 Storage connection URI
  -h, --help                          Display help

Conclusion

Agent TARS represents a significant advancement in AI-powered browser automation. By combining Vision-Language Models with traditional web automation techniques, it offers a powerful and flexible solution for complex web tasks.

Key Takeaways

Easy Installation: Single npm command to get started
Flexible Configuration: Multiple model providers and configuration options
Extensible: MCP integration allows connecting to any external tool
Production Ready: Server mode with session persistence

Next Steps

Join the Discord community for support
Check out example use cases
Read the official documentation

UI-TARS Desktop Complete Guide - For local computer desktop automation
MCP Server Development Guide - Build custom tools

Appendix A: Comparison of Model Providers

Provider	Best Model for Agent TARS	Strengths	Limitations
Anthropic	claude-3-7-sonnet-latest	Best vision understanding, reliable	Usage limits on free tier
OpenAI	gpt-4o	Fast, good general performance	Higher API costs
Google	gemini-2.5-pro-preview	Long context, free tier	Beta features may change
VolcEngine	doubao-1-5-thinking-vision-pro	Optimized for Agent TARS	Requires Chinese API access
Azure OpenAI	gpt-4o (deployed)	Enterprise compliance	Complex setup

Appendix B: Changelog Highlights

v0.3.0 (Nov 2025): Streaming support, Event Stream Viewer, AIO Sandbox integration
Beta Release (Jun 2025): Initial public release with CLI and Web UI
Ongoing: Active development with regular updates

Last updated: January 2026

Comments

Your comments help others in the community.