๐ŸŽฏ New! Master certifications with Performance-Based Questions (PBQ) โ€” realistic hands-on practice for CompTIA & Cisco exams!

Agent TARS Complete Setup Guide: Open-Source Multimodal AI Agent for Browser Automation

Published on January 12, 2026


Introduction

Agent TARS is an open-source multimodal AI agent stack developed by ByteDance that brings the power of GUI agents and vision capabilities to your terminal, browser, and applications. Unlike traditional automation tools, Agent TARS uses cutting-edge Vision-Language Models (VLMs) to understand and interact with graphical interfaces the way humans doโ€”by seeing and interpreting visual content.

This comprehensive guide covers everything from basic installation to advanced configurations, including MCP server integration, model provider setup, and browser automation strategies.

What is Agent TARS?

Agent TARS evolved from the UI-TARS-desktop project, shifting from an Electron-based desktop app to a lightweight CLI with Web UI for better portability and efficiency. This allows it to run โ€œanytime, anywhereโ€ without heavy dependencies like bundled Chromium, reducing installation size and improving iteration speed.

Agent TARS is designed to complete tasks in a human-like manner through:

  • Visual Understanding: Uses Vision-Language Models to see and interpret screenshots
  • Browser Control: Automates web browsing using visual grounding, DOM manipulation, or hybrid strategies
  • Code Execution: Runs shell commands, Jupyter notebooks, and file editing in sandboxed environments
  • MCP Integration: Connects to Model Context Protocol (MCP) servers for extensible tool access
  • Multi-Interface: Provides both CLI and Web UI for different use cases
  • Context Engineering: Manages long-running tasks with dynamic sliding windows and hierarchical memory
  • Observability: Uses a Snapshot framework for deterministic replays and automated benchmarking
  • Event Streaming: Real-time Agent Event Stream for monitoring agent status, tool calls, and responses

Context Engineering

Agent TARS implements sophisticated context management to prevent context overflow in models with limited token windows (e.g., 128k tokens):

Memory LevelNamePurpose
L0PermanentCore system instructions, always retained
L1RunSession-level context, persists across loops
L2LoopCurrent task iteration, may be summarized
L3EphemeralTemporary data, discarded after use

This hierarchical memory system enables efficient handling of complex, multi-step tasks without exceeding token limits.

Agent TARS vs UI-TARS Desktop

These are two distinct products in the same repository:

FeatureAgent TARSUI-TARS Desktop
InterfaceCLI + Web UINative Desktop App (Electron)
Primary UseBrowser automation, code executionLocal computer GUI control
Model BackendCloud APIs (OpenAI, Claude, etc.)Local/Remote VLM models (UI-TARS series)
ArchitectureLightweight, no bundled browserElectron with bundled Chromium
Best ForWeb tasks, terminal-based automationDesktop automation, direct computer control
Installationnpm/npxDownload installer

Agent TARS is a general-purpose agent stack focused on multimodal workflows and CLI-driven tasks, while UI-TARS Desktop is specialized for native GUI automation using the UI-TARS model series.

๐Ÿ“ Note: If you need to control your local computer desktop (not just browser), see our separate UI-TARS Desktop Complete Guide.


Part 1: System Requirements

Before installing Agent TARS, ensure your system meets these requirements.

Hardware Requirements

ComponentMinimumRecommended
RAM4 GB8 GB+
CPU2 cores4+ cores
Storage2 GB free5 GB+ free
NetworkStable internetLow-latency connection
BrowserGoogle ChromeGoogle Chrome (latest)

Software Requirements

SoftwareRequired VersionNotes
Node.js22.x or higherCritical: Version 22+ is mandatory
npmComes with Node.jsUsed for package installation
Google ChromeLatest stableRequired for browser automation
GitLatestOptional, for development

API Key Requirements

You need an API key from at least one of these providers:

ProviderRecommended ModelsPricing
Anthropicclaude-3-7-sonnet-latest, claude-3-5-sonnet-20241022Pay-per-use
OpenAIgpt-4o, gpt-4-turbo-visionPay-per-use
VolcEnginedoubao-1-5-thinking-vision-pro-250428Pay-per-use
Googlegemini-2.5-pro-preview-03-25Free tier available
Azure OpenAIgpt-4o (deployed)Enterprise pricing
MistralVarious vision modelsPay-per-use

Part 2: Node.js Installation

Agent TARS requires Node.js version 22 or higher. Hereโ€™s how to install it on each platform.

Node.js on Linux (Ubuntu/Debian)

nvm (Node Version Manager) allows easy installation and switching between Node.js versions.

# Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

# Reload shell configuration
source ~/.bashrc
# Or for zsh users:
# source ~/.zshrc

# Verify nvm installation
nvm --version

What this does: Downloads and runs the nvm installation script, which adds nvm to your shell configuration.

# Install Node.js 22 (LTS)
nvm install 22

# Set Node.js 22 as default
nvm alias default 22

# Verify installation
node --version
# Should output: v22.x.x

npm --version
# Should output: 10.x.x or higher

What this does: Installs Node.js version 22 and sets it as your default Node.js version.

Option B: Using NodeSource Repository

# Add NodeSource repository for Node.js 22
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -

# Install Node.js
sudo apt-get install -y nodejs

# Verify installation
node --version
npm --version

What this does: Adds the official NodeSource repository and installs the latest Node.js 22.x version.


Node.js on macOS

# Install nvm (Node Version Manager)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

# Reload terminal or run:
source ~/.zshrc

# Install and use Node.js 22
nvm install 22
nvm use 22
nvm alias default 22

# Verify
node --version

Option B: Using Homebrew

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Node.js 22
brew install node@22

# Add to PATH (if not automatically done)
echo 'export PATH="/opt/homebrew/opt/node@22/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

# Verify
node --version

What this does: Uses Homebrew package manager to install Node.js 22 and adds it to your system PATH.


Node.js on Windows

  1. Download nvm-windows from GitHub releases
  2. Run the installer (nvm-setup.exe)
  3. Accept defaults and complete installation

Open PowerShell (or Command Prompt) as Administrator:

# List available Node.js versions
nvm list available

# Install Node.js 22 (latest LTS)
nvm install 22

# Use Node.js 22
nvm use 22

# Verify installation
node --version
npm --version

What this does: Installs nvm-windows which allows managing multiple Node.js versions on Windows.

Option B: Direct Installation

  1. Visit nodejs.org
  2. Download the Node.js 22.x LTS installer for Windows
  3. Run the installer:
    • Accept license agreement
    • Keep default installation directory
    • Ensure โ€œnpm package managerโ€ is checked
    • Enable โ€œAdd to PATHโ€ option
  4. Restart your terminal/PowerShell
# Verify installation
node --version
npm --version

Part 3: Installing Google Chrome

Agent TARS uses Google Chrome for browser automation. Ensure you have it installed:

Chrome on Linux (Ubuntu/Debian)

# Download and install Chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install -y ./google-chrome-stable_current_amd64.deb

# Verify installation
google-chrome --version

Chrome on macOS

Download from google.com/chrome or use Homebrew:

brew install --cask google-chrome

Chrome on Windows

Download from google.com/chrome and run the installer.


Part 4: Installing Agent TARS CLI

There are three main ways to run Agent TARS CLI:

Method 1: Using npx (Quickest)

This method runs Agent TARS without installing it globally:

# Run Agent TARS directly with npx
npx @agent-tars/cli@latest

What this does: Downloads and runs the latest version of Agent TARS CLI without installing it permanently. The package is cached locally for faster subsequent runs.

๐Ÿ’ก Tip: Use this method for quick testing or one-time use. For regular use, global installation is recommended.


Install Agent TARS globally for easy access:

Linux/macOS:

# Install globally (latest stable)
npm install -g @agent-tars/cli@latest

# Or install beta version for newest features
npm install -g @agent-tars/cli@beta

# Verify installation
agent-tars --version

# Run Agent TARS
agent-tars

Windows (PowerShell as Administrator):

# Install globally (latest stable)
npm install -g @agent-tars/cli@latest

# Or install beta version for newest features
npm install -g @agent-tars/cli@beta

# Verify installation
agent-tars --version

# Run Agent TARS
agent-tars

What this does: Installs Agent TARS CLI globally, making the agent-tars command available from any directory.

Updating Agent TARS

To update to the latest version:

# Update to latest stable
npm update -g @agent-tars/cli

# Or reinstall latest
npm install -g @agent-tars/cli@latest

Method 3: Development Installation (From Source)

For contributors or those who want the latest development version:

# Clone the repository
git clone https://github.com/bytedance/UI-TARS-desktop.git
cd UI-TARS-desktop

# Install pnpm package manager
npm install -g pnpm

# Install dependencies
pnpm install

# Navigate to CLI package
cd multimodal/agent-tars/cli

# Run in development mode
pnpm dev

What this does: Clones the entire UI-TARS-desktop monorepo and runs Agent TARS CLI from source.


Method 4: Docker Installation (Custom Setup)

Official Docker support isnโ€™t provided, but you can containerize Agent TARS for portable deployments.

Create a Dockerfile

FROM node:22-alpine

# Set working directory
WORKDIR /app

# Install Agent TARS CLI globally
RUN npm install -g @agent-tars/cli@latest

# Set default environment variables (override at runtime)
ENV AGENT_PROVIDER=volcengine
ENV AGENT_MODEL=doubao-1-5-thinking-vision-pro-250428

# Expose Web UI port
EXPOSE 8888

# Default command to run Agent TARS
CMD ["agent-tars", "--provider", "${AGENT_PROVIDER}", "--model", "${AGENT_MODEL}", "--apiKey", "${AGENT_API_KEY}"]

Dockerfile explained:

  • FROM node:22-alpine: Lightweight Node.js 22 base image
  • WORKDIR /app: Sets working directory inside container
  • RUN npm install -g: Installs Agent TARS CLI
  • EXPOSE 8888: Exposes the default Web UI port
  • CMD: Default command to run Agent TARS

Build and Run

# Build the Docker image
docker build -t agent-tars .

# Run with API key as environment variable
docker run -p 8888:8888 
    -e AGENT_PROVIDER=anthropic 
    -e AGENT_MODEL=claude-3-7-sonnet-latest 
    -e AGENT_API_KEY=your-api-key 
    agent-tars

# Access Web UI at http://localhost:8888

Platform notes:

  • Linux: Works directly
  • macOS/Windows: Ensure Docker Desktop is installed and allocate sufficient RAM (4GB+ recommended)

Part 5: Running Agent TARS

Basic Usage

Run Agent TARS with your preferred model provider:

Using Anthropic Claude:

agent-tars --provider anthropic 
           --model claude-3-7-sonnet-latest 
           --apiKey YOUR_ANTHROPIC_API_KEY

Using OpenAI:

agent-tars --provider openai 
           --model gpt-4o 
           --apiKey YOUR_OPENAI_API_KEY

Using VolcEngine (ByteDance):

agent-tars --provider volcengine 
           --model doubao-1-5-thinking-vision-pro-250428 
           --apiKey YOUR_VOLCENGINE_API_KEY

Using Google Gemini:

agent-tars --provider google 
           --model gemini-2.5-pro-preview-03-25 
           --apiKey YOUR_GOOGLE_API_KEY

What Happens When You Run Agent TARS

  1. Web UI Launches: A browser window opens with the Agent TARS interface
  2. Agent Ready: The agent waits for your natural language instructions
  3. Execution: When you give a task, the agent:
    • Takes screenshots of the browser
    • Analyzes the visual content using the VLM
    • Plans and executes actions (click, type, navigate)
    • Provides real-time feedback

Example Tasks

# Book a flight
"Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline"

# Research task
"Search for the top 5 programming languages in 2025 and create a summary"

# Web automation
"Go to GitHub, find the ByteDance/UI-TARS-desktop repository, and tell me how many stars it has"

Part 6: Configuration Options

Command Line Arguments

ArgumentDescriptionDefault
--providerModel provider (anthropic, openai, volcengine, google, azure, mistral)Required
--modelModel nameRequired
--apiKeyAPI key for the providerRequired
--baseURLCustom API endpointProvider default
--headlessRun browser in headless modefalse
--portPort for Web UI server8888
--configPath to configuration fileNone

Using Environment Variables

Instead of passing arguments every time, set environment variables:

Linux/macOS:

# Add to ~/.bashrc or ~/.zshrc
export ANTHROPIC_API_KEY="your-api-key"
export OPENAI_API_KEY="your-api-key"
export VOLCENGINE_API_KEY="your-api-key"

# Reload configuration
source ~/.bashrc

Windows:

# Set permanently
[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "your-api-key", "User")
[Environment]::SetEnvironmentVariable("OPENAI_API_KEY", "your-api-key", "User")

# Or temporarily for current session
$env:ANTHROPIC_API_KEY = "your-api-key"

Then run without specifying the API key:

agent-tars --provider anthropic --model claude-3-7-sonnet-latest

Workspace Initialization

For persistent configurations, initialize a workspace:

# Initialize a new workspace with config files
agent-tars workspace --init

What this does: Creates a directory with configuration files (agent-tars.config.ts) and prompts you for initial setup options like provider and API key.

Configuration File

Create a configuration file for complex setups:

agent-tars.config.ts (in your project directory):

import { defineConfig } from '@agent-tars/cli';

export default defineConfig({
  provider: 'anthropic',
  model: {
    id: 'claude-3-7-sonnet-latest',
    // apiKey is read from environment variable
  },
  browser: {
    headless: false,
    viewport: {
      width: 1920,
      height: 1080
    }
  },
  server: {
    port: 8888, // Default Web UI port
    storage: {
      type: 'sqlite',
      uri: './agent-sessions.db'
    }
  },
  mcp: [
    // MCP server configurations
  ]
});

Run with configuration file:

agent-tars --config ./agent-tars.config.ts

Or if youโ€™ve initialized a workspace, simply run:

agent-tars
# Uses workspace config automatically

Part 7: Web UI Overview

When Agent TARS launches, it opens a Web UI with these main components:

Interface Elements

ComponentDescription
Chat InputEnter natural language instructions
Browser ViewLive preview of what the agent sees
Event StreamReal-time log of agent actions
Tool CallsDisplay of MCP tools being used
SettingsConfigure model, providers, and options

Keyboard Shortcuts

ShortcutAction
EnterSend message
Shift + EnterNew line in message
Ctrl/Cmd + KClear conversation
EscapeStop current action

Part 8: Browser Control Strategies

Agent TARS uses three strategies for browser control:

1. Visual Grounding (GUI Agent)

Uses Vision-Language Models to:

  • Identify clickable elements by their visual appearance
  • Understand the layout and context of the page
  • Make decisions based on what it โ€œseesโ€

Best for: Complex UIs, dynamically generated content, visual recognition tasks

2. DOM Manipulation

Uses traditional web automation to:

  • Query elements by CSS selectors
  • Extract text content and attributes
  • Interact with JavaScript-heavy applications

Best for: Speed, reliability, well-structured websites

3. Hybrid Mode (Default)

Combines both approaches:

  • Uses visual grounding for understanding and planning
  • Falls back to DOM manipulation for reliable execution
  • Adapts strategy based on page complexity

Configuration:

# Visual grounding only
agent-tars --browser-strategy visual

# DOM only
agent-tars --browser-strategy dom

# Hybrid (default)
agent-tars --browser-strategy hybrid

Part 9: MCP Server Integration

Model Context Protocol (MCP) allows Agent TARS to connect to external tools and services.

What is MCP?

MCP is an open protocol for connecting AI systems to external resources:

  • Tools: Functions the AI can call (search, file operations, etc.)
  • Resources: Data sources the AI can access
  • Prompts: Pre-defined templates for common tasks

Built-in MCP Servers

Agent TARS includes several MCP servers:

ServerPurposeTools
mcp-server-browserBrowser automationnavigate, click, screenshot, extract_content
mcp-server-filesystemFile operationsread_file, write_file, list_directory
mcp-server-commandsShell commandsexecute_command
mcp-server-searchWeb searchsearch

Configuring MCP Servers

In your configuration file:

export default defineConfig({
  // ... other config
  mcp: [
    {
      name: 'filesystem',
      transport: 'stdio',
      command: 'npx',
      args: ['@agent-infra/mcp-server-filesystem', '--root', '/allowed/directory']
    },
    {
      name: 'search',
      transport: 'sse',
      url: 'http://localhost:8080/mcp'
    }
  ]
});

Adding Custom MCP Servers

You can add any MCP-compatible server:

mcp: [
  {
    name: 'custom-tool',
    transport: 'stdio',
    command: 'node',
    args: ['./my-custom-mcp-server.js'],
    // Optional: filter which tools to expose
    include: ['tool1', 'tool2'],
    exclude: ['sensitive-tool']
  }
]

Part 10: Code Execution (Sandbox)

Agent TARS can execute code in isolated sandboxed environments using the @agent-infra/sandbox package.

Sandbox Capabilities

FeatureDescription
Bash CommandsExecute shell commands securely
Jupyter NotebooksRun Python code interactively
File EditingCreate and modify files
Environment IsolationEach session is isolated

Enabling Sandbox

export default defineConfig({
  // ... other config
  sandbox: {
    enabled: true,
    url: 'http://localhost:8081', // AIO Sandbox URL
    // Or use Docker-based sandbox
    docker: {
      image: 'agent-infra/sandbox:latest',
      autoStart: true
    }
  }
});

Starting AIO Sandbox

The AIO (All-in-One) Sandbox provides an isolated execution environment:

# Using Docker
docker run -d -p 8081:8081 ghcr.io/agent-infra/sandbox:latest

# Or using npm
npx @agent-infra/sandbox

Part 11: Running as a Server

Agent TARS can run as a persistent server for multi-session management.

Server Mode

agent-tars server --port 3000

API Endpoints

EndpointMethodDescription
/api/v1/sessionsPOSTCreate new session
/api/v1/sessions/:idGETGet session status
/api/v1/sessions/:id/messagesPOSTSend message to session
/api/v1/sessions/:id/eventsGET (SSE)Stream session events
/api/v1/sessions/:idDELETEEnd session

Example API Usage

// Create a new session
const response = await fetch('http://localhost:3000/api/v1/sessions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: {
      provider: 'anthropic',
      id: 'claude-3-7-sonnet-latest'
    }
  })
});

const { sessionId } = await response.json();

// Send a message
await fetch(`http://localhost:3000/api/v1/sessions/${sessionId}/messages`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    content: 'Search for the weather in New York'
  })
});

Part 12: Headless Mode

Run Agent TARS without a visible browser for automation scripts:

agent-tars --provider anthropic 
           --model claude-3-7-sonnet-latest 
           --headless

Use Cases for Headless Mode

  • CI/CD Pipelines: Automated testing
  • Scheduled Tasks: Cron jobs for web scraping
  • Server Environments: Running on headless servers
  • Batch Processing: Processing multiple tasks sequentially

Part 13: Storage and Persistence

Agent TARS supports multiple storage backends for session persistence.

SQLite (Default)

export default defineConfig({
  server: {
    storage: {
      type: 'sqlite',
      uri: './sessions.db'
    }
  }
});

MongoDB

export default defineConfig({
  server: {
    storage: {
      type: 'mongodb',
      uri: 'mongodb://localhost:27017/agent-tars'
    }
  }
});

File-based (JSON)

export default defineConfig({
  server: {
    storage: {
      type: 'file',
      path: './sessions-data'
    }
  }
});

Part 14: Troubleshooting

Common Issues and Solutions

Issue: โ€œNode.js version too oldโ€

Error: Agent TARS requires Node.js >= 22

Solution: Update Node.js to version 22 or higher (see Part 2).


Issue: โ€œChrome not foundโ€

Error: Could not find Chrome installation

Solution: Install Google Chrome or specify the path:

agent-tars --browser-executable-path /path/to/chrome

Issue: โ€œAPI key invalidโ€

Error: Invalid API key for provider

Solution:

  1. Verify your API key is correct
  2. Check that the key has necessary permissions
  3. Ensure the key is not expired

Issue: โ€œWebSocket connection failedโ€

Error: Failed to connect to browser websocket

Solution:

  1. Ensure Chrome isnโ€™t running with conflicting flags
  2. Close any existing Chrome processes
  3. Try running with --no-sandbox flag (Linux):
agent-tars --browser-args="--no-sandbox"

Issue: Permission denied (Linux)

Error: EACCES: permission denied

Solution: Fix npm global installation permissions:

# Create npm global directory in home
mkdir ~/.npm-global
npm config set prefix '~/.npm-global'

# Add to PATH
echo 'export PATH=~/.npm-global/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

Part 15: Best Practices

1. Security Considerations

  • Never share API keys in public repositories
  • Use environment variables for sensitive data
  • Limit sandbox permissions to required directories
  • Monitor usage to avoid unexpected API costs

2. Performance Optimization

  • Use headless mode for automated tasks
  • Enable DOM-only mode for simple, structured websites
  • Configure appropriate timeouts for slow networks
  • Use session persistence for long-running tasks

3. Cost Management

  • Choose smaller models for simple tasks
  • Use Claude Haiku or GPT-4o-mini for cost-sensitive operations
  • Implement rate limiting in production
  • Monitor token usage regularly

Part 16: Advanced Use Cases

Automated Testing

# Run test suite with Agent TARS
agent-tars --headless 
           --task "Navigate to example.com, click login, verify dashboard loads" 
           --output-format json 
           --output-file test-results.json

Web Scraping

# Extract data from websites
agent-tars --task "Go to news.ycombinator.com, extract top 10 headlines with links" 
           --output-format csv

Form Automation

# Fill complex forms
agent-tars --task "Fill out the job application form with: Name=John Doe, Email=john@example.com, Resume=upload resume.pdf"

Part 17: CLI Reference

Full Command Reference

agent-tars [options]

Options:
  -V, --version                       Output version number
  -p, --provider <provider>           Model provider
  -m, --model <model>                 Model name
  -k, --apiKey <key>                  API key
  -b, --baseURL <url>                 Custom API base URL
  --headless                          Run browser in headless mode
  --port <port>                       Web UI port
  --config <path>                     Configuration file path
  --browser-strategy <strategy>       Browser control strategy
  --browser-executable-path <path>    Chrome executable path
  --browser-args <args>               Additional browser arguments
  --sandbox-url <url>                 Sandbox service URL
  --storage-type <type>               Storage backend type
  --storage-uri <uri>                 Storage connection URI
  -h, --help                          Display help

Conclusion

Agent TARS represents a significant advancement in AI-powered browser automation. By combining Vision-Language Models with traditional web automation techniques, it offers a powerful and flexible solution for complex web tasks.

Key Takeaways

  1. Easy Installation: Single npm command to get started
  2. Flexible Configuration: Multiple model providers and configuration options
  3. Extensible: MCP integration allows connecting to any external tool
  4. Production Ready: Server mode with session persistence

Next Steps


Appendix A: Comparison of Model Providers

ProviderBest Model for Agent TARSStrengthsLimitations
Anthropicclaude-3-7-sonnet-latestBest vision understanding, reliableUsage limits on free tier
OpenAIgpt-4oFast, good general performanceHigher API costs
Googlegemini-2.5-pro-previewLong context, free tierBeta features may change
VolcEnginedoubao-1-5-thinking-vision-proOptimized for Agent TARSRequires Chinese API access
Azure OpenAIgpt-4o (deployed)Enterprise complianceComplex setup

Appendix B: Changelog Highlights

  • v0.3.0 (Nov 2025): Streaming support, Event Stream Viewer, AIO Sandbox integration
  • Beta Release (Jun 2025): Initial public release with CLI and Web UI
  • Ongoing: Active development with regular updates

Last updated: January 2026

Comments

Sign in to join the discussion!

Your comments help others in the community.