🎯 New! Master certifications with Performance-Based Questions (PBQ) — realistic hands-on practice for CompTIA & Cisco exams!

UI-TARS Desktop Complete Setup Guide: Native GUI Agent for Computer Control

Published on January 12, 2026


Introduction

UI-TARS Desktop is an open-source native desktop application developed by ByteDance that enables you to control your computer using natural language commands. Built on Electron and powered by the UI-TARS Vision-Language Model, it can see your screen, understand UI elements, and perform mouse and keyboard actions autonomously.

This comprehensive guide covers everything from downloading the application to deploying your own local model, including detailed VRAM requirements and hardware recommendations.

What is UI-TARS Desktop?

UI-TARS Desktop is a GUI (Graphical User Interface) agent that:

  • Sees Your Screen: Takes screenshots and analyzes visual content
  • Understands Context: Uses Vision-Language Models to comprehend UI elements
  • Executes Actions: Performs precise mouse clicks, keyboard input, and navigation
  • Works Locally: Supports fully local operation for privacy-sensitive environments
  • Offers Remote Options: Free trial remote operators for quick testing

Key Features

FeatureDescription
Natural Language ControlDescribe tasks in plain English
Screenshot RecognitionAI-powered visual understanding
Precise ControlMouse and keyboard automation
Cross-PlatformWindows and macOS support
Real-time FeedbackLive status and action display
Private & SecureOption for fully local processing
Multiple OperatorsLocal computer, browser, and remote options

UI-TARS Desktop vs Agent TARS

FeatureUI-TARS DesktopAgent TARS
InterfaceNative Desktop App (Electron)CLI + Web UI
Primary UseLocal computer GUI controlBrowser automation, code execution
Model BackendUI-TARS VLM (local or cloud)Cloud APIs (OpenAI, Claude, etc.)
Best ForDesktop automation, privacy-focusedWeb tasks, scripting
InstallationDownload installernpm install

📝 Note: If you need browser-focused automation with MCP tools and code execution, see our separate Agent TARS Complete Guide.


Part 1: System Requirements

Hardware Requirements

UI-TARS Desktop has different requirements depending on whether you run models locally or use cloud/remote services.

For Local Model Deployment (Running UI-TARS model on your machine)

ComponentUI-TARS-7B (Minimum)UI-TARS-7B (Recommended)UI-TARS-72B
GPU VRAM16 GB (FP16)24 GB80+ GB (Multi-GPU required)
System RAM16 GB32 GB64+ GB
CPU8 cores12+ cores16+ cores
Storage40 GB SSD (~14GB model)100 GB NVMe SSD200+ GB NVMe (~144GB model)

For Cloud/Remote Model Usage (Using API or remote operators)

ComponentMinimumRecommended
System RAM4 GB8 GB
CPU2 cores4 cores
Storage500 MB1 GB
NetworkStable internetLow-latency connection

VRAM Requirements by Quantization

The UI-TARS-7B model can run with different quantization levels:

QuantizationVRAM RequiredModel SizeQuality Impact
FP32 (Full)~28 GB~33 GBBest quality
FP16~14-16 GB~15.2 GBNear-best quality
Q8_0~8 GB~8.1 GBGood quality
Q6_K~6.5 GB~6.25 GBGood quality
Q4_K_S~4.5 GB~4.46 GBModerate quality
Q2_K~3.5 GB~3.02 GBLower quality

💡 Recommendation: For best results, use FP16 with a 24GB GPU (RTX 3090, RTX 4090, A5000). For 8-12GB GPUs (RTX 3080/4070), use Q8_0 or Q4_K quantization.

Compatible GPUs

GPUVRAMRecommended Quantization
RTX 409024 GBFP16 (best experience)
RTX 309024 GBFP16 (best experience)
RTX 408016 GBFP16 (tight fit)
RTX 308010-12 GBQ8_0 or Q4_K
RTX 407012 GBQ8_0
RTX 30708 GBQ4_K_S
RTX 40608 GBQ4_K_S

Software Requirements

SoftwareWindowsmacOS
Operating SystemWindows 10/11 (64-bit)macOS 12+ (Monterey or later)
PermissionsAdministrator accessAccessibility + Screen Recording

Part 2: Installing UI-TARS Desktop

The easiest way to get UI-TARS Desktop is downloading the pre-built installer.

Windows Installation

  1. Download the installer:

    • Visit GitHub Releases
    • Download the latest .exe file (e.g., UI-TARS-Desktop-X.X.X-Setup.exe)
  2. Run the installer:

    • Double-click the downloaded .exe file
    • If Windows Defender shows a warning, click “More info” → “Run anyway”
    • Follow the installation wizard
  3. Launch the application:

    • Find “UI-TARS” in your Start Menu
    • Or launch from the desktop shortcut
  4. Allow through firewall (if prompted):

    • Click “Allow access” for private networks

Linux Installation

UI-TARS Desktop provides AppImage and .deb packages for Linux distributions.

Option A: AppImage (Universal)
# Download the latest AppImage from GitHub releases
wget https://github.com/bytedance/UI-TARS-desktop/releases/latest/download/UI-TARS-Desktop-x.x.x.AppImage

# Make it executable
chmod +x UI-TARS-Desktop-*.AppImage

# Run the application
./UI-TARS-Desktop-*.AppImage

What this does: AppImage is a portable format that runs on most Linux distributions without installation.

Option B: Debian/Ubuntu (.deb)
# Download the .deb package
wget https://github.com/bytedance/UI-TARS-desktop/releases/latest/download/ui-tars-desktop_x.x.x_amd64.deb

# Install with dpkg
sudo dpkg -i ui-tars-desktop_*.deb

# Fix any dependency issues
sudo apt-get install -f

# Launch from applications menu or run:
ui-tars-desktop

What this does: Installs UI-TARS Desktop system-wide via the Debian package manager.


macOS Installation

Option A: Direct Download
  1. Download the installer:

    • Visit GitHub Releases
    • Download the .dmg file for your chip:
      • Apple Silicon (M1/M2/M3/M4): Download the arm64 version
      • Intel Macs: Download the x64 version
  2. Install the application:

    • Open the downloaded .dmg file
    • Drag “UI-TARS” to your Applications folder
  3. Grant required permissions: This is critical for UI-TARS to control your computer.

    Accessibility Permission:

    • Open System SettingsPrivacy & SecurityAccessibility
    • Click the lock icon to make changes
    • Click + and add UI-TARS
    • Toggle UI-TARS ON

    Screen Recording Permission:

    • Open System SettingsPrivacy & SecurityScreen Recording
    • Click the lock icon to make changes
    • Click + and add UI-TARS
    • Toggle UI-TARS ON
  4. Launch the application:

    • Double-click UI-TARS in Applications
    • If you see “UI-TARS can’t be opened because Apple cannot check it for malicious software”:
      • Go to System SettingsPrivacy & Security
      • Scroll down and click “Open Anyway”
Option B: Homebrew Installation
# Install via Homebrew Cask
brew install --cask ui-tars

# Update to latest version
brew upgrade --cask ui-tars

What this does: Downloads and installs UI-TARS Desktop from the Homebrew Cask repository.

⚠️ Important: After Homebrew installation, you still need to grant Accessibility and Screen Recording permissions manually.


Method 2: Build from Source (For Developers)

If you want the latest development version or plan to contribute:

Prerequisites

Install these tools first:

# Install Node.js 20+ (required)
# Using nvm:
nvm install 20
nvm use 20

# Install pnpm package manager
npm install -g pnpm

Clone and Build

# Clone the repository
git clone https://github.com/bytedance/UI-TARS-desktop.git
cd UI-TARS-desktop

# Install dependencies
pnpm install

# Build the UI-TARS Desktop app
pnpm build:ui-tars

# Run in development mode
pnpm dev:ui-tars

# Create distributable package
pnpm package

Build commands explained:

  • pnpm install: Downloads all required packages
  • pnpm build:ui-tars: Compiles the application
  • pnpm dev:ui-tars: Runs with hot-reload for development
  • pnpm package: Creates installer for your platform
  • pnpm make: Creates platform-specific installers (DMG, EXE, AppImage)

Platform notes:

  • macOS: May need to edit plist for permissions configuration
  • Windows: May require UAC (User Account Control) for certain operations
  • Linux: Ensure required X11 libraries are installed

Part 3: Understanding Operator Modes

UI-TARS Desktop supports four operator modes:

1. Local Computer Operator

Controls your local computer’s desktop:

  • Uses NutJS for mouse/keyboard automation
  • Takes screenshots of your screen
  • Requires local VLM model or cloud API

Best for: Desktop automation, application control, file management

2. Local Browser Operator

Controls a local Chromium browser:

  • Uses Puppeteer for browser automation
  • Isolated browser instance
  • Requires local VLM model or cloud API

Best for: Web automation with local processing

3. Remote Computer Operator (Free Trial)

Control a cloud-hosted virtual computer:

  • No local model needed
  • 30-minute free trial sessions
  • VNC streaming preview

Best for: Testing UI-TARS without local setup

4. Remote Browser Operator (Free Trial)

Control a cloud-hosted browser:

  • No local model needed
  • 30-minute free trial sessions
  • Browser automation in the cloud

Best for: Quick testing and evaluation


Part 4: VLM Provider Configuration

To use Local Computer or Local Browser operators, you need a Vision-Language Model backend.

Option A: Using HuggingFace Inference Endpoints (Cloud)

HuggingFace offers hosted endpoints for UI-TARS models.

Step 1: Create HuggingFace Account

  1. Go to huggingface.co
  2. Create an account or sign in
  3. Navigate to your Settings → Access Tokens
  4. Create a new token with read access

Step 2: Deploy UI-TARS Endpoint

  1. Go to HuggingFace Inference Endpoints
  2. Click “Deploy new model”
  3. Search for ByteDance-Seed/UI-TARS-1.5-7B
  4. Select a GPU instance:
    • A100 (80GB): Best performance
    • A10G (24GB): Good balance
    • T4 (16GB): Budget option with FP16
  5. Deploy and wait for the endpoint to be ready (5-10 minutes)
  6. Copy your endpoint URL (format: https://<your-endpoint>.endpoints.huggingface.cloud/v1/)

Step 3: Configure in UI-TARS Desktop

  1. Open UI-TARS Desktop
  2. Click the Settings icon (gear icon)
  3. Select HuggingFace as provider
  4. Enter:
    • Base URL: Your HuggingFace endpoint URL
    • API Key: Your HuggingFace access token
    • Model: UI-TARS-1.5-7B

Option B: Using VolcEngine (ByteDance Cloud)

VolcEngine is ByteDance’s cloud platform with optimized UI-TARS support.

Step 1: Create VolcEngine Account

  1. Visit VolcEngine Console
  2. Sign up for an account
  3. Navigate to the AI/ML services section
  4. Subscribe to the Doubao model service

Step 2: Get API Credentials

  1. Go to API Key management
  2. Create a new API key
  3. Note down your:
    • API Key
    • Endpoint URL (usually https://ark.cn-beijing.volces.com/api/v3)

Step 3: Configure in UI-TARS Desktop

  1. Open UI-TARS Desktop
  2. Click the Settings icon
  3. Select VolcEngine as provider
  4. Enter:
    • Base URL: https://ark.cn-beijing.volces.com/api/v3
    • API Key: Your VolcEngine API key
    • Model: doubao-1.5-ui-tars-250328

Option C: Using OpenAI-Compatible Endpoints

Any OpenAI-compatible API can be used (Ollama, vLLM, LocalAI, etc.).

Configuration in UI-TARS Desktop

  1. Open UI-TARS Desktop
  2. Click the Settings icon
  3. Select Custom or OpenAI-Compatible as provider
  4. Enter:
    • Base URL: Your local endpoint (e.g., http://localhost:8000/v1)
    • API Key: Set any value if not required (some servers need a placeholder)
    • Model: Model name as registered in your backend

Part 5: Local Model Deployment with vLLM (Docker)

For privacy or offline use, deploy UI-TARS locally using vLLM.

Prerequisites

  • NVIDIA GPU with 16+ GB VRAM
  • NVIDIA Docker runtime installed
  • 50+ GB free disk space
  • Docker installed

Option A: Direct Python vLLM Installation (Non-Docker)

For users who prefer not to use Docker:

# Install Python 3.10+ and pip
sudo apt-get install python3 python3-pip

# Install vLLM
pip install vllm

# Download model from HuggingFace (requires huggingface-cli)
pip install huggingface_hub
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B

# Start the OpenAI-compatible server
python -m vllm.entrypoints.openai.api_server 
    --model ByteDance-Seed/UI-TARS-1.5-7B 
    --trust-remote-code 
    --gpu-memory-utilization 0.9 
    --port 8000

What this does: Runs vLLM directly on your system without Docker containerization. Uses 90% of available GPU memory for optimal throughput.

Option B: Docker-based vLLM Installation

Step 1: Install NVIDIA Container Toolkit

Linux (Ubuntu/Debian):

# Add NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | 
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

What this does: Installs the NVIDIA Container Toolkit which allows Docker containers to access your GPU.

Windows (WSL2):

# Ensure latest NVIDIA drivers are installed
# Install WSL2 with Ubuntu
wsl --install -d Ubuntu

# Inside WSL2, the NVIDIA Container Toolkit should already be available
# Verify with:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Step 2: Run vLLM Server with UI-TARS

# Create a directory for model cache
mkdir -p ~/.cache/huggingface

# Run vLLM with UI-TARS-1.5-7B
docker run -d 
    --name ui-tars-vllm 
    --gpus all 
    -v ~/.cache/huggingface:/root/.cache/huggingface 
    -p 8000:8000 
    vllm/vllm-openai:latest 
    --model ByteDance-Seed/UI-TARS-1.5-7B 
    --trust-remote-code 
    --max-model-len 32768 
    --dtype auto 
    --api-key "local-api-key"

Command breakdown:

  • --gpus all: Use all available GPUs
  • -v ~/.cache/huggingface:/root/.cache/huggingface: Cache downloaded models
  • -p 8000:8000: Expose API on port 8000
  • --model: The UI-TARS model from HuggingFace
  • --trust-remote-code: Required for UI-TARS model
  • --max-model-len: Maximum context length
  • --dtype auto: Automatic precision selection
  • --api-key: Set a local API key for security

For 8GB GPUs (Using Quantization):

docker run -d 
    --name ui-tars-vllm 
    --gpus all 
    -v ~/.cache/huggingface:/root/.cache/huggingface 
    -p 8000:8000 
    vllm/vllm-openai:latest 
    --model ByteDance-Seed/UI-TARS-1.5-7B 
    --trust-remote-code 
    --quantization awq 
    --max-model-len 16384 
    --api-key "local-api-key"

What this does: Uses AWQ 4-bit quantization to reduce VRAM usage to approximately 4-5GB while maintaining reasonable quality.


Step 3: Verify vLLM Server

# Check if server is running
curl http://localhost:8000/v1/models

# Expected output:
# {"data":[{"id":"ByteDance-Seed/UI-TARS-1.5-7B","object":"model"...}]}

Step 4: Configure UI-TARS Desktop

  1. Open UI-TARS Desktop
  2. Click Settings
  3. Select Custom/OpenAI-Compatible
  4. Configure:
    • Base URL: http://localhost:8000/v1
    • API Key: local-api-key (or what you set)
    • Model: ByteDance-Seed/UI-TARS-1.5-7B
  5. Click Save

Part 6: Local Model Deployment with Ollama

Ollama provides an easier way to run local models on your machine.

Step 1: Install Ollama

Linux:

curl -fsSL https://ollama.ai/install.sh | sh

macOS:

# Using Homebrew
brew install ollama

# Or download from ollama.ai

Windows:

Download and install from ollama.ai/download.

Step 2: Pull UI-TARS Model

As of early 2026, UI-TARS models may not be directly available in Ollama’s registry. You can use GGUF versions:

# Check available models (UI-TARS may need to be imported)
ollama list

# If GGUF version is available in the community:
ollama pull mradermacher/UI-TARS-1.5-7B-GGUF:Q8_0

# Or import a GGUF file manually
ollama create ui-tars -f ./Modelfile

Modelfile example:

FROM ./UI-TARS-1.5-7B-Q8_0.gguf

TEMPLATE """{{ .System }}
{{ .Prompt }}"""

PARAMETER temperature 0.1
PARAMETER num_ctx 32768

Step 3: Run Ollama Server

# Start Ollama server (usually runs automatically)
ollama serve

# The API will be available at http://localhost:11434

Step 4: Configure UI-TARS Desktop for Ollama

  1. Open UI-TARS Desktop
  2. Click Settings
  3. Select Custom/OpenAI-Compatible
  4. Configure:
    • Base URL: http://localhost:11434/v1
    • API Key: ollama (placeholder)
    • Model: ui-tars (or your model name)
  5. Click Save

Part 7: Using Remote Operators (Free Trial)

The quickest way to try UI-TARS Desktop is using the free remote operators.

Remote Computer Operator

  1. Open UI-TARS Desktop
  2. Select Remote Computer from the operator dropdown
  3. Click Start Free Trial
  4. Wait for resource allocation (1-2 minutes)
  5. A VNC preview window will show the remote desktop
  6. Enter your task in the chat input
  7. Watch the AI control the remote computer

Free trial limits:

  • 30 minutes per session
  • Daily usage limits may apply
  • Queue during high demand

Remote Browser Operator

  1. Open UI-TARS Desktop
  2. Select Remote Browser from the operator dropdown
  3. Click Start Free Trial
  4. Wait for browser allocation
  5. Enter your browsing task
  6. Watch the AI control the remote browser

Part 8: Using UI-TARS Desktop

First Run Setup

  1. Launch UI-TARS Desktop

  2. Configure VLM Settings:

    • Click the Settings gear icon
    • Choose your provider (HuggingFace, VolcEngine, or Custom)
    • Enter your API credentials
    • Click Verify to test the connection
    • Click Save
  3. Grant Permissions (macOS only):

    • If prompted, grant Accessibility permissions
    • Grant Screen Recording permissions
    • Restart the application

Performing Tasks

  1. Select an Operator:

    • Local Computer (requires VLM setup)
    • Local Browser (requires VLM setup)
    • Remote Computer (free trial)
    • Remote Browser (free trial)
  2. Enter Your Task: Type a natural language description:

    "Open Notepad and type 'Hello World', then save the file as test.txt on the desktop"
  3. Click Send or press Enter

  4. Watch the Execution:

    • UI-TARS takes a screenshot
    • Analyzes the screen content
    • Plans the action sequence
    • Executes mouse/keyboard actions
    • Reports progress in real-time

Example Tasks

Task TypeExample Prompt
App Control“Open Calculator and calculate 15 × 23”
File Operations“Create a new folder called ‘Projects’ on the desktop”
Web Browsing“Open Chrome and search for ‘weather in Tokyo’”
Multi-step“Open Word, create a new document, add a heading ‘Meeting Notes’, and save it”
System Settings“Open Display Settings and change the resolution to 1920x1080”

Stopping Execution

  • Click the Stop button to halt the current task
  • Press Escape key
  • The agent will safely stop after completing the current action

Part 9: Troubleshooting

Common Issues

Issue: “Screen Recording permission not granted” (macOS)

Symptoms: UI-TARS can’t take screenshots.

Solution:

  1. Open System SettingsPrivacy & SecurityScreen Recording
  2. Ensure UI-TARS is listed and toggled ON
  3. If already enabled, remove and re-add UI-TARS
  4. Restart the application

Issue: “Accessibility permission not granted” (macOS)

Symptoms: UI-TARS can’t click or type.

Solution:

  1. Open System SettingsPrivacy & SecurityAccessibility
  2. Ensure UI-TARS is listed and toggled ON
  3. If already enabled, remove and re-add UI-TARS
  4. Restart the application

Issue: “VLM connection failed”

Symptoms: Cannot connect to model endpoint.

Solution:

  1. Verify your API endpoint URL is correct
  2. Check your API key is valid and not expired
  3. Ensure the model server is running (for local setups)
  4. Test with curl:
    curl -X POST http://your-endpoint/v1/chat/completions 
      -H "Content-Type: application/json" 
      -H "Authorization: Bearer YOUR_API_KEY" 
      -d '{"model":"UI-TARS-1.5-7B","messages":[{"role":"user","content":"Hello"}]}'

Issue: “Out of VRAM” (Local deployment)

Symptoms: vLLM or Ollama crashes with memory errors.

Solution:

  1. Use a quantized model (Q8_0 or Q4_K)
  2. Reduce --max-model-len to 16384 or 8192
  3. Close other GPU-intensive applications
  4. Check GPU memory with nvidia-smi

Issue: “App won’t open” (Windows)

Symptoms: Application doesn’t start or shows error.

Solution:

  1. Run as Administrator
  2. Check Windows Event Viewer for errors
  3. Reinstall with latest version
  4. Disable antivirus temporarily to test

Issue: Remote operator shows “unavailable”

Symptoms: Free trial resources not available.

Solution:

  1. High demand—try again later
  2. Daily limit reached—wait 24 hours
  3. Check internet connection
  4. Try the other remote operator type

Performance Tips

  1. For local deployment:

    • Use NVMe SSD for model storage
    • Keep GPU drivers updated
    • Close unnecessary applications
    • Use FP16 if you have sufficient VRAM
  2. For cloud APIs:

    • Use low-latency internet connection
    • Choose geographically close endpoints
    • Consider higher-tier plans for better performance
  3. General tips:

    • Lower screen resolution for faster processing
    • Use simpler prompts for faster execution
    • Enable caching if available

Part 10: Advanced Configuration

Custom Model Endpoints

For enterprise deployments or custom models:

{
  "provider": "custom",
  "baseURL": "https://api.your-company.com/v1",
  "apiKey": "your-enterprise-key",
  "model": "ui-tars-enterprise-v2",
  "timeout": 60000,
  "maxTokens": 4096
}

Multiple Operator Profiles

You can configure different profiles for different use cases:

  1. Work Profile: Connect to company’s private VLM
  2. Personal Profile: Use HuggingFace endpoint
  3. Testing Profile: Use free remote operators

Switch profiles from the dropdown in Settings.

Screenshot Quality Settings

For faster processing on slower systems:

  1. Open Settings → Advanced
  2. Reduce Screenshot Quality (default: 90%)
  3. Reduce Screenshot Resolution Scale (default: 100%)

Part 11: Security Considerations

API Key Security

  • Never share your API keys
  • Use environment variables where possible
  • Rotate keys periodically
  • Monitor usage for unexpected activity

Local Processing Advantages

When running models locally:

  • No data leaves your machine
  • No cloud dependencies
  • Works offline
  • Complete privacy

Remote Operator Considerations

When using remote operators:

  • Sessions are temporary and isolated
  • Do not enter sensitive credentials
  • Use for testing only
  • Data may be processed on ByteDance servers

Part 12: Model Comparison

UI-TARS Model Versions

ModelParametersBest Use CaseVRAM (FP16)
UI-TARS-1.5-7B7 billionGeneral use, consumer GPUs~14-16 GB
UI-TARS-1.6-7B7 billionLatest improvements~14-16 GB
UI-TARS-72B72 billionComplex tasks, enterprise80+ GB
UI-TARS-2B2 billionFast inference, limited accuracy~6 GB
Your HardwareRecommended ModelQuantization
RTX 4090 (24GB)UI-TARS-1.5-7BFP16
RTX 3080 (10GB)UI-TARS-1.5-7BQ4_K_S
RTX 4060 (8GB)UI-TARS-1.5-7BQ4_K_S or Q2_K
Apple M2 Pro (32GB)UI-TARS-1.5-7BMLX 6-bit
No GPUUse cloud APIN/A

Part 13: Development and SDK

UI-TARS provides an SDK for developers to integrate GUI automation into their applications.

Python SDK

# Install Python SDK
pip install ui-tars

Usage example:

from ui_tars import UITARSAgent

# Initialize agent
agent = UITARSAgent(
    model_endpoint="http://localhost:8000/v1",
    api_key="your-api-key"
)

# Execute a task
result = agent.run("Open Notepad and type 'Hello World'")
print(result)

Node.js SDK

npm install @ui-tars/sdk

Usage example:

const { GUIAgent } = require('@ui-tars/sdk');

const agent = new GUIAgent({
  endpoint: 'http://localhost:8000/v1',
  apiKey: 'your-api-key'
});

await agent.run('Open Chrome and navigate to google.com');

Part 14: Integration with Other Systems

MCP Server Mode

UI-TARS Desktop can run as an MCP (Model Context Protocol) server:

# Run UI-TARS Desktop as MCP server
ui-tars-desktop --mcp-server --port 8080

Other MCP clients can then connect and use UI-TARS capabilities.

API Mode

Run UI-TARS as a REST API server for integration:

ui-tars-desktop --api-server --port 3000

Endpoints:

  • POST /execute: Execute a task
  • GET /status: Get current status
  • POST /stop: Stop current execution

Part 15: Maintenance and Updates

Updating UI-TARS Desktop

Direct Download:

  1. Download the latest release from GitHub
  2. Install over the existing installation
  3. Configuration is preserved

Homebrew (macOS):

brew upgrade --cask ui-tars

From Source:

cd UI-TARS-desktop
git pull origin main
pnpm install
pnpm build:ui-tars

Updating Local Models

vLLM:

# Stop the current container
docker stop ui-tars-vllm

# Pull latest model
docker run --rm -v ~/.cache/huggingface:/root/.cache/huggingface 
    vllm/vllm-openai:latest python -c "from huggingface_hub import snapshot_download; snapshot_download('ByteDance-Seed/UI-TARS-1.5-7B')"

# Restart container
docker start ui-tars-vllm

Ollama:

ollama pull ui-tars:latest

Conclusion

UI-TARS Desktop represents a significant advancement in AI-powered computer automation. By combining Vision-Language Models with precise GUI control, it enables natural language interaction with your desktop environment.

Key Takeaways

  1. Multiple Options: Use local models for privacy or cloud APIs for convenience
  2. Hardware Matters: 16+ GB VRAM recommended for local FP16 inference
  3. Easy Testing: Free remote operators for quick evaluation
  4. Active Development: Regular updates with new features

Next Steps


Appendix A: Docker Build for UI-TARS Desktop (Advanced)

While not officially supported, you can build UI-TARS Desktop in a Docker container:

FROM node:22

# Clone repository
RUN git clone https://github.com/bytedance/UI-TARS-desktop.git /app

WORKDIR /app

# Install pnpm and dependencies
RUN npm install -g pnpm && pnpm install && pnpm package

# The built app will be in /app/out
CMD ["node", "dist/main.js"]

Build and run:

# Build the Docker image
docker build -t ui-tars-desktop .

# Run with privileged mode for GUI access
docker run -it --privileged 
    -e DISPLAY=$DISPLAY 
    -v /tmp/.X11-unix:/tmp/.X11-unix 
    ui-tars-desktop

⚠️ Important: Running Docker with GUI requires X11 forwarding on Linux. macOS/Windows need additional tools like XQuartz or VcXsrv.


Appendix B: Useful Commands

Docker Management

# View vLLM logs
docker logs -f ui-tars-vllm

# Check GPU usage
nvidia-smi -l 1

# Stop vLLM
docker stop ui-tars-vllm

# Remove container
docker rm ui-tars-vllm

# View running containers
docker ps

Ollama Management

# List models
ollama list

# Model info
ollama show ui-tars

# View logs
ollama logs

# Remove model
ollama rm ui-tars

Appendix C: Changelog Highlights

  • v0.2.0 (Jun 2025): Remote Computer and Browser operators
  • v0.1.0 (Apr 2025): Redesigned Agent UI, UI-TARS-1.5 support
  • v0.0.x (Early 2025): Initial release with local computer control

Last updated: January 2026

Comments

Sign in to join the discussion!

Your comments help others in the community.