UI-TARS Desktop Complete Setup Guide: Native GUI Agent for Computer Control

Published on January 12, 2026

Introduction

UI-TARS Desktop is an open-source native desktop application developed by ByteDance that enables you to control your computer using natural language commands. Built on Electron and powered by the UI-TARS Vision-Language Model, it can see your screen, understand UI elements, and perform mouse and keyboard actions autonomously.

This comprehensive guide covers everything from downloading the application to deploying your own local model, including detailed VRAM requirements and hardware recommendations.

What is UI-TARS Desktop?

UI-TARS Desktop is a GUI (Graphical User Interface) agent that:

Sees Your Screen: Takes screenshots and analyzes visual content
Understands Context: Uses Vision-Language Models to comprehend UI elements
Executes Actions: Performs precise mouse clicks, keyboard input, and navigation
Works Locally: Supports fully local operation for privacy-sensitive environments
Offers Remote Options: Free trial remote operators for quick testing

Key Features

Feature	Description
Natural Language Control	Describe tasks in plain English
Screenshot Recognition	AI-powered visual understanding
Precise Control	Mouse and keyboard automation
Cross-Platform	Windows and macOS support
Real-time Feedback	Live status and action display
Private & Secure	Option for fully local processing
Multiple Operators	Local computer, browser, and remote options

UI-TARS Desktop vs Agent TARS

Feature	UI-TARS Desktop	Agent TARS
Interface	Native Desktop App (Electron)	CLI + Web UI
Primary Use	Local computer GUI control	Browser automation, code execution
Model Backend	UI-TARS VLM (local or cloud)	Cloud APIs (OpenAI, Claude, etc.)
Best For	Desktop automation, privacy-focused	Web tasks, scripting
Installation	Download installer	npm install

📝 Note: If you need browser-focused automation with MCP tools and code execution, see our separate Agent TARS Complete Guide.

Part 1: System Requirements

Hardware Requirements

UI-TARS Desktop has different requirements depending on whether you run models locally or use cloud/remote services.

For Local Model Deployment (Running UI-TARS model on your machine)

Component	UI-TARS-7B (Minimum)	UI-TARS-7B (Recommended)	UI-TARS-72B
GPU VRAM	16 GB (FP16)	24 GB	80+ GB (Multi-GPU required)
System RAM	16 GB	32 GB	64+ GB
CPU	8 cores	12+ cores	16+ cores
Storage	40 GB SSD (~14GB model)	100 GB NVMe SSD	200+ GB NVMe (~144GB model)

For Cloud/Remote Model Usage (Using API or remote operators)

Component	Minimum	Recommended
System RAM	4 GB	8 GB
CPU	2 cores	4 cores
Storage	500 MB	1 GB
Network	Stable internet	Low-latency connection

VRAM Requirements by Quantization

The UI-TARS-7B model can run with different quantization levels:

Quantization	VRAM Required	Model Size	Quality Impact
FP32 (Full)	~28 GB	~33 GB	Best quality
FP16	~14-16 GB	~15.2 GB	Near-best quality
Q8_0	~8 GB	~8.1 GB	Good quality
Q6_K	~6.5 GB	~6.25 GB	Good quality
Q4_K_S	~4.5 GB	~4.46 GB	Moderate quality
Q2_K	~3.5 GB	~3.02 GB	Lower quality

💡 Recommendation: For best results, use FP16 with a 24GB GPU (RTX 3090, RTX 4090, A5000). For 8-12GB GPUs (RTX 3080/4070), use Q8_0 or Q4_K quantization.

Compatible GPUs

GPU	VRAM	Recommended Quantization
RTX 4090	24 GB	FP16 (best experience)
RTX 3090	24 GB	FP16 (best experience)
RTX 4080	16 GB	FP16 (tight fit)
RTX 3080	10-12 GB	Q8_0 or Q4_K
RTX 4070	12 GB	Q8_0
RTX 3070	8 GB	Q4_K_S
RTX 4060	8 GB	Q4_K_S

Software Requirements

Software	Windows	macOS
Operating System	Windows 10/11 (64-bit)	macOS 12+ (Monterey or later)
Permissions	Administrator access	Accessibility + Screen Recording

Part 2: Installing UI-TARS Desktop

Method 1: Download Pre-built Release (Recommended)

The easiest way to get UI-TARS Desktop is downloading the pre-built installer.

Windows Installation

Download the installer:
- Visit GitHub Releases
- Download the latest .exe file (e.g., UI-TARS-Desktop-X.X.X-Setup.exe)
Run the installer:
- Double-click the downloaded .exe file
- If Windows Defender shows a warning, click “More info” → “Run anyway”
- Follow the installation wizard
Launch the application:
- Find “UI-TARS” in your Start Menu
- Or launch from the desktop shortcut
Allow through firewall (if prompted):
- Click “Allow access” for private networks

Linux Installation

UI-TARS Desktop provides AppImage and .deb packages for Linux distributions.

Option A: AppImage (Universal)

# Download the latest AppImage from GitHub releases
wget https://github.com/bytedance/UI-TARS-desktop/releases/latest/download/UI-TARS-Desktop-x.x.x.AppImage

# Make it executable
chmod +x UI-TARS-Desktop-*.AppImage

# Run the application
./UI-TARS-Desktop-*.AppImage

What this does: AppImage is a portable format that runs on most Linux distributions without installation.

Option B: Debian/Ubuntu (.deb)

# Download the .deb package
wget https://github.com/bytedance/UI-TARS-desktop/releases/latest/download/ui-tars-desktop_x.x.x_amd64.deb

# Install with dpkg
sudo dpkg -i ui-tars-desktop_*.deb

# Fix any dependency issues
sudo apt-get install -f

# Launch from applications menu or run:
ui-tars-desktop

What this does: Installs UI-TARS Desktop system-wide via the Debian package manager.

macOS Installation

Option A: Direct Download

Download the installer:
- Visit GitHub Releases
- Download the .dmg file for your chip:
  - Apple Silicon (M1/M2/M3/M4): Download the arm64 version
  - Intel Macs: Download the x64 version
Install the application:
- Open the downloaded .dmg file
- Drag “UI-TARS” to your Applications folder
Grant required permissions: This is critical for UI-TARS to control your computer.

Accessibility Permission:
- Open System Settings → Privacy & Security → Accessibility
- Click the lock icon to make changes
- Click + and add UI-TARS
- Toggle UI-TARS ON
Screen Recording Permission:
- Open System Settings → Privacy & Security → Screen Recording
- Click the lock icon to make changes
- Click + and add UI-TARS
- Toggle UI-TARS ON
Launch the application:
- Double-click UI-TARS in Applications
- If you see “UI-TARS can’t be opened because Apple cannot check it for malicious software”:
  - Go to System Settings → Privacy & Security
  - Scroll down and click “Open Anyway”

Option B: Homebrew Installation

# Install via Homebrew Cask
brew install --cask ui-tars

# Update to latest version
brew upgrade --cask ui-tars

What this does: Downloads and installs UI-TARS Desktop from the Homebrew Cask repository.

⚠️ Important: After Homebrew installation, you still need to grant Accessibility and Screen Recording permissions manually.

Method 2: Build from Source (For Developers)

If you want the latest development version or plan to contribute:

Prerequisites

Install these tools first:

# Install Node.js 20+ (required)
# Using nvm:
nvm install 20
nvm use 20

# Install pnpm package manager
npm install -g pnpm

Clone and Build

# Clone the repository
git clone https://github.com/bytedance/UI-TARS-desktop.git
cd UI-TARS-desktop

# Install dependencies
pnpm install

# Build the UI-TARS Desktop app
pnpm build:ui-tars

# Run in development mode
pnpm dev:ui-tars

# Create distributable package
pnpm package

Build commands explained:

pnpm install: Downloads all required packages
pnpm build:ui-tars: Compiles the application
pnpm dev:ui-tars: Runs with hot-reload for development
pnpm package: Creates installer for your platform
pnpm make: Creates platform-specific installers (DMG, EXE, AppImage)

Platform notes:

macOS: May need to edit plist for permissions configuration
Windows: May require UAC (User Account Control) for certain operations
Linux: Ensure required X11 libraries are installed

Part 3: Understanding Operator Modes

UI-TARS Desktop supports four operator modes:

1. Local Computer Operator

Controls your local computer’s desktop:

Uses NutJS for mouse/keyboard automation
Takes screenshots of your screen
Requires local VLM model or cloud API

Best for: Desktop automation, application control, file management

2. Local Browser Operator

Controls a local Chromium browser:

Uses Puppeteer for browser automation
Isolated browser instance
Requires local VLM model or cloud API

Best for: Web automation with local processing

3. Remote Computer Operator (Free Trial)

Control a cloud-hosted virtual computer:

No local model needed
30-minute free trial sessions
VNC streaming preview

Best for: Testing UI-TARS without local setup

4. Remote Browser Operator (Free Trial)

Control a cloud-hosted browser:

No local model needed
30-minute free trial sessions
Browser automation in the cloud

Best for: Quick testing and evaluation

Part 4: VLM Provider Configuration

To use Local Computer or Local Browser operators, you need a Vision-Language Model backend.

Option A: Using HuggingFace Inference Endpoints (Cloud)

HuggingFace offers hosted endpoints for UI-TARS models.

Step 1: Create HuggingFace Account

Go to huggingface.co
Create an account or sign in
Navigate to your Settings → Access Tokens
Create a new token with read access

Step 2: Deploy UI-TARS Endpoint

Go to HuggingFace Inference Endpoints
Click “Deploy new model”
Search for ByteDance-Seed/UI-TARS-1.5-7B
Select a GPU instance:
- A100 (80GB): Best performance
- A10G (24GB): Good balance
- T4 (16GB): Budget option with FP16
Deploy and wait for the endpoint to be ready (5-10 minutes)
Copy your endpoint URL (format: https://<your-endpoint>.endpoints.huggingface.cloud/v1/)

Step 3: Configure in UI-TARS Desktop

Open UI-TARS Desktop
Click the Settings icon (gear icon)
Select HuggingFace as provider
Enter:
- Base URL: Your HuggingFace endpoint URL
- API Key: Your HuggingFace access token
- Model: UI-TARS-1.5-7B

Option B: Using VolcEngine (ByteDance Cloud)

VolcEngine is ByteDance’s cloud platform with optimized UI-TARS support.

Step 1: Create VolcEngine Account

Visit VolcEngine Console
Sign up for an account
Navigate to the AI/ML services section
Subscribe to the Doubao model service

Step 2: Get API Credentials

Go to API Key management
Create a new API key
Note down your:
- API Key
- Endpoint URL (usually https://ark.cn-beijing.volces.com/api/v3)

Step 3: Configure in UI-TARS Desktop

Open UI-TARS Desktop
Click the Settings icon
Select VolcEngine as provider
Enter:
- Base URL: https://ark.cn-beijing.volces.com/api/v3
- API Key: Your VolcEngine API key
- Model: doubao-1.5-ui-tars-250328

Option C: Using OpenAI-Compatible Endpoints

Any OpenAI-compatible API can be used (Ollama, vLLM, LocalAI, etc.).

Configuration in UI-TARS Desktop

Open UI-TARS Desktop
Click the Settings icon
Select Custom or OpenAI-Compatible as provider
Enter:
- Base URL: Your local endpoint (e.g., http://localhost:8000/v1)
- API Key: Set any value if not required (some servers need a placeholder)
- Model: Model name as registered in your backend

Part 5: Local Model Deployment with vLLM (Docker)

For privacy or offline use, deploy UI-TARS locally using vLLM.

Prerequisites

NVIDIA GPU with 16+ GB VRAM
NVIDIA Docker runtime installed
50+ GB free disk space
Docker installed

Option A: Direct Python vLLM Installation (Non-Docker)

For users who prefer not to use Docker:

# Install Python 3.10+ and pip
sudo apt-get install python3 python3-pip

# Install vLLM
pip install vllm

# Download model from HuggingFace (requires huggingface-cli)
pip install huggingface_hub
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B

# Start the OpenAI-compatible server
python -m vllm.entrypoints.openai.api_server 
    --model ByteDance-Seed/UI-TARS-1.5-7B 
    --trust-remote-code 
    --gpu-memory-utilization 0.9 
    --port 8000

What this does: Runs vLLM directly on your system without Docker containerization. Uses 90% of available GPU memory for optimal throughput.

Option B: Docker-based vLLM Installation

Step 1: Install NVIDIA Container Toolkit

Linux (Ubuntu/Debian):

# Add NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | 
    sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

What this does: Installs the NVIDIA Container Toolkit which allows Docker containers to access your GPU.

Windows (WSL2):

# Ensure latest NVIDIA drivers are installed
# Install WSL2 with Ubuntu
wsl --install -d Ubuntu

# Inside WSL2, the NVIDIA Container Toolkit should already be available
# Verify with:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Step 2: Run vLLM Server with UI-TARS

Using Docker (Recommended):

# Create a directory for model cache
mkdir -p ~/.cache/huggingface

# Run vLLM with UI-TARS-1.5-7B
docker run -d 
    --name ui-tars-vllm 
    --gpus all 
    -v ~/.cache/huggingface:/root/.cache/huggingface 
    -p 8000:8000 
    vllm/vllm-openai:latest 
    --model ByteDance-Seed/UI-TARS-1.5-7B 
    --trust-remote-code 
    --max-model-len 32768 
    --dtype auto 
    --api-key "local-api-key"

Command breakdown:

--gpus all: Use all available GPUs
-v ~/.cache/huggingface:/root/.cache/huggingface: Cache downloaded models
-p 8000:8000: Expose API on port 8000
--model: The UI-TARS model from HuggingFace
--trust-remote-code: Required for UI-TARS model
--max-model-len: Maximum context length
--dtype auto: Automatic precision selection
--api-key: Set a local API key for security

For 8GB GPUs (Using Quantization):

docker run -d 
    --name ui-tars-vllm 
    --gpus all 
    -v ~/.cache/huggingface:/root/.cache/huggingface 
    -p 8000:8000 
    vllm/vllm-openai:latest 
    --model ByteDance-Seed/UI-TARS-1.5-7B 
    --trust-remote-code 
    --quantization awq 
    --max-model-len 16384 
    --api-key "local-api-key"

What this does: Uses AWQ 4-bit quantization to reduce VRAM usage to approximately 4-5GB while maintaining reasonable quality.

Step 3: Verify vLLM Server

# Check if server is running
curl http://localhost:8000/v1/models

# Expected output:
# {"data":[{"id":"ByteDance-Seed/UI-TARS-1.5-7B","object":"model"...}]}

Step 4: Configure UI-TARS Desktop

Open UI-TARS Desktop
Click Settings
Select Custom/OpenAI-Compatible
Configure:
- Base URL: http://localhost:8000/v1
- API Key: local-api-key (or what you set)
- Model: ByteDance-Seed/UI-TARS-1.5-7B
Click Save

Part 6: Local Model Deployment with Ollama

Ollama provides an easier way to run local models on your machine.

Step 1: Install Ollama

Linux:

curl -fsSL https://ollama.ai/install.sh | sh

macOS:

# Using Homebrew
brew install ollama

# Or download from ollama.ai

Windows:

Download and install from ollama.ai/download.

Step 2: Pull UI-TARS Model

As of early 2026, UI-TARS models may not be directly available in Ollama’s registry. You can use GGUF versions:

# Check available models (UI-TARS may need to be imported)
ollama list

# If GGUF version is available in the community:
ollama pull mradermacher/UI-TARS-1.5-7B-GGUF:Q8_0

# Or import a GGUF file manually
ollama create ui-tars -f ./Modelfile

Modelfile example:

FROM ./UI-TARS-1.5-7B-Q8_0.gguf

TEMPLATE """{{ .System }}
{{ .Prompt }}"""

PARAMETER temperature 0.1
PARAMETER num_ctx 32768

Step 3: Run Ollama Server

# Start Ollama server (usually runs automatically)
ollama serve

# The API will be available at http://localhost:11434

Step 4: Configure UI-TARS Desktop for Ollama

Open UI-TARS Desktop
Click Settings
Select Custom/OpenAI-Compatible
Configure:
- Base URL: http://localhost:11434/v1
- API Key: ollama (placeholder)
- Model: ui-tars (or your model name)
Click Save

Part 7: Using Remote Operators (Free Trial)

The quickest way to try UI-TARS Desktop is using the free remote operators.

Remote Computer Operator

Open UI-TARS Desktop
Select Remote Computer from the operator dropdown
Click Start Free Trial
Wait for resource allocation (1-2 minutes)
A VNC preview window will show the remote desktop
Enter your task in the chat input
Watch the AI control the remote computer

Free trial limits:

30 minutes per session
Daily usage limits may apply
Queue during high demand

Remote Browser Operator

Open UI-TARS Desktop
Select Remote Browser from the operator dropdown
Click Start Free Trial
Wait for browser allocation
Enter your browsing task
Watch the AI control the remote browser

Part 8: Using UI-TARS Desktop

First Run Setup

Launch UI-TARS Desktop
Configure VLM Settings:
- Click the Settings gear icon
- Choose your provider (HuggingFace, VolcEngine, or Custom)
- Enter your API credentials
- Click Verify to test the connection
- Click Save
Grant Permissions (macOS only):
- If prompted, grant Accessibility permissions
- Grant Screen Recording permissions
- Restart the application

Performing Tasks

Select an Operator:
- Local Computer (requires VLM setup)
- Local Browser (requires VLM setup)
- Remote Computer (free trial)
- Remote Browser (free trial)

Enter Your Task: Type a natural language description:

"Open Notepad and type 'Hello World', then save the file as test.txt on the desktop"

Click Send or press Enter
Watch the Execution:
- UI-TARS takes a screenshot
- Analyzes the screen content
- Plans the action sequence
- Executes mouse/keyboard actions
- Reports progress in real-time

Example Tasks

Task Type	Example Prompt
App Control	“Open Calculator and calculate 15 × 23”
File Operations	“Create a new folder called ‘Projects’ on the desktop”
Web Browsing	“Open Chrome and search for ‘weather in Tokyo’”
Multi-step	“Open Word, create a new document, add a heading ‘Meeting Notes’, and save it”
System Settings	“Open Display Settings and change the resolution to 1920x1080”

Stopping Execution

Click the Stop button to halt the current task
Press Escape key
The agent will safely stop after completing the current action

Part 9: Troubleshooting

Common Issues

Issue: “Screen Recording permission not granted” (macOS)

Symptoms: UI-TARS can’t take screenshots.

Solution:

Open System Settings → Privacy & Security → Screen Recording
Ensure UI-TARS is listed and toggled ON
If already enabled, remove and re-add UI-TARS
Restart the application

Issue: “Accessibility permission not granted” (macOS)

Symptoms: UI-TARS can’t click or type.

Solution:

Open System Settings → Privacy & Security → Accessibility
Ensure UI-TARS is listed and toggled ON
If already enabled, remove and re-add UI-TARS
Restart the application

Issue: “VLM connection failed”

Symptoms: Cannot connect to model endpoint.

Solution:

Verify your API endpoint URL is correct
Check your API key is valid and not expired
Ensure the model server is running (for local setups)

Test with curl:

curl -X POST http://your-endpoint/v1/chat/completions 
  -H "Content-Type: application/json" 
  -H "Authorization: Bearer YOUR_API_KEY" 
  -d '{"model":"UI-TARS-1.5-7B","messages":[{"role":"user","content":"Hello"}]}'

Issue: “Out of VRAM” (Local deployment)

Symptoms: vLLM or Ollama crashes with memory errors.

Solution:

Use a quantized model (Q8_0 or Q4_K)
Reduce --max-model-len to 16384 or 8192
Close other GPU-intensive applications
Check GPU memory with nvidia-smi

Issue: “App won’t open” (Windows)

Symptoms: Application doesn’t start or shows error.

Solution:

Run as Administrator
Check Windows Event Viewer for errors
Reinstall with latest version
Disable antivirus temporarily to test

Issue: Remote operator shows “unavailable”

Symptoms: Free trial resources not available.

Solution:

High demand—try again later
Daily limit reached—wait 24 hours
Check internet connection
Try the other remote operator type

Performance Tips

For local deployment:
- Use NVMe SSD for model storage
- Keep GPU drivers updated
- Close unnecessary applications
- Use FP16 if you have sufficient VRAM
For cloud APIs:
- Use low-latency internet connection
- Choose geographically close endpoints
- Consider higher-tier plans for better performance
General tips:
- Lower screen resolution for faster processing
- Use simpler prompts for faster execution
- Enable caching if available

Part 10: Advanced Configuration

Custom Model Endpoints

For enterprise deployments or custom models:

{
  "provider": "custom",
  "baseURL": "https://api.your-company.com/v1",
  "apiKey": "your-enterprise-key",
  "model": "ui-tars-enterprise-v2",
  "timeout": 60000,
  "maxTokens": 4096
}

Multiple Operator Profiles

You can configure different profiles for different use cases:

Work Profile: Connect to company’s private VLM
Personal Profile: Use HuggingFace endpoint
Testing Profile: Use free remote operators

Switch profiles from the dropdown in Settings.

Screenshot Quality Settings

For faster processing on slower systems:

Open Settings → Advanced
Reduce Screenshot Quality (default: 90%)
Reduce Screenshot Resolution Scale (default: 100%)

Part 11: Security Considerations

API Key Security

Never share your API keys
Use environment variables where possible
Rotate keys periodically
Monitor usage for unexpected activity

Local Processing Advantages

When running models locally:

No data leaves your machine
No cloud dependencies
Works offline
Complete privacy

Remote Operator Considerations

When using remote operators:

Sessions are temporary and isolated
Do not enter sensitive credentials
Use for testing only
Data may be processed on ByteDance servers

Part 12: Model Comparison

UI-TARS Model Versions

Model	Parameters	Best Use Case	VRAM (FP16)
UI-TARS-1.5-7B	7 billion	General use, consumer GPUs	~14-16 GB
UI-TARS-1.6-7B	7 billion	Latest improvements	~14-16 GB
UI-TARS-72B	72 billion	Complex tasks, enterprise	80+ GB
UI-TARS-2B	2 billion	Fast inference, limited accuracy	~6 GB

Recommended Configurations

Your Hardware	Recommended Model	Quantization
RTX 4090 (24GB)	UI-TARS-1.5-7B	FP16
RTX 3080 (10GB)	UI-TARS-1.5-7B	Q4_K_S
RTX 4060 (8GB)	UI-TARS-1.5-7B	Q4_K_S or Q2_K
Apple M2 Pro (32GB)	UI-TARS-1.5-7B	MLX 6-bit
No GPU	Use cloud API	N/A

Part 13: Development and SDK

UI-TARS provides an SDK for developers to integrate GUI automation into their applications.

Python SDK

# Install Python SDK
pip install ui-tars

Usage example:

from ui_tars import UITARSAgent

# Initialize agent
agent = UITARSAgent(
    model_endpoint="http://localhost:8000/v1",
    api_key="your-api-key"
)

# Execute a task
result = agent.run("Open Notepad and type 'Hello World'")
print(result)

Node.js SDK

npm install @ui-tars/sdk

Usage example:

const { GUIAgent } = require('@ui-tars/sdk');

const agent = new GUIAgent({
  endpoint: 'http://localhost:8000/v1',
  apiKey: 'your-api-key'
});

await agent.run('Open Chrome and navigate to google.com');

Part 14: Integration with Other Systems

MCP Server Mode

UI-TARS Desktop can run as an MCP (Model Context Protocol) server:

# Run UI-TARS Desktop as MCP server
ui-tars-desktop --mcp-server --port 8080

Other MCP clients can then connect and use UI-TARS capabilities.

API Mode

Run UI-TARS as a REST API server for integration:

ui-tars-desktop --api-server --port 3000

Endpoints:

POST /execute: Execute a task
GET /status: Get current status
POST /stop: Stop current execution

Part 15: Maintenance and Updates

Updating UI-TARS Desktop

Direct Download:

Download the latest release from GitHub
Install over the existing installation
Configuration is preserved

Homebrew (macOS):

brew upgrade --cask ui-tars

From Source:

cd UI-TARS-desktop
git pull origin main
pnpm install
pnpm build:ui-tars

Updating Local Models

vLLM:

# Stop the current container
docker stop ui-tars-vllm

# Pull latest model
docker run --rm -v ~/.cache/huggingface:/root/.cache/huggingface 
    vllm/vllm-openai:latest python -c "from huggingface_hub import snapshot_download; snapshot_download('ByteDance-Seed/UI-TARS-1.5-7B')"

# Restart container
docker start ui-tars-vllm

Ollama:

ollama pull ui-tars:latest

Conclusion

UI-TARS Desktop represents a significant advancement in AI-powered computer automation. By combining Vision-Language Models with precise GUI control, it enables natural language interaction with your desktop environment.

Key Takeaways

Multiple Options: Use local models for privacy or cloud APIs for convenience
Hardware Matters: 16+ GB VRAM recommended for local FP16 inference
Easy Testing: Free remote operators for quick evaluation
Active Development: Regular updates with new features

Next Steps

Join the Discord community for support
Read the official documentation
Explore the SDK documentation

Agent TARS Complete Guide - For browser automation with CLI
vLLM Deployment Guide - Advanced model serving

Appendix A: Docker Build for UI-TARS Desktop (Advanced)

While not officially supported, you can build UI-TARS Desktop in a Docker container:

FROM node:22

# Clone repository
RUN git clone https://github.com/bytedance/UI-TARS-desktop.git /app

WORKDIR /app

# Install pnpm and dependencies
RUN npm install -g pnpm && pnpm install && pnpm package

# The built app will be in /app/out
CMD ["node", "dist/main.js"]

Build and run:

# Build the Docker image
docker build -t ui-tars-desktop .

# Run with privileged mode for GUI access
docker run -it --privileged 
    -e DISPLAY=$DISPLAY 
    -v /tmp/.X11-unix:/tmp/.X11-unix 
    ui-tars-desktop

⚠️ Important: Running Docker with GUI requires X11 forwarding on Linux. macOS/Windows need additional tools like XQuartz or VcXsrv.

Appendix B: Useful Commands

Docker Management

# View vLLM logs
docker logs -f ui-tars-vllm

# Check GPU usage
nvidia-smi -l 1

# Stop vLLM
docker stop ui-tars-vllm

# Remove container
docker rm ui-tars-vllm

# View running containers
docker ps

Ollama Management

# List models
ollama list

# Model info
ollama show ui-tars

# View logs
ollama logs

# Remove model
ollama rm ui-tars