UI-TARS Desktop Complete Setup Guide: Native GUI Agent for Computer Control
Published on January 12, 2026
Introduction
UI-TARS Desktop is an open-source native desktop application developed by ByteDance that enables you to control your computer using natural language commands. Built on Electron and powered by the UI-TARS Vision-Language Model, it can see your screen, understand UI elements, and perform mouse and keyboard actions autonomously.
This comprehensive guide covers everything from downloading the application to deploying your own local model, including detailed VRAM requirements and hardware recommendations.
What is UI-TARS Desktop?
UI-TARS Desktop is a GUI (Graphical User Interface) agent that:
- Sees Your Screen: Takes screenshots and analyzes visual content
- Understands Context: Uses Vision-Language Models to comprehend UI elements
- Executes Actions: Performs precise mouse clicks, keyboard input, and navigation
- Works Locally: Supports fully local operation for privacy-sensitive environments
- Offers Remote Options: Free trial remote operators for quick testing
Key Features
| Feature | Description |
|---|---|
| Natural Language Control | Describe tasks in plain English |
| Screenshot Recognition | AI-powered visual understanding |
| Precise Control | Mouse and keyboard automation |
| Cross-Platform | Windows and macOS support |
| Real-time Feedback | Live status and action display |
| Private & Secure | Option for fully local processing |
| Multiple Operators | Local computer, browser, and remote options |
UI-TARS Desktop vs Agent TARS
| Feature | UI-TARS Desktop | Agent TARS |
|---|---|---|
| Interface | Native Desktop App (Electron) | CLI + Web UI |
| Primary Use | Local computer GUI control | Browser automation, code execution |
| Model Backend | UI-TARS VLM (local or cloud) | Cloud APIs (OpenAI, Claude, etc.) |
| Best For | Desktop automation, privacy-focused | Web tasks, scripting |
| Installation | Download installer | npm install |
📝 Note: If you need browser-focused automation with MCP tools and code execution, see our separate Agent TARS Complete Guide.
Part 1: System Requirements
Hardware Requirements
UI-TARS Desktop has different requirements depending on whether you run models locally or use cloud/remote services.
For Local Model Deployment (Running UI-TARS model on your machine)
| Component | UI-TARS-7B (Minimum) | UI-TARS-7B (Recommended) | UI-TARS-72B |
|---|---|---|---|
| GPU VRAM | 16 GB (FP16) | 24 GB | 80+ GB (Multi-GPU required) |
| System RAM | 16 GB | 32 GB | 64+ GB |
| CPU | 8 cores | 12+ cores | 16+ cores |
| Storage | 40 GB SSD (~14GB model) | 100 GB NVMe SSD | 200+ GB NVMe (~144GB model) |
For Cloud/Remote Model Usage (Using API or remote operators)
| Component | Minimum | Recommended |
|---|---|---|
| System RAM | 4 GB | 8 GB |
| CPU | 2 cores | 4 cores |
| Storage | 500 MB | 1 GB |
| Network | Stable internet | Low-latency connection |
VRAM Requirements by Quantization
The UI-TARS-7B model can run with different quantization levels:
| Quantization | VRAM Required | Model Size | Quality Impact |
|---|---|---|---|
| FP32 (Full) | ~28 GB | ~33 GB | Best quality |
| FP16 | ~14-16 GB | ~15.2 GB | Near-best quality |
| Q8_0 | ~8 GB | ~8.1 GB | Good quality |
| Q6_K | ~6.5 GB | ~6.25 GB | Good quality |
| Q4_K_S | ~4.5 GB | ~4.46 GB | Moderate quality |
| Q2_K | ~3.5 GB | ~3.02 GB | Lower quality |
💡 Recommendation: For best results, use FP16 with a 24GB GPU (RTX 3090, RTX 4090, A5000). For 8-12GB GPUs (RTX 3080/4070), use Q8_0 or Q4_K quantization.
Compatible GPUs
| GPU | VRAM | Recommended Quantization |
|---|---|---|
| RTX 4090 | 24 GB | FP16 (best experience) |
| RTX 3090 | 24 GB | FP16 (best experience) |
| RTX 4080 | 16 GB | FP16 (tight fit) |
| RTX 3080 | 10-12 GB | Q8_0 or Q4_K |
| RTX 4070 | 12 GB | Q8_0 |
| RTX 3070 | 8 GB | Q4_K_S |
| RTX 4060 | 8 GB | Q4_K_S |
Software Requirements
| Software | Windows | macOS |
|---|---|---|
| Operating System | Windows 10/11 (64-bit) | macOS 12+ (Monterey or later) |
| Permissions | Administrator access | Accessibility + Screen Recording |
Part 2: Installing UI-TARS Desktop
Method 1: Download Pre-built Release (Recommended)
The easiest way to get UI-TARS Desktop is downloading the pre-built installer.
Windows Installation
Download the installer:
- Visit GitHub Releases
- Download the latest
.exefile (e.g.,UI-TARS-Desktop-X.X.X-Setup.exe)
Run the installer:
- Double-click the downloaded
.exefile - If Windows Defender shows a warning, click “More info” → “Run anyway”
- Follow the installation wizard
- Double-click the downloaded
Launch the application:
- Find “UI-TARS” in your Start Menu
- Or launch from the desktop shortcut
Allow through firewall (if prompted):
- Click “Allow access” for private networks
Linux Installation
UI-TARS Desktop provides AppImage and .deb packages for Linux distributions.
Option A: AppImage (Universal)
# Download the latest AppImage from GitHub releases
wget https://github.com/bytedance/UI-TARS-desktop/releases/latest/download/UI-TARS-Desktop-x.x.x.AppImage
# Make it executable
chmod +x UI-TARS-Desktop-*.AppImage
# Run the application
./UI-TARS-Desktop-*.AppImage What this does: AppImage is a portable format that runs on most Linux distributions without installation.
Option B: Debian/Ubuntu (.deb)
# Download the .deb package
wget https://github.com/bytedance/UI-TARS-desktop/releases/latest/download/ui-tars-desktop_x.x.x_amd64.deb
# Install with dpkg
sudo dpkg -i ui-tars-desktop_*.deb
# Fix any dependency issues
sudo apt-get install -f
# Launch from applications menu or run:
ui-tars-desktop What this does: Installs UI-TARS Desktop system-wide via the Debian package manager.
macOS Installation
Option A: Direct Download
Download the installer:
- Visit GitHub Releases
- Download the
.dmgfile for your chip:- Apple Silicon (M1/M2/M3/M4): Download the
arm64version - Intel Macs: Download the
x64version
- Apple Silicon (M1/M2/M3/M4): Download the
Install the application:
- Open the downloaded
.dmgfile - Drag “UI-TARS” to your Applications folder
- Open the downloaded
Grant required permissions: This is critical for UI-TARS to control your computer.
Accessibility Permission:
- Open System Settings → Privacy & Security → Accessibility
- Click the lock icon to make changes
- Click + and add UI-TARS
- Toggle UI-TARS ON
Screen Recording Permission:
- Open System Settings → Privacy & Security → Screen Recording
- Click the lock icon to make changes
- Click + and add UI-TARS
- Toggle UI-TARS ON
Launch the application:
- Double-click UI-TARS in Applications
- If you see “UI-TARS can’t be opened because Apple cannot check it for malicious software”:
- Go to System Settings → Privacy & Security
- Scroll down and click “Open Anyway”
Option B: Homebrew Installation
# Install via Homebrew Cask
brew install --cask ui-tars
# Update to latest version
brew upgrade --cask ui-tars What this does: Downloads and installs UI-TARS Desktop from the Homebrew Cask repository.
⚠️ Important: After Homebrew installation, you still need to grant Accessibility and Screen Recording permissions manually.
Method 2: Build from Source (For Developers)
If you want the latest development version or plan to contribute:
Prerequisites
Install these tools first:
# Install Node.js 20+ (required)
# Using nvm:
nvm install 20
nvm use 20
# Install pnpm package manager
npm install -g pnpm Clone and Build
# Clone the repository
git clone https://github.com/bytedance/UI-TARS-desktop.git
cd UI-TARS-desktop
# Install dependencies
pnpm install
# Build the UI-TARS Desktop app
pnpm build:ui-tars
# Run in development mode
pnpm dev:ui-tars
# Create distributable package
pnpm package Build commands explained:
pnpm install: Downloads all required packagespnpm build:ui-tars: Compiles the applicationpnpm dev:ui-tars: Runs with hot-reload for developmentpnpm package: Creates installer for your platformpnpm make: Creates platform-specific installers (DMG, EXE, AppImage)
Platform notes:
- macOS: May need to edit plist for permissions configuration
- Windows: May require UAC (User Account Control) for certain operations
- Linux: Ensure required X11 libraries are installed
Part 3: Understanding Operator Modes
UI-TARS Desktop supports four operator modes:
1. Local Computer Operator
Controls your local computer’s desktop:
- Uses NutJS for mouse/keyboard automation
- Takes screenshots of your screen
- Requires local VLM model or cloud API
Best for: Desktop automation, application control, file management
2. Local Browser Operator
Controls a local Chromium browser:
- Uses Puppeteer for browser automation
- Isolated browser instance
- Requires local VLM model or cloud API
Best for: Web automation with local processing
3. Remote Computer Operator (Free Trial)
Control a cloud-hosted virtual computer:
- No local model needed
- 30-minute free trial sessions
- VNC streaming preview
Best for: Testing UI-TARS without local setup
4. Remote Browser Operator (Free Trial)
Control a cloud-hosted browser:
- No local model needed
- 30-minute free trial sessions
- Browser automation in the cloud
Best for: Quick testing and evaluation
Part 4: VLM Provider Configuration
To use Local Computer or Local Browser operators, you need a Vision-Language Model backend.
Option A: Using HuggingFace Inference Endpoints (Cloud)
HuggingFace offers hosted endpoints for UI-TARS models.
Step 1: Create HuggingFace Account
- Go to huggingface.co
- Create an account or sign in
- Navigate to your Settings → Access Tokens
- Create a new token with
readaccess
Step 2: Deploy UI-TARS Endpoint
- Go to HuggingFace Inference Endpoints
- Click “Deploy new model”
- Search for
ByteDance-Seed/UI-TARS-1.5-7B - Select a GPU instance:
- A100 (80GB): Best performance
- A10G (24GB): Good balance
- T4 (16GB): Budget option with FP16
- Deploy and wait for the endpoint to be ready (5-10 minutes)
- Copy your endpoint URL (format:
https://<your-endpoint>.endpoints.huggingface.cloud/v1/)
Step 3: Configure in UI-TARS Desktop
- Open UI-TARS Desktop
- Click the Settings icon (gear icon)
- Select HuggingFace as provider
- Enter:
- Base URL: Your HuggingFace endpoint URL
- API Key: Your HuggingFace access token
- Model:
UI-TARS-1.5-7B
Option B: Using VolcEngine (ByteDance Cloud)
VolcEngine is ByteDance’s cloud platform with optimized UI-TARS support.
Step 1: Create VolcEngine Account
- Visit VolcEngine Console
- Sign up for an account
- Navigate to the AI/ML services section
- Subscribe to the Doubao model service
Step 2: Get API Credentials
- Go to API Key management
- Create a new API key
- Note down your:
- API Key
- Endpoint URL (usually
https://ark.cn-beijing.volces.com/api/v3)
Step 3: Configure in UI-TARS Desktop
- Open UI-TARS Desktop
- Click the Settings icon
- Select VolcEngine as provider
- Enter:
- Base URL:
https://ark.cn-beijing.volces.com/api/v3 - API Key: Your VolcEngine API key
- Model:
doubao-1.5-ui-tars-250328
- Base URL:
Option C: Using OpenAI-Compatible Endpoints
Any OpenAI-compatible API can be used (Ollama, vLLM, LocalAI, etc.).
Configuration in UI-TARS Desktop
- Open UI-TARS Desktop
- Click the Settings icon
- Select Custom or OpenAI-Compatible as provider
- Enter:
- Base URL: Your local endpoint (e.g.,
http://localhost:8000/v1) - API Key: Set any value if not required (some servers need a placeholder)
- Model: Model name as registered in your backend
- Base URL: Your local endpoint (e.g.,
Part 5: Local Model Deployment with vLLM (Docker)
For privacy or offline use, deploy UI-TARS locally using vLLM.
Prerequisites
- NVIDIA GPU with 16+ GB VRAM
- NVIDIA Docker runtime installed
- 50+ GB free disk space
- Docker installed
Option A: Direct Python vLLM Installation (Non-Docker)
For users who prefer not to use Docker:
# Install Python 3.10+ and pip
sudo apt-get install python3 python3-pip
# Install vLLM
pip install vllm
# Download model from HuggingFace (requires huggingface-cli)
pip install huggingface_hub
huggingface-cli download ByteDance-Seed/UI-TARS-1.5-7B
# Start the OpenAI-compatible server
python -m vllm.entrypoints.openai.api_server
--model ByteDance-Seed/UI-TARS-1.5-7B
--trust-remote-code
--gpu-memory-utilization 0.9
--port 8000 What this does: Runs vLLM directly on your system without Docker containerization. Uses 90% of available GPU memory for optimal throughput.
Option B: Docker-based vLLM Installation
Step 1: Install NVIDIA Container Toolkit
Linux (Ubuntu/Debian):
# Add NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list |
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi What this does: Installs the NVIDIA Container Toolkit which allows Docker containers to access your GPU.
Windows (WSL2):
# Ensure latest NVIDIA drivers are installed
# Install WSL2 with Ubuntu
wsl --install -d Ubuntu
# Inside WSL2, the NVIDIA Container Toolkit should already be available
# Verify with:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi Step 2: Run vLLM Server with UI-TARS
Using Docker (Recommended):
# Create a directory for model cache
mkdir -p ~/.cache/huggingface
# Run vLLM with UI-TARS-1.5-7B
docker run -d
--name ui-tars-vllm
--gpus all
-v ~/.cache/huggingface:/root/.cache/huggingface
-p 8000:8000
vllm/vllm-openai:latest
--model ByteDance-Seed/UI-TARS-1.5-7B
--trust-remote-code
--max-model-len 32768
--dtype auto
--api-key "local-api-key" Command breakdown:
--gpus all: Use all available GPUs-v ~/.cache/huggingface:/root/.cache/huggingface: Cache downloaded models-p 8000:8000: Expose API on port 8000--model: The UI-TARS model from HuggingFace--trust-remote-code: Required for UI-TARS model--max-model-len: Maximum context length--dtype auto: Automatic precision selection--api-key: Set a local API key for security
For 8GB GPUs (Using Quantization):
docker run -d
--name ui-tars-vllm
--gpus all
-v ~/.cache/huggingface:/root/.cache/huggingface
-p 8000:8000
vllm/vllm-openai:latest
--model ByteDance-Seed/UI-TARS-1.5-7B
--trust-remote-code
--quantization awq
--max-model-len 16384
--api-key "local-api-key" What this does: Uses AWQ 4-bit quantization to reduce VRAM usage to approximately 4-5GB while maintaining reasonable quality.
Step 3: Verify vLLM Server
# Check if server is running
curl http://localhost:8000/v1/models
# Expected output:
# {"data":[{"id":"ByteDance-Seed/UI-TARS-1.5-7B","object":"model"...}]} Step 4: Configure UI-TARS Desktop
- Open UI-TARS Desktop
- Click Settings
- Select Custom/OpenAI-Compatible
- Configure:
- Base URL:
http://localhost:8000/v1 - API Key:
local-api-key(or what you set) - Model:
ByteDance-Seed/UI-TARS-1.5-7B
- Base URL:
- Click Save
Part 6: Local Model Deployment with Ollama
Ollama provides an easier way to run local models on your machine.
Step 1: Install Ollama
Linux:
curl -fsSL https://ollama.ai/install.sh | sh macOS:
# Using Homebrew
brew install ollama
# Or download from ollama.ai Windows:
Download and install from ollama.ai/download.
Step 2: Pull UI-TARS Model
As of early 2026, UI-TARS models may not be directly available in Ollama’s registry. You can use GGUF versions:
# Check available models (UI-TARS may need to be imported)
ollama list
# If GGUF version is available in the community:
ollama pull mradermacher/UI-TARS-1.5-7B-GGUF:Q8_0
# Or import a GGUF file manually
ollama create ui-tars -f ./Modelfile Modelfile example:
FROM ./UI-TARS-1.5-7B-Q8_0.gguf
TEMPLATE """{{ .System }}
{{ .Prompt }}"""
PARAMETER temperature 0.1
PARAMETER num_ctx 32768 Step 3: Run Ollama Server
# Start Ollama server (usually runs automatically)
ollama serve
# The API will be available at http://localhost:11434 Step 4: Configure UI-TARS Desktop for Ollama
- Open UI-TARS Desktop
- Click Settings
- Select Custom/OpenAI-Compatible
- Configure:
- Base URL:
http://localhost:11434/v1 - API Key:
ollama(placeholder) - Model:
ui-tars(or your model name)
- Base URL:
- Click Save
Part 7: Using Remote Operators (Free Trial)
The quickest way to try UI-TARS Desktop is using the free remote operators.
Remote Computer Operator
- Open UI-TARS Desktop
- Select Remote Computer from the operator dropdown
- Click Start Free Trial
- Wait for resource allocation (1-2 minutes)
- A VNC preview window will show the remote desktop
- Enter your task in the chat input
- Watch the AI control the remote computer
Free trial limits:
- 30 minutes per session
- Daily usage limits may apply
- Queue during high demand
Remote Browser Operator
- Open UI-TARS Desktop
- Select Remote Browser from the operator dropdown
- Click Start Free Trial
- Wait for browser allocation
- Enter your browsing task
- Watch the AI control the remote browser
Part 8: Using UI-TARS Desktop
First Run Setup
Launch UI-TARS Desktop
Configure VLM Settings:
- Click the Settings gear icon
- Choose your provider (HuggingFace, VolcEngine, or Custom)
- Enter your API credentials
- Click Verify to test the connection
- Click Save
Grant Permissions (macOS only):
- If prompted, grant Accessibility permissions
- Grant Screen Recording permissions
- Restart the application
Performing Tasks
Select an Operator:
- Local Computer (requires VLM setup)
- Local Browser (requires VLM setup)
- Remote Computer (free trial)
- Remote Browser (free trial)
Enter Your Task: Type a natural language description:
"Open Notepad and type 'Hello World', then save the file as test.txt on the desktop"Click Send or press Enter
Watch the Execution:
- UI-TARS takes a screenshot
- Analyzes the screen content
- Plans the action sequence
- Executes mouse/keyboard actions
- Reports progress in real-time
Example Tasks
| Task Type | Example Prompt |
|---|---|
| App Control | “Open Calculator and calculate 15 × 23” |
| File Operations | “Create a new folder called ‘Projects’ on the desktop” |
| Web Browsing | “Open Chrome and search for ‘weather in Tokyo’” |
| Multi-step | “Open Word, create a new document, add a heading ‘Meeting Notes’, and save it” |
| System Settings | “Open Display Settings and change the resolution to 1920x1080” |
Stopping Execution
- Click the Stop button to halt the current task
- Press Escape key
- The agent will safely stop after completing the current action
Part 9: Troubleshooting
Common Issues
Issue: “Screen Recording permission not granted” (macOS)
Symptoms: UI-TARS can’t take screenshots.
Solution:
- Open System Settings → Privacy & Security → Screen Recording
- Ensure UI-TARS is listed and toggled ON
- If already enabled, remove and re-add UI-TARS
- Restart the application
Issue: “Accessibility permission not granted” (macOS)
Symptoms: UI-TARS can’t click or type.
Solution:
- Open System Settings → Privacy & Security → Accessibility
- Ensure UI-TARS is listed and toggled ON
- If already enabled, remove and re-add UI-TARS
- Restart the application
Issue: “VLM connection failed”
Symptoms: Cannot connect to model endpoint.
Solution:
- Verify your API endpoint URL is correct
- Check your API key is valid and not expired
- Ensure the model server is running (for local setups)
- Test with curl:
curl -X POST http://your-endpoint/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{"model":"UI-TARS-1.5-7B","messages":[{"role":"user","content":"Hello"}]}'
Issue: “Out of VRAM” (Local deployment)
Symptoms: vLLM or Ollama crashes with memory errors.
Solution:
- Use a quantized model (Q8_0 or Q4_K)
- Reduce
--max-model-lento 16384 or 8192 - Close other GPU-intensive applications
- Check GPU memory with
nvidia-smi
Issue: “App won’t open” (Windows)
Symptoms: Application doesn’t start or shows error.
Solution:
- Run as Administrator
- Check Windows Event Viewer for errors
- Reinstall with latest version
- Disable antivirus temporarily to test
Issue: Remote operator shows “unavailable”
Symptoms: Free trial resources not available.
Solution:
- High demand—try again later
- Daily limit reached—wait 24 hours
- Check internet connection
- Try the other remote operator type
Performance Tips
For local deployment:
- Use NVMe SSD for model storage
- Keep GPU drivers updated
- Close unnecessary applications
- Use FP16 if you have sufficient VRAM
For cloud APIs:
- Use low-latency internet connection
- Choose geographically close endpoints
- Consider higher-tier plans for better performance
General tips:
- Lower screen resolution for faster processing
- Use simpler prompts for faster execution
- Enable caching if available
Part 10: Advanced Configuration
Custom Model Endpoints
For enterprise deployments or custom models:
{
"provider": "custom",
"baseURL": "https://api.your-company.com/v1",
"apiKey": "your-enterprise-key",
"model": "ui-tars-enterprise-v2",
"timeout": 60000,
"maxTokens": 4096
} Multiple Operator Profiles
You can configure different profiles for different use cases:
- Work Profile: Connect to company’s private VLM
- Personal Profile: Use HuggingFace endpoint
- Testing Profile: Use free remote operators
Switch profiles from the dropdown in Settings.
Screenshot Quality Settings
For faster processing on slower systems:
- Open Settings → Advanced
- Reduce Screenshot Quality (default: 90%)
- Reduce Screenshot Resolution Scale (default: 100%)
Part 11: Security Considerations
API Key Security
- Never share your API keys
- Use environment variables where possible
- Rotate keys periodically
- Monitor usage for unexpected activity
Local Processing Advantages
When running models locally:
- No data leaves your machine
- No cloud dependencies
- Works offline
- Complete privacy
Remote Operator Considerations
When using remote operators:
- Sessions are temporary and isolated
- Do not enter sensitive credentials
- Use for testing only
- Data may be processed on ByteDance servers
Part 12: Model Comparison
UI-TARS Model Versions
| Model | Parameters | Best Use Case | VRAM (FP16) |
|---|---|---|---|
| UI-TARS-1.5-7B | 7 billion | General use, consumer GPUs | ~14-16 GB |
| UI-TARS-1.6-7B | 7 billion | Latest improvements | ~14-16 GB |
| UI-TARS-72B | 72 billion | Complex tasks, enterprise | 80+ GB |
| UI-TARS-2B | 2 billion | Fast inference, limited accuracy | ~6 GB |
Recommended Configurations
| Your Hardware | Recommended Model | Quantization |
|---|---|---|
| RTX 4090 (24GB) | UI-TARS-1.5-7B | FP16 |
| RTX 3080 (10GB) | UI-TARS-1.5-7B | Q4_K_S |
| RTX 4060 (8GB) | UI-TARS-1.5-7B | Q4_K_S or Q2_K |
| Apple M2 Pro (32GB) | UI-TARS-1.5-7B | MLX 6-bit |
| No GPU | Use cloud API | N/A |
Part 13: Development and SDK
UI-TARS provides an SDK for developers to integrate GUI automation into their applications.
Python SDK
# Install Python SDK
pip install ui-tars Usage example:
from ui_tars import UITARSAgent
# Initialize agent
agent = UITARSAgent(
model_endpoint="http://localhost:8000/v1",
api_key="your-api-key"
)
# Execute a task
result = agent.run("Open Notepad and type 'Hello World'")
print(result) Node.js SDK
npm install @ui-tars/sdk Usage example:
const { GUIAgent } = require('@ui-tars/sdk');
const agent = new GUIAgent({
endpoint: 'http://localhost:8000/v1',
apiKey: 'your-api-key'
});
await agent.run('Open Chrome and navigate to google.com'); Part 14: Integration with Other Systems
MCP Server Mode
UI-TARS Desktop can run as an MCP (Model Context Protocol) server:
# Run UI-TARS Desktop as MCP server
ui-tars-desktop --mcp-server --port 8080 Other MCP clients can then connect and use UI-TARS capabilities.
API Mode
Run UI-TARS as a REST API server for integration:
ui-tars-desktop --api-server --port 3000 Endpoints:
POST /execute: Execute a taskGET /status: Get current statusPOST /stop: Stop current execution
Part 15: Maintenance and Updates
Updating UI-TARS Desktop
Direct Download:
- Download the latest release from GitHub
- Install over the existing installation
- Configuration is preserved
Homebrew (macOS):
brew upgrade --cask ui-tars From Source:
cd UI-TARS-desktop
git pull origin main
pnpm install
pnpm build:ui-tars Updating Local Models
vLLM:
# Stop the current container
docker stop ui-tars-vllm
# Pull latest model
docker run --rm -v ~/.cache/huggingface:/root/.cache/huggingface
vllm/vllm-openai:latest python -c "from huggingface_hub import snapshot_download; snapshot_download('ByteDance-Seed/UI-TARS-1.5-7B')"
# Restart container
docker start ui-tars-vllm Ollama:
ollama pull ui-tars:latest Conclusion
UI-TARS Desktop represents a significant advancement in AI-powered computer automation. By combining Vision-Language Models with precise GUI control, it enables natural language interaction with your desktop environment.
Key Takeaways
- Multiple Options: Use local models for privacy or cloud APIs for convenience
- Hardware Matters: 16+ GB VRAM recommended for local FP16 inference
- Easy Testing: Free remote operators for quick evaluation
- Active Development: Regular updates with new features
Next Steps
- Join the Discord community for support
- Read the official documentation
- Explore the SDK documentation
Related Guides
- Agent TARS Complete Guide - For browser automation with CLI
- vLLM Deployment Guide - Advanced model serving
Appendix A: Docker Build for UI-TARS Desktop (Advanced)
While not officially supported, you can build UI-TARS Desktop in a Docker container:
FROM node:22
# Clone repository
RUN git clone https://github.com/bytedance/UI-TARS-desktop.git /app
WORKDIR /app
# Install pnpm and dependencies
RUN npm install -g pnpm && pnpm install && pnpm package
# The built app will be in /app/out
CMD ["node", "dist/main.js"] Build and run:
# Build the Docker image
docker build -t ui-tars-desktop .
# Run with privileged mode for GUI access
docker run -it --privileged
-e DISPLAY=$DISPLAY
-v /tmp/.X11-unix:/tmp/.X11-unix
ui-tars-desktop ⚠️ Important: Running Docker with GUI requires X11 forwarding on Linux. macOS/Windows need additional tools like XQuartz or VcXsrv.
Appendix B: Useful Commands
Docker Management
# View vLLM logs
docker logs -f ui-tars-vllm
# Check GPU usage
nvidia-smi -l 1
# Stop vLLM
docker stop ui-tars-vllm
# Remove container
docker rm ui-tars-vllm
# View running containers
docker ps Ollama Management
# List models
ollama list
# Model info
ollama show ui-tars
# View logs
ollama logs
# Remove model
ollama rm ui-tars Appendix C: Changelog Highlights
- v0.2.0 (Jun 2025): Remote Computer and Browser operators
- v0.1.0 (Apr 2025): Redesigned Agent UI, UI-TARS-1.5 support
- v0.0.x (Early 2025): Initial release with local computer control
Last updated: January 2026
Comments
Sign in to join the discussion!
Your comments help others in the community.