Grafana & Prometheus Complete Guide 2026: The Ultimate Monitoring Stack for Home Labs and Production
Published on January 12, 2026
Introduction
Grafana and Prometheus form the de facto standard monitoring stack for modern infrastructure. Whether youโre running a small home lab with a Raspberry Pi or managing enterprise Kubernetes clusters, this powerful combination provides real-time visibility into your systems with beautiful visualizations and intelligent alerting.
This comprehensive guide covers everything from basic concepts to advanced configurations, including Docker deployments, cross-platform installations, alerting, and integrations with other data sources like InfluxDB, Elasticsearch, and SQL databases.
What Youโll Learn
- Complete understanding of Prometheus and Grafana architecture
- Step-by-step installation on Windows, macOS, and Linux
- Docker and Docker Compose deployment
- Configuring exporters (Node Exporter, Windows Exporter, cAdvisor, Blackbox)
- Creating powerful dashboards with PromQL
- Setting up alerts with Alertmanager
- Integrating multiple data sources
- Home lab monitoring scenarios with practical examples
Key Terminology
| Term | Definition |
|---|---|
| Prometheus | Open-source monitoring system that collects metrics via a pull-based model |
| Grafana | Open-source visualization platform for creating interactive dashboards |
| Exporter | Agent that exposes metrics in Prometheus format from applications or systems |
| PromQL | Prometheus Query Language for querying time-series data |
| Alertmanager | Handles alert deduplication, grouping, routing, and notifications |
| Scrape | The process of Prometheus collecting metrics from endpoints |
| Time Series | Data points indexed by time, the core data type for monitoring |
| Label | Key-value pair that identifies a metric dimension (e.g., host="server1") |
Estimated Time
| Task | Estimated Time |
|---|---|
| Basic Docker Setup | 15-30 minutes |
| Manual Installation (Single OS) | 30-60 minutes |
| Full Stack with Alerting | 1-2 hours |
| Home Lab Complete Setup | 2-4 hours |
Part 1: Understanding the Monitoring Stack
What is Prometheus?
Prometheus is an open-source systems monitoring and alerting toolkit, originally developed at SoundCloud in 2012. Itโs now a CNCF (Cloud Native Computing Foundation) graduated project, widely adopted for its reliability in dynamic environments like Kubernetes. It collects and stores metrics as time-series data (numerical values with timestamps and optional labels like key-value pairs for multi-dimensional querying).
Current Version: Prometheus v3.9.0 (released January 6, 2026)
Prometheus v3.9.0 Key Features
| Feature | Description |
|---|---|
| Multi-dimensional Data Model | Metrics identified by name and labels (e.g., http_requests_total{method="GET", status="200"}) |
| PromQL Query Language | Flexible for aggregating, filtering, and analyzing data |
| Pull Model | Scrapes metrics over HTTP from targets; supports push via gateways |
| Autonomous Servers | No distributed storage dependency; standalone for high reliability |
| Service Discovery | Integrates with Kubernetes, Consul, or static configs |
| Enhanced UTF-8 Support | Full UTF-8 compatibility for metric and label names |
| Improved UI | Modern interface with PromLens-style tree view |
| Remote Write 2.0 | Enhanced data handling with metadata, exemplars, and native histograms |
| OpenTelemetry Support | Native OTLP metrics ingestion |
| Native Histograms | More efficient histogram implementation |
| Bug Fixes | Improved scraping and alerting stability from v3.8.x |
How Prometheus Works
+-------------------------------------------------------------------+
| PROMETHEUS ARCHITECTURE |
+-------------------------------------------------------------------+
| |
| TARGETS PROMETHEUS SERVER |
| +--------------+ +------------------------------+ |
| | Node Exporter|-----+ | +-------------------------+ | |
| | :9100 | | | | Retrieval | | |
| +--------------+ | | | (Pull Metrics) | | |
| +--------------+ | | +-----------+-------------+ | |
| | cAdvisor |-----+---->| v | |
| | :8080 | | | +-------------------------+ | |
| +--------------+ | | | Time Series DB (TSDB) | | |
| +--------------+ | | | (Local Storage) | | |
| | App Metrics |-----+ | +-----------+-------------+ | |
| | :8000 | | v | |
| +--------------+ | +-------------------------+ | |
| | | PromQL Engine | | |
| PUSHGATEWAY | | (Query Processing) | | |
| +--------------+ | +-----------+-------------+ | |
| | Batch Jobs |---------->| | | |
| | :9091 | | +-----------v-------------+ | |
| +--------------+ | | HTTP Server :9090 | | |
| | | (Web UI / API) | | |
| | +-------------------------+ | |
| +------------------------------+ |
| | |
| ALERTMANAGER | |
| +------------------+ | |
| | Deduplication |<------------------+ |
| | Grouping/Routing | |
| | Notifications |--> Email, Slack, PagerDuty, Discord |
| | :9093 | |
| +------------------+ |
| |
+-------------------------------------------------------------------+ Core Components
| Component | Purpose | Default Port |
|---|---|---|
| Prometheus Server | Scrapes and stores metrics, provides query API | 9090 |
| Alertmanager | Handles alerts, deduplication, routing | 9093 |
| Pushgateway | Accepts metrics from batch jobs | 9091 |
| Node Exporter | Linux system metrics | 9100 |
| Windows Exporter | Windows system metrics | 9182 |
| cAdvisor | Container metrics | 8080 |
| Blackbox Exporter | Probe endpoints (HTTP, DNS, TCP, ICMP) | 9115 |
What is Grafana?
Grafana is an open-source analytics and visualization platform for querying, visualizing, alerting on, and exploring metrics, logs, and traces from diverse sources. It supports plugins for data sources like Prometheus, Loki, Elasticsearch, and SQL databases. Grafana OSS is free; Enterprise adds premium features like advanced authentication. Grafana Cloud is a hosted version with AI/ML enhancements.
Current Version: Grafana v12.3 (latest stable as of January 2026)
Grafana v12.3 Key Features
| Feature | Description |
|---|---|
| 100+ Data Source Plugins | Connect to Prometheus, InfluxDB, SQL databases, and more |
| Dynamic Dashboards | Flexible panel layouts that adapt to screen sizes |
| Dashboard Tabs | Organize complex dashboards with tabbed views |
| SQL Expressions | Join and transform data from multiple sources (JOINs across Loki/BigQuery) |
| Grafana Assistant | AI-powered assistant for queries and dashboard creation |
| Unified Alerting | Alert from any data source with notifications via Slack, email, etc. |
| Enhanced Tables | Faster loading with 40,000+ rows support |
| Explore Mode | Ad-hoc queries for metrics, logs, and traces |
| Variables/Templates | Dynamic filters (e.g., select servers) |
| Observability as Code | Provision dashboards via Terraform/Ansible; Git Sync for IaC |
| Bug Fixes | Improved high-availability setups, minimal breaking changes from v11.x |
Grafana Architecture
+-------------------------------------------------------------------+
| GRAFANA ARCHITECTURE |
+-------------------------------------------------------------------+
| |
| DATA SOURCES GRAFANA SERVER |
| +-----------------+ +-------------------------+ |
| | Prometheus |---------->| | |
| | (Metrics) | | +-----------------+ | |
| +-----------------+ | | Data Source | | |
| +-----------------+ | | Manager | | |
| | Loki |---------->| +--------+--------+ | |
| | (Logs) | | | | |
| +-----------------+ | +--------v--------+ | |
| +-----------------+ | | Query Engine | | |
| | InfluxDB |---------->| | (Transform) | | |
| | (Time Series) | | +--------+--------+ | |
| +-----------------+ | | | |
| +-----------------+ | +--------v--------+ | |
| | Elasticsearch |---------->| | Visualization | | |
| | (Logs/Search) | | | (Panels) | | |
| +-----------------+ | +--------+--------+ | |
| +-----------------+ | | | |
| | PostgreSQL |---------->| +--------v--------+ | |
| | MySQL, MSSQL | | | Dashboard | | |
| +-----------------+ | | Renderer | | |
| | +--------+--------+ | |
| | | | |
| | +--------v--------+ | |
| | | HTTP Server | | |
| | | :3000 | | |
| | +-----------------+ | |
| ALERTING | | | |
| +-----------------+ | | | |
| | Email |<----------+ Alerting | | |
| | Slack | | Engine | | |
| | Discord | | | | |
| | PagerDuty | | | | |
| +-----------------+ +------------+------------+ |
| |
+-------------------------------------------------------------------+ Grafana Ecosystem (LGTM Stack)
| Component | Purpose | Description |
|---|---|---|
| Loki | Log Aggregation | Like Prometheus but for logs |
| Grafana | Visualization | Dashboards and UI |
| Tempo | Distributed Tracing | Trace storage and querying |
| Mimir | Long-term Metrics | Scalable Prometheus storage |
Part 2: Prerequisites and System Requirements
Hardware Requirements
| Scale | RAM | CPU | Storage |
|---|---|---|---|
| Home Lab (5-10 hosts) | 2-4 GB | 2 cores | 20 GB SSD |
| Small Business (50 hosts) | 8-16 GB | 4 cores | 100 GB SSD |
| Enterprise (500+ hosts) | 32+ GB | 8+ cores | 500 GB+ NVMe |
Software Prerequisites
| Component | Required Version | Purpose |
|---|---|---|
| Docker | 20.10+ | Container runtime |
| Docker Compose | v2.0+ | Multi-container orchestration |
| Git | Any | Version control (optional) |
Operating System Compatibility
| OS | Support Level | Notes |
|---|---|---|
| Linux (Ubuntu/Debian) | โ Best | Recommended for production |
| Linux (RHEL/CentOS/Fedora) | โ Good | Full support |
| Windows | โ Good | Docker Desktop or native |
| macOS | โ Good | Docker Desktop recommended |
| Raspberry Pi | โ Good | ARM builds available |
Part 3: Docker Installation (Recommended)
Docker is the recommended installation method as it simplifies setup, updates, and dependency management.
Step 1: Install Docker
Linux (Ubuntu/Debian)
# Remove old Docker versions
for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do
sudo apt-get remove -y $pkg 2>/dev/null
done
# Install prerequisites
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg |
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add Docker repository
echo
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo "$VERSION_CODENAME") stable" |
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
docker-buildx-plugin docker-compose-plugin
# Add your user to the docker group
sudo usermod -aG docker $USER
newgrp docker
# Verify installation
docker --version
docker compose version Command Explanation:
apt-get remove: Removes conflicting old Docker packagescurl: Downloads Dockerโs GPG key for package verificationgpg --dearmor: Converts the key to the correct formatusermod -aG docker $USER: Allows running Docker without sudo
Linux (Fedora/RHEL/CentOS)
# Remove old versions
sudo dnf remove docker docker-client docker-client-latest
docker-common docker-latest docker-latest-logrotate
docker-logrotate docker-selinux docker-engine-selinux docker-engine
# Install Docker
sudo dnf -y install dnf-plugins-core
sudo dnf config-manager --add-repo
https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io
docker-buildx-plugin docker-compose-plugin
# Start and enable Docker
sudo systemctl start docker
sudo systemctl enable docker
# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker
# Verify
docker --version Windows (Docker Desktop)
- Enable Virtualization in BIOS/UEFI
- Install WSL2 (PowerShell as Administrator):
wsl --install # Restart computer when prompted - Download Docker Desktop from docker.com
- During installation, ensure โUse WSL 2 instead of Hyper-Vโ is checked
- Configure Docker Desktop:
- Settings โ Resources โ WSL Integration
- Enable integration with your default WSL distro
- Verify installation:
docker --version docker compose version
macOS (Docker Desktop)
# Install via Homebrew (recommended)
brew install --cask docker
# Or download from docker.com:
# - Apple Silicon (M1/M2/M3/M4): "Apple Chip" version
# - Intel Macs: "Intel Chip" version After installation:
- Launch Docker from Applications
- Configure Resources (Settings โ Resources โ at least 4GB RAM)
- Verify:
docker --version docker compose version
Step 2: Create Project Directory
# Linux/macOS
mkdir -p ~/monitoring-stack
cd ~/monitoring-stack # Windows PowerShell
mkdir C:Users$env:USERNAMEDocumentsmonitoring-stack
cd C:Users$env:USERNAMEDocumentsmonitoring-stack Step 3: Create Docker Compose Configuration
Create a file named docker-compose.yml:
# docker-compose.yml
# Complete Prometheus + Grafana Monitoring Stack
# Version: 2026-01
services:
# ===========================================
# PROMETHEUS - Metrics Collection & Storage
# ===========================================
prometheus:
image: prom/prometheus:v3.9.0
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules:/etc/prometheus/rules:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
- '--web.enable-lifecycle'
networks:
- monitoring
# ===========================================
# GRAFANA - Visualization & Dashboards
# ===========================================
grafana:
image: grafana/grafana:12.3.0
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin123
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=http://localhost:3000
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
depends_on:
- prometheus
networks:
- monitoring
# ===========================================
# NODE EXPORTER - Linux System Metrics
# ===========================================
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
networks:
- monitoring
# ===========================================
# cADVISOR - Container Metrics
# ===========================================
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
privileged: true
devices:
- /dev/kmsg
networks:
- monitoring
# ===========================================
# ALERTMANAGER - Alert Routing & Notifications
# ===========================================
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- alertmanager-data:/alertmanager
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
networks:
- monitoring
# ===========================================
# BLACKBOX EXPORTER - Endpoint Probing
# ===========================================
blackbox-exporter:
image: prom/blackbox-exporter:latest
container_name: blackbox-exporter
restart: unless-stopped
ports:
- "9115:9115"
volumes:
- ./blackbox/blackbox.yml:/etc/blackbox_exporter/config.yml:ro
command:
- '--config.file=/etc/blackbox_exporter/config.yml'
networks:
- monitoring
# ===========================================
# NETWORKS
# ===========================================
networks:
monitoring:
driver: bridge
# ===========================================
# VOLUMES (Persistent Data)
# ===========================================
volumes:
prometheus-data:
grafana-data:
alertmanager-data: Step 4: Create Configuration Files
Create Directory Structure
# Linux/macOS
mkdir -p prometheus/rules grafana/provisioning/datasources grafana/provisioning/dashboards alertmanager blackbox # Windows PowerShell
New-Item -ItemType Directory -Path prometheus\rules, grafanaprovisioningdatasources, grafanaprovisioningdashboards, alertmanager, blackbox -Force Prometheus Configuration
Create prometheus/prometheus.yml:
# prometheus/prometheus.yml
# Prometheus Configuration File
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate rules
scrape_timeout: 10s # Timeout for scrape requests
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Rule files for alerts
rule_files:
- /etc/prometheus/rules/*.yml
# Scrape configurations
scrape_configs:
# ===========================================
# Prometheus itself
# ===========================================
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
labels:
instance: 'prometheus-server'
# ===========================================
# Node Exporter (Linux system metrics)
# ===========================================
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
labels:
instance: 'docker-host'
# ===========================================
# cAdvisor (Docker container metrics)
# ===========================================
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
labels:
instance: 'docker-containers'
# ===========================================
# Alertmanager
# ===========================================
- job_name: 'alertmanager'
static_configs:
- targets: ['alertmanager:9093']
# ===========================================
# Blackbox Exporter - Website Monitoring
# ===========================================
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://google.com
- https://github.com
# Add your websites here
labels:
probe_type: 'website'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
# ===========================================
# Add more targets here
# ===========================================
# Example: Additional Linux servers
# - job_name: 'linux-servers'
# static_configs:
# - targets: ['192.168.1.10:9100', '192.168.1.11:9100']
#
# Example: Windows servers
# - job_name: 'windows-servers'
# static_configs:
# - targets: ['192.168.1.20:9182', '192.168.1.21:9182'] Alertmanager Configuration
Create alertmanager/alertmanager.yml:
# alertmanager/alertmanager.yml
# Alertmanager Configuration
global:
resolve_timeout: 5m
# SMTP settings for email notifications (optional)
# smtp_smarthost: 'smtp.gmail.com:587'
# smtp_from: 'alertmanager@example.com'
# smtp_auth_username: 'your-email@gmail.com'
# smtp_auth_password: 'your-app-password'
route:
# Default receiver
receiver: 'default-receiver'
# Group alerts by these labels
group_by: ['alertname', 'severity']
# Wait before sending first notification
group_wait: 30s
# Wait before sending updates
group_interval: 5m
# Resend interval
repeat_interval: 4h
# Child routes for specific alerts
routes:
- match:
severity: critical
receiver: 'critical-alerts'
continue: true
- match:
severity: warning
receiver: 'warning-alerts'
receivers:
- name: 'default-receiver'
# Webhook receiver (e.g., for Discord, custom endpoints)
# webhook_configs:
# - url: 'http://your-webhook-url'
- name: 'critical-alerts'
# Email for critical alerts
# email_configs:
# - to: 'admin@example.com'
# subject: '๐จ CRITICAL: {{ .GroupLabels.alertname }}'
- name: 'warning-alerts'
# Slack integration
# slack_configs:
# - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
# channel: '#alerts'
# title: 'โ ๏ธ {{ .GroupLabels.alertname }}'
# Inhibition rules (prevent duplicate alerts)
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance'] Prometheus Alert Rules
Create prometheus/rules/alerts.yml:
# prometheus/rules/alerts.yml
# Prometheus Alert Rules
groups:
- name: system-alerts
rules:
# ===========================================
# Instance Down Alert
# ===========================================
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
# ===========================================
# High CPU Usage
# ===========================================
- alert: HighCpuUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% (current: {{ $value | printf "%.2f" }}%)"
# ===========================================
# High Memory Usage
# ===========================================
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 85% (current: {{ $value | printf "%.2f" }}%)"
# ===========================================
# Low Disk Space
# ===========================================
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"}) * 100 < 15
for: 5m
labels:
severity: warning
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk {{ $labels.mountpoint }} has less than 15% free space (current: {{ $value | printf "%.2f" }}%)"
# ===========================================
# Container Restart
# ===========================================
- alert: ContainerRestarted
expr: increase(container_last_seen{name!=""}[5m]) > 1
for: 0m
labels:
severity: info
annotations:
summary: "Container {{ $labels.name }} restarted"
description: "Container {{ $labels.name }} has been restarted."
- name: website-alerts
rules:
# ===========================================
# Website Down
# ===========================================
- alert: WebsiteDown
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Website {{ $labels.instance }} is down"
description: "The website {{ $labels.instance }} has been unreachable for more than 1 minute."
# ===========================================
# SSL Certificate Expiring
# ===========================================
- alert: SSLCertificateExpiringSoon
expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
for: 1h
labels:
severity: warning
annotations:
summary: "SSL certificate expiring soon for {{ $labels.instance }}"
description: "SSL certificate will expire in {{ $value | printf "%.0f" }} days"
# ===========================================
# Slow Website Response
# ===========================================
- alert: SlowWebsiteResponse
expr: probe_duration_seconds > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Slow response from {{ $labels.instance }}"
description: "Website responding slowly ({{ $value | printf "%.2f" }}s)" Blackbox Exporter Configuration
Create blackbox/blackbox.yml:
# blackbox/blackbox.yml
# Blackbox Exporter Configuration
modules:
# ===========================================
# HTTP 2xx Check (Standard website check)
# ===========================================
http_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200, 201, 202, 204, 301, 302, 303, 307, 308]
method: GET
follow_redirects: true
fail_if_ssl: false
fail_if_not_ssl: false
tls_config:
insecure_skip_verify: false
preferred_ip_protocol: "ip4"
# ===========================================
# HTTP POST Check
# ===========================================
http_post_2xx:
prober: http
timeout: 10s
http:
method: POST
valid_status_codes: [200, 201, 202, 204]
# ===========================================
# HTTPS with SSL Verification
# ===========================================
https_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200]
method: GET
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: false
# ===========================================
# TCP Check (Port connectivity)
# ===========================================
tcp_connect:
prober: tcp
timeout: 10s
# ===========================================
# DNS Check
# ===========================================
dns_check:
prober: dns
timeout: 10s
dns:
query_name: "google.com"
query_type: "A"
valid_rcodes:
- NOERROR
# ===========================================
# ICMP Ping Check (requires privileged mode)
# ===========================================
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4" Grafana Data Source Provisioning
Create grafana/provisioning/datasources/datasources.yml:
# grafana/provisioning/datasources/datasources.yml
# Auto-provision Prometheus as data source
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
timeInterval: "15s"
httpMethod: "POST" Step 5: Start the Monitoring Stack
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# Check service status
docker compose ps Expected output:
NAME STATUS PORTS
alertmanager Up 0.0.0.0:9093->9093/tcp
blackbox-exporter Up 0.0.0.0:9115->9115/tcp
cadvisor Up 0.0.0.0:8080->8080/tcp
grafana Up 0.0.0.0:3000->3000/tcp
node-exporter Up 0.0.0.0:9100->9100/tcp
prometheus Up 0.0.0.0:9090->9090/tcp Step 6: Access Web Interfaces
| Service | URL | Default Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin123 |
| Prometheus | http://localhost:9090 | No auth |
| Alertmanager | http://localhost:9093 | No auth |
| Node Exporter | http://localhost:9100/metrics | No auth |
| cAdvisor | http://localhost:8080 | No auth |
Part 4: Native Installation (Without Docker)
Linux Installation (Ubuntu/Debian)
Install Prometheus
# Create Prometheus user
sudo useradd --no-create-home --shell /bin/false prometheus
# Create directories
sudo mkdir -p /etc/prometheus /var/lib/prometheus
# Download Prometheus (check for latest version)
PROM_VERSION="2.54.0"
wget https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz
# Extract and install
tar xvfz prometheus-${PROM_VERSION}.linux-amd64.tar.gz
cd prometheus-${PROM_VERSION}.linux-amd64
sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
# Set ownership
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool Create Prometheus Systemd Service
sudo nano /etc/systemd/system/prometheus.service [Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus
--config.file=/etc/prometheus/prometheus.yml
--storage.tsdb.path=/var/lib/prometheus/
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--storage.tsdb.retention.time=30d
--web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target # Reload systemd and start Prometheus
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus Install Grafana
# Add Grafana GPG key and repository
sudo apt-get install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
# Install Grafana
sudo apt-get update
sudo apt-get install -y grafana
# Start Grafana
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-server Install Node Exporter
# Create node_exporter user
sudo useradd --no-create-home --shell /bin/false node_exporter
# Download Node Exporter
NODE_VERSION="1.8.2"
wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_VERSION}/node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
# Extract and install
tar xvfz node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
sudo cp node_exporter-${NODE_VERSION}.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter # Create systemd service
sudo nano /etc/systemd/system/node_exporter.service [Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
--collector.filesystem.mount-points-exclude="^/(sys|proc|dev|host|etc)($$|/)"
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter macOS Installation (Homebrew)
# Update Homebrew
brew update
# Install Prometheus
brew install prometheus
# Install Grafana
brew install grafana
# Start services
brew services start prometheus
brew services start grafana
# Verify
brew services list Configuration file locations on macOS:
- Prometheus:
/opt/homebrew/etc/prometheus.yml(Apple Silicon) or/usr/local/etc/prometheus.yml(Intel) - Grafana:
/opt/homebrew/etc/grafana/grafana.ini(Apple Silicon)
Access:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (default: admin/admin)
Windows Installation
Install Prometheus on Windows
- Download Prometheus from prometheus.io/download
- Extract to
C:\prometheus - Create configuration file
C:\prometheus\prometheus.yml - Run Prometheus:
cd C:prometheus .prometheus.exe --config.file=prometheus.yml
Create Windows Service (PowerShell as Admin):
# Using NSSM (Non-Sucking Service Manager)
# Download NSSM from nssm.cc
nssm install Prometheus C:prometheusprometheus.exe
nssm set Prometheus AppParameters "--config.file=C:prometheusprometheus.yml"
nssm set Prometheus AppDirectory C:prometheus
nssm start Prometheus Install Windows Exporter
- Download Windows Exporter MSI from GitHub releases
- Install the MSI (creates Windows service automatically)
- Verify at http://localhost:9182/metrics
Custom installation with specific collectors:
# Install with specific collectors
msiexec /i windows_exporter-0.29.0-amd64.msi ENABLED_COLLECTORS="cpu,cs,logical_disk,net,os,service,system" Install Grafana on Windows
- Download Grafana from grafana.com/grafana/download
- Extract to
C:\grafana - Run:
cd C:grafanain .grafana-server.exe - Install as Service (PowerShell as Admin):
nssm install Grafana C:grafanaingrafana-server.exe nssm set Grafana AppDirectory C:grafana nssm start Grafana
Part 5: Creating Grafana Dashboards
Importing Pre-Built Dashboards
Grafana has thousands of community dashboards available. Here are the most useful ones:
| Dashboard | ID | Description |
|---|---|---|
| Node Exporter Full | 1860 | Comprehensive Linux metrics |
| Docker Container & Host Metrics | 893 | Docker + host monitoring |
| cAdvisor Exporter | 14282 | Container resource usage |
| Prometheus 2.0 Overview | 3662 | Prometheus server stats |
| Blackbox Exporter | 7587 | Website uptime monitoring |
| Home Server / Homelab | 15306 | Home lab overview |
To import a dashboard:
- Go to Grafana โ Dashboards โ New โ Import
- Enter the Dashboard ID
- Select your Prometheus data source
- Click Import
Creating Custom Dashboards
Basic PromQL Queries
| Metric | PromQL Query | Description |
|---|---|---|
| CPU Usage | 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) | Overall CPU usage % |
| Memory Usage | (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 | Memory usage % |
| Disk Usage | 100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100) | Disk usage % |
| Network In | rate(node_network_receive_bytes_total[5m]) | Network receive rate |
| Network Out | rate(node_network_transmit_bytes_total[5m]) | Network transmit rate |
| Container CPU | rate(container_cpu_usage_seconds_total[5m]) * 100 | Container CPU usage |
| Container Memory | container_memory_usage_bytes | Container memory usage |
| Website Up | probe_success | 1 = up, 0 = down |
| Response Time | probe_duration_seconds | Website response time |
PromQL Tips
# Average over time
avg_over_time(node_load1[1h])
# Rate of change
rate(node_network_receive_bytes_total[5m])
# Increase (counter increment)
increase(http_requests_total[1h])
# Top 5 by value
topk(5, node_filesystem_size_bytes)
# Filtering by label
node_cpu_seconds_total{mode="idle", instance="server1:9100"}
# Regex matching
node_filesystem_avail_bytes{mountpoint=~"/|/home"}
# Aggregation by label
sum by (instance) (rate(node_network_receive_bytes_total[5m])) Part 6: Alerting Configuration
Grafana Alerting
Setting Up Email Notifications
Configure SMTP in
grafana.ini:[smtp] enabled = true host = smtp.gmail.com:587 user = your-email@gmail.com password = your-app-password from_address = grafana@yourdomain.com from_name = Grafana AlertsCreate Contact Point in Grafana:
- Alerting โ Contact points โ Add contact point
- Type: Email
- Enter recipient addresses
Setting Up Slack Notifications
- Create Slack App at api.slack.com/apps
- Enable Incoming Webhooks
- Copy Webhook URL
- Create Contact Point in Grafana:
- Type: Slack
- Paste Webhook URL
Setting Up Discord Notifications
- Create Discord Webhook:
- Server Settings โ Integrations โ Webhooks โ Create Webhook
- Copy Webhook URL
- Create Contact Point in Grafana:
- Type: Discord (or Webhook)
- Paste Discord Webhook URL
Part 7: Integrations with Other Data Sources
Connecting InfluxDB
InfluxDB is a time-series database ideal for IoT and high-write scenarios.
# Add to docker-compose.yml
influxdb:
image: influxdb:2.7
container_name: influxdb
ports:
- "8086:8086"
volumes:
- influxdb-data:/var/lib/influxdb2
environment:
- DOCKER_INFLUXDB_INIT_MODE=setup
- DOCKER_INFLUXDB_INIT_USERNAME=admin
- DOCKER_INFLUXDB_INIT_PASSWORD=password123
- DOCKER_INFLUXDB_INIT_ORG=homelab
- DOCKER_INFLUXDB_INIT_BUCKET=metrics
networks:
- monitoring In Grafana:
- Configuration โ Data Sources โ Add data source
- Select InfluxDB
- Configure:
- Query Language: Flux
- URL: http://influxdb:8086
- Organization: homelab
- Token: (your InfluxDB token)
- Default Bucket: metrics
Prometheus Remote Write to InfluxDB:
You can configure Prometheus to write metrics directly to InfluxDB for long-term storage:
# Add to prometheus.yml
remote_write:
- url: http://influxdb:8086/api/v1/prom/write?db=prometheus&u=admin&p=password123 Use Case: Store long-term metrics in InfluxDB while using Prometheus for real-time monitoring. Ideal for IoT and home sensor data.
Connecting Elasticsearch
Elasticsearch is powerful for log analysis and full-text search.
# Add to docker-compose.yml
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms512m -Xmx512m
ports:
- "9200:9200"
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
networks:
- monitoring In Grafana:
- Configuration โ Data Sources โ Add data source
- Select Elasticsearch
- Configure:
- URL: http://elasticsearch:9200
- Index name: your-index-pattern-*
- Time field name: @timestamp
Elasticsearch Exporter for Prometheus:
Scrap Elasticsearch cluster metrics into Prometheus:
docker run -p 9114:9114 quay.io/prometheuscommunity/elasticsearch-exporter --es.uri=http://elasticsearch:9200 Add to Prometheus scrape config:
- job_name: 'elasticsearch'
static_configs:
- targets: ['elasticsearch-exporter:9114'] Use Case: Combine metrics with logs for correlation (e.g., high CPU + error logs). Log home network events and alert on anomalies via Prometheus, then drill into logs in Grafana.
Connecting SQL Databases
PostgreSQL
# Add to docker-compose.yml
postgres:
image: postgres:16
container_name: postgres
environment:
- POSTGRES_DB=monitoring
- POSTGRES_USER=grafana
- POSTGRES_PASSWORD=password123
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- monitoring In Grafana:
- Configuration โ Data Sources โ Add data source
- Select PostgreSQL
- Configure:
- Host: postgres:5432
- Database: monitoring
- User: grafana
- Password: password123
MySQL
In Grafana:
- Configuration โ Data Sources โ Add data source
- Select MySQL
- Configure connection details
SQL Query Example:
SELECT
timestamp AS "time",
cpu_usage,
memory_usage
FROM server_stats
WHERE host = '$host'
ORDER BY timestamp Use Case: Visualize relational data alongside metrics. Track personal finance data, application logs stored in DB, or business analytics dashboards.
Part 8: Home Lab Use Cases
Use Case 1: Raspberry Pi Monitoring
Monitor temperature, CPU, and memory on Raspberry Pi:
# Extra Prometheus scrape config for Pi
- job_name: 'raspberry-pi'
static_configs:
- targets: ['192.168.1.50:9100']
labels:
device: 'raspberry-pi-4' Temperature monitoring query:
node_hwmon_temp_celsius{chip="cpu_thermal"} Use Case 2: Smart Home / IoT Monitoring
Monitor MQTT-based sensors with custom exporters:
# Add MQTT exporter
mqtt-exporter:
image: kpetrem/mqtt-exporter:latest
container_name: mqtt-exporter
environment:
- MQTT_ADDRESS=your-mqtt-broker
- MQTT_TOPIC=sensors/#
ports:
- "9344:9344"
networks:
- monitoring Use Case 3: Network Device Monitoring (SNMP)
Monitor routers, switches, and other SNMP devices:
snmp-exporter:
image: prom/snmp-exporter:latest
container_name: snmp-exporter
ports:
- "9116:9116"
volumes:
- ./snmp/snmp.yml:/etc/snmp_exporter/snmp.yml
networks:
- monitoring Use Case 4: Docker Container Monitoring
Monitor all your containerized services:
# Container CPU usage
rate(container_cpu_usage_seconds_total{name!=""}[5m]) * 100
# Container memory
container_memory_usage_bytes{name!=""} / 1024 / 1024
# Container network I/O
rate(container_network_receive_bytes_total[5m]) Use Case 5: Website Uptime Monitoring
Monitor your personal websites and services:
# Add to prometheus.yml scrape_configs
- job_name: 'my-websites'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://myblog.com
- https://myapp.example.com
- http://192.168.1.100:8080
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115 Part 9: Maintenance and Best Practices
Updating Components
Docker Update
# Pull latest images
docker compose pull
# Recreate containers with new images
docker compose up -d
# Remove old images
docker image prune -f Native Installation Update
# Linux - Prometheus
sudo systemctl stop prometheus
# Download new version and replace binary
sudo systemctl start prometheus
# Linux - Grafana
sudo apt-get update && sudo apt-get upgrade grafana
# macOS
brew upgrade prometheus grafana
brew services restart prometheus grafana Backup Strategies
Docker Volumes Backup
# Create backup directory
mkdir -p ~/monitoring-backups
# Backup Prometheus data
docker run --rm
-v monitoring-stack_prometheus-data:/data
-v ~/monitoring-backups:/backup
alpine tar cvf /backup/prometheus-backup.tar /data
# Backup Grafana data
docker run --rm
-v monitoring-stack_grafana-data:/data
-v ~/monitoring-backups:/backup
alpine tar cvf /backup/grafana-backup.tar /data Retention and Storage
# Prometheus command flags
command:
- '--storage.tsdb.retention.time=90d' # Keep 90 days of data
- '--storage.tsdb.retention.size=10GB' # Or limit by size Security Recommendations
- Change default passwords immediately
- Use HTTPS with a reverse proxy (Nginx, Traefik, Caddy)
- Enable authentication on Prometheus and Alertmanager
- Use firewall rules to limit access
- Regular updates for security patches
Part 10: Troubleshooting
Common Issues and Solutions
| Issue | Cause | Solution |
|---|---|---|
| Prometheus not scraping | Target unreachable | Check network, firewall, target status |
| Grafana canโt connect to Prometheus | Wrong URL | Use Docker service name (e.g., http://prometheus:9090) |
| Node Exporter permission denied | Missing volumes | Ensure /proc, /sys, / are mounted |
| cAdvisor not starting | Missing privileges | Add privileged: true and device mounts |
| High memory usage | Too many metrics | Enable metric filtering, reduce retention |
| Alerts not firing | Wrong expression | Test query in Prometheus UI first |
Useful Commands
# Check Prometheus targets
curl http://localhost:9090/api/v1/targets
# Check active alerts
curl http://localhost:9090/api/v1/alerts
# Reload Prometheus config
curl -X POST http://localhost:9090/-/reload
# Check Grafana health
curl http://localhost:3000/api/health
# View container logs
docker compose logs prometheus
docker compose logs grafana Conclusion
You now have a complete understanding of Grafana and Prometheus for monitoring. This guide covered:
- โ Architecture and components of both systems
- โ Docker and native installation on all platforms
- โ Configuration of exporters for various use cases
- โ Creating dashboards and writing PromQL queries
- โ Setting up alerting with multiple notification channels
- โ Integrating with InfluxDB, Elasticsearch, and SQL databases
- โ Practical home lab monitoring scenarios
- โ Maintenance, backup, and security best practices
Next Steps
- Explore Grafana Loki for log aggregation (see companion guide)
- Set up Grafana Tempo for distributed tracing
- Consider Grafana Mimir for long-term metrics storage
- Kubernetes Home Lab: Use kube-prometheus-stack Helm chart:
helm install prom prometheus-community/kube-prometheus-stack - Join communities: r/homelab, r/selfhosted, Grafana Community Forums
Validation Tip: Test scraping by running
curl http://localhost:9090/metrics. In Grafana, verify data source connectivity via the โTestโ button.
Resources
- Prometheus Documentation
- Grafana Documentation
- Grafana Dashboard Repository
- Awesome Prometheus
- PromQL Cheat Sheet
๐ก Tip: For advanced log aggregation and a complete observability stack, see our companion guide: โGrafana Loki & Advanced Observability: Complete Home Lab Guideโ
Comments
Sign in to join the discussion!
Your comments help others in the community.