Grafana & Prometheus Complete Guide 2026: The Ultimate Monitoring Stack for Home Labs and Production

Published on January 12, 2026

Introduction

Grafana and Prometheus form the de facto standard monitoring stack for modern infrastructure. Whether you’re running a small home lab with a Raspberry Pi or managing enterprise Kubernetes clusters, this powerful combination provides real-time visibility into your systems with beautiful visualizations and intelligent alerting.

This comprehensive guide covers everything from basic concepts to advanced configurations, including Docker deployments, cross-platform installations, alerting, and integrations with other data sources like InfluxDB, Elasticsearch, and SQL databases.

What You’ll Learn

Complete understanding of Prometheus and Grafana architecture
Step-by-step installation on Windows, macOS, and Linux
Docker and Docker Compose deployment
Configuring exporters (Node Exporter, Windows Exporter, cAdvisor, Blackbox)
Creating powerful dashboards with PromQL
Setting up alerts with Alertmanager
Integrating multiple data sources
Home lab monitoring scenarios with practical examples

Key Terminology

Term	Definition
Prometheus	Open-source monitoring system that collects metrics via a pull-based model
Grafana	Open-source visualization platform for creating interactive dashboards
Exporter	Agent that exposes metrics in Prometheus format from applications or systems
PromQL	Prometheus Query Language for querying time-series data
Alertmanager	Handles alert deduplication, grouping, routing, and notifications
Scrape	The process of Prometheus collecting metrics from endpoints
Time Series	Data points indexed by time, the core data type for monitoring
Label	Key-value pair that identifies a metric dimension (e.g., `host="server1"`)

Estimated Time

Task	Estimated Time
Basic Docker Setup	15-30 minutes
Manual Installation (Single OS)	30-60 minutes
Full Stack with Alerting	1-2 hours
Home Lab Complete Setup	2-4 hours

Part 1: Understanding the Monitoring Stack

What is Prometheus?

Prometheus is an open-source systems monitoring and alerting toolkit, originally developed at SoundCloud in 2012. It’s now a CNCF (Cloud Native Computing Foundation) graduated project, widely adopted for its reliability in dynamic environments like Kubernetes. It collects and stores metrics as time-series data (numerical values with timestamps and optional labels like key-value pairs for multi-dimensional querying).

Current Version: Prometheus v3.9.0 (released January 6, 2026)

Prometheus v3.9.0 Key Features

Feature	Description
Multi-dimensional Data Model	Metrics identified by name and labels (e.g., `http_requests_total{method="GET", status="200"}`)
PromQL Query Language	Flexible for aggregating, filtering, and analyzing data
Pull Model	Scrapes metrics over HTTP from targets; supports push via gateways
Autonomous Servers	No distributed storage dependency; standalone for high reliability
Service Discovery	Integrates with Kubernetes, Consul, or static configs
Enhanced UTF-8 Support	Full UTF-8 compatibility for metric and label names
Improved UI	Modern interface with PromLens-style tree view
Remote Write 2.0	Enhanced data handling with metadata, exemplars, and native histograms
OpenTelemetry Support	Native OTLP metrics ingestion
Native Histograms	More efficient histogram implementation
Bug Fixes	Improved scraping and alerting stability from v3.8.x

How Prometheus Works

+-------------------------------------------------------------------+
|                    PROMETHEUS ARCHITECTURE                        |
+-------------------------------------------------------------------+
|                                                                   |
|   TARGETS                     PROMETHEUS SERVER                   |
|   +--------------+           +------------------------------+     |
|   | Node Exporter|-----+     | +-------------------------+  |     |
|   |  :9100       |     |     | |     Retrieval           |  |     |
|   +--------------+     |     | |   (Pull Metrics)        |  |     |
|   +--------------+     |     | +-----------+-------------+  |     |
|   | cAdvisor     |-----+---->|             v                |     |
|   |  :8080       |     |     | +-------------------------+  |     |
|   +--------------+     |     | |   Time Series DB (TSDB) |  |     |
|   +--------------+     |     | |   (Local Storage)       |  |     |
|   | App Metrics  |-----+     | +-----------+-------------+  |     |
|   |  :8000       |           |             v                |     |
|   +--------------+           | +-------------------------+  |     |
|                              | |   PromQL Engine         |  |     |
|   PUSHGATEWAY                | |   (Query Processing)    |  |     |
|   +--------------+           | +-----------+-------------+  |     |
|   | Batch Jobs   |---------->|             |                |     |
|   |  :9091       |           | +-----------v-------------+  |     |
|   +--------------+           | |   HTTP Server :9090     |  |     |
|                              | |   (Web UI / API)        |  |     |
|                              | +-------------------------+  |     |
|                              +------------------------------+     |
|                                          |                        |
|   ALERTMANAGER                           |                        |
|   +------------------+                   |                        |
|   | Deduplication    |<------------------+                        |
|   | Grouping/Routing |                                            |
|   | Notifications    |--> Email, Slack, PagerDuty, Discord        |
|   |  :9093           |                                            |
|   +------------------+                                            |
|                                                                   |
+-------------------------------------------------------------------+

Core Components

Component	Purpose	Default Port
Prometheus Server	Scrapes and stores metrics, provides query API	9090
Alertmanager	Handles alerts, deduplication, routing	9093
Pushgateway	Accepts metrics from batch jobs	9091
Node Exporter	Linux system metrics	9100
Windows Exporter	Windows system metrics	9182
cAdvisor	Container metrics	8080
Blackbox Exporter	Probe endpoints (HTTP, DNS, TCP, ICMP)	9115

What is Grafana?

Grafana is an open-source analytics and visualization platform for querying, visualizing, alerting on, and exploring metrics, logs, and traces from diverse sources. It supports plugins for data sources like Prometheus, Loki, Elasticsearch, and SQL databases. Grafana OSS is free; Enterprise adds premium features like advanced authentication. Grafana Cloud is a hosted version with AI/ML enhancements.

Current Version: Grafana v12.3 (latest stable as of January 2026)

Grafana v12.3 Key Features

Feature	Description
100+ Data Source Plugins	Connect to Prometheus, InfluxDB, SQL databases, and more
Dynamic Dashboards	Flexible panel layouts that adapt to screen sizes
Dashboard Tabs	Organize complex dashboards with tabbed views
SQL Expressions	Join and transform data from multiple sources (JOINs across Loki/BigQuery)
Grafana Assistant	AI-powered assistant for queries and dashboard creation
Unified Alerting	Alert from any data source with notifications via Slack, email, etc.
Enhanced Tables	Faster loading with 40,000+ rows support
Explore Mode	Ad-hoc queries for metrics, logs, and traces
Variables/Templates	Dynamic filters (e.g., select servers)
Observability as Code	Provision dashboards via Terraform/Ansible; Git Sync for IaC
Bug Fixes	Improved high-availability setups, minimal breaking changes from v11.x

Grafana Architecture

+-------------------------------------------------------------------+
|                      GRAFANA ARCHITECTURE                         |
+-------------------------------------------------------------------+
|                                                                   |
|   DATA SOURCES                   GRAFANA SERVER                   |
|   +-----------------+           +-------------------------+       |
|   |   Prometheus    |---------->|                         |       |
|   |   (Metrics)     |           |   +-----------------+   |       |
|   +-----------------+           |   |   Data Source   |   |       |
|   +-----------------+           |   |   Manager       |   |       |
|   |   Loki          |---------->|   +--------+--------+   |       |
|   |   (Logs)        |           |            |            |       |
|   +-----------------+           |   +--------v--------+   |       |
|   +-----------------+           |   |   Query Engine  |   |       |
|   |   InfluxDB      |---------->|   |   (Transform)   |   |       |
|   |   (Time Series) |           |   +--------+--------+   |       |
|   +-----------------+           |            |            |       |
|   +-----------------+           |   +--------v--------+   |       |
|   |   Elasticsearch |---------->|   |  Visualization  |   |       |
|   |   (Logs/Search) |           |   |  (Panels)       |   |       |
|   +-----------------+           |   +--------+--------+   |       |
|   +-----------------+           |            |            |       |
|   |   PostgreSQL    |---------->|   +--------v--------+   |       |
|   |   MySQL, MSSQL  |           |   |   Dashboard     |   |       |
|   +-----------------+           |   |   Renderer      |   |       |
|                                 |   +--------+--------+   |       |
|                                 |            |            |       |
|                                 |   +--------v--------+   |       |
|                                 |   |  HTTP Server    |   |       |
|                                 |   |  :3000          |   |       |
|                                 |   +-----------------+   |       |
|   ALERTING                      |            |            |       |
|   +-----------------+           |            |            |       |
|   |  Email          |<----------+  Alerting  |            |       |
|   |  Slack          |           |  Engine    |            |       |
|   |  Discord        |           |            |            |       |
|   |  PagerDuty      |           |            |            |       |
|   +-----------------+           +------------+------------+       |
|                                                                   |
+-------------------------------------------------------------------+

Grafana Ecosystem (LGTM Stack)

Component	Purpose	Description
Loki	Log Aggregation	Like Prometheus but for logs
Grafana	Visualization	Dashboards and UI
Tempo	Distributed Tracing	Trace storage and querying
Mimir	Long-term Metrics	Scalable Prometheus storage

Part 2: Prerequisites and System Requirements

Hardware Requirements

Scale	RAM	CPU	Storage
Home Lab (5-10 hosts)	2-4 GB	2 cores	20 GB SSD
Small Business (50 hosts)	8-16 GB	4 cores	100 GB SSD
Enterprise (500+ hosts)	32+ GB	8+ cores	500 GB+ NVMe

Software Prerequisites

Component	Required Version	Purpose
Docker	20.10+	Container runtime
Docker Compose	v2.0+	Multi-container orchestration
Git	Any	Version control (optional)

Operating System Compatibility

OS	Support Level	Notes
Linux (Ubuntu/Debian)	✅ Best	Recommended for production
Linux (RHEL/CentOS/Fedora)	✅ Good	Full support
Windows	✅ Good	Docker Desktop or native
macOS	✅ Good	Docker Desktop recommended
Raspberry Pi	✅ Good	ARM builds available

Part 3: Docker Installation (Recommended)

Docker is the recommended installation method as it simplifies setup, updates, and dependency management.

Step 1: Install Docker

Linux (Ubuntu/Debian)

# Remove old Docker versions
for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do
  sudo apt-get remove -y $pkg 2>/dev/null
done

# Install prerequisites
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg

# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | 
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add Docker repository
echo 
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg]   https://download.docker.com/linux/ubuntu   $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | 
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io 
  docker-buildx-plugin docker-compose-plugin

# Add your user to the docker group
sudo usermod -aG docker $USER
newgrp docker

# Verify installation
docker --version
docker compose version

Command Explanation:

apt-get remove: Removes conflicting old Docker packages
curl: Downloads Docker’s GPG key for package verification
gpg --dearmor: Converts the key to the correct format
usermod -aG docker $USER: Allows running Docker without sudo

Linux (Fedora/RHEL/CentOS)

# Remove old versions
sudo dnf remove docker docker-client docker-client-latest 
  docker-common docker-latest docker-latest-logrotate 
  docker-logrotate docker-selinux docker-engine-selinux docker-engine

# Install Docker
sudo dnf -y install dnf-plugins-core
sudo dnf config-manager --add-repo 
  https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io 
  docker-buildx-plugin docker-compose-plugin

# Start and enable Docker
sudo systemctl start docker
sudo systemctl enable docker

# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker

# Verify
docker --version

Windows (Docker Desktop)

Enable Virtualization in BIOS/UEFI

Install WSL2 (PowerShell as Administrator):

wsl --install
# Restart computer when prompted

Download Docker Desktop from docker.com
During installation, ensure “Use WSL 2 instead of Hyper-V” is checked
Configure Docker Desktop:
- Settings → Resources → WSL Integration
- Enable integration with your default WSL distro
Verify installation:
```
docker --version
docker compose version
```

macOS (Docker Desktop)

# Install via Homebrew (recommended)
brew install --cask docker

# Or download from docker.com:
# - Apple Silicon (M1/M2/M3/M4): "Apple Chip" version
# - Intel Macs: "Intel Chip" version

After installation:

Launch Docker from Applications
Configure Resources (Settings → Resources → at least 4GB RAM)
Verify:
```
docker --version
docker compose version
```

Step 2: Create Project Directory

# Linux/macOS
mkdir -p ~/monitoring-stack
cd ~/monitoring-stack

# Windows PowerShell
mkdir C:Users$env:USERNAMEDocumentsmonitoring-stack
cd C:Users$env:USERNAMEDocumentsmonitoring-stack

Step 3: Create Docker Compose Configuration

Create a file named docker-compose.yml:

# docker-compose.yml
# Complete Prometheus + Grafana Monitoring Stack
# Version: 2026-01

services:
  # ===========================================
  # PROMETHEUS - Metrics Collection & Storage
  # ===========================================
  prometheus:
    image: prom/prometheus:v3.9.0
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules:/etc/prometheus/rules:ro
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
      - '--web.enable-lifecycle'
    networks:
      - monitoring

  # ===========================================
  # GRAFANA - Visualization & Dashboards
  # ===========================================
  grafana:
    image: grafana/grafana:12.3.0
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_ROOT_URL=http://localhost:3000
      - GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    depends_on:
      - prometheus
    networks:
      - monitoring

  # ===========================================
  # NODE EXPORTER - Linux System Metrics
  # ===========================================
  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    networks:
      - monitoring

  # ===========================================
  # cADVISOR - Container Metrics
  # ===========================================
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    privileged: true
    devices:
      - /dev/kmsg
    networks:
      - monitoring

  # ===========================================
  # ALERTMANAGER - Alert Routing & Notifications
  # ===========================================
  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - alertmanager-data:/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    networks:
      - monitoring

  # ===========================================
  # BLACKBOX EXPORTER - Endpoint Probing
  # ===========================================
  blackbox-exporter:
    image: prom/blackbox-exporter:latest
    container_name: blackbox-exporter
    restart: unless-stopped
    ports:
      - "9115:9115"
    volumes:
      - ./blackbox/blackbox.yml:/etc/blackbox_exporter/config.yml:ro
    command:
      - '--config.file=/etc/blackbox_exporter/config.yml'
    networks:
      - monitoring

# ===========================================
# NETWORKS
# ===========================================
networks:
  monitoring:
    driver: bridge

# ===========================================
# VOLUMES (Persistent Data)
# ===========================================
volumes:
  prometheus-data:
  grafana-data:
  alertmanager-data:

Step 4: Create Configuration Files

Create Directory Structure

# Linux/macOS
mkdir -p prometheus/rules grafana/provisioning/datasources grafana/provisioning/dashboards alertmanager blackbox

# Windows PowerShell
New-Item -ItemType Directory -Path prometheus\rules, grafanaprovisioningdatasources, grafanaprovisioningdashboards, alertmanager, blackbox -Force

Prometheus Configuration

Create prometheus/prometheus.yml:

# prometheus/prometheus.yml
# Prometheus Configuration File

global:
  scrape_interval: 15s          # How often to scrape targets
  evaluation_interval: 15s       # How often to evaluate rules
  scrape_timeout: 10s           # Timeout for scrape requests

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

# Rule files for alerts
rule_files:
  - /etc/prometheus/rules/*.yml

# Scrape configurations
scrape_configs:
  # ===========================================
  # Prometheus itself
  # ===========================================
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
        labels:
          instance: 'prometheus-server'

  # ===========================================
  # Node Exporter (Linux system metrics)
  # ===========================================
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']
        labels:
          instance: 'docker-host'

  # ===========================================
  # cAdvisor (Docker container metrics)
  # ===========================================
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
        labels:
          instance: 'docker-containers'

  # ===========================================
  # Alertmanager
  # ===========================================
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['alertmanager:9093']

  # ===========================================
  # Blackbox Exporter - Website Monitoring
  # ===========================================
  - job_name: 'blackbox-http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://google.com
          - https://github.com
          # Add your websites here
        labels:
          probe_type: 'website'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

  # ===========================================
  # Add more targets here
  # ===========================================
  # Example: Additional Linux servers
  # - job_name: 'linux-servers'
  #   static_configs:
  #     - targets: ['192.168.1.10:9100', '192.168.1.11:9100']
  #
  # Example: Windows servers
  # - job_name: 'windows-servers'
  #   static_configs:
  #     - targets: ['192.168.1.20:9182', '192.168.1.21:9182']

Alertmanager Configuration

Create alertmanager/alertmanager.yml:

# alertmanager/alertmanager.yml
# Alertmanager Configuration

global:
  resolve_timeout: 5m
  # SMTP settings for email notifications (optional)
  # smtp_smarthost: 'smtp.gmail.com:587'
  # smtp_from: 'alertmanager@example.com'
  # smtp_auth_username: 'your-email@gmail.com'
  # smtp_auth_password: 'your-app-password'

route:
  # Default receiver
  receiver: 'default-receiver'
  
  # Group alerts by these labels
  group_by: ['alertname', 'severity']
  
  # Wait before sending first notification
  group_wait: 30s
  
  # Wait before sending updates
  group_interval: 5m
  
  # Resend interval
  repeat_interval: 4h
  
  # Child routes for specific alerts
  routes:
    - match:
        severity: critical
      receiver: 'critical-alerts'
      continue: true

    - match:
        severity: warning
      receiver: 'warning-alerts'

receivers:
  - name: 'default-receiver'
    # Webhook receiver (e.g., for Discord, custom endpoints)
    # webhook_configs:
    #   - url: 'http://your-webhook-url'

  - name: 'critical-alerts'
    # Email for critical alerts
    # email_configs:
    #   - to: 'admin@example.com'
    #     subject: '🚨 CRITICAL: {{ .GroupLabels.alertname }}'

  - name: 'warning-alerts'
    # Slack integration
    # slack_configs:
    #   - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    #     channel: '#alerts'
    #     title: '⚠️ {{ .GroupLabels.alertname }}'

# Inhibition rules (prevent duplicate alerts)
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Prometheus Alert Rules

Create prometheus/rules/alerts.yml:

# prometheus/rules/alerts.yml
# Prometheus Alert Rules

groups:
  - name: system-alerts
    rules:
      # ===========================================
      # Instance Down Alert
      # ===========================================
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

      # ===========================================
      # High CPU Usage
      # ===========================================
      - alert: HighCpuUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% (current: {{ $value | printf "%.2f" }}%)"

      # ===========================================
      # High Memory Usage
      # ===========================================
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is above 85% (current: {{ $value | printf "%.2f" }}%)"

      # ===========================================
      # Low Disk Space
      # ===========================================
      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"}) * 100 < 15
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Disk {{ $labels.mountpoint }} has less than 15% free space (current: {{ $value | printf "%.2f" }}%)"

      # ===========================================
      # Container Restart
      # ===========================================
      - alert: ContainerRestarted
        expr: increase(container_last_seen{name!=""}[5m]) > 1
        for: 0m
        labels:
          severity: info
        annotations:
          summary: "Container {{ $labels.name }} restarted"
          description: "Container {{ $labels.name }} has been restarted."

  - name: website-alerts
    rules:
      # ===========================================
      # Website Down
      # ===========================================
      - alert: WebsiteDown
        expr: probe_success == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Website {{ $labels.instance }} is down"
          description: "The website {{ $labels.instance }} has been unreachable for more than 1 minute."

      # ===========================================
      # SSL Certificate Expiring
      # ===========================================
      - alert: SSLCertificateExpiringSoon
        expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate expiring soon for {{ $labels.instance }}"
          description: "SSL certificate will expire in {{ $value | printf "%.0f" }} days"

      # ===========================================
      # Slow Website Response
      # ===========================================
      - alert: SlowWebsiteResponse
        expr: probe_duration_seconds > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow response from {{ $labels.instance }}"
          description: "Website responding slowly ({{ $value | printf "%.2f" }}s)"

Blackbox Exporter Configuration

Create blackbox/blackbox.yml:

# blackbox/blackbox.yml
# Blackbox Exporter Configuration

modules:
  # ===========================================
  # HTTP 2xx Check (Standard website check)
  # ===========================================
  http_2xx:
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200, 201, 202, 204, 301, 302, 303, 307, 308]
      method: GET
      follow_redirects: true
      fail_if_ssl: false
      fail_if_not_ssl: false
      tls_config:
        insecure_skip_verify: false
      preferred_ip_protocol: "ip4"

  # ===========================================
  # HTTP POST Check
  # ===========================================
  http_post_2xx:
    prober: http
    timeout: 10s
    http:
      method: POST
      valid_status_codes: [200, 201, 202, 204]

  # ===========================================
  # HTTPS with SSL Verification
  # ===========================================
  https_2xx:
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200]
      method: GET
      fail_if_not_ssl: true
      tls_config:
        insecure_skip_verify: false

  # ===========================================
  # TCP Check (Port connectivity)
  # ===========================================
  tcp_connect:
    prober: tcp
    timeout: 10s

  # ===========================================
  # DNS Check
  # ===========================================
  dns_check:
    prober: dns
    timeout: 10s
    dns:
      query_name: "google.com"
      query_type: "A"
      valid_rcodes:
        - NOERROR

  # ===========================================
  # ICMP Ping Check (requires privileged mode)
  # ===========================================
  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"

Grafana Data Source Provisioning

Create grafana/provisioning/datasources/datasources.yml:

# grafana/provisioning/datasources/datasources.yml
# Auto-provision Prometheus as data source

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true
    jsonData:
      timeInterval: "15s"
      httpMethod: "POST"

Step 5: Start the Monitoring Stack

# Start all services
docker compose up -d

# View logs
docker compose logs -f

# Check service status
docker compose ps

Expected output:

NAME                 STATUS          PORTS
alertmanager         Up              0.0.0.0:9093->9093/tcp
blackbox-exporter    Up              0.0.0.0:9115->9115/tcp
cadvisor             Up              0.0.0.0:8080->8080/tcp
grafana              Up              0.0.0.0:3000->3000/tcp
node-exporter        Up              0.0.0.0:9100->9100/tcp
prometheus           Up              0.0.0.0:9090->9090/tcp

Step 6: Access Web Interfaces

Service	URL	Default Credentials
Grafana	http://localhost:3000	admin / admin123
Prometheus	http://localhost:9090	No auth
Alertmanager	http://localhost:9093	No auth
Node Exporter	http://localhost:9100/metrics	No auth
cAdvisor	http://localhost:8080	No auth

Part 4: Native Installation (Without Docker)

Linux Installation (Ubuntu/Debian)

Install Prometheus

# Create Prometheus user
sudo useradd --no-create-home --shell /bin/false prometheus

# Create directories
sudo mkdir -p /etc/prometheus /var/lib/prometheus

# Download Prometheus (check for latest version)
PROM_VERSION="2.54.0"
wget https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz

# Extract and install
tar xvfz prometheus-${PROM_VERSION}.linux-amd64.tar.gz
cd prometheus-${PROM_VERSION}.linux-amd64

sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/

# Set ownership
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

Create Prometheus Systemd Service

sudo nano /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus 
    --config.file=/etc/prometheus/prometheus.yml 
    --storage.tsdb.path=/var/lib/prometheus/ 
    --web.console.templates=/etc/prometheus/consoles 
    --web.console.libraries=/etc/prometheus/console_libraries 
    --storage.tsdb.retention.time=30d 
    --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

# Reload systemd and start Prometheus
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

Install Grafana

# Add Grafana GPG key and repository
sudo apt-get install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null

echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

# Install Grafana
sudo apt-get update
sudo apt-get install -y grafana

# Start Grafana
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-server

Install Node Exporter

# Create node_exporter user
sudo useradd --no-create-home --shell /bin/false node_exporter

# Download Node Exporter
NODE_VERSION="1.8.2"
wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_VERSION}/node_exporter-${NODE_VERSION}.linux-amd64.tar.gz

# Extract and install
tar xvfz node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
sudo cp node_exporter-${NODE_VERSION}.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

# Create systemd service
sudo nano /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter 
    --collector.filesystem.mount-points-exclude="^/(sys|proc|dev|host|etc)($$|/)"
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

macOS Installation (Homebrew)

# Update Homebrew
brew update

# Install Prometheus
brew install prometheus

# Install Grafana
brew install grafana

# Start services
brew services start prometheus
brew services start grafana

# Verify
brew services list

Configuration file locations on macOS:

Prometheus: /opt/homebrew/etc/prometheus.yml (Apple Silicon) or /usr/local/etc/prometheus.yml (Intel)
Grafana: /opt/homebrew/etc/grafana/grafana.ini (Apple Silicon)

Access:

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (default: admin/admin)

Windows Installation

Install Prometheus on Windows

Download Prometheus from prometheus.io/download
Extract to C:\prometheus
Create configuration file C:\prometheus\prometheus.yml

Run Prometheus:

cd C:prometheus
.prometheus.exe --config.file=prometheus.yml

Create Windows Service (PowerShell as Admin):

# Using NSSM (Non-Sucking Service Manager)
# Download NSSM from nssm.cc

nssm install Prometheus C:prometheusprometheus.exe
nssm set Prometheus AppParameters "--config.file=C:prometheusprometheus.yml"
nssm set Prometheus AppDirectory C:prometheus
nssm start Prometheus

Install Windows Exporter

Download Windows Exporter MSI from GitHub releases
Install the MSI (creates Windows service automatically)
Verify at http://localhost:9182/metrics

Custom installation with specific collectors:

# Install with specific collectors
msiexec /i windows_exporter-0.29.0-amd64.msi ENABLED_COLLECTORS="cpu,cs,logical_disk,net,os,service,system"

Install Grafana on Windows

Download Grafana from grafana.com/grafana/download
Extract to C:\grafana
Run:
```
cd C:grafanain
.grafana-server.exe
```

Install as Service (PowerShell as Admin):

nssm install Grafana C:grafanaingrafana-server.exe
nssm set Grafana AppDirectory C:grafana
nssm start Grafana

Part 5: Creating Grafana Dashboards

Importing Pre-Built Dashboards

Grafana has thousands of community dashboards available. Here are the most useful ones:

Dashboard	ID	Description
Node Exporter Full	1860	Comprehensive Linux metrics
Docker Container & Host Metrics	893	Docker + host monitoring
cAdvisor Exporter	14282	Container resource usage
Prometheus 2.0 Overview	3662	Prometheus server stats
Blackbox Exporter	7587	Website uptime monitoring
Home Server / Homelab	15306	Home lab overview

To import a dashboard:

Go to Grafana → Dashboards → New → Import
Enter the Dashboard ID
Select your Prometheus data source
Click Import

Creating Custom Dashboards

Basic PromQL Queries

Metric	PromQL Query	Description
CPU Usage	`100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)`	Overall CPU usage %
Memory Usage	`(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100`	Memory usage %
Disk Usage	`100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)`	Disk usage %
Network In	`rate(node_network_receive_bytes_total[5m])`	Network receive rate
Network Out	`rate(node_network_transmit_bytes_total[5m])`	Network transmit rate
Container CPU	`rate(container_cpu_usage_seconds_total[5m]) * 100`	Container CPU usage
Container Memory	`container_memory_usage_bytes`	Container memory usage
Website Up	`probe_success`	1 = up, 0 = down
Response Time	`probe_duration_seconds`	Website response time

PromQL Tips

# Average over time
avg_over_time(node_load1[1h])

# Rate of change
rate(node_network_receive_bytes_total[5m])

# Increase (counter increment)
increase(http_requests_total[1h])

# Top 5 by value
topk(5, node_filesystem_size_bytes)

# Filtering by label
node_cpu_seconds_total{mode="idle", instance="server1:9100"}

# Regex matching
node_filesystem_avail_bytes{mountpoint=~"/|/home"}

# Aggregation by label
sum by (instance) (rate(node_network_receive_bytes_total[5m]))

Part 6: Alerting Configuration

Grafana Alerting

Setting Up Email Notifications

Configure SMTP in grafana.ini:

[smtp]
enabled = true
host = smtp.gmail.com:587
user = your-email@gmail.com
password = your-app-password
from_address = grafana@yourdomain.com
from_name = Grafana Alerts

Create Contact Point in Grafana:
- Alerting → Contact points → Add contact point
- Type: Email
- Enter recipient addresses

Setting Up Slack Notifications

Create Slack App at api.slack.com/apps
Enable Incoming Webhooks
Copy Webhook URL
Create Contact Point in Grafana:
- Type: Slack
- Paste Webhook URL

Setting Up Discord Notifications

Create Discord Webhook:
- Server Settings → Integrations → Webhooks → Create Webhook
- Copy Webhook URL
Create Contact Point in Grafana:
- Type: Discord (or Webhook)
- Paste Discord Webhook URL

Part 7: Integrations with Other Data Sources

Connecting InfluxDB

InfluxDB is a time-series database ideal for IoT and high-write scenarios.

# Add to docker-compose.yml
  influxdb:
    image: influxdb:2.7
    container_name: influxdb
    ports:
      - "8086:8086"
    volumes:
      - influxdb-data:/var/lib/influxdb2
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=admin
      - DOCKER_INFLUXDB_INIT_PASSWORD=password123
      - DOCKER_INFLUXDB_INIT_ORG=homelab
      - DOCKER_INFLUXDB_INIT_BUCKET=metrics
    networks:
      - monitoring

In Grafana:

Configuration → Data Sources → Add data source
Select InfluxDB
Configure:
- Query Language: Flux
- URL: http://influxdb:8086
- Organization: homelab
- Token: (your InfluxDB token)
- Default Bucket: metrics

Prometheus Remote Write to InfluxDB:

You can configure Prometheus to write metrics directly to InfluxDB for long-term storage:

# Add to prometheus.yml
remote_write:
  - url: http://influxdb:8086/api/v1/prom/write?db=prometheus&u=admin&p=password123

Use Case: Store long-term metrics in InfluxDB while using Prometheus for real-time monitoring. Ideal for IoT and home sensor data.

Connecting Elasticsearch

Elasticsearch is powerful for log analysis and full-text search.

# Add to docker-compose.yml
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    networks:
      - monitoring

In Grafana:

Configuration → Data Sources → Add data source
Select Elasticsearch
Configure:
- URL: http://elasticsearch:9200
- Index name: your-index-pattern-*
- Time field name: @timestamp

Elasticsearch Exporter for Prometheus:

Scrap Elasticsearch cluster metrics into Prometheus:

docker run -p 9114:9114 quay.io/prometheuscommunity/elasticsearch-exporter --es.uri=http://elasticsearch:9200

Add to Prometheus scrape config:

- job_name: 'elasticsearch'
  static_configs:
    - targets: ['elasticsearch-exporter:9114']

Use Case: Combine metrics with logs for correlation (e.g., high CPU + error logs). Log home network events and alert on anomalies via Prometheus, then drill into logs in Grafana.

Connecting SQL Databases

PostgreSQL

# Add to docker-compose.yml
  postgres:
    image: postgres:16
    container_name: postgres
    environment:
      - POSTGRES_DB=monitoring
      - POSTGRES_USER=grafana
      - POSTGRES_PASSWORD=password123
    ports:
      - "5432:5432"
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - monitoring

In Grafana:

Configuration → Data Sources → Add data source
Select PostgreSQL
Configure:
- Host: postgres:5432
- Database: monitoring
- User: grafana
- Password: password123

MySQL

In Grafana:

Configuration → Data Sources → Add data source
Select MySQL
Configure connection details

SQL Query Example:

SELECT
  timestamp AS "time",
  cpu_usage,
  memory_usage
FROM server_stats
WHERE host = '$host'
ORDER BY timestamp

Use Case: Visualize relational data alongside metrics. Track personal finance data, application logs stored in DB, or business analytics dashboards.

Part 8: Home Lab Use Cases

Use Case 1: Raspberry Pi Monitoring

Monitor temperature, CPU, and memory on Raspberry Pi:

# Extra Prometheus scrape config for Pi
- job_name: 'raspberry-pi'
  static_configs:
    - targets: ['192.168.1.50:9100']
      labels:
        device: 'raspberry-pi-4'

Temperature monitoring query:

node_hwmon_temp_celsius{chip="cpu_thermal"}

Use Case 2: Smart Home / IoT Monitoring

Monitor MQTT-based sensors with custom exporters:

# Add MQTT exporter
  mqtt-exporter:
    image: kpetrem/mqtt-exporter:latest
    container_name: mqtt-exporter
    environment:
      - MQTT_ADDRESS=your-mqtt-broker
      - MQTT_TOPIC=sensors/#
    ports:
      - "9344:9344"
    networks:
      - monitoring

Use Case 3: Network Device Monitoring (SNMP)

Monitor routers, switches, and other SNMP devices:

  snmp-exporter:
    image: prom/snmp-exporter:latest
    container_name: snmp-exporter
    ports:
      - "9116:9116"
    volumes:
      - ./snmp/snmp.yml:/etc/snmp_exporter/snmp.yml
    networks:
      - monitoring

Use Case 4: Docker Container Monitoring

Monitor all your containerized services:

# Container CPU usage
rate(container_cpu_usage_seconds_total{name!=""}[5m]) * 100

# Container memory
container_memory_usage_bytes{name!=""} / 1024 / 1024

# Container network I/O
rate(container_network_receive_bytes_total[5m])

Use Case 5: Website Uptime Monitoring

Monitor your personal websites and services:

# Add to prometheus.yml scrape_configs
- job_name: 'my-websites'
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
    - targets:
        - https://myblog.com
        - https://myapp.example.com
        - http://192.168.1.100:8080
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: blackbox-exporter:9115

Part 9: Maintenance and Best Practices

Updating Components

Docker Update

# Pull latest images
docker compose pull

# Recreate containers with new images
docker compose up -d

# Remove old images
docker image prune -f

Native Installation Update

# Linux - Prometheus
sudo systemctl stop prometheus
# Download new version and replace binary
sudo systemctl start prometheus

# Linux - Grafana
sudo apt-get update && sudo apt-get upgrade grafana

# macOS
brew upgrade prometheus grafana
brew services restart prometheus grafana

Backup Strategies

Docker Volumes Backup

# Create backup directory
mkdir -p ~/monitoring-backups

# Backup Prometheus data
docker run --rm 
  -v monitoring-stack_prometheus-data:/data 
  -v ~/monitoring-backups:/backup 
  alpine tar cvf /backup/prometheus-backup.tar /data

# Backup Grafana data
docker run --rm 
  -v monitoring-stack_grafana-data:/data 
  -v ~/monitoring-backups:/backup 
  alpine tar cvf /backup/grafana-backup.tar /data

Retention and Storage

# Prometheus command flags
command:
  - '--storage.tsdb.retention.time=90d'  # Keep 90 days of data
  - '--storage.tsdb.retention.size=10GB' # Or limit by size

Security Recommendations

Change default passwords immediately
Use HTTPS with a reverse proxy (Nginx, Traefik, Caddy)
Enable authentication on Prometheus and Alertmanager
Use firewall rules to limit access
Regular updates for security patches

Part 10: Troubleshooting

Common Issues and Solutions

Issue	Cause	Solution
Prometheus not scraping	Target unreachable	Check network, firewall, target status
Grafana can’t connect to Prometheus	Wrong URL	Use Docker service name (e.g., `http://prometheus:9090`)
Node Exporter permission denied	Missing volumes	Ensure `/proc`, `/sys`, `/` are mounted
cAdvisor not starting	Missing privileges	Add `privileged: true` and device mounts
High memory usage	Too many metrics	Enable metric filtering, reduce retention
Alerts not firing	Wrong expression	Test query in Prometheus UI first

Useful Commands

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets

# Check active alerts
curl http://localhost:9090/api/v1/alerts

# Reload Prometheus config
curl -X POST http://localhost:9090/-/reload

# Check Grafana health
curl http://localhost:3000/api/health

# View container logs
docker compose logs prometheus
docker compose logs grafana

Conclusion

You now have a complete understanding of Grafana and Prometheus for monitoring. This guide covered:

✅ Architecture and components of both systems
✅ Docker and native installation on all platforms
✅ Configuration of exporters for various use cases
✅ Creating dashboards and writing PromQL queries
✅ Setting up alerting with multiple notification channels
✅ Integrating with InfluxDB, Elasticsearch, and SQL databases
✅ Practical home lab monitoring scenarios
✅ Maintenance, backup, and security best practices

Next Steps

Explore Grafana Loki for log aggregation (see companion guide)
Set up Grafana Tempo for distributed tracing
Consider Grafana Mimir for long-term metrics storage

Kubernetes Home Lab: Use kube-prometheus-stack Helm chart:

helm install prom prometheus-community/kube-prometheus-stack

Join communities: r/homelab, r/selfhosted, Grafana Community Forums

Validation Tip: Test scraping by running curl http://localhost:9090/metrics. In Grafana, verify data source connectivity via the “Test” button.

Resources

💡 Tip: For advanced log aggregation and a complete observability stack, see our companion guide: “Grafana Loki & Advanced Observability: Complete Home Lab Guide”

Comments

Your comments help others in the community.