๐ŸŽฏ New! Master certifications with Performance-Based Questions (PBQ) โ€” realistic hands-on practice for CompTIA & Cisco exams!

Grafana & Prometheus Complete Guide 2026: The Ultimate Monitoring Stack for Home Labs and Production

Published on January 12, 2026


Introduction

Grafana and Prometheus form the de facto standard monitoring stack for modern infrastructure. Whether youโ€™re running a small home lab with a Raspberry Pi or managing enterprise Kubernetes clusters, this powerful combination provides real-time visibility into your systems with beautiful visualizations and intelligent alerting.

This comprehensive guide covers everything from basic concepts to advanced configurations, including Docker deployments, cross-platform installations, alerting, and integrations with other data sources like InfluxDB, Elasticsearch, and SQL databases.

What Youโ€™ll Learn

  • Complete understanding of Prometheus and Grafana architecture
  • Step-by-step installation on Windows, macOS, and Linux
  • Docker and Docker Compose deployment
  • Configuring exporters (Node Exporter, Windows Exporter, cAdvisor, Blackbox)
  • Creating powerful dashboards with PromQL
  • Setting up alerts with Alertmanager
  • Integrating multiple data sources
  • Home lab monitoring scenarios with practical examples

Key Terminology

TermDefinition
PrometheusOpen-source monitoring system that collects metrics via a pull-based model
GrafanaOpen-source visualization platform for creating interactive dashboards
ExporterAgent that exposes metrics in Prometheus format from applications or systems
PromQLPrometheus Query Language for querying time-series data
AlertmanagerHandles alert deduplication, grouping, routing, and notifications
ScrapeThe process of Prometheus collecting metrics from endpoints
Time SeriesData points indexed by time, the core data type for monitoring
LabelKey-value pair that identifies a metric dimension (e.g., host="server1")

Estimated Time

TaskEstimated Time
Basic Docker Setup15-30 minutes
Manual Installation (Single OS)30-60 minutes
Full Stack with Alerting1-2 hours
Home Lab Complete Setup2-4 hours

Part 1: Understanding the Monitoring Stack

What is Prometheus?

Prometheus is an open-source systems monitoring and alerting toolkit, originally developed at SoundCloud in 2012. Itโ€™s now a CNCF (Cloud Native Computing Foundation) graduated project, widely adopted for its reliability in dynamic environments like Kubernetes. It collects and stores metrics as time-series data (numerical values with timestamps and optional labels like key-value pairs for multi-dimensional querying).

Current Version: Prometheus v3.9.0 (released January 6, 2026)

Prometheus v3.9.0 Key Features

FeatureDescription
Multi-dimensional Data ModelMetrics identified by name and labels (e.g., http_requests_total{method="GET", status="200"})
PromQL Query LanguageFlexible for aggregating, filtering, and analyzing data
Pull ModelScrapes metrics over HTTP from targets; supports push via gateways
Autonomous ServersNo distributed storage dependency; standalone for high reliability
Service DiscoveryIntegrates with Kubernetes, Consul, or static configs
Enhanced UTF-8 SupportFull UTF-8 compatibility for metric and label names
Improved UIModern interface with PromLens-style tree view
Remote Write 2.0Enhanced data handling with metadata, exemplars, and native histograms
OpenTelemetry SupportNative OTLP metrics ingestion
Native HistogramsMore efficient histogram implementation
Bug FixesImproved scraping and alerting stability from v3.8.x

How Prometheus Works

+-------------------------------------------------------------------+
|                    PROMETHEUS ARCHITECTURE                        |
+-------------------------------------------------------------------+
|                                                                   |
|   TARGETS                     PROMETHEUS SERVER                   |
|   +--------------+           +------------------------------+     |
|   | Node Exporter|-----+     | +-------------------------+  |     |
|   |  :9100       |     |     | |     Retrieval           |  |     |
|   +--------------+     |     | |   (Pull Metrics)        |  |     |
|   +--------------+     |     | +-----------+-------------+  |     |
|   | cAdvisor     |-----+---->|             v                |     |
|   |  :8080       |     |     | +-------------------------+  |     |
|   +--------------+     |     | |   Time Series DB (TSDB) |  |     |
|   +--------------+     |     | |   (Local Storage)       |  |     |
|   | App Metrics  |-----+     | +-----------+-------------+  |     |
|   |  :8000       |           |             v                |     |
|   +--------------+           | +-------------------------+  |     |
|                              | |   PromQL Engine         |  |     |
|   PUSHGATEWAY                | |   (Query Processing)    |  |     |
|   +--------------+           | +-----------+-------------+  |     |
|   | Batch Jobs   |---------->|             |                |     |
|   |  :9091       |           | +-----------v-------------+  |     |
|   +--------------+           | |   HTTP Server :9090     |  |     |
|                              | |   (Web UI / API)        |  |     |
|                              | +-------------------------+  |     |
|                              +------------------------------+     |
|                                          |                        |
|   ALERTMANAGER                           |                        |
|   +------------------+                   |                        |
|   | Deduplication    |<------------------+                        |
|   | Grouping/Routing |                                            |
|   | Notifications    |--> Email, Slack, PagerDuty, Discord        |
|   |  :9093           |                                            |
|   +------------------+                                            |
|                                                                   |
+-------------------------------------------------------------------+

Core Components

ComponentPurposeDefault Port
Prometheus ServerScrapes and stores metrics, provides query API9090
AlertmanagerHandles alerts, deduplication, routing9093
PushgatewayAccepts metrics from batch jobs9091
Node ExporterLinux system metrics9100
Windows ExporterWindows system metrics9182
cAdvisorContainer metrics8080
Blackbox ExporterProbe endpoints (HTTP, DNS, TCP, ICMP)9115

What is Grafana?

Grafana is an open-source analytics and visualization platform for querying, visualizing, alerting on, and exploring metrics, logs, and traces from diverse sources. It supports plugins for data sources like Prometheus, Loki, Elasticsearch, and SQL databases. Grafana OSS is free; Enterprise adds premium features like advanced authentication. Grafana Cloud is a hosted version with AI/ML enhancements.

Current Version: Grafana v12.3 (latest stable as of January 2026)

Grafana v12.3 Key Features

FeatureDescription
100+ Data Source PluginsConnect to Prometheus, InfluxDB, SQL databases, and more
Dynamic DashboardsFlexible panel layouts that adapt to screen sizes
Dashboard TabsOrganize complex dashboards with tabbed views
SQL ExpressionsJoin and transform data from multiple sources (JOINs across Loki/BigQuery)
Grafana AssistantAI-powered assistant for queries and dashboard creation
Unified AlertingAlert from any data source with notifications via Slack, email, etc.
Enhanced TablesFaster loading with 40,000+ rows support
Explore ModeAd-hoc queries for metrics, logs, and traces
Variables/TemplatesDynamic filters (e.g., select servers)
Observability as CodeProvision dashboards via Terraform/Ansible; Git Sync for IaC
Bug FixesImproved high-availability setups, minimal breaking changes from v11.x

Grafana Architecture

+-------------------------------------------------------------------+
|                      GRAFANA ARCHITECTURE                         |
+-------------------------------------------------------------------+
|                                                                   |
|   DATA SOURCES                   GRAFANA SERVER                   |
|   +-----------------+           +-------------------------+       |
|   |   Prometheus    |---------->|                         |       |
|   |   (Metrics)     |           |   +-----------------+   |       |
|   +-----------------+           |   |   Data Source   |   |       |
|   +-----------------+           |   |   Manager       |   |       |
|   |   Loki          |---------->|   +--------+--------+   |       |
|   |   (Logs)        |           |            |            |       |
|   +-----------------+           |   +--------v--------+   |       |
|   +-----------------+           |   |   Query Engine  |   |       |
|   |   InfluxDB      |---------->|   |   (Transform)   |   |       |
|   |   (Time Series) |           |   +--------+--------+   |       |
|   +-----------------+           |            |            |       |
|   +-----------------+           |   +--------v--------+   |       |
|   |   Elasticsearch |---------->|   |  Visualization  |   |       |
|   |   (Logs/Search) |           |   |  (Panels)       |   |       |
|   +-----------------+           |   +--------+--------+   |       |
|   +-----------------+           |            |            |       |
|   |   PostgreSQL    |---------->|   +--------v--------+   |       |
|   |   MySQL, MSSQL  |           |   |   Dashboard     |   |       |
|   +-----------------+           |   |   Renderer      |   |       |
|                                 |   +--------+--------+   |       |
|                                 |            |            |       |
|                                 |   +--------v--------+   |       |
|                                 |   |  HTTP Server    |   |       |
|                                 |   |  :3000          |   |       |
|                                 |   +-----------------+   |       |
|   ALERTING                      |            |            |       |
|   +-----------------+           |            |            |       |
|   |  Email          |<----------+  Alerting  |            |       |
|   |  Slack          |           |  Engine    |            |       |
|   |  Discord        |           |            |            |       |
|   |  PagerDuty      |           |            |            |       |
|   +-----------------+           +------------+------------+       |
|                                                                   |
+-------------------------------------------------------------------+

Grafana Ecosystem (LGTM Stack)

ComponentPurposeDescription
LokiLog AggregationLike Prometheus but for logs
GrafanaVisualizationDashboards and UI
TempoDistributed TracingTrace storage and querying
MimirLong-term MetricsScalable Prometheus storage

Part 2: Prerequisites and System Requirements

Hardware Requirements

ScaleRAMCPUStorage
Home Lab (5-10 hosts)2-4 GB2 cores20 GB SSD
Small Business (50 hosts)8-16 GB4 cores100 GB SSD
Enterprise (500+ hosts)32+ GB8+ cores500 GB+ NVMe

Software Prerequisites

ComponentRequired VersionPurpose
Docker20.10+Container runtime
Docker Composev2.0+Multi-container orchestration
GitAnyVersion control (optional)

Operating System Compatibility

OSSupport LevelNotes
Linux (Ubuntu/Debian)โœ… BestRecommended for production
Linux (RHEL/CentOS/Fedora)โœ… GoodFull support
Windowsโœ… GoodDocker Desktop or native
macOSโœ… GoodDocker Desktop recommended
Raspberry Piโœ… GoodARM builds available

Docker is the recommended installation method as it simplifies setup, updates, and dependency management.

Step 1: Install Docker

Linux (Ubuntu/Debian)

# Remove old Docker versions
for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do
  sudo apt-get remove -y $pkg 2>/dev/null
done

# Install prerequisites
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg

# Add Docker's official GPG key
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | 
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add Docker repository
echo 
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg]   https://download.docker.com/linux/ubuntu   $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | 
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker Engine
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io 
  docker-buildx-plugin docker-compose-plugin

# Add your user to the docker group
sudo usermod -aG docker $USER
newgrp docker

# Verify installation
docker --version
docker compose version

Command Explanation:

  • apt-get remove: Removes conflicting old Docker packages
  • curl: Downloads Dockerโ€™s GPG key for package verification
  • gpg --dearmor: Converts the key to the correct format
  • usermod -aG docker $USER: Allows running Docker without sudo

Linux (Fedora/RHEL/CentOS)

# Remove old versions
sudo dnf remove docker docker-client docker-client-latest 
  docker-common docker-latest docker-latest-logrotate 
  docker-logrotate docker-selinux docker-engine-selinux docker-engine

# Install Docker
sudo dnf -y install dnf-plugins-core
sudo dnf config-manager --add-repo 
  https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install -y docker-ce docker-ce-cli containerd.io 
  docker-buildx-plugin docker-compose-plugin

# Start and enable Docker
sudo systemctl start docker
sudo systemctl enable docker

# Add user to docker group
sudo usermod -aG docker $USER
newgrp docker

# Verify
docker --version

Windows (Docker Desktop)

  1. Enable Virtualization in BIOS/UEFI
  2. Install WSL2 (PowerShell as Administrator):
    wsl --install
    # Restart computer when prompted
  3. Download Docker Desktop from docker.com
  4. During installation, ensure โ€œUse WSL 2 instead of Hyper-Vโ€ is checked
  5. Configure Docker Desktop:
    • Settings โ†’ Resources โ†’ WSL Integration
    • Enable integration with your default WSL distro
  6. Verify installation:
    docker --version
    docker compose version

macOS (Docker Desktop)

# Install via Homebrew (recommended)
brew install --cask docker

# Or download from docker.com:
# - Apple Silicon (M1/M2/M3/M4): "Apple Chip" version
# - Intel Macs: "Intel Chip" version

After installation:

  1. Launch Docker from Applications
  2. Configure Resources (Settings โ†’ Resources โ†’ at least 4GB RAM)
  3. Verify:
    docker --version
    docker compose version

Step 2: Create Project Directory

# Linux/macOS
mkdir -p ~/monitoring-stack
cd ~/monitoring-stack
# Windows PowerShell
mkdir C:Users$env:USERNAMEDocumentsmonitoring-stack
cd C:Users$env:USERNAMEDocumentsmonitoring-stack

Step 3: Create Docker Compose Configuration

Create a file named docker-compose.yml:

# docker-compose.yml
# Complete Prometheus + Grafana Monitoring Stack
# Version: 2026-01

services:
  # ===========================================
  # PROMETHEUS - Metrics Collection & Storage
  # ===========================================
  prometheus:
    image: prom/prometheus:v3.9.0
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules:/etc/prometheus/rules:ro
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
      - '--web.enable-lifecycle'
    networks:
      - monitoring

  # ===========================================
  # GRAFANA - Visualization & Dashboards
  # ===========================================
  grafana:
    image: grafana/grafana:12.3.0
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_ROOT_URL=http://localhost:3000
      - GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    depends_on:
      - prometheus
    networks:
      - monitoring

  # ===========================================
  # NODE EXPORTER - Linux System Metrics
  # ===========================================
  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    networks:
      - monitoring

  # ===========================================
  # cADVISOR - Container Metrics
  # ===========================================
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    privileged: true
    devices:
      - /dev/kmsg
    networks:
      - monitoring

  # ===========================================
  # ALERTMANAGER - Alert Routing & Notifications
  # ===========================================
  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - alertmanager-data:/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    networks:
      - monitoring

  # ===========================================
  # BLACKBOX EXPORTER - Endpoint Probing
  # ===========================================
  blackbox-exporter:
    image: prom/blackbox-exporter:latest
    container_name: blackbox-exporter
    restart: unless-stopped
    ports:
      - "9115:9115"
    volumes:
      - ./blackbox/blackbox.yml:/etc/blackbox_exporter/config.yml:ro
    command:
      - '--config.file=/etc/blackbox_exporter/config.yml'
    networks:
      - monitoring

# ===========================================
# NETWORKS
# ===========================================
networks:
  monitoring:
    driver: bridge

# ===========================================
# VOLUMES (Persistent Data)
# ===========================================
volumes:
  prometheus-data:
  grafana-data:
  alertmanager-data:

Step 4: Create Configuration Files

Create Directory Structure

# Linux/macOS
mkdir -p prometheus/rules grafana/provisioning/datasources grafana/provisioning/dashboards alertmanager blackbox
# Windows PowerShell
New-Item -ItemType Directory -Path prometheus\rules, grafanaprovisioningdatasources, grafanaprovisioningdashboards, alertmanager, blackbox -Force

Prometheus Configuration

Create prometheus/prometheus.yml:

# prometheus/prometheus.yml
# Prometheus Configuration File

global:
  scrape_interval: 15s          # How often to scrape targets
  evaluation_interval: 15s       # How often to evaluate rules
  scrape_timeout: 10s           # Timeout for scrape requests

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

# Rule files for alerts
rule_files:
  - /etc/prometheus/rules/*.yml

# Scrape configurations
scrape_configs:
  # ===========================================
  # Prometheus itself
  # ===========================================
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']
        labels:
          instance: 'prometheus-server'

  # ===========================================
  # Node Exporter (Linux system metrics)
  # ===========================================
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']
        labels:
          instance: 'docker-host'

  # ===========================================
  # cAdvisor (Docker container metrics)
  # ===========================================
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
        labels:
          instance: 'docker-containers'

  # ===========================================
  # Alertmanager
  # ===========================================
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['alertmanager:9093']

  # ===========================================
  # Blackbox Exporter - Website Monitoring
  # ===========================================
  - job_name: 'blackbox-http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://google.com
          - https://github.com
          # Add your websites here
        labels:
          probe_type: 'website'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

  # ===========================================
  # Add more targets here
  # ===========================================
  # Example: Additional Linux servers
  # - job_name: 'linux-servers'
  #   static_configs:
  #     - targets: ['192.168.1.10:9100', '192.168.1.11:9100']
  #
  # Example: Windows servers
  # - job_name: 'windows-servers'
  #   static_configs:
  #     - targets: ['192.168.1.20:9182', '192.168.1.21:9182']

Alertmanager Configuration

Create alertmanager/alertmanager.yml:

# alertmanager/alertmanager.yml
# Alertmanager Configuration

global:
  resolve_timeout: 5m
  # SMTP settings for email notifications (optional)
  # smtp_smarthost: 'smtp.gmail.com:587'
  # smtp_from: 'alertmanager@example.com'
  # smtp_auth_username: 'your-email@gmail.com'
  # smtp_auth_password: 'your-app-password'

route:
  # Default receiver
  receiver: 'default-receiver'
  
  # Group alerts by these labels
  group_by: ['alertname', 'severity']
  
  # Wait before sending first notification
  group_wait: 30s
  
  # Wait before sending updates
  group_interval: 5m
  
  # Resend interval
  repeat_interval: 4h
  
  # Child routes for specific alerts
  routes:
    - match:
        severity: critical
      receiver: 'critical-alerts'
      continue: true

    - match:
        severity: warning
      receiver: 'warning-alerts'

receivers:
  - name: 'default-receiver'
    # Webhook receiver (e.g., for Discord, custom endpoints)
    # webhook_configs:
    #   - url: 'http://your-webhook-url'

  - name: 'critical-alerts'
    # Email for critical alerts
    # email_configs:
    #   - to: 'admin@example.com'
    #     subject: '๐Ÿšจ CRITICAL: {{ .GroupLabels.alertname }}'

  - name: 'warning-alerts'
    # Slack integration
    # slack_configs:
    #   - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    #     channel: '#alerts'
    #     title: 'โš ๏ธ {{ .GroupLabels.alertname }}'

# Inhibition rules (prevent duplicate alerts)
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Prometheus Alert Rules

Create prometheus/rules/alerts.yml:

# prometheus/rules/alerts.yml
# Prometheus Alert Rules

groups:
  - name: system-alerts
    rules:
      # ===========================================
      # Instance Down Alert
      # ===========================================
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} is down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

      # ===========================================
      # High CPU Usage
      # ===========================================
      - alert: HighCpuUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% (current: {{ $value | printf "%.2f" }}%)"

      # ===========================================
      # High Memory Usage
      # ===========================================
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is above 85% (current: {{ $value | printf "%.2f" }}%)"

      # ===========================================
      # Low Disk Space
      # ===========================================
      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{fstype!="tmpfs"} / node_filesystem_size_bytes{fstype!="tmpfs"}) * 100 < 15
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Disk {{ $labels.mountpoint }} has less than 15% free space (current: {{ $value | printf "%.2f" }}%)"

      # ===========================================
      # Container Restart
      # ===========================================
      - alert: ContainerRestarted
        expr: increase(container_last_seen{name!=""}[5m]) > 1
        for: 0m
        labels:
          severity: info
        annotations:
          summary: "Container {{ $labels.name }} restarted"
          description: "Container {{ $labels.name }} has been restarted."

  - name: website-alerts
    rules:
      # ===========================================
      # Website Down
      # ===========================================
      - alert: WebsiteDown
        expr: probe_success == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Website {{ $labels.instance }} is down"
          description: "The website {{ $labels.instance }} has been unreachable for more than 1 minute."

      # ===========================================
      # SSL Certificate Expiring
      # ===========================================
      - alert: SSLCertificateExpiringSoon
        expr: (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "SSL certificate expiring soon for {{ $labels.instance }}"
          description: "SSL certificate will expire in {{ $value | printf "%.0f" }} days"

      # ===========================================
      # Slow Website Response
      # ===========================================
      - alert: SlowWebsiteResponse
        expr: probe_duration_seconds > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Slow response from {{ $labels.instance }}"
          description: "Website responding slowly ({{ $value | printf "%.2f" }}s)"

Blackbox Exporter Configuration

Create blackbox/blackbox.yml:

# blackbox/blackbox.yml
# Blackbox Exporter Configuration

modules:
  # ===========================================
  # HTTP 2xx Check (Standard website check)
  # ===========================================
  http_2xx:
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200, 201, 202, 204, 301, 302, 303, 307, 308]
      method: GET
      follow_redirects: true
      fail_if_ssl: false
      fail_if_not_ssl: false
      tls_config:
        insecure_skip_verify: false
      preferred_ip_protocol: "ip4"

  # ===========================================
  # HTTP POST Check
  # ===========================================
  http_post_2xx:
    prober: http
    timeout: 10s
    http:
      method: POST
      valid_status_codes: [200, 201, 202, 204]

  # ===========================================
  # HTTPS with SSL Verification
  # ===========================================
  https_2xx:
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200]
      method: GET
      fail_if_not_ssl: true
      tls_config:
        insecure_skip_verify: false

  # ===========================================
  # TCP Check (Port connectivity)
  # ===========================================
  tcp_connect:
    prober: tcp
    timeout: 10s

  # ===========================================
  # DNS Check
  # ===========================================
  dns_check:
    prober: dns
    timeout: 10s
    dns:
      query_name: "google.com"
      query_type: "A"
      valid_rcodes:
        - NOERROR

  # ===========================================
  # ICMP Ping Check (requires privileged mode)
  # ===========================================
  icmp:
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"

Grafana Data Source Provisioning

Create grafana/provisioning/datasources/datasources.yml:

# grafana/provisioning/datasources/datasources.yml
# Auto-provision Prometheus as data source

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true
    jsonData:
      timeInterval: "15s"
      httpMethod: "POST"

Step 5: Start the Monitoring Stack

# Start all services
docker compose up -d

# View logs
docker compose logs -f

# Check service status
docker compose ps

Expected output:

NAME                 STATUS          PORTS
alertmanager         Up              0.0.0.0:9093->9093/tcp
blackbox-exporter    Up              0.0.0.0:9115->9115/tcp
cadvisor             Up              0.0.0.0:8080->8080/tcp
grafana              Up              0.0.0.0:3000->3000/tcp
node-exporter        Up              0.0.0.0:9100->9100/tcp
prometheus           Up              0.0.0.0:9090->9090/tcp

Step 6: Access Web Interfaces

ServiceURLDefault Credentials
Grafanahttp://localhost:3000admin / admin123
Prometheushttp://localhost:9090No auth
Alertmanagerhttp://localhost:9093No auth
Node Exporterhttp://localhost:9100/metricsNo auth
cAdvisorhttp://localhost:8080No auth

Part 4: Native Installation (Without Docker)

Linux Installation (Ubuntu/Debian)

Install Prometheus

# Create Prometheus user
sudo useradd --no-create-home --shell /bin/false prometheus

# Create directories
sudo mkdir -p /etc/prometheus /var/lib/prometheus

# Download Prometheus (check for latest version)
PROM_VERSION="2.54.0"
wget https://github.com/prometheus/prometheus/releases/download/v${PROM_VERSION}/prometheus-${PROM_VERSION}.linux-amd64.tar.gz

# Extract and install
tar xvfz prometheus-${PROM_VERSION}.linux-amd64.tar.gz
cd prometheus-${PROM_VERSION}.linux-amd64

sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/

# Set ownership
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

Create Prometheus Systemd Service

sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/introduction/overview/
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus 
    --config.file=/etc/prometheus/prometheus.yml 
    --storage.tsdb.path=/var/lib/prometheus/ 
    --web.console.templates=/etc/prometheus/consoles 
    --web.console.libraries=/etc/prometheus/console_libraries 
    --storage.tsdb.retention.time=30d 
    --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
# Reload systemd and start Prometheus
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus

Install Grafana

# Add Grafana GPG key and repository
sudo apt-get install -y apt-transport-https software-properties-common wget
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null

echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

# Install Grafana
sudo apt-get update
sudo apt-get install -y grafana

# Start Grafana
sudo systemctl daemon-reload
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
sudo systemctl status grafana-server

Install Node Exporter

# Create node_exporter user
sudo useradd --no-create-home --shell /bin/false node_exporter

# Download Node Exporter
NODE_VERSION="1.8.2"
wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_VERSION}/node_exporter-${NODE_VERSION}.linux-amd64.tar.gz

# Extract and install
tar xvfz node_exporter-${NODE_VERSION}.linux-amd64.tar.gz
sudo cp node_exporter-${NODE_VERSION}.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
# Create systemd service
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter 
    --collector.filesystem.mount-points-exclude="^/(sys|proc|dev|host|etc)($$|/)"
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter

macOS Installation (Homebrew)

# Update Homebrew
brew update

# Install Prometheus
brew install prometheus

# Install Grafana
brew install grafana

# Start services
brew services start prometheus
brew services start grafana

# Verify
brew services list

Configuration file locations on macOS:

  • Prometheus: /opt/homebrew/etc/prometheus.yml (Apple Silicon) or /usr/local/etc/prometheus.yml (Intel)
  • Grafana: /opt/homebrew/etc/grafana/grafana.ini (Apple Silicon)

Access:

  • Prometheus: http://localhost:9090
  • Grafana: http://localhost:3000 (default: admin/admin)

Windows Installation

Install Prometheus on Windows

  1. Download Prometheus from prometheus.io/download
  2. Extract to C:\prometheus
  3. Create configuration file C:\prometheus\prometheus.yml
  4. Run Prometheus:
    cd C:prometheus
    .prometheus.exe --config.file=prometheus.yml

Create Windows Service (PowerShell as Admin):

# Using NSSM (Non-Sucking Service Manager)
# Download NSSM from nssm.cc

nssm install Prometheus C:prometheusprometheus.exe
nssm set Prometheus AppParameters "--config.file=C:prometheusprometheus.yml"
nssm set Prometheus AppDirectory C:prometheus
nssm start Prometheus

Install Windows Exporter

  1. Download Windows Exporter MSI from GitHub releases
  2. Install the MSI (creates Windows service automatically)
  3. Verify at http://localhost:9182/metrics

Custom installation with specific collectors:

# Install with specific collectors
msiexec /i windows_exporter-0.29.0-amd64.msi ENABLED_COLLECTORS="cpu,cs,logical_disk,net,os,service,system"

Install Grafana on Windows

  1. Download Grafana from grafana.com/grafana/download
  2. Extract to C:\grafana
  3. Run:
    cd C:grafanain
    .grafana-server.exe
  4. Install as Service (PowerShell as Admin):
    nssm install Grafana C:grafanaingrafana-server.exe
    nssm set Grafana AppDirectory C:grafana
    nssm start Grafana

Part 5: Creating Grafana Dashboards

Importing Pre-Built Dashboards

Grafana has thousands of community dashboards available. Here are the most useful ones:

DashboardIDDescription
Node Exporter Full1860Comprehensive Linux metrics
Docker Container & Host Metrics893Docker + host monitoring
cAdvisor Exporter14282Container resource usage
Prometheus 2.0 Overview3662Prometheus server stats
Blackbox Exporter7587Website uptime monitoring
Home Server / Homelab15306Home lab overview

To import a dashboard:

  1. Go to Grafana โ†’ Dashboards โ†’ New โ†’ Import
  2. Enter the Dashboard ID
  3. Select your Prometheus data source
  4. Click Import

Creating Custom Dashboards

Basic PromQL Queries

MetricPromQL QueryDescription
CPU Usage100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)Overall CPU usage %
Memory Usage(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100Memory usage %
Disk Usage100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)Disk usage %
Network Inrate(node_network_receive_bytes_total[5m])Network receive rate
Network Outrate(node_network_transmit_bytes_total[5m])Network transmit rate
Container CPUrate(container_cpu_usage_seconds_total[5m]) * 100Container CPU usage
Container Memorycontainer_memory_usage_bytesContainer memory usage
Website Upprobe_success1 = up, 0 = down
Response Timeprobe_duration_secondsWebsite response time

PromQL Tips

# Average over time
avg_over_time(node_load1[1h])

# Rate of change
rate(node_network_receive_bytes_total[5m])

# Increase (counter increment)
increase(http_requests_total[1h])

# Top 5 by value
topk(5, node_filesystem_size_bytes)

# Filtering by label
node_cpu_seconds_total{mode="idle", instance="server1:9100"}

# Regex matching
node_filesystem_avail_bytes{mountpoint=~"/|/home"}

# Aggregation by label
sum by (instance) (rate(node_network_receive_bytes_total[5m]))

Part 6: Alerting Configuration

Grafana Alerting

Setting Up Email Notifications

  1. Configure SMTP in grafana.ini:

    [smtp]
    enabled = true
    host = smtp.gmail.com:587
    user = your-email@gmail.com
    password = your-app-password
    from_address = grafana@yourdomain.com
    from_name = Grafana Alerts
  2. Create Contact Point in Grafana:

    • Alerting โ†’ Contact points โ†’ Add contact point
    • Type: Email
    • Enter recipient addresses

Setting Up Slack Notifications

  1. Create Slack App at api.slack.com/apps
  2. Enable Incoming Webhooks
  3. Copy Webhook URL
  4. Create Contact Point in Grafana:
    • Type: Slack
    • Paste Webhook URL

Setting Up Discord Notifications

  1. Create Discord Webhook:
    • Server Settings โ†’ Integrations โ†’ Webhooks โ†’ Create Webhook
    • Copy Webhook URL
  2. Create Contact Point in Grafana:
    • Type: Discord (or Webhook)
    • Paste Discord Webhook URL

Part 7: Integrations with Other Data Sources

Connecting InfluxDB

InfluxDB is a time-series database ideal for IoT and high-write scenarios.

# Add to docker-compose.yml
  influxdb:
    image: influxdb:2.7
    container_name: influxdb
    ports:
      - "8086:8086"
    volumes:
      - influxdb-data:/var/lib/influxdb2
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=admin
      - DOCKER_INFLUXDB_INIT_PASSWORD=password123
      - DOCKER_INFLUXDB_INIT_ORG=homelab
      - DOCKER_INFLUXDB_INIT_BUCKET=metrics
    networks:
      - monitoring

In Grafana:

  1. Configuration โ†’ Data Sources โ†’ Add data source
  2. Select InfluxDB
  3. Configure:
    • Query Language: Flux
    • URL: http://influxdb:8086
    • Organization: homelab
    • Token: (your InfluxDB token)
    • Default Bucket: metrics

Prometheus Remote Write to InfluxDB:

You can configure Prometheus to write metrics directly to InfluxDB for long-term storage:

# Add to prometheus.yml
remote_write:
  - url: http://influxdb:8086/api/v1/prom/write?db=prometheus&u=admin&p=password123

Use Case: Store long-term metrics in InfluxDB while using Prometheus for real-time monitoring. Ideal for IoT and home sensor data.

Connecting Elasticsearch

Elasticsearch is powerful for log analysis and full-text search.

# Add to docker-compose.yml
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    container_name: elasticsearch
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
    networks:
      - monitoring

In Grafana:

  1. Configuration โ†’ Data Sources โ†’ Add data source
  2. Select Elasticsearch
  3. Configure:
    • URL: http://elasticsearch:9200
    • Index name: your-index-pattern-*
    • Time field name: @timestamp

Elasticsearch Exporter for Prometheus:

Scrap Elasticsearch cluster metrics into Prometheus:

docker run -p 9114:9114 quay.io/prometheuscommunity/elasticsearch-exporter --es.uri=http://elasticsearch:9200

Add to Prometheus scrape config:

- job_name: 'elasticsearch'
  static_configs:
    - targets: ['elasticsearch-exporter:9114']

Use Case: Combine metrics with logs for correlation (e.g., high CPU + error logs). Log home network events and alert on anomalies via Prometheus, then drill into logs in Grafana.

Connecting SQL Databases

PostgreSQL

# Add to docker-compose.yml
  postgres:
    image: postgres:16
    container_name: postgres
    environment:
      - POSTGRES_DB=monitoring
      - POSTGRES_USER=grafana
      - POSTGRES_PASSWORD=password123
    ports:
      - "5432:5432"
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - monitoring

In Grafana:

  1. Configuration โ†’ Data Sources โ†’ Add data source
  2. Select PostgreSQL
  3. Configure:
    • Host: postgres:5432
    • Database: monitoring
    • User: grafana
    • Password: password123

MySQL

In Grafana:

  1. Configuration โ†’ Data Sources โ†’ Add data source
  2. Select MySQL
  3. Configure connection details

SQL Query Example:

SELECT
  timestamp AS "time",
  cpu_usage,
  memory_usage
FROM server_stats
WHERE host = '$host'
ORDER BY timestamp

Use Case: Visualize relational data alongside metrics. Track personal finance data, application logs stored in DB, or business analytics dashboards.


Part 8: Home Lab Use Cases

Use Case 1: Raspberry Pi Monitoring

Monitor temperature, CPU, and memory on Raspberry Pi:

# Extra Prometheus scrape config for Pi
- job_name: 'raspberry-pi'
  static_configs:
    - targets: ['192.168.1.50:9100']
      labels:
        device: 'raspberry-pi-4'

Temperature monitoring query:

node_hwmon_temp_celsius{chip="cpu_thermal"}

Use Case 2: Smart Home / IoT Monitoring

Monitor MQTT-based sensors with custom exporters:

# Add MQTT exporter
  mqtt-exporter:
    image: kpetrem/mqtt-exporter:latest
    container_name: mqtt-exporter
    environment:
      - MQTT_ADDRESS=your-mqtt-broker
      - MQTT_TOPIC=sensors/#
    ports:
      - "9344:9344"
    networks:
      - monitoring

Use Case 3: Network Device Monitoring (SNMP)

Monitor routers, switches, and other SNMP devices:

  snmp-exporter:
    image: prom/snmp-exporter:latest
    container_name: snmp-exporter
    ports:
      - "9116:9116"
    volumes:
      - ./snmp/snmp.yml:/etc/snmp_exporter/snmp.yml
    networks:
      - monitoring

Use Case 4: Docker Container Monitoring

Monitor all your containerized services:

# Container CPU usage
rate(container_cpu_usage_seconds_total{name!=""}[5m]) * 100

# Container memory
container_memory_usage_bytes{name!=""} / 1024 / 1024

# Container network I/O
rate(container_network_receive_bytes_total[5m])

Use Case 5: Website Uptime Monitoring

Monitor your personal websites and services:

# Add to prometheus.yml scrape_configs
- job_name: 'my-websites'
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
    - targets:
        - https://myblog.com
        - https://myapp.example.com
        - http://192.168.1.100:8080
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: blackbox-exporter:9115

Part 9: Maintenance and Best Practices

Updating Components

Docker Update

# Pull latest images
docker compose pull

# Recreate containers with new images
docker compose up -d

# Remove old images
docker image prune -f

Native Installation Update

# Linux - Prometheus
sudo systemctl stop prometheus
# Download new version and replace binary
sudo systemctl start prometheus

# Linux - Grafana
sudo apt-get update && sudo apt-get upgrade grafana

# macOS
brew upgrade prometheus grafana
brew services restart prometheus grafana

Backup Strategies

Docker Volumes Backup

# Create backup directory
mkdir -p ~/monitoring-backups

# Backup Prometheus data
docker run --rm 
  -v monitoring-stack_prometheus-data:/data 
  -v ~/monitoring-backups:/backup 
  alpine tar cvf /backup/prometheus-backup.tar /data

# Backup Grafana data
docker run --rm 
  -v monitoring-stack_grafana-data:/data 
  -v ~/monitoring-backups:/backup 
  alpine tar cvf /backup/grafana-backup.tar /data

Retention and Storage

# Prometheus command flags
command:
  - '--storage.tsdb.retention.time=90d'  # Keep 90 days of data
  - '--storage.tsdb.retention.size=10GB' # Or limit by size

Security Recommendations

  1. Change default passwords immediately
  2. Use HTTPS with a reverse proxy (Nginx, Traefik, Caddy)
  3. Enable authentication on Prometheus and Alertmanager
  4. Use firewall rules to limit access
  5. Regular updates for security patches

Part 10: Troubleshooting

Common Issues and Solutions

IssueCauseSolution
Prometheus not scrapingTarget unreachableCheck network, firewall, target status
Grafana canโ€™t connect to PrometheusWrong URLUse Docker service name (e.g., http://prometheus:9090)
Node Exporter permission deniedMissing volumesEnsure /proc, /sys, / are mounted
cAdvisor not startingMissing privilegesAdd privileged: true and device mounts
High memory usageToo many metricsEnable metric filtering, reduce retention
Alerts not firingWrong expressionTest query in Prometheus UI first

Useful Commands

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets

# Check active alerts
curl http://localhost:9090/api/v1/alerts

# Reload Prometheus config
curl -X POST http://localhost:9090/-/reload

# Check Grafana health
curl http://localhost:3000/api/health

# View container logs
docker compose logs prometheus
docker compose logs grafana

Conclusion

You now have a complete understanding of Grafana and Prometheus for monitoring. This guide covered:

  • โœ… Architecture and components of both systems
  • โœ… Docker and native installation on all platforms
  • โœ… Configuration of exporters for various use cases
  • โœ… Creating dashboards and writing PromQL queries
  • โœ… Setting up alerting with multiple notification channels
  • โœ… Integrating with InfluxDB, Elasticsearch, and SQL databases
  • โœ… Practical home lab monitoring scenarios
  • โœ… Maintenance, backup, and security best practices

Next Steps

  1. Explore Grafana Loki for log aggregation (see companion guide)
  2. Set up Grafana Tempo for distributed tracing
  3. Consider Grafana Mimir for long-term metrics storage
  4. Kubernetes Home Lab: Use kube-prometheus-stack Helm chart:
    helm install prom prometheus-community/kube-prometheus-stack
  5. Join communities: r/homelab, r/selfhosted, Grafana Community Forums

Validation Tip: Test scraping by running curl http://localhost:9090/metrics. In Grafana, verify data source connectivity via the โ€œTestโ€ button.

Resources


๐Ÿ’ก Tip: For advanced log aggregation and a complete observability stack, see our companion guide: โ€œGrafana Loki & Advanced Observability: Complete Home Lab Guideโ€

Comments

Sign in to join the discussion!

Your comments help others in the community.