๐ŸŽฏ New! Master certifications with Performance-Based Questions (PBQ) โ€” realistic hands-on practice for CompTIA & Cisco exams!

Grafana Loki & Advanced Observability 2026: Log Aggregation, Tracing & Complete Home Lab Stack

Published on January 12, 2026


Introduction

While Prometheus and Grafana provide excellent metrics monitoring, comprehensive observability requires three pillars: Metrics, Logs, and Traces. This guide covers the advanced observability components that complete your monitoring stack:

  • Grafana Loki โ€“ Log aggregation (like Prometheus, but for logs)
  • Grafana Tempo โ€“ Distributed tracing
  • Grafana Mimir โ€“ Long-term metrics storage
  • Grafana Alloy โ€“ Unified telemetry collector

This is the companion guide to โ€œGrafana & Prometheus Complete Guideโ€ โ€“ ensure you have the basic stack running before proceeding.

The LGTM Stack

ComponentPurposeAnalogy
LokiLog aggregationโ€œPrometheus for logsโ€
GrafanaVisualizationDashboard and UI
TempoDistributed tracingRequest flow tracking
MimirLong-term metricsScalable Prometheus storage

Key Terminology

TermDefinition
Log AggregationCollecting logs from multiple sources into a centralized system
PromtailOfficial log collector agent for Loki
AlloyModern, unified telemetry collector (successor to Promtail and Grafana Agent)
LogQLLokiโ€™s query language for searching and filtering logs
TraceA record of a requestโ€™s journey through distributed systems
SpanA single unit of work within a trace
OTLPOpenTelemetry Protocol for telemetry data

Prerequisites

Before starting this guide, ensure you have:

  • โœ… Basic Grafana + Prometheus stack running (see companion guide)
  • โœ… Docker and Docker Compose installed
  • โœ… At least 4GB RAM (8GB recommended for full stack)
  • โœ… Basic understanding of containers and networking

Part 1: Understanding Grafana Loki

What is Loki?

Grafana Loki is an open-source, horizontally-scalable, multi-tenant log aggregation system inspired by Prometheus. Unlike traditional log management systems (like Elasticsearch/ELK), Loki takes a unique approach:

  • Indexes only labels (metadata), not full log content
  • Highly efficient storage and resource usage
  • Similar query model to Prometheus (labels-based)
  • Seamless Grafana integration with log exploration

Loki vs Traditional Log Systems

FeatureLokiElasticsearch (ELK)
IndexingLabels onlyFull-text content
StorageVery efficientResource-heavy
Query LanguageLogQLLucene/KQL
Resource UsageLowHigh
Learning CurveEasy (Prometheus-like)Moderate
Best ForInfrastructure logsFull-text search

Loki Architecture

+-------------------------------------------------------------------+
|                     GRAFANA LOKI STACK                            |
+-------------------------------------------------------------------+
|                                                                   |
|   LOG SOURCES                   COLLECTORS                        |
|   +--------------+             +--------------+                   |
|   | Docker       |------------>|  Promtail /  |                   |
|   | Containers   |             |  Alloy       |                   |
|   +--------------+             +------+-------+                   |
|   +--------------+                    |                           |
|   | System Logs  |--------------------|                           |
|   | (/var/log)   |                    |                           |
|   +--------------+                    |                           |
|   +--------------+                    v                           |
|   | Application  |             +--------------+                   |
|   | Logs         |------------>|    LOKI      |                   |
|   +--------------+             |  +--------+  |                   |
|                                |  |Ingester|  |                   |
|   +--------------+             |  +--------+  |                   |
|   | Syslog       |-------------|  +--------+  |                   |
|   | (UDP/TCP)    |             |  |Querier |  |                   |
|   +--------------+             |  +--------+  |                   |
|                                |  +--------+  |                   |
|                                |  |Storage |  |                   |
|                                |  +--------+  |                   |
|                                +-------+------+                   |
|                                        |                          |
|                                        v                          |
|                                +--------------+                   |
|                                |   GRAFANA    |                   |
|                                |  (Explore +  |                   |
|                                |  Dashboards) |                   |
|                                +--------------+                   |
|                                                                   |
+-------------------------------------------------------------------+

Part 2: Complete LGTM Stack Deployment

Full Docker Compose Configuration

This configuration includes all observability components for a complete home lab setup:

Create a new directory and docker-compose.yml:

mkdir -p ~/lgtm-stack
cd ~/lgtm-stack
# docker-compose.yml
# Complete LGTM Observability Stack
# Includes: Loki, Grafana, Tempo, Mimir, Prometheus, and all exporters

services:
  # ===========================================
  # GRAFANA LOKI - Log Aggregation
  # ===========================================
  loki:
    image: grafana/loki:latest
    container_name: loki
    restart: unless-stopped
    ports:
      - "3100:3100"
    volumes:
      - ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - observability

  # ===========================================
  # PROMTAIL - Log Collector
  # ===========================================
  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    restart: unless-stopped
    volumes:
      - ./promtail/promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    command: -config.file=/etc/promtail/config.yml
    depends_on:
      - loki
    networks:
      - observability

  # ===========================================
  # GRAFANA ALLOY - Modern Telemetry Collector
  # (Alternative to Promtail + Grafana Agent)
  # ===========================================
  # alloy:
  #   image: grafana/alloy:latest
  #   container_name: alloy
  #   restart: unless-stopped
  #   ports:
  #     - "12345:12345"
  #     - "4317:4317"   # OTLP gRPC
  #     - "4318:4318"   # OTLP HTTP
  #   volumes:
  #     - ./alloy/config.alloy:/etc/alloy/config.alloy:ro
  #     - /var/log:/var/log:ro
  #     - /var/lib/docker/containers:/var/lib/docker/containers:ro
  #   command:
  #     - run
  #     - /etc/alloy/config.alloy
  #     - --server.http.listen-addr=0.0.0.0:12345
  #   networks:
  #     - observability

  # ===========================================
  # GRAFANA TEMPO - Distributed Tracing
  # ===========================================
  tempo:
    image: grafana/tempo:latest
    container_name: tempo
    restart: unless-stopped
    ports:
      - "3200:3200"     # Tempo HTTP
      - "4317:4317"     # OTLP gRPC
      - "4318:4318"     # OTLP HTTP
      - "9411:9411"     # Zipkin
      - "14268:14268"   # Jaeger ingest
    volumes:
      - ./tempo/tempo-config.yml:/etc/tempo/tempo.yaml:ro
      - tempo-data:/tmp/tempo
    command: -config.file=/etc/tempo/tempo.yaml
    networks:
      - observability

  # ===========================================
  # PROMETHEUS - Metrics Collection
  # ===========================================
  prometheus:
    image: prom/prometheus:v3.9.0
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/rules:/etc/prometheus/rules:ro
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=15d'
      - '--web.enable-lifecycle'
      - '--web.enable-remote-write-receiver'
    networks:
      - observability

  # ===========================================
  # GRAFANA - Visualization
  # ===========================================
  grafana:
    image: grafana/grafana:12.3.0
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin123
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
      - GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    depends_on:
      - prometheus
      - loki
      - tempo
    networks:
      - observability

  # ===========================================
  # NODE EXPORTER - Linux System Metrics
  # ===========================================
  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    networks:
      - observability

  # ===========================================
  # cADVISOR - Container Metrics
  # ===========================================
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    privileged: true
    devices:
      - /dev/kmsg
    networks:
      - observability

  # ===========================================
  # ALERTMANAGER - Alert Routing
  # ===========================================
  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - alertmanager-data:/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
    networks:
      - observability

  # ===========================================
  # BLACKBOX EXPORTER - Endpoint Probing
  # ===========================================
  blackbox-exporter:
    image: prom/blackbox-exporter:latest
    container_name: blackbox-exporter
    restart: unless-stopped
    ports:
      - "9115:9115"
    volumes:
      - ./blackbox/blackbox.yml:/etc/blackbox_exporter/config.yml:ro
    command:
      - '--config.file=/etc/blackbox_exporter/config.yml'
    networks:
      - observability

# ===========================================
# NETWORKS
# ===========================================
networks:
  observability:
    driver: bridge

# ===========================================
# VOLUMES
# ===========================================
volumes:
  prometheus-data:
  grafana-data:
  loki-data:
  tempo-data:
  alertmanager-data:

Configuration Files

Create Directory Structure

mkdir -p loki promtail tempo prometheus/rules grafana/provisioning/datasources grafana/provisioning/dashboards alertmanager blackbox

Loki Configuration

Create loki/loki-config.yml:

# loki/loki-config.yml
# Grafana Loki Configuration for Home Lab

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://alertmanager:9093

# Frontend settings
frontend:
  max_outstanding_per_tenant: 4096

# Ingester settings
ingester:
  chunk_encoding: snappy
  
# Limits for home lab use
limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  max_streams_per_user: 10000
  max_line_size: 256kb

# Compactor settings
compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

Promtail Configuration

Create promtail/promtail-config.yml:

# promtail/promtail-config.yml
# Promtail Log Collector Configuration

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # ===========================================
  # Docker Container Logs
  # ===========================================
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      # Keep container name as label
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      # Add container ID
      - source_labels: ['__meta_docker_container_id']
        target_label: 'container_id'
      # Add image name
      - source_labels: ['__meta_docker_container_image']
        target_label: 'image'
      # Add compose project if available
      - source_labels: ['__meta_docker_container_label_com_docker_compose_project']
        target_label: 'compose_project'
      # Add compose service if available
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'compose_service'

  # ===========================================
  # System Logs (Linux)
  # ===========================================
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          __path__: /var/log/syslog

  - job_name: auth
    static_configs:
      - targets:
          - localhost
        labels:
          job: auth
          __path__: /var/log/auth.log

  - job_name: kernel
    static_configs:
      - targets:
          - localhost
        labels:
          job: kernel
          __path__: /var/log/kern.log

  # ===========================================
  # Journal Logs (systemd)
  # ===========================================
  - job_name: journal
    journal:
      max_age: 12h
      labels:
        job: journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'
      - source_labels: ['__journal__hostname']
        target_label: 'hostname'
      - source_labels: ['__journal_priority_keyword']
        target_label: 'level'

  # ===========================================
  # Custom Application Logs
  # ===========================================
  # Uncomment and modify for your applications
  # - job_name: myapp
  #   static_configs:
  #     - targets:
  #         - localhost
  #       labels:
  #         job: myapp
  #         __path__: /var/log/myapp/*.log

Tempo Configuration

Create tempo/tempo-config.yml:

# tempo/tempo-config.yml
# Grafana Tempo Distributed Tracing Configuration

server:
  http_listen_port: 3200

distributor:
  receivers:
    jaeger:
      protocols:
        thrift_http:
        grpc:
        thrift_binary:
        thrift_compact:
    zipkin:
    otlp:
      protocols:
        http:
        grpc:
    opencensus:

ingester:
  max_block_duration: 5m

compactor:
  compaction:
    block_retention: 48h

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: home-lab
  storage:
    path: /tmp/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true

storage:
  trace:
    backend: local
    wal:
      path: /tmp/tempo/wal
    local:
      path: /tmp/tempo/blocks

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics]

Prometheus Configuration (Updated)

Create prometheus/prometheus.yml:

# prometheus/prometheus.yml
# Prometheus Configuration with LGTM Stack Integration

global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: home-lab
    replica: 1

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - /etc/prometheus/rules/*.yml

scrape_configs:
  # Prometheus self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']

  # Node Exporter
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']
        labels:
          instance: 'docker-host'

  # cAdvisor
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  # Alertmanager
  - job_name: 'alertmanager'
    static_configs:
      - targets: ['alertmanager:9093']

  # Loki
  - job_name: 'loki'
    static_configs:
      - targets: ['loki:3100']

  # Tempo
  - job_name: 'tempo'
    static_configs:
      - targets: ['tempo:3200']

  # Blackbox - Website Monitoring
  - job_name: 'blackbox-http'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://google.com
          - https://github.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

Grafana Data Source Provisioning

Create grafana/provisioning/datasources/datasources.yml:

# grafana/provisioning/datasources/datasources.yml
# Auto-provision all LGTM data sources

apiVersion: 1

datasources:
  # Prometheus - Metrics
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: true
    jsonData:
      timeInterval: "15s"
      httpMethod: "POST"

  # Loki - Logs
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    editable: true
    jsonData:
      maxLines: 1000
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: "traceID=(\w+)"
          name: TraceID
          url: "$${__value.raw}"

  # Tempo - Traces
  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    editable: true
    jsonData:
      httpMethod: GET
      tracesToLogs:
        datasourceUid: loki
        tags: ['job', 'instance', 'pod', 'namespace']
        mappedTags: [{ key: 'service.name', value: 'service' }]
        mapTagNamesEnabled: true
        spanStartTimeShift: '1h'
        spanEndTimeShift: '1h'
        filterByTraceID: true
        filterBySpanID: false
      tracesToMetrics:
        datasourceUid: prometheus
        tags: [{ key: 'service.name', value: 'service' }]
        queries:
          - name: 'Request rate'
            query: 'sum(rate(tempo_spanmetrics_latency_count{$$__tags}[5m]))'
      serviceMap:
        datasourceUid: prometheus
      nodeGraph:
        enabled: true
      lokiSearch:
        datasourceUid: loki

Alertmanager Configuration

Create alertmanager/alertmanager.yml:

# alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  receiver: 'default-receiver'
  group_by: ['alertname', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: 'default-receiver'

Blackbox Exporter Configuration

Create blackbox/blackbox.yml:

# blackbox/blackbox.yml
modules:
  http_2xx:
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200, 201, 202, 204, 301, 302, 303, 307, 308]
      method: GET
      follow_redirects: true
      preferred_ip_protocol: "ip4"
  
  tcp_connect:
    prober: tcp
    timeout: 10s

  icmp:
    prober: icmp
    timeout: 5s

Deploy the Stack

# Start all services
docker compose up -d

# View logs
docker compose logs -f

# Check status
docker compose ps

Access Web Interfaces

ServiceURLCredentials
Grafanahttp://localhost:3000admin / admin123
Prometheushttp://localhost:9090None
Lokihttp://localhost:3100/readyNone
Tempohttp://localhost:3200/readyNone
Alertmanagerhttp://localhost:9093None

Part 3: Working with Loki and LogQL

LogQL Basics

LogQL is Lokiโ€™s query language, similar to PromQL but for logs.

Log Stream Selectors

# Select logs from a specific container
{container="grafana"}

# Select logs from a job
{job="syslog"}

# Multiple label matchers
{container="prometheus", compose_project="lgtm-stack"}

# Regex matching
{container=~"grafana|loki|tempo"}

# Exclude specific containers
{container!="cadvisor"}

Log Pipeline

# Filter logs containing "error"
{container="myapp"} |= "error"

# Case-insensitive search
{container="myapp"} |~ "(?i)error"

# Exclude lines with "debug"
{container="myapp"} != "debug"

# Parse JSON logs
{container="myapp"} | json

# Parse with pattern
{job="nginx"} | pattern `<ip> - - [<_>] "<method> <uri> <_>" <status> <size>`

# Extract and filter fields
{container="myapp"} | json | level="error"

# Count errors per minute
sum(rate({container="myapp"} |= "error" [1m])) by (container)

Common LogQL Queries

Use CaseQuery
View all container logs{job="docker"}
Search for errors{job="docker"} \|= "error"
Auth failures{job="auth"} \|= "Failed"
Rate of errorssum(rate({job="docker"} \|= "error" [5m])) by (container)
Top sourcestopk(10, sum(rate({job="docker"} [1h])) by (container))
JSON error logs{container="myapp"} \| json \| level="error"

Creating Loki Dashboards

Log Volume Panel

sum(rate({job="docker"}[5m])) by (container)

Error Rate Panel

sum(rate({job="docker"} |= "error" [5m])) by (container)
/ 
sum(rate({job="docker"} [5m])) by (container)
* 100

Log Table Panel

Simply use a log stream selector:

{container=~".+"} | json

Part 4: Home Lab Monitoring Scenarios

Scenario 1: Proxmox VE Monitoring

Monitor your Proxmox hypervisor with metrics and logs.

Prometheus Configuration for Proxmox

# Add to prometheus.yml
scrape_configs:
  - job_name: 'proxmox'
    static_configs:
      - targets: ['192.168.1.100:9221']  # PVE Exporter
    metrics_path: /pve
    params:
      module: [default]

Install PVE Exporter on Proxmox

# On Proxmox host
apt update && apt install -y python3-pip
pip3 install prometheus-pve-exporter

# Create config
cat > /etc/prometheus/pve.yml << EOF
default:
    user: root@pam
    token_name: "prometheus"
    token_value: "your-token-here"
    verify_ssl: false
EOF

# Create systemd service
cat > /etc/systemd/system/pve-exporter.service << EOF
[Unit]
Description=Prometheus PVE Exporter
After=network.target

[Service]
ExecStart=/usr/local/bin/pve_exporter /etc/prometheus/pve.yml
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl enable --now pve-exporter

Collect Proxmox Logs with Promtail

Add to promtail-config.yml:

scrape_configs:
  - job_name: proxmox
    static_configs:
      - targets:
          - localhost
        labels:
          job: proxmox
          host: pve-node1
          __path__: /var/log/pve-firewall.log
      - targets:
          - localhost
        labels:
          job: proxmox
          host: pve-node1
          __path__: /var/log/pvedaemon.log

Scenario 2: Synology NAS Monitoring

SNMP Exporter for Synology

# Add to docker-compose.yml
  snmp-exporter:
    image: prom/snmp-exporter:latest
    container_name: snmp-exporter
    ports:
      - "9116:9116"
    volumes:
      - ./snmp/snmp.yml:/etc/snmp_exporter/snmp.yml:ro
    networks:
      - observability

Prometheus Config for Synology

# Add to prometheus.yml
- job_name: 'synology'
  static_configs:
    - targets:
        - 192.168.1.50  # Your Synology IP
  metrics_path: /snmp
  params:
    module: [synology]
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: snmp-exporter:9116

Scenario 3: TrueNAS Monitoring

Enable SNMP on TrueNAS

  1. Go to Services โ†’ SNMP
  2. Enable SNMP service
  3. Set community string (e.g., โ€œpublicโ€)

Configure Prometheus

- job_name: 'truenas'
  static_configs:
    - targets:
        - 192.168.1.60  # Your TrueNAS IP
  metrics_path: /snmp
  params:
    module: [if_mib]
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: snmp-exporter:9116

Scenario 4: Unraid Monitoring

Install Netdata on Unraid

  1. Go to Apps in Unraid
  2. Search for โ€œNetdataโ€
  3. Install the container

Add Netdata to Prometheus

- job_name: 'unraid-netdata'
  metrics_path: /api/v1/allmetrics
  params:
    format: [prometheus]
  static_configs:
    - targets: ['192.168.1.70:19999']
      labels:
        instance: 'unraid-server'

Scenario 5: Smart Home / Home Assistant

Home Assistant Integration

Add to configuration.yaml in Home Assistant:

prometheus:
  namespace: homeassistant
  filter:
    include_domains:
      - sensor
      - binary_sensor
      - switch
      - climate
    exclude_entities:
      - sensor.time

Prometheus Config

- job_name: 'homeassistant'
  scrape_interval: 60s
  metrics_path: /api/prometheus
  bearer_token: 'YOUR_LONG_LIVED_ACCESS_TOKEN'
  static_configs:
    - targets: ['192.168.1.80:8123']

Scenario 6: Pi-hole DNS Monitoring

Enable Pi-hole Exporter

# Add to docker-compose.yml
  pihole-exporter:
    image: ekofr/pihole-exporter:latest
    container_name: pihole-exporter
    environment:
      - PIHOLE_HOSTNAME=192.168.1.1
      - PIHOLE_API_TOKEN=your-api-token
    ports:
      - "9617:9617"
    networks:
      - observability
# Add to prometheus.yml
- job_name: 'pihole'
  static_configs:
    - targets: ['pihole-exporter:9617']

Part 5: Distributed Tracing with Tempo

Understanding Traces

Distributed tracing helps you understand how requests flow through your systems:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    REQUEST TRACE EXAMPLE                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                               โ”‚
โ”‚ [Frontend] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
โ”‚     โ”‚                                                         โ”‚
โ”‚     โ”œโ”€โ–บ [API Gateway] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
โ”‚     โ”‚       โ”‚                                                 โ”‚
โ”‚     โ”‚       โ”œโ”€โ–บ [Auth Service] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚     โ”‚       โ”‚       โ”‚                                         โ”‚
โ”‚     โ”‚       โ”‚       โ””โ”€โ–บ [Database] โ”€โ”€โ”€โ”€โ”€โ”€                    โ”‚
โ”‚     โ”‚       โ”‚                                                 โ”‚
โ”‚     โ”‚       โ””โ”€โ–บ [User Service] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”‚
โ”‚     โ”‚               โ”‚                                         โ”‚
โ”‚     โ”‚               โ””โ”€โ–บ [Cache] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                     โ”‚
โ”‚     โ”‚                                                         โ”‚
โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
โ”‚     0ms        50ms       100ms      150ms      200ms        โ”‚
โ”‚                                                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Instrumenting Applications

Python Application Example

# Install: pip install opentelemetry-sdk opentelemetry-exporter-otlp

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

# Configure tracing
resource = Resource(attributes={"service.name": "my-python-app"})
provider = TracerProvider(resource=resource)

otlp_exporter = OTLPSpanExporter(
    endpoint="http://tempo:4317",
    insecure=True
)

provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

# Use in your code
with tracer.start_as_current_span("my-operation") as span:
    span.set_attribute("user.id", "12345")
    # Your code here

Node.js Application Example

// Install: npm install @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-grpc

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://tempo:4317',
  }),
  serviceName: 'my-node-app',
});

sdk.start();

Querying Traces in Grafana

  1. Go to Explore in Grafana
  2. Select Tempo data source
  3. Use TraceQL queries:
# Find traces by service
{resource.service.name="my-app"}

# Find slow traces
{duration > 500ms}

# Find error traces
{status=error}

# Complex query
{resource.service.name="api" && span.http.status_code >= 500}

Part 6: Long-Term Storage with Mimir

When to Use Mimir

Use Grafana Mimir when:

  • You need to store metrics for months or years
  • You have multiple Prometheus servers
  • You need horizontal scaling
  • You want multi-tenancy

Basic Mimir Setup

# Add to docker-compose.yml
  mimir:
    image: grafana/mimir:latest
    container_name: mimir
    ports:
      - "9009:9009"
    volumes:
      - ./mimir/mimir.yml:/etc/mimir/mimir.yaml:ro
      - mimir-data:/data
    command:
      - --config.file=/etc/mimir/mimir.yaml
    networks:
      - observability

Create mimir/mimir.yml:

# mimir/mimir.yml
# Grafana Mimir Configuration (Single-Node)

multitenancy_enabled: false

blocks_storage:
  backend: filesystem
  filesystem:
    dir: /data/blocks

compactor:
  data_dir: /data/compactor
  sharding_ring:
    kvstore:
      store: memberlist

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  ring:
    kvstore:
      store: memberlist
    replication_factor: 1

ruler_storage:
  backend: filesystem
  filesystem:
    dir: /data/rules

server:
  http_listen_port: 9009
  log_level: info

store_gateway:
  sharding_ring:
    replication_factor: 1

Configure Prometheus Remote Write

Add to prometheus.yml:

remote_write:
  - url: http://mimir:9009/api/v1/push

Add Mimir as Grafana Data Source

# Add to datasources.yml
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    editable: true

Part 7: Using Grafana Alloy

What is Alloy?

Grafana Alloy is the modern, unified telemetry collector that replaces:

  • Promtail (for logs)
  • Grafana Agent (for metrics/traces)

It uses a configuration language similar to HCL and supports:

  • Metrics collection (Prometheus-compatible)
  • Log collection (Loki-compatible)
  • Trace collection (Tempo-compatible)
  • OpenTelemetry Protocol (OTLP)

Alloy Configuration Example

Create alloy/config.alloy:

// alloy/config.alloy
// Grafana Alloy Configuration

// ==========================================
// Logging Configuration
// ==========================================

logging {
  level  = "info"
  format = "logfmt"
}

// ==========================================
// Loki - Log Forwarding
// ==========================================

loki.write "local" {
  endpoint {
    url = "http://loki:3100/loki/api/v1/push"
  }
}

// ==========================================
// Docker Log Discovery
// ==========================================

discovery.docker "containers" {
  host = "unix:///var/run/docker.sock"
}

loki.source.docker "docker" {
  host       = "unix:///var/run/docker.sock"
  targets    = discovery.docker.containers.targets
  forward_to = [loki.write.local.receiver]
  
  relabel_rules = loki.relabel.docker.rules
}

loki.relabel "docker" {
  forward_to = []
  
  rule {
    source_labels = ["__meta_docker_container_name"]
    regex         = "/(.*)"
    target_label  = "container"
  }
  
  rule {
    source_labels = ["__meta_docker_container_image"]
    target_label  = "image"
  }
}

// ==========================================
// Prometheus - Metrics Scraping
// ==========================================

prometheus.scrape "node" {
  targets = [{
    __address__ = "node-exporter:9100",
  }]
  
  forward_to = [prometheus.remote_write.local.receiver]
}

prometheus.remote_write "local" {
  endpoint {
    url = "http://prometheus:9090/api/v1/write"
  }
}

// ==========================================
// OpenTelemetry - Trace Collection
// ==========================================

otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }
  http {
    endpoint = "0.0.0.0:4318"
  }
  
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}

otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls {
      insecure = true
    }
  }
}

Part 8: Advanced Topics

Log-Based Alerting

Create alerts based on log patterns:

# Add to loki-config.yml under ruler:
ruler:
  alertmanager_url: http://alertmanager:9093
  storage:
    type: local
    local:
      directory: /loki/rules
  rule_path: /loki/rules-temp
  ring:
    kvstore:
      store: inmemory
  enable_api: true

Create alert rules loki/rules/alerts.yml:

groups:
  - name: log-alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({job="docker"} |= "error" [5m])) by (container)
          / 
          sum(rate({job="docker"} [5m])) by (container)
          > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: High error rate in {{ $labels.container }}
          
      - alert: AuthenticationFailures
        expr: |
          sum(count_over_time({job="auth"} |= "Failed password" [5m])) > 10
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: Multiple authentication failures detected

Correlation: Logs to Traces

Link logs to traces using trace IDs:

  1. Ensure your application logs include trace IDs:

    {"timestamp": "2026-01-12T10:00:00Z", "level": "error", "message": "Request failed", "traceID": "abc123def456"}
  2. Configure derived fields in Grafana Loki data source (already done in our config)

  3. Click trace IDs in Grafana Explore to jump to the trace

Performance Tuning

Loki Performance

# Increase ingestion capacity
limits_config:
  ingestion_rate_mb: 20
  ingestion_burst_size_mb: 40
  max_streams_per_user: 50000

# Enable caching
query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 500

Prometheus Performance

# Command flags for better performance
command:
  - '--storage.tsdb.min-block-duration=2h'
  - '--storage.tsdb.max-block-duration=2h'
  - '--query.max-concurrency=20'

Part 9: Backup and Disaster Recovery

Backup Script

#!/bin/bash
# backup-observability.sh

BACKUP_DIR="/backup/observability/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"

# Stop containers for consistent backup
docker compose stop

# Backup volumes
for volume in prometheus-data grafana-data loki-data tempo-data alertmanager-data; do
  docker run --rm 
    -v lgtm-stack_${volume}:/data 
    -v "$BACKUP_DIR":/backup 
    alpine tar czf /backup/${volume}.tar.gz /data
done

# Backup configs
cp -r ./prometheus ./loki ./tempo ./grafana ./alertmanager ./blackbox "$BACKUP_DIR/"

# Restart containers
docker compose up -d

echo "Backup completed: $BACKUP_DIR"

Restore Script

#!/bin/bash
# restore-observability.sh

BACKUP_DIR="$1"

if [ -z "$BACKUP_DIR" ]; then
  echo "Usage: $0 /path/to/backup/directory"
  exit 1
fi

docker compose down

# Restore volumes
for volume in prometheus-data grafana-data loki-data tempo-data alertmanager-data; do
  docker volume rm lgtm-stack_${volume} 2>/dev/null
  docker volume create lgtm-stack_${volume}
  docker run --rm 
    -v lgtm-stack_${volume}:/data 
    -v "$BACKUP_DIR":/backup 
    alpine sh -c "cd /data && tar xzf /backup/${volume}.tar.gz --strip 1"
done

# Restore configs
cp -r "$BACKUP_DIR"/{prometheus,loki,tempo,grafana,alertmanager,blackbox} ./

docker compose up -d

echo "Restore completed from: $BACKUP_DIR"

Part 10: Troubleshooting

Common Issues

IssueCauseSolution
Loki not receiving logsPromtail not runningCheck docker compose logs promtail
High memory usageToo many log streamsReduce cardinality, add filters
Tempo traces not appearingOTLP not configuredCheck application instrumentation
Grafana canโ€™t query LokiWrong data source URLUse http://loki:3100
Logs delayedHigh ingestion rateIncrease ingestion_rate_mb

Diagnostic Commands

# Check Loki health
curl http://localhost:3100/ready
curl http://localhost:3100/metrics

# Check Tempo health
curl http://localhost:3200/ready

# View Promtail targets
curl http://localhost:9080/targets

# Check Loki ingestion rate
curl "http://localhost:3100/loki/api/v1/query?query=sum(rate(loki_distributor_bytes_received_total[5m]))"

# Test log ingestion
curl -X POST -H "Content-Type: application/json" 
  http://localhost:3100/loki/api/v1/push 
  -d '{"streams":[{"stream":{"job":"test"},"values":[["'$(date +%s%N)'","test log entry"]]}]}'

Log Query in Grafana

  1. Go to Explore
  2. Select Loki data source
  3. Try: {job="docker"} to see all Docker logs

Conclusion

You now have a complete understanding of the Grafana observability ecosystem:

  • โœ… Loki for centralized log aggregation
  • โœ… Promtail/Alloy for log collection
  • โœ… Tempo for distributed tracing
  • โœ… Mimir for long-term metrics storage
  • โœ… LogQL for powerful log queries
  • โœ… Home lab scenarios (Proxmox, Synology, TrueNAS, Unraid, Home Assistant)
  • โœ… Correlation between logs, metrics, and traces
  • โœ… Backup and disaster recovery

Complete Stack Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 COMPLETE LGTM OBSERVABILITY                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”‚
โ”‚  โ”‚   METRICS   โ”‚  โ”‚    LOGS     โ”‚  โ”‚   TRACES    โ”‚         โ”‚
โ”‚  โ”‚ (Prometheus)โ”‚  โ”‚   (Loki)    โ”‚  โ”‚   (Tempo)   โ”‚         โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚
โ”‚         โ”‚                โ”‚                โ”‚                 โ”‚
โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 โ”‚
โ”‚                          โ”‚                                  โ”‚
โ”‚                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”                          โ”‚
โ”‚                   โ”‚   GRAFANA   โ”‚                          โ”‚
โ”‚                   โ”‚ (Visualize) โ”‚                          โ”‚
โ”‚                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                          โ”‚
โ”‚                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Resources


๐Ÿ’ก Tip: For the basic Prometheus and Grafana setup, see our companion guide: โ€œGrafana & Prometheus Complete Guide: The Ultimate Monitoring Stackโ€

Comments

Sign in to join the discussion!

Your comments help others in the community.