Grafana Loki & Advanced Observability 2026: Log Aggregation, Tracing & Complete Home Lab Stack
Published on January 12, 2026
Introduction
While Prometheus and Grafana provide excellent metrics monitoring, comprehensive observability requires three pillars: Metrics, Logs, and Traces. This guide covers the advanced observability components that complete your monitoring stack:
- Grafana Loki โ Log aggregation (like Prometheus, but for logs)
- Grafana Tempo โ Distributed tracing
- Grafana Mimir โ Long-term metrics storage
- Grafana Alloy โ Unified telemetry collector
This is the companion guide to โGrafana & Prometheus Complete Guideโ โ ensure you have the basic stack running before proceeding.
The LGTM Stack
| Component | Purpose | Analogy |
|---|---|---|
| Loki | Log aggregation | โPrometheus for logsโ |
| Grafana | Visualization | Dashboard and UI |
| Tempo | Distributed tracing | Request flow tracking |
| Mimir | Long-term metrics | Scalable Prometheus storage |
Key Terminology
| Term | Definition |
|---|---|
| Log Aggregation | Collecting logs from multiple sources into a centralized system |
| Promtail | Official log collector agent for Loki |
| Alloy | Modern, unified telemetry collector (successor to Promtail and Grafana Agent) |
| LogQL | Lokiโs query language for searching and filtering logs |
| Trace | A record of a requestโs journey through distributed systems |
| Span | A single unit of work within a trace |
| OTLP | OpenTelemetry Protocol for telemetry data |
Prerequisites
Before starting this guide, ensure you have:
- โ Basic Grafana + Prometheus stack running (see companion guide)
- โ Docker and Docker Compose installed
- โ At least 4GB RAM (8GB recommended for full stack)
- โ Basic understanding of containers and networking
Part 1: Understanding Grafana Loki
What is Loki?
Grafana Loki is an open-source, horizontally-scalable, multi-tenant log aggregation system inspired by Prometheus. Unlike traditional log management systems (like Elasticsearch/ELK), Loki takes a unique approach:
- Indexes only labels (metadata), not full log content
- Highly efficient storage and resource usage
- Similar query model to Prometheus (labels-based)
- Seamless Grafana integration with log exploration
Loki vs Traditional Log Systems
| Feature | Loki | Elasticsearch (ELK) |
|---|---|---|
| Indexing | Labels only | Full-text content |
| Storage | Very efficient | Resource-heavy |
| Query Language | LogQL | Lucene/KQL |
| Resource Usage | Low | High |
| Learning Curve | Easy (Prometheus-like) | Moderate |
| Best For | Infrastructure logs | Full-text search |
Loki Architecture
+-------------------------------------------------------------------+
| GRAFANA LOKI STACK |
+-------------------------------------------------------------------+
| |
| LOG SOURCES COLLECTORS |
| +--------------+ +--------------+ |
| | Docker |------------>| Promtail / | |
| | Containers | | Alloy | |
| +--------------+ +------+-------+ |
| +--------------+ | |
| | System Logs |--------------------| |
| | (/var/log) | | |
| +--------------+ | |
| +--------------+ v |
| | Application | +--------------+ |
| | Logs |------------>| LOKI | |
| +--------------+ | +--------+ | |
| | |Ingester| | |
| +--------------+ | +--------+ | |
| | Syslog |-------------| +--------+ | |
| | (UDP/TCP) | | |Querier | | |
| +--------------+ | +--------+ | |
| | +--------+ | |
| | |Storage | | |
| | +--------+ | |
| +-------+------+ |
| | |
| v |
| +--------------+ |
| | GRAFANA | |
| | (Explore + | |
| | Dashboards) | |
| +--------------+ |
| |
+-------------------------------------------------------------------+ Part 2: Complete LGTM Stack Deployment
Full Docker Compose Configuration
This configuration includes all observability components for a complete home lab setup:
Create a new directory and docker-compose.yml:
mkdir -p ~/lgtm-stack
cd ~/lgtm-stack # docker-compose.yml
# Complete LGTM Observability Stack
# Includes: Loki, Grafana, Tempo, Mimir, Prometheus, and all exporters
services:
# ===========================================
# GRAFANA LOKI - Log Aggregation
# ===========================================
loki:
image: grafana/loki:latest
container_name: loki
restart: unless-stopped
ports:
- "3100:3100"
volumes:
- ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
networks:
- observability
# ===========================================
# PROMTAIL - Log Collector
# ===========================================
promtail:
image: grafana/promtail:latest
container_name: promtail
restart: unless-stopped
volumes:
- ./promtail/promtail-config.yml:/etc/promtail/config.yml:ro
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
command: -config.file=/etc/promtail/config.yml
depends_on:
- loki
networks:
- observability
# ===========================================
# GRAFANA ALLOY - Modern Telemetry Collector
# (Alternative to Promtail + Grafana Agent)
# ===========================================
# alloy:
# image: grafana/alloy:latest
# container_name: alloy
# restart: unless-stopped
# ports:
# - "12345:12345"
# - "4317:4317" # OTLP gRPC
# - "4318:4318" # OTLP HTTP
# volumes:
# - ./alloy/config.alloy:/etc/alloy/config.alloy:ro
# - /var/log:/var/log:ro
# - /var/lib/docker/containers:/var/lib/docker/containers:ro
# command:
# - run
# - /etc/alloy/config.alloy
# - --server.http.listen-addr=0.0.0.0:12345
# networks:
# - observability
# ===========================================
# GRAFANA TEMPO - Distributed Tracing
# ===========================================
tempo:
image: grafana/tempo:latest
container_name: tempo
restart: unless-stopped
ports:
- "3200:3200" # Tempo HTTP
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "9411:9411" # Zipkin
- "14268:14268" # Jaeger ingest
volumes:
- ./tempo/tempo-config.yml:/etc/tempo/tempo.yaml:ro
- tempo-data:/tmp/tempo
command: -config.file=/etc/tempo/tempo.yaml
networks:
- observability
# ===========================================
# PROMETHEUS - Metrics Collection
# ===========================================
prometheus:
image: prom/prometheus:v3.9.0
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/rules:/etc/prometheus/rules:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
- '--web.enable-lifecycle'
- '--web.enable-remote-write-receiver'
networks:
- observability
# ===========================================
# GRAFANA - Visualization
# ===========================================
grafana:
image: grafana/grafana:12.3.0
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin123
- GF_USERS_ALLOW_SIGN_UP=false
- GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-piechart-panel
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
depends_on:
- prometheus
- loki
- tempo
networks:
- observability
# ===========================================
# NODE EXPORTER - Linux System Metrics
# ===========================================
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
networks:
- observability
# ===========================================
# cADVISOR - Container Metrics
# ===========================================
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
privileged: true
devices:
- /dev/kmsg
networks:
- observability
# ===========================================
# ALERTMANAGER - Alert Routing
# ===========================================
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: unless-stopped
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- alertmanager-data:/alertmanager
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
networks:
- observability
# ===========================================
# BLACKBOX EXPORTER - Endpoint Probing
# ===========================================
blackbox-exporter:
image: prom/blackbox-exporter:latest
container_name: blackbox-exporter
restart: unless-stopped
ports:
- "9115:9115"
volumes:
- ./blackbox/blackbox.yml:/etc/blackbox_exporter/config.yml:ro
command:
- '--config.file=/etc/blackbox_exporter/config.yml'
networks:
- observability
# ===========================================
# NETWORKS
# ===========================================
networks:
observability:
driver: bridge
# ===========================================
# VOLUMES
# ===========================================
volumes:
prometheus-data:
grafana-data:
loki-data:
tempo-data:
alertmanager-data: Configuration Files
Create Directory Structure
mkdir -p loki promtail tempo prometheus/rules grafana/provisioning/datasources grafana/provisioning/dashboards alertmanager blackbox Loki Configuration
Create loki/loki-config.yml:
# loki/loki-config.yml
# Grafana Loki Configuration for Home Lab
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://alertmanager:9093
# Frontend settings
frontend:
max_outstanding_per_tenant: 4096
# Ingester settings
ingester:
chunk_encoding: snappy
# Limits for home lab use
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_streams_per_user: 10000
max_line_size: 256kb
# Compactor settings
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150 Promtail Configuration
Create promtail/promtail-config.yml:
# promtail/promtail-config.yml
# Promtail Log Collector Configuration
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# ===========================================
# Docker Container Logs
# ===========================================
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
# Keep container name as label
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
# Add container ID
- source_labels: ['__meta_docker_container_id']
target_label: 'container_id'
# Add image name
- source_labels: ['__meta_docker_container_image']
target_label: 'image'
# Add compose project if available
- source_labels: ['__meta_docker_container_label_com_docker_compose_project']
target_label: 'compose_project'
# Add compose service if available
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'compose_service'
# ===========================================
# System Logs (Linux)
# ===========================================
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: syslog
__path__: /var/log/syslog
- job_name: auth
static_configs:
- targets:
- localhost
labels:
job: auth
__path__: /var/log/auth.log
- job_name: kernel
static_configs:
- targets:
- localhost
labels:
job: kernel
__path__: /var/log/kern.log
# ===========================================
# Journal Logs (systemd)
# ===========================================
- job_name: journal
journal:
max_age: 12h
labels:
job: journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
- source_labels: ['__journal__hostname']
target_label: 'hostname'
- source_labels: ['__journal_priority_keyword']
target_label: 'level'
# ===========================================
# Custom Application Logs
# ===========================================
# Uncomment and modify for your applications
# - job_name: myapp
# static_configs:
# - targets:
# - localhost
# labels:
# job: myapp
# __path__: /var/log/myapp/*.log Tempo Configuration
Create tempo/tempo-config.yml:
# tempo/tempo-config.yml
# Grafana Tempo Distributed Tracing Configuration
server:
http_listen_port: 3200
distributor:
receivers:
jaeger:
protocols:
thrift_http:
grpc:
thrift_binary:
thrift_compact:
zipkin:
otlp:
protocols:
http:
grpc:
opencensus:
ingester:
max_block_duration: 5m
compactor:
compaction:
block_retention: 48h
metrics_generator:
registry:
external_labels:
source: tempo
cluster: home-lab
storage:
path: /tmp/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
storage:
trace:
backend: local
wal:
path: /tmp/tempo/wal
local:
path: /tmp/tempo/blocks
overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics] Prometheus Configuration (Updated)
Create prometheus/prometheus.yml:
# prometheus/prometheus.yml
# Prometheus Configuration with LGTM Stack Integration
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: home-lab
replica: 1
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
# Node Exporter
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
labels:
instance: 'docker-host'
# cAdvisor
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
# Alertmanager
- job_name: 'alertmanager'
static_configs:
- targets: ['alertmanager:9093']
# Loki
- job_name: 'loki'
static_configs:
- targets: ['loki:3100']
# Tempo
- job_name: 'tempo'
static_configs:
- targets: ['tempo:3200']
# Blackbox - Website Monitoring
- job_name: 'blackbox-http'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://google.com
- https://github.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115 Grafana Data Source Provisioning
Create grafana/provisioning/datasources/datasources.yml:
# grafana/provisioning/datasources/datasources.yml
# Auto-provision all LGTM data sources
apiVersion: 1
datasources:
# Prometheus - Metrics
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
timeInterval: "15s"
httpMethod: "POST"
# Loki - Logs
- name: Loki
type: loki
access: proxy
url: http://loki:3100
editable: true
jsonData:
maxLines: 1000
derivedFields:
- datasourceUid: tempo
matcherRegex: "traceID=(\w+)"
name: TraceID
url: "$${__value.raw}"
# Tempo - Traces
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
editable: true
jsonData:
httpMethod: GET
tracesToLogs:
datasourceUid: loki
tags: ['job', 'instance', 'pod', 'namespace']
mappedTags: [{ key: 'service.name', value: 'service' }]
mapTagNamesEnabled: true
spanStartTimeShift: '1h'
spanEndTimeShift: '1h'
filterByTraceID: true
filterBySpanID: false
tracesToMetrics:
datasourceUid: prometheus
tags: [{ key: 'service.name', value: 'service' }]
queries:
- name: 'Request rate'
query: 'sum(rate(tempo_spanmetrics_latency_count{$$__tags}[5m]))'
serviceMap:
datasourceUid: prometheus
nodeGraph:
enabled: true
lokiSearch:
datasourceUid: loki Alertmanager Configuration
Create alertmanager/alertmanager.yml:
# alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route:
receiver: 'default-receiver'
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'default-receiver' Blackbox Exporter Configuration
Create blackbox/blackbox.yml:
# blackbox/blackbox.yml
modules:
http_2xx:
prober: http
timeout: 10s
http:
valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
valid_status_codes: [200, 201, 202, 204, 301, 302, 303, 307, 308]
method: GET
follow_redirects: true
preferred_ip_protocol: "ip4"
tcp_connect:
prober: tcp
timeout: 10s
icmp:
prober: icmp
timeout: 5s Deploy the Stack
# Start all services
docker compose up -d
# View logs
docker compose logs -f
# Check status
docker compose ps Access Web Interfaces
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin123 |
| Prometheus | http://localhost:9090 | None |
| Loki | http://localhost:3100/ready | None |
| Tempo | http://localhost:3200/ready | None |
| Alertmanager | http://localhost:9093 | None |
Part 3: Working with Loki and LogQL
LogQL Basics
LogQL is Lokiโs query language, similar to PromQL but for logs.
Log Stream Selectors
# Select logs from a specific container
{container="grafana"}
# Select logs from a job
{job="syslog"}
# Multiple label matchers
{container="prometheus", compose_project="lgtm-stack"}
# Regex matching
{container=~"grafana|loki|tempo"}
# Exclude specific containers
{container!="cadvisor"} Log Pipeline
# Filter logs containing "error"
{container="myapp"} |= "error"
# Case-insensitive search
{container="myapp"} |~ "(?i)error"
# Exclude lines with "debug"
{container="myapp"} != "debug"
# Parse JSON logs
{container="myapp"} | json
# Parse with pattern
{job="nginx"} | pattern `<ip> - - [<_>] "<method> <uri> <_>" <status> <size>`
# Extract and filter fields
{container="myapp"} | json | level="error"
# Count errors per minute
sum(rate({container="myapp"} |= "error" [1m])) by (container) Common LogQL Queries
| Use Case | Query |
|---|---|
| View all container logs | {job="docker"} |
| Search for errors | {job="docker"} \|= "error" |
| Auth failures | {job="auth"} \|= "Failed" |
| Rate of errors | sum(rate({job="docker"} \|= "error" [5m])) by (container) |
| Top sources | topk(10, sum(rate({job="docker"} [1h])) by (container)) |
| JSON error logs | {container="myapp"} \| json \| level="error" |
Creating Loki Dashboards
Log Volume Panel
sum(rate({job="docker"}[5m])) by (container) Error Rate Panel
sum(rate({job="docker"} |= "error" [5m])) by (container)
/
sum(rate({job="docker"} [5m])) by (container)
* 100 Log Table Panel
Simply use a log stream selector:
{container=~".+"} | json Part 4: Home Lab Monitoring Scenarios
Scenario 1: Proxmox VE Monitoring
Monitor your Proxmox hypervisor with metrics and logs.
Prometheus Configuration for Proxmox
# Add to prometheus.yml
scrape_configs:
- job_name: 'proxmox'
static_configs:
- targets: ['192.168.1.100:9221'] # PVE Exporter
metrics_path: /pve
params:
module: [default] Install PVE Exporter on Proxmox
# On Proxmox host
apt update && apt install -y python3-pip
pip3 install prometheus-pve-exporter
# Create config
cat > /etc/prometheus/pve.yml << EOF
default:
user: root@pam
token_name: "prometheus"
token_value: "your-token-here"
verify_ssl: false
EOF
# Create systemd service
cat > /etc/systemd/system/pve-exporter.service << EOF
[Unit]
Description=Prometheus PVE Exporter
After=network.target
[Service]
ExecStart=/usr/local/bin/pve_exporter /etc/prometheus/pve.yml
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl enable --now pve-exporter Collect Proxmox Logs with Promtail
Add to promtail-config.yml:
scrape_configs:
- job_name: proxmox
static_configs:
- targets:
- localhost
labels:
job: proxmox
host: pve-node1
__path__: /var/log/pve-firewall.log
- targets:
- localhost
labels:
job: proxmox
host: pve-node1
__path__: /var/log/pvedaemon.log Scenario 2: Synology NAS Monitoring
SNMP Exporter for Synology
# Add to docker-compose.yml
snmp-exporter:
image: prom/snmp-exporter:latest
container_name: snmp-exporter
ports:
- "9116:9116"
volumes:
- ./snmp/snmp.yml:/etc/snmp_exporter/snmp.yml:ro
networks:
- observability Prometheus Config for Synology
# Add to prometheus.yml
- job_name: 'synology'
static_configs:
- targets:
- 192.168.1.50 # Your Synology IP
metrics_path: /snmp
params:
module: [synology]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: snmp-exporter:9116 Scenario 3: TrueNAS Monitoring
Enable SNMP on TrueNAS
- Go to Services โ SNMP
- Enable SNMP service
- Set community string (e.g., โpublicโ)
Configure Prometheus
- job_name: 'truenas'
static_configs:
- targets:
- 192.168.1.60 # Your TrueNAS IP
metrics_path: /snmp
params:
module: [if_mib]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: snmp-exporter:9116 Scenario 4: Unraid Monitoring
Install Netdata on Unraid
- Go to Apps in Unraid
- Search for โNetdataโ
- Install the container
Add Netdata to Prometheus
- job_name: 'unraid-netdata'
metrics_path: /api/v1/allmetrics
params:
format: [prometheus]
static_configs:
- targets: ['192.168.1.70:19999']
labels:
instance: 'unraid-server' Scenario 5: Smart Home / Home Assistant
Home Assistant Integration
Add to configuration.yaml in Home Assistant:
prometheus:
namespace: homeassistant
filter:
include_domains:
- sensor
- binary_sensor
- switch
- climate
exclude_entities:
- sensor.time Prometheus Config
- job_name: 'homeassistant'
scrape_interval: 60s
metrics_path: /api/prometheus
bearer_token: 'YOUR_LONG_LIVED_ACCESS_TOKEN'
static_configs:
- targets: ['192.168.1.80:8123'] Scenario 6: Pi-hole DNS Monitoring
Enable Pi-hole Exporter
# Add to docker-compose.yml
pihole-exporter:
image: ekofr/pihole-exporter:latest
container_name: pihole-exporter
environment:
- PIHOLE_HOSTNAME=192.168.1.1
- PIHOLE_API_TOKEN=your-api-token
ports:
- "9617:9617"
networks:
- observability # Add to prometheus.yml
- job_name: 'pihole'
static_configs:
- targets: ['pihole-exporter:9617'] Part 5: Distributed Tracing with Tempo
Understanding Traces
Distributed tracing helps you understand how requests flow through your systems:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REQUEST TRACE EXAMPLE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ [Frontend] โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โ โโโบ [API Gateway] โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ โ
โ โ โโโบ [Auth Service] โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โ โ โโโบ [Database] โโโโโโ โ
โ โ โ โ
โ โ โโโบ [User Service] โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โ โโโบ [Cache] โโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 0ms 50ms 100ms 150ms 200ms โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Instrumenting Applications
Python Application Example
# Install: pip install opentelemetry-sdk opentelemetry-exporter-otlp
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
# Configure tracing
resource = Resource(attributes={"service.name": "my-python-app"})
provider = TracerProvider(resource=resource)
otlp_exporter = OTLPSpanExporter(
endpoint="http://tempo:4317",
insecure=True
)
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
# Use in your code
with tracer.start_as_current_span("my-operation") as span:
span.set_attribute("user.id", "12345")
# Your code here Node.js Application Example
// Install: npm install @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-grpc
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: 'http://tempo:4317',
}),
serviceName: 'my-node-app',
});
sdk.start(); Querying Traces in Grafana
- Go to Explore in Grafana
- Select Tempo data source
- Use TraceQL queries:
# Find traces by service
{resource.service.name="my-app"}
# Find slow traces
{duration > 500ms}
# Find error traces
{status=error}
# Complex query
{resource.service.name="api" && span.http.status_code >= 500} Part 6: Long-Term Storage with Mimir
When to Use Mimir
Use Grafana Mimir when:
- You need to store metrics for months or years
- You have multiple Prometheus servers
- You need horizontal scaling
- You want multi-tenancy
Basic Mimir Setup
# Add to docker-compose.yml
mimir:
image: grafana/mimir:latest
container_name: mimir
ports:
- "9009:9009"
volumes:
- ./mimir/mimir.yml:/etc/mimir/mimir.yaml:ro
- mimir-data:/data
command:
- --config.file=/etc/mimir/mimir.yaml
networks:
- observability Create mimir/mimir.yml:
# mimir/mimir.yml
# Grafana Mimir Configuration (Single-Node)
multitenancy_enabled: false
blocks_storage:
backend: filesystem
filesystem:
dir: /data/blocks
compactor:
data_dir: /data/compactor
sharding_ring:
kvstore:
store: memberlist
distributor:
ring:
kvstore:
store: memberlist
ingester:
ring:
kvstore:
store: memberlist
replication_factor: 1
ruler_storage:
backend: filesystem
filesystem:
dir: /data/rules
server:
http_listen_port: 9009
log_level: info
store_gateway:
sharding_ring:
replication_factor: 1 Configure Prometheus Remote Write
Add to prometheus.yml:
remote_write:
- url: http://mimir:9009/api/v1/push Add Mimir as Grafana Data Source
# Add to datasources.yml
- name: Mimir
type: prometheus
access: proxy
url: http://mimir:9009/prometheus
editable: true Part 7: Using Grafana Alloy
What is Alloy?
Grafana Alloy is the modern, unified telemetry collector that replaces:
- Promtail (for logs)
- Grafana Agent (for metrics/traces)
It uses a configuration language similar to HCL and supports:
- Metrics collection (Prometheus-compatible)
- Log collection (Loki-compatible)
- Trace collection (Tempo-compatible)
- OpenTelemetry Protocol (OTLP)
Alloy Configuration Example
Create alloy/config.alloy:
// alloy/config.alloy
// Grafana Alloy Configuration
// ==========================================
// Logging Configuration
// ==========================================
logging {
level = "info"
format = "logfmt"
}
// ==========================================
// Loki - Log Forwarding
// ==========================================
loki.write "local" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
}
}
// ==========================================
// Docker Log Discovery
// ==========================================
discovery.docker "containers" {
host = "unix:///var/run/docker.sock"
}
loki.source.docker "docker" {
host = "unix:///var/run/docker.sock"
targets = discovery.docker.containers.targets
forward_to = [loki.write.local.receiver]
relabel_rules = loki.relabel.docker.rules
}
loki.relabel "docker" {
forward_to = []
rule {
source_labels = ["__meta_docker_container_name"]
regex = "/(.*)"
target_label = "container"
}
rule {
source_labels = ["__meta_docker_container_image"]
target_label = "image"
}
}
// ==========================================
// Prometheus - Metrics Scraping
// ==========================================
prometheus.scrape "node" {
targets = [{
__address__ = "node-exporter:9100",
}]
forward_to = [prometheus.remote_write.local.receiver]
}
prometheus.remote_write "local" {
endpoint {
url = "http://prometheus:9090/api/v1/write"
}
}
// ==========================================
// OpenTelemetry - Trace Collection
// ==========================================
otelcol.receiver.otlp "default" {
grpc {
endpoint = "0.0.0.0:4317"
}
http {
endpoint = "0.0.0.0:4318"
}
output {
traces = [otelcol.exporter.otlp.tempo.input]
}
}
otelcol.exporter.otlp "tempo" {
client {
endpoint = "tempo:4317"
tls {
insecure = true
}
}
} Part 8: Advanced Topics
Log-Based Alerting
Create alerts based on log patterns:
# Add to loki-config.yml under ruler:
ruler:
alertmanager_url: http://alertmanager:9093
storage:
type: local
local:
directory: /loki/rules
rule_path: /loki/rules-temp
ring:
kvstore:
store: inmemory
enable_api: true Create alert rules loki/rules/alerts.yml:
groups:
- name: log-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate({job="docker"} |= "error" [5m])) by (container)
/
sum(rate({job="docker"} [5m])) by (container)
> 0.05
for: 5m
labels:
severity: warning
annotations:
summary: High error rate in {{ $labels.container }}
- alert: AuthenticationFailures
expr: |
sum(count_over_time({job="auth"} |= "Failed password" [5m])) > 10
for: 1m
labels:
severity: critical
annotations:
summary: Multiple authentication failures detected Correlation: Logs to Traces
Link logs to traces using trace IDs:
Ensure your application logs include trace IDs:
{"timestamp": "2026-01-12T10:00:00Z", "level": "error", "message": "Request failed", "traceID": "abc123def456"}Configure derived fields in Grafana Loki data source (already done in our config)
Click trace IDs in Grafana Explore to jump to the trace
Performance Tuning
Loki Performance
# Increase ingestion capacity
limits_config:
ingestion_rate_mb: 20
ingestion_burst_size_mb: 40
max_streams_per_user: 50000
# Enable caching
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 500 Prometheus Performance
# Command flags for better performance
command:
- '--storage.tsdb.min-block-duration=2h'
- '--storage.tsdb.max-block-duration=2h'
- '--query.max-concurrency=20' Part 9: Backup and Disaster Recovery
Backup Script
#!/bin/bash
# backup-observability.sh
BACKUP_DIR="/backup/observability/$(date +%Y%m%d)"
mkdir -p "$BACKUP_DIR"
# Stop containers for consistent backup
docker compose stop
# Backup volumes
for volume in prometheus-data grafana-data loki-data tempo-data alertmanager-data; do
docker run --rm
-v lgtm-stack_${volume}:/data
-v "$BACKUP_DIR":/backup
alpine tar czf /backup/${volume}.tar.gz /data
done
# Backup configs
cp -r ./prometheus ./loki ./tempo ./grafana ./alertmanager ./blackbox "$BACKUP_DIR/"
# Restart containers
docker compose up -d
echo "Backup completed: $BACKUP_DIR" Restore Script
#!/bin/bash
# restore-observability.sh
BACKUP_DIR="$1"
if [ -z "$BACKUP_DIR" ]; then
echo "Usage: $0 /path/to/backup/directory"
exit 1
fi
docker compose down
# Restore volumes
for volume in prometheus-data grafana-data loki-data tempo-data alertmanager-data; do
docker volume rm lgtm-stack_${volume} 2>/dev/null
docker volume create lgtm-stack_${volume}
docker run --rm
-v lgtm-stack_${volume}:/data
-v "$BACKUP_DIR":/backup
alpine sh -c "cd /data && tar xzf /backup/${volume}.tar.gz --strip 1"
done
# Restore configs
cp -r "$BACKUP_DIR"/{prometheus,loki,tempo,grafana,alertmanager,blackbox} ./
docker compose up -d
echo "Restore completed from: $BACKUP_DIR" Part 10: Troubleshooting
Common Issues
| Issue | Cause | Solution |
|---|---|---|
| Loki not receiving logs | Promtail not running | Check docker compose logs promtail |
| High memory usage | Too many log streams | Reduce cardinality, add filters |
| Tempo traces not appearing | OTLP not configured | Check application instrumentation |
| Grafana canโt query Loki | Wrong data source URL | Use http://loki:3100 |
| Logs delayed | High ingestion rate | Increase ingestion_rate_mb |
Diagnostic Commands
# Check Loki health
curl http://localhost:3100/ready
curl http://localhost:3100/metrics
# Check Tempo health
curl http://localhost:3200/ready
# View Promtail targets
curl http://localhost:9080/targets
# Check Loki ingestion rate
curl "http://localhost:3100/loki/api/v1/query?query=sum(rate(loki_distributor_bytes_received_total[5m]))"
# Test log ingestion
curl -X POST -H "Content-Type: application/json"
http://localhost:3100/loki/api/v1/push
-d '{"streams":[{"stream":{"job":"test"},"values":[["'$(date +%s%N)'","test log entry"]]}]}' Log Query in Grafana
- Go to Explore
- Select Loki data source
- Try:
{job="docker"}to see all Docker logs
Conclusion
You now have a complete understanding of the Grafana observability ecosystem:
- โ Loki for centralized log aggregation
- โ Promtail/Alloy for log collection
- โ Tempo for distributed tracing
- โ Mimir for long-term metrics storage
- โ LogQL for powerful log queries
- โ Home lab scenarios (Proxmox, Synology, TrueNAS, Unraid, Home Assistant)
- โ Correlation between logs, metrics, and traces
- โ Backup and disaster recovery
Complete Stack Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ COMPLETE LGTM OBSERVABILITY โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ METRICS โ โ LOGS โ โ TRACES โ โ
โ โ (Prometheus)โ โ (Loki) โ โ (Tempo) โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโผโโโโโโโ โ
โ โ GRAFANA โ โ
โ โ (Visualize) โ โ
โ โโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Resources
- Grafana Loki Documentation
- Grafana Tempo Documentation
- Grafana Mimir Documentation
- Grafana Alloy Documentation
- LogQL Documentation
- OpenTelemetry
๐ก Tip: For the basic Prometheus and Grafana setup, see our companion guide: โGrafana & Prometheus Complete Guide: The Ultimate Monitoring Stackโ
Comments
Sign in to join the discussion!
Your comments help others in the community.