Merge pull request #4 from efigueroa/claude/centralized-logging-011CUqEzDETA2BqAzYUcXtjt

feat: Add centralized logging stack with Loki, Promtail, and Grafana
This commit is contained in:
Eduardo Figueroa 2025-11-08 17:17:53 -08:00 committed by GitHub
commit 25aea7dc34
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 1305 additions and 0 deletions

View file

@ -31,6 +31,11 @@ compose/
│ ├── radarr/ # Movie management
│ ├── sabnzbd/ # Usenet downloader
│ └── qbittorrent/# Torrent client
├── monitoring/ # Monitoring & logging
│ └── logging/ # Centralized logging stack
│ ├── loki/ # Log aggregation (loki.fig.systems)
│ ├── promtail/ # Log collection agent
│ └── grafana/ # Log visualization (logs.fig.systems)
└── services/ # Utility services
├── homarr/ # Dashboard (home.fig.systems)
├── backrest/ # Backup manager (backup.fig.systems)
@ -58,6 +63,10 @@ All services are accessible via:
| Traefik Dashboard | traefik.fig.systems | ✅ |
| LLDAP | lldap.fig.systems | ✅ |
| Tinyauth | auth.fig.systems | ❌ |
| **Monitoring** | | |
| Grafana (Logs) | logs.fig.systems | ❌* |
| Loki (API) | loki.fig.systems | ✅ |
| **Dashboard & Management** | | |
| Homarr | home.fig.systems | ✅ |
| Backrest | backup.fig.systems | ✅ |
| Jellyfin | flix.fig.systems | ❌* |
@ -149,6 +158,9 @@ cd compose/services/linkwarden && docker compose up -d
cd compose/services/vikunja && docker compose up -d
cd compose/services/homarr && docker compose up -d
cd compose/services/backrest && docker compose up -d
# Monitoring (optional but recommended)
cd compose/monitoring/logging && docker compose up -d
cd compose/services/lubelogger && docker compose up -d
cd compose/services/calibre-web && docker compose up -d
cd compose/services/booklore && docker compose up -d

View file

@ -0,0 +1,28 @@
# Centralized Logging Configuration
# Timezone
TZ=America/Los_Angeles
# Grafana Admin Credentials
# Default username: admin
# Change this password immediately after first login!
# Example format: MyGr@f@n@P@ssw0rd!2024
GF_SECURITY_ADMIN_PASSWORD=changeme_please_set_secure_grafana_password
# Grafana Configuration
GF_SERVER_ROOT_URL=https://logs.fig.systems
GF_SERVER_DOMAIN=logs.fig.systems
# Disable Grafana analytics (optional)
GF_ANALYTICS_REPORTING_ENABLED=false
GF_ANALYTICS_CHECK_FOR_UPDATES=false
# Allow embedding (for Homarr dashboard integration)
GF_SECURITY_ALLOW_EMBEDDING=true
# Loki Configuration
# Retention period in days (default: 30 days)
LOKI_RETENTION_PERIOD=30d
# Promtail Configuration
# No additional configuration needed - configured via promtail-config.yaml

13
compose/monitoring/logging/.gitignore vendored Normal file
View file

@ -0,0 +1,13 @@
# Loki data
loki-data/
# Grafana data
grafana-data/
# Keep provisioning and config files
!grafana-provisioning/
!loki-config.yaml
!promtail-config.yaml
# Keep .env.example if created
!.env.example

View file

@ -0,0 +1,527 @@
# Centralized Logging Stack
Grafana Loki + Promtail + Grafana for centralized Docker container log aggregation and visualization.
## Overview
This stack provides centralized logging for all Docker containers in your homelab:
- **Loki**: Log aggregation backend (like Prometheus but for logs)
- **Promtail**: Agent that collects logs from Docker containers
- **Grafana**: Web UI for querying and visualizing logs
### Why This Stack?
- ✅ **Lightweight**: Minimal resource usage compared to ELK stack
- ✅ **Docker-native**: Automatically discovers and collects logs from all containers
- ✅ **Powerful search**: LogQL query language for filtering and searching
- ✅ **Retention**: Configurable log retention (default: 30 days)
- ✅ **Labels**: Automatic labeling by container, image, compose project
- ✅ **Integrated**: Works seamlessly with existing homelab services
## Quick Start
### 1. Configure Environment
```bash
cd ~/homelab/compose/monitoring/logging
nano .env
```
**Update:**
```env
# Change this!
GF_SECURITY_ADMIN_PASSWORD=<your-strong-password>
```
### 2. Deploy the Stack
```bash
docker compose up -d
```
### 3. Access Grafana
Go to: **https://logs.fig.systems**
**Default credentials:**
- Username: `admin`
- Password: `<your GF_SECURITY_ADMIN_PASSWORD>`
**⚠️ Change the password immediately after first login!**
### 4. View Logs
1. Click "Explore" (compass icon) in left sidebar
2. Select "Loki" datasource (should be selected by default)
3. Start querying logs!
## Usage
### Basic Log Queries
**View all logs from a container:**
```logql
{container="jellyfin"}
```
**View logs from a compose project:**
```logql
{compose_project="media"}
```
**View logs from specific service:**
```logql
{compose_service="lldap"}
```
**Filter by log level:**
```logql
{container="immich_server"} |= "error"
```
**Exclude lines:**
```logql
{container="traefik"} != "404"
```
**Multiple filters:**
```logql
{container="jellyfin"} |= "error" != "404"
```
### Advanced Queries
**Count errors per minute:**
```logql
sum(count_over_time({container="jellyfin"} |= "error" [1m])) by (container)
```
**Rate of logs:**
```logql
rate({container="traefik"}[5m])
```
**Logs from last hour:**
```logql
{container="immich_server"} | __timestamp__ >= now() - 1h
```
**Filter by multiple containers:**
```logql
{container=~"jellyfin|immich.*|sonarr"}
```
**Extract and filter JSON:**
```logql
{container="linkwarden"} | json | level="error"
```
## Configuration
### Log Retention
Default: **30 days**
To change retention period:
**Edit `.env`:**
```env
LOKI_RETENTION_PERIOD=60d # Keep logs for 60 days
```
**Edit `loki-config.yaml`:**
```yaml
limits_config:
retention_period: 60d # Must match .env
table_manager:
retention_period: 60d # Must match above
```
**Restart:**
```bash
docker compose restart loki
```
### Adjust Resource Limits
**Edit `loki-config.yaml`:**
```yaml
limits_config:
ingestion_rate_mb: 10 # MB/sec per stream
ingestion_burst_size_mb: 20 # Burst size
```
### Add Custom Labels
**Edit `promtail-config.yaml`:**
```yaml
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
relabel_configs:
# Add custom label
- source_labels: ['__meta_docker_container_label_environment']
target_label: 'environment'
```
## How It Works
### Architecture
```
Docker Containers
↓ (logs via Docker socket)
Promtail (scrapes and ships)
↓ (HTTP push)
Loki (stores and indexes)
↓ (LogQL queries)
Grafana (visualization)
```
### Log Collection
Promtail automatically collects logs from:
1. **All Docker containers** via Docker socket
2. **System logs** from `/var/log`
Logs are labeled with:
- `container`: Container name
- `image`: Docker image
- `compose_project`: Docker Compose project name
- `compose_service`: Service name from compose.yaml
- `stream`: stdout or stderr
### Storage
Logs are stored in:
- **Location**: `./loki-data/`
- **Format**: Compressed chunks
- **Index**: BoltDB
- **Retention**: Automatic cleanup after retention period
## Integration with Services
### Option 1: Automatic (Default)
Promtail automatically discovers all containers. No changes needed!
### Option 2: Explicit Labels (Recommended)
Add labels to services for better organization:
**Edit any service's `compose.yaml`:**
```yaml
services:
servicename:
# ... existing config ...
labels:
# ... existing labels ...
# Add logging labels
logging: "promtail"
log_level: "info"
environment: "production"
```
These labels will be available in Loki for filtering.
### Option 3: Send Logs Directly to Loki
Instead of Promtail scraping, send logs directly:
**Edit service `compose.yaml`:**
```yaml
services:
servicename:
# ... existing config ...
logging:
driver: loki
options:
loki-url: "http://loki:3100/loki/api/v1/push"
loki-external-labels: "container={{.Name}},compose_project={{.Config.Labels[\"com.docker.compose.project\"]}}"
```
**Note**: This requires the Loki Docker driver plugin (not recommended for simplicity).
## Grafana Dashboards
### Built-in Explore
Best way to start - use Grafana's Explore view:
1. Click "Explore" icon (compass)
2. Select "Loki" datasource
3. Use builder to create queries
4. Save interesting queries
### Pre-built Dashboards
You can import community dashboards:
1. Go to Dashboards → Import
2. Use dashboard ID: `13639` (Docker logs dashboard)
3. Select "Loki" as datasource
4. Import
### Create Custom Dashboard
1. Click "+" → "Dashboard"
2. Add panel
3. Select Loki datasource
4. Build query using LogQL
5. Save dashboard
**Example panels:**
- Error count by container
- Log volume over time
- Top 10 logging containers
- Recent errors table
## Alerting
### Create Log-Based Alerts
1. Go to Alerting → Alert rules
2. Create new alert rule
3. Query: `sum(count_over_time({container="jellyfin"} |= "error" [5m])) > 10`
4. Set thresholds and notification channels
5. Save
**Example alerts:**
- Too many errors in container
- Container restarted
- Disk space warnings
- Failed authentication attempts
## Troubleshooting
### Promtail Not Collecting Logs
**Check Promtail is running:**
```bash
docker logs promtail
```
**Verify Docker socket access:**
```bash
docker exec promtail ls -la /var/run/docker.sock
```
**Test Promtail config:**
```bash
docker exec promtail promtail -config.file=/etc/promtail/config.yaml -dry-run
```
### Loki Not Receiving Logs
**Check Loki health:**
```bash
curl http://localhost:3100/ready
```
**View Loki logs:**
```bash
docker logs loki
```
**Check Promtail is pushing:**
```bash
docker logs promtail | grep -i push
```
### Grafana Can't Connect to Loki
**Test Loki from Grafana container:**
```bash
docker exec grafana wget -O- http://loki:3100/ready
```
**Check datasource configuration:**
- Grafana → Configuration → Data sources → Loki
- URL should be: `http://loki:3100`
### No Logs Appearing
**Wait a few minutes** - logs take time to appear
**Check retention:**
```bash
# Logs older than retention period are deleted
grep retention_period loki-config.yaml
```
**Verify time range in Grafana:**
- Make sure selected time range includes recent logs
- Try "Last 5 minutes"
### High Disk Usage
**Check Loki data size:**
```bash
du -sh ./loki-data
```
**Reduce retention:**
```env
LOKI_RETENTION_PERIOD=7d # Shorter retention
```
**Manual cleanup:**
```bash
# Stop Loki
docker compose stop loki
# Remove old data (CAREFUL!)
rm -rf ./loki-data/chunks/*
# Restart
docker compose start loki
```
## Performance Tuning
### For Low Resources (< 8GB RAM)
**Edit `loki-config.yaml`:**
```yaml
limits_config:
retention_period: 7d # Shorter retention
ingestion_rate_mb: 5 # Lower rate
ingestion_burst_size_mb: 10 # Lower burst
query_range:
results_cache:
cache:
embedded_cache:
max_size_mb: 50 # Smaller cache
```
### For High Volume
**Edit `loki-config.yaml`:**
```yaml
limits_config:
ingestion_rate_mb: 20 # Higher rate
ingestion_burst_size_mb: 40 # Higher burst
query_range:
results_cache:
cache:
embedded_cache:
max_size_mb: 200 # Larger cache
```
## Best Practices
### Log Levels
Configure services to log appropriately:
- **Production**: `info` or `warning`
- **Development**: `debug`
- **Troubleshooting**: `trace`
Too much logging = higher resource usage!
### Retention Strategy
- **Critical services**: 60+ days
- **Normal services**: 30 days
- **High volume services**: 7-14 days
### Query Optimization
- **Use specific labels**: `{container="name"}` not `{container=~".*"}`
- **Limit time range**: Query hours not days when possible
- **Use filters early**: `|= "error"` before parsing
- **Avoid regex when possible**: `|= "string"` faster than `|~ "reg.*ex"`
### Storage Management
Monitor disk usage:
```bash
# Check regularly
du -sh compose/monitoring/logging/loki-data
# Set up alerts when > 80% disk usage
```
## Integration with Homarr
Grafana will automatically appear in Homarr dashboard. You can also:
### Add Grafana Widget to Homarr
1. Edit Homarr dashboard
2. Add "iFrame" widget
3. URL: `https://logs.fig.systems/d/<dashboard-id>`
4. This embeds Grafana dashboards in Homarr
## Backup and Restore
### Backup
```bash
# Backup Loki data
tar czf loki-backup-$(date +%Y%m%d).tar.gz ./loki-data
# Backup Grafana dashboards and datasources
tar czf grafana-backup-$(date +%Y%m%d).tar.gz ./grafana-data ./grafana-provisioning
```
### Restore
```bash
# Restore Loki
docker compose down
tar xzf loki-backup-YYYYMMDD.tar.gz
docker compose up -d
# Restore Grafana
docker compose down
tar xzf grafana-backup-YYYYMMDD.tar.gz
docker compose up -d
```
## Updating
```bash
cd ~/homelab/compose/monitoring/logging
# Pull latest images
docker compose pull
# Restart with new images
docker compose up -d
```
## Resource Usage
**Typical usage:**
- **Loki**: 200-500MB RAM
- **Promtail**: 50-100MB RAM
- **Grafana**: 100-200MB RAM
- **Disk**: ~1-5GB per week (depends on log volume)
## Next Steps
1. ✅ Deploy the stack
2. ✅ Login to Grafana and explore logs
3. ✅ Create useful dashboards
4. ✅ Set up alerts for errors
5. ✅ Configure retention based on needs
6. ⬜ Add Prometheus for metrics (future)
7. ⬜ Add Tempo for distributed tracing (future)
## Resources
- [Loki Documentation](https://grafana.com/docs/loki/latest/)
- [LogQL Query Language](https://grafana.com/docs/loki/latest/logql/)
- [Promtail Configuration](https://grafana.com/docs/loki/latest/clients/promtail/configuration/)
- [Grafana Tutorials](https://grafana.com/tutorials/)
---
**Now you can see logs from all containers in one place!** 🎉

View file

@ -0,0 +1,123 @@
# Centralized Logging Stack - Loki + Promtail + Grafana
# Docs: https://grafana.com/docs/loki/latest/
services:
loki:
container_name: loki
image: grafana/loki:2.9.3
restart: unless-stopped
env_file:
- .env
volumes:
- ./loki-config.yaml:/etc/loki/local-config.yaml:ro
- ./loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
networks:
- homelab
- logging_internal
labels:
# Traefik (for API access)
traefik.enable: true
traefik.docker.network: homelab
# Loki API
traefik.http.routers.loki.rule: Host(`loki.fig.systems`) || Host(`loki.edfig.dev`)
traefik.http.routers.loki.entrypoints: websecure
traefik.http.routers.loki.tls.certresolver: letsencrypt
traefik.http.services.loki.loadbalancer.server.port: 3100
# SSO Protection
traefik.http.routers.loki.middlewares: tinyauth
# Homarr Discovery
homarr.name: Loki (Logs)
homarr.group: Monitoring
homarr.icon: mdi:math-log
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3100/ready || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
promtail:
container_name: promtail
image: grafana/promtail:2.9.3
restart: unless-stopped
env_file:
- .env
volumes:
- ./promtail-config.yaml:/etc/promtail/config.yaml:ro
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
command: -config.file=/etc/promtail/config.yaml
networks:
- logging_internal
depends_on:
loki:
condition: service_healthy
grafana:
container_name: grafana
image: grafana/grafana:10.2.3
restart: unless-stopped
env_file:
- .env
volumes:
- ./grafana-data:/var/lib/grafana
- ./grafana-provisioning:/etc/grafana/provisioning
networks:
- homelab
- logging_internal
depends_on:
loki:
condition: service_healthy
labels:
# Traefik
traefik.enable: true
traefik.docker.network: homelab
# Grafana Web UI
traefik.http.routers.grafana.rule: Host(`logs.fig.systems`) || Host(`logs.edfig.dev`)
traefik.http.routers.grafana.entrypoints: websecure
traefik.http.routers.grafana.tls.certresolver: letsencrypt
traefik.http.services.grafana.loadbalancer.server.port: 3000
# SSO Protection (optional - Grafana has its own auth)
# traefik.http.routers.grafana.middlewares: tinyauth
# Homarr Discovery
homarr.name: Grafana (Logs Dashboard)
homarr.group: Monitoring
homarr.icon: mdi:chart-line
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3000/api/health || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
homelab:
external: true
logging_internal:
name: logging_internal
driver: bridge

View file

@ -0,0 +1,13 @@
apiVersion: 1
providers:
- name: 'Loki Dashboards'
orgId: 1
folder: 'Loki'
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/provisioning/dashboards
foldersFromFilesStructure: true

View file

@ -0,0 +1,17 @@
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki:3100
isDefault: true
editable: true
jsonData:
maxLines: 1000
derivedFields:
# Extract traceID from logs for distributed tracing (optional)
- datasourceUid: tempo
matcherRegex: "traceID=(\\w+)"
name: TraceID
url: "$${__value.raw}"

View file

@ -0,0 +1,57 @@
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
# Retention - keeps logs for 30 days
limits_config:
retention_period: 30d
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
# Cleanup old logs
compactor:
working_directory: /loki/compactor
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
# Table manager for retention
table_manager:
retention_deletes_enabled: true
retention_period: 30d

View file

@ -0,0 +1,70 @@
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# Docker containers logs
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
filters:
- name: label
values: ["logging=promtail"]
relabel_configs:
# Use container name as job
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
# Use image name
- source_labels: ['__meta_docker_container_image']
target_label: 'image'
# Use container ID
- source_labels: ['__meta_docker_container_id']
target_label: 'container_id'
# Add all docker labels as labels
- action: labelmap
regex: __meta_docker_container_label_(.+)
# All Docker containers (fallback)
- job_name: docker_all
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
- source_labels: ['__meta_docker_container_image']
target_label: 'image'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'stream'
# Extract compose project and service
- source_labels: ['__meta_docker_container_label_com_docker_compose_project']
target_label: 'compose_project'
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'compose_service'
# System logs
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log

View file

@ -0,0 +1,445 @@
# Centralized Logging with Loki
Guide for setting up and using the centralized logging stack (Loki + Promtail + Grafana).
## Overview
The logging stack provides centralized log aggregation and visualization for all Docker containers:
- **Loki**: Log aggregation backend (stores and indexes logs)
- **Promtail**: Agent that collects logs from Docker containers
- **Grafana**: Web UI for querying and visualizing logs
### Why Centralized Logging?
**Problems without it:**
- Logs scattered across many containers
- Hard to correlate events across services
- Logs lost when containers restart
- No easy way to search historical logs
**Benefits:**
- ✅ Single place to view all logs
- ✅ Powerful search and filtering (LogQL)
- ✅ Persist logs even after container restarts
- ✅ Correlate events across services
- ✅ Create dashboards and alerts
- ✅ Configurable retention (30 days default)
## Quick Setup
### 1. Configure Grafana Password
```bash
cd ~/homelab/compose/monitoring/logging
nano .env
```
**Update:**
```env
GF_SECURITY_ADMIN_PASSWORD=<your-strong-password>
```
**Generate password:**
```bash
openssl rand -base64 20
```
### 2. Deploy
```bash
cd ~/homelab/compose/monitoring/logging
docker compose up -d
```
### 3. Access Grafana
Go to: **https://logs.fig.systems**
**Login:**
- Username: `admin`
- Password: `<your GF_SECURITY_ADMIN_PASSWORD>`
### 4. Start Exploring Logs
1. Click **Explore** (compass icon) in left sidebar
2. Loki datasource should be selected
3. Start querying!
## Basic Usage
### View Logs from a Container
```logql
{container="jellyfin"}
```
### View Last Hour's Logs
```logql
{container="immich_server"} | __timestamp__ >= now() - 1h
```
### Filter for Errors
```logql
{container="traefik"} |= "error"
```
### Exclude Lines
```logql
{container="traefik"} != "404"
```
### Multiple Containers
```logql
{container=~"jellyfin|immich.*"}
```
### By Compose Project
```logql
{compose_project="media"}
```
## Advanced Queries
### Count Errors
```logql
sum(count_over_time({container="jellyfin"} |= "error" [5m]))
```
### Error Rate
```logql
rate({container="traefik"} |= "error" [5m])
```
### Parse JSON Logs
```logql
{container="linkwarden"} | json | level="error"
```
### Top 10 Error Messages
```logql
topk(10,
sum by (container) (
count_over_time({job="docker"} |= "error" [24h])
)
)
```
## Creating Dashboards
### Import Pre-built Dashboard
1. Go to **Dashboards** → **Import**
2. Dashboard ID: **13639** (Docker logs)
3. Select **Loki** as datasource
4. Click **Import**
### Create Custom Dashboard
1. Click **+** → **Dashboard**
2. **Add panel**
3. Select **Loki** datasource
4. Build query
5. Choose visualization (logs, graph, table, etc.)
6. **Save**
**Example panels:**
- Error count by container
- Log volume over time
- Recent errors (table)
- Top logging containers
## Setting Up Alerts
### Create Alert Rule
1. **Alerting****Alert rules** → **New alert rule**
2. **Query:**
```logql
sum(count_over_time({container="jellyfin"} |= "error" [5m])) > 10
```
3. **Condition**: Alert when > 10 errors in 5 minutes
4. **Configure** notification channel (email, webhook, etc.)
5. **Save**
**Example alerts:**
- Too many errors in service
- Service stopped logging (might have crashed)
- Authentication failures
- Disk space warnings
## Configuration
### Change Log Retention
**Default: 30 days**
Edit `.env`:
```env
LOKI_RETENTION_PERIOD=60d # 60 days
```
Edit `loki-config.yaml`:
```yaml
limits_config:
retention_period: 60d
table_manager:
retention_period: 60d
```
Restart:
```bash
docker compose restart loki
```
### Adjust Resource Limits
For low-resource systems, edit `loki-config.yaml`:
```yaml
limits_config:
retention_period: 7d # Shorter retention
ingestion_rate_mb: 5 # Lower rate
query_range:
results_cache:
cache:
embedded_cache:
max_size_mb: 50 # Smaller cache
```
### Add Labels to Services
Make services easier to find by adding labels:
**Edit service `compose.yaml`:**
```yaml
services:
myservice:
labels:
logging: "promtail"
environment: "production"
tier: "frontend"
```
Query with these labels:
```logql
{environment="production", tier="frontend"}
```
## Troubleshooting
### No Logs Appearing
**Wait a few minutes** - initial log collection takes time
**Check Promtail:**
```bash
docker logs promtail
```
**Check Loki:**
```bash
docker logs loki
```
**Verify Promtail can reach Loki:**
```bash
docker exec promtail wget -O- http://loki:3100/ready
```
### Grafana Can't Connect to Loki
**Test from Grafana:**
```bash
docker exec grafana wget -O- http://loki:3100/ready
```
**Check datasource:** Grafana → Configuration → Data sources → Loki
- URL should be: `http://loki:3100`
### High Disk Usage
**Check size:**
```bash
du -sh compose/monitoring/logging/loki-data
```
**Reduce retention:**
```env
LOKI_RETENTION_PERIOD=7d
```
**Manual cleanup (CAREFUL):**
```bash
docker compose stop loki
rm -rf loki-data/chunks/*
docker compose start loki
```
### Slow Queries
**Optimize queries:**
- Use specific labels: `{container="name"}` not `{container=~".*"}`
- Limit time range: Hours not days
- Filter early: `|= "error"` before parsing
- Avoid complex regex
## Best Practices
### Log Verbosity
Configure appropriate log levels per environment:
- **Production**: `info` or `warning`
- **Debugging**: `debug` or `trace`
Too verbose = wasted resources!
### Retention Strategy
Match retention to importance:
- **Critical services**: 60-90 days
- **Normal services**: 30 days
- **High-volume services**: 7-14 days
### Useful Queries to Save
Create saved queries for common tasks:
**Recent errors:**
```logql
{job="docker"} |= "error" | __timestamp__ >= now() - 15m
```
**Service health check:**
```logql
{container="traefik"} |= "request"
```
**Failed logins:**
```logql
{container="lldap"} |= "failed" |= "login"
```
## Integration Tips
### Embed in Homarr
Add Grafana dashboards to Homarr:
1. Edit Homarr dashboard
2. Add **iFrame widget**
3. URL: `https://logs.fig.systems/d/<dashboard-id>`
### Use with Backups
Include logging data in backups:
```bash
cd ~/homelab/compose/monitoring/logging
tar czf logging-backup-$(date +%Y%m%d).tar.gz loki-data/ grafana-data/
```
### Combine with Metrics
Later you can add Prometheus for metrics:
- Loki for logs
- Prometheus for metrics (CPU, RAM, disk)
- Both in Grafana dashboards
## Common LogQL Patterns
### Filter by Time
```logql
# Last 5 minutes
{container="name"} | __timestamp__ >= now() - 5m
# Specific time range (in Grafana UI time picker)
# Or use: __timestamp__ >= "2024-01-01T00:00:00Z"
```
### Pattern Matching
```logql
# Contains
{container="name"} |= "error"
# Does not contain
{container="name"} != "404"
# Regex match
{container="name"} |~ "error|fail|critical"
# Regex does not match
{container="name"} !~ "debug|trace"
```
### Aggregations
```logql
# Count
count_over_time({container="name"}[5m])
# Rate
rate({container="name"}[5m])
# Sum
sum(count_over_time({job="docker"}[1h])) by (container)
# Average
avg_over_time({container="name"} | unwrap bytes [5m])
```
### JSON Parsing
```logql
# Parse JSON and filter
{container="name"} | json | level="error"
# Extract field
{container="name"} | json | line_format "{{.message}}"
# Filter on JSON field
{container="name"} | json status_code="500"
```
## Resource Usage
**Typical usage:**
- **Loki**: 200-500MB RAM, 1-5GB disk/week
- **Promtail**: 50-100MB RAM
- **Grafana**: 100-200MB RAM, ~100MB disk
- **Total**: ~400-700MB RAM
**For 20 containers with moderate logging**
## Next Steps
1. ✅ Explore your logs in Grafana
2. ✅ Create useful dashboards
3. ✅ Set up alerts for critical errors
4. ⬜ Add Prometheus for metrics (future)
5. ⬜ Add Tempo for distributed tracing (future)
6. ⬜ Create log-based SLA tracking
## Resources
- [Loki Documentation](https://grafana.com/docs/loki/latest/)
- [LogQL Reference](https://grafana.com/docs/loki/latest/logql/)
- [Grafana Dashboards](https://grafana.com/grafana/dashboards/)
- [Community Dashboards](https://grafana.com/grafana/dashboards/?search=loki)
---
**Now debug issues 10x faster with centralized logs!** 🔍