diff --git a/README.md b/README.md index 7b1d002..a3c5de5 100644 --- a/README.md +++ b/README.md @@ -31,6 +31,11 @@ compose/ │ ├── radarr/ # Movie management │ ├── sabnzbd/ # Usenet downloader │ └── qbittorrent/# Torrent client +├── monitoring/ # Monitoring & logging +│ └── logging/ # Centralized logging stack +│ ├── loki/ # Log aggregation (loki.fig.systems) +│ ├── promtail/ # Log collection agent +│ └── grafana/ # Log visualization (logs.fig.systems) └── services/ # Utility services ├── homarr/ # Dashboard (home.fig.systems) ├── backrest/ # Backup manager (backup.fig.systems) @@ -58,6 +63,10 @@ All services are accessible via: | Traefik Dashboard | traefik.fig.systems | ✅ | | LLDAP | lldap.fig.systems | ✅ | | Tinyauth | auth.fig.systems | ❌ | +| **Monitoring** | | | +| Grafana (Logs) | logs.fig.systems | ❌* | +| Loki (API) | loki.fig.systems | ✅ | +| **Dashboard & Management** | | | | Homarr | home.fig.systems | ✅ | | Backrest | backup.fig.systems | ✅ | | Jellyfin | flix.fig.systems | ❌* | @@ -149,6 +158,9 @@ cd compose/services/linkwarden && docker compose up -d cd compose/services/vikunja && docker compose up -d cd compose/services/homarr && docker compose up -d cd compose/services/backrest && docker compose up -d + +# Monitoring (optional but recommended) +cd compose/monitoring/logging && docker compose up -d cd compose/services/lubelogger && docker compose up -d cd compose/services/calibre-web && docker compose up -d cd compose/services/booklore && docker compose up -d diff --git a/compose/monitoring/logging/.env b/compose/monitoring/logging/.env new file mode 100644 index 0000000..ae11f01 --- /dev/null +++ b/compose/monitoring/logging/.env @@ -0,0 +1,28 @@ +# Centralized Logging Configuration + +# Timezone +TZ=America/Los_Angeles + +# Grafana Admin Credentials +# Default username: admin +# Change this password immediately after first login! +# Example format: MyGr@f@n@P@ssw0rd!2024 +GF_SECURITY_ADMIN_PASSWORD=changeme_please_set_secure_grafana_password + +# Grafana Configuration +GF_SERVER_ROOT_URL=https://logs.fig.systems +GF_SERVER_DOMAIN=logs.fig.systems + +# Disable Grafana analytics (optional) +GF_ANALYTICS_REPORTING_ENABLED=false +GF_ANALYTICS_CHECK_FOR_UPDATES=false + +# Allow embedding (for Homarr dashboard integration) +GF_SECURITY_ALLOW_EMBEDDING=true + +# Loki Configuration +# Retention period in days (default: 30 days) +LOKI_RETENTION_PERIOD=30d + +# Promtail Configuration +# No additional configuration needed - configured via promtail-config.yaml diff --git a/compose/monitoring/logging/.gitignore b/compose/monitoring/logging/.gitignore new file mode 100644 index 0000000..38bf7ec --- /dev/null +++ b/compose/monitoring/logging/.gitignore @@ -0,0 +1,13 @@ +# Loki data +loki-data/ + +# Grafana data +grafana-data/ + +# Keep provisioning and config files +!grafana-provisioning/ +!loki-config.yaml +!promtail-config.yaml + +# Keep .env.example if created +!.env.example diff --git a/compose/monitoring/logging/README.md b/compose/monitoring/logging/README.md new file mode 100644 index 0000000..2c29d61 --- /dev/null +++ b/compose/monitoring/logging/README.md @@ -0,0 +1,527 @@ +# Centralized Logging Stack + +Grafana Loki + Promtail + Grafana for centralized Docker container log aggregation and visualization. + +## Overview + +This stack provides centralized logging for all Docker containers in your homelab: + +- **Loki**: Log aggregation backend (like Prometheus but for logs) +- **Promtail**: Agent that collects logs from Docker containers +- **Grafana**: Web UI for querying and visualizing logs + +### Why This Stack? + +- ✅ **Lightweight**: Minimal resource usage compared to ELK stack +- ✅ **Docker-native**: Automatically discovers and collects logs from all containers +- ✅ **Powerful search**: LogQL query language for filtering and searching +- ✅ **Retention**: Configurable log retention (default: 30 days) +- ✅ **Labels**: Automatic labeling by container, image, compose project +- ✅ **Integrated**: Works seamlessly with existing homelab services + +## Quick Start + +### 1. Configure Environment + +```bash +cd ~/homelab/compose/monitoring/logging +nano .env +``` + +**Update:** +```env +# Change this! +GF_SECURITY_ADMIN_PASSWORD= +``` + +### 2. Deploy the Stack + +```bash +docker compose up -d +``` + +### 3. Access Grafana + +Go to: **https://logs.fig.systems** + +**Default credentials:** +- Username: `admin` +- Password: `` + +**⚠️ Change the password immediately after first login!** + +### 4. View Logs + +1. Click "Explore" (compass icon) in left sidebar +2. Select "Loki" datasource (should be selected by default) +3. Start querying logs! + +## Usage + +### Basic Log Queries + +**View all logs from a container:** +```logql +{container="jellyfin"} +``` + +**View logs from a compose project:** +```logql +{compose_project="media"} +``` + +**View logs from specific service:** +```logql +{compose_service="lldap"} +``` + +**Filter by log level:** +```logql +{container="immich_server"} |= "error" +``` + +**Exclude lines:** +```logql +{container="traefik"} != "404" +``` + +**Multiple filters:** +```logql +{container="jellyfin"} |= "error" != "404" +``` + +### Advanced Queries + +**Count errors per minute:** +```logql +sum(count_over_time({container="jellyfin"} |= "error" [1m])) by (container) +``` + +**Rate of logs:** +```logql +rate({container="traefik"}[5m]) +``` + +**Logs from last hour:** +```logql +{container="immich_server"} | __timestamp__ >= now() - 1h +``` + +**Filter by multiple containers:** +```logql +{container=~"jellyfin|immich.*|sonarr"} +``` + +**Extract and filter JSON:** +```logql +{container="linkwarden"} | json | level="error" +``` + +## Configuration + +### Log Retention + +Default: **30 days** + +To change retention period: + +**Edit `.env`:** +```env +LOKI_RETENTION_PERIOD=60d # Keep logs for 60 days +``` + +**Edit `loki-config.yaml`:** +```yaml +limits_config: + retention_period: 60d # Must match .env + +table_manager: + retention_period: 60d # Must match above +``` + +**Restart:** +```bash +docker compose restart loki +``` + +### Adjust Resource Limits + +**Edit `loki-config.yaml`:** +```yaml +limits_config: + ingestion_rate_mb: 10 # MB/sec per stream + ingestion_burst_size_mb: 20 # Burst size +``` + +### Add Custom Labels + +**Edit `promtail-config.yaml`:** +```yaml +scrape_configs: + - job_name: docker + docker_sd_configs: + - host: unix:///var/run/docker.sock + + relabel_configs: + # Add custom label + - source_labels: ['__meta_docker_container_label_environment'] + target_label: 'environment' +``` + +## How It Works + +### Architecture + +``` +Docker Containers + ↓ (logs via Docker socket) +Promtail (scrapes and ships) + ↓ (HTTP push) +Loki (stores and indexes) + ↓ (LogQL queries) +Grafana (visualization) +``` + +### Log Collection + +Promtail automatically collects logs from: +1. **All Docker containers** via Docker socket +2. **System logs** from `/var/log` + +Logs are labeled with: +- `container`: Container name +- `image`: Docker image +- `compose_project`: Docker Compose project name +- `compose_service`: Service name from compose.yaml +- `stream`: stdout or stderr + +### Storage + +Logs are stored in: +- **Location**: `./loki-data/` +- **Format**: Compressed chunks +- **Index**: BoltDB +- **Retention**: Automatic cleanup after retention period + +## Integration with Services + +### Option 1: Automatic (Default) + +Promtail automatically discovers all containers. No changes needed! + +### Option 2: Explicit Labels (Recommended) + +Add labels to services for better organization: + +**Edit any service's `compose.yaml`:** +```yaml +services: + servicename: + # ... existing config ... + labels: + # ... existing labels ... + + # Add logging labels + logging: "promtail" + log_level: "info" + environment: "production" +``` + +These labels will be available in Loki for filtering. + +### Option 3: Send Logs Directly to Loki + +Instead of Promtail scraping, send logs directly: + +**Edit service `compose.yaml`:** +```yaml +services: + servicename: + # ... existing config ... + logging: + driver: loki + options: + loki-url: "http://loki:3100/loki/api/v1/push" + loki-external-labels: "container={{.Name}},compose_project={{.Config.Labels[\"com.docker.compose.project\"]}}" +``` + +**Note**: This requires the Loki Docker driver plugin (not recommended for simplicity). + +## Grafana Dashboards + +### Built-in Explore + +Best way to start - use Grafana's Explore view: +1. Click "Explore" icon (compass) +2. Select "Loki" datasource +3. Use builder to create queries +4. Save interesting queries + +### Pre-built Dashboards + +You can import community dashboards: + +1. Go to Dashboards → Import +2. Use dashboard ID: `13639` (Docker logs dashboard) +3. Select "Loki" as datasource +4. Import + +### Create Custom Dashboard + +1. Click "+" → "Dashboard" +2. Add panel +3. Select Loki datasource +4. Build query using LogQL +5. Save dashboard + +**Example panels:** +- Error count by container +- Log volume over time +- Top 10 logging containers +- Recent errors table + +## Alerting + +### Create Log-Based Alerts + +1. Go to Alerting → Alert rules +2. Create new alert rule +3. Query: `sum(count_over_time({container="jellyfin"} |= "error" [5m])) > 10` +4. Set thresholds and notification channels +5. Save + +**Example alerts:** +- Too many errors in container +- Container restarted +- Disk space warnings +- Failed authentication attempts + +## Troubleshooting + +### Promtail Not Collecting Logs + +**Check Promtail is running:** +```bash +docker logs promtail +``` + +**Verify Docker socket access:** +```bash +docker exec promtail ls -la /var/run/docker.sock +``` + +**Test Promtail config:** +```bash +docker exec promtail promtail -config.file=/etc/promtail/config.yaml -dry-run +``` + +### Loki Not Receiving Logs + +**Check Loki health:** +```bash +curl http://localhost:3100/ready +``` + +**View Loki logs:** +```bash +docker logs loki +``` + +**Check Promtail is pushing:** +```bash +docker logs promtail | grep -i push +``` + +### Grafana Can't Connect to Loki + +**Test Loki from Grafana container:** +```bash +docker exec grafana wget -O- http://loki:3100/ready +``` + +**Check datasource configuration:** +- Grafana → Configuration → Data sources → Loki +- URL should be: `http://loki:3100` + +### No Logs Appearing + +**Wait a few minutes** - logs take time to appear + +**Check retention:** +```bash +# Logs older than retention period are deleted +grep retention_period loki-config.yaml +``` + +**Verify time range in Grafana:** +- Make sure selected time range includes recent logs +- Try "Last 5 minutes" + +### High Disk Usage + +**Check Loki data size:** +```bash +du -sh ./loki-data +``` + +**Reduce retention:** +```env +LOKI_RETENTION_PERIOD=7d # Shorter retention +``` + +**Manual cleanup:** +```bash +# Stop Loki +docker compose stop loki + +# Remove old data (CAREFUL!) +rm -rf ./loki-data/chunks/* + +# Restart +docker compose start loki +``` + +## Performance Tuning + +### For Low Resources (< 8GB RAM) + +**Edit `loki-config.yaml`:** +```yaml +limits_config: + retention_period: 7d # Shorter retention + ingestion_rate_mb: 5 # Lower rate + ingestion_burst_size_mb: 10 # Lower burst + +query_range: + results_cache: + cache: + embedded_cache: + max_size_mb: 50 # Smaller cache +``` + +### For High Volume + +**Edit `loki-config.yaml`:** +```yaml +limits_config: + ingestion_rate_mb: 20 # Higher rate + ingestion_burst_size_mb: 40 # Higher burst + +query_range: + results_cache: + cache: + embedded_cache: + max_size_mb: 200 # Larger cache +``` + +## Best Practices + +### Log Levels + +Configure services to log appropriately: +- **Production**: `info` or `warning` +- **Development**: `debug` +- **Troubleshooting**: `trace` + +Too much logging = higher resource usage! + +### Retention Strategy + +- **Critical services**: 60+ days +- **Normal services**: 30 days +- **High volume services**: 7-14 days + +### Query Optimization + +- **Use specific labels**: `{container="name"}` not `{container=~".*"}` +- **Limit time range**: Query hours not days when possible +- **Use filters early**: `|= "error"` before parsing +- **Avoid regex when possible**: `|= "string"` faster than `|~ "reg.*ex"` + +### Storage Management + +Monitor disk usage: +```bash +# Check regularly +du -sh compose/monitoring/logging/loki-data + +# Set up alerts when > 80% disk usage +``` + +## Integration with Homarr + +Grafana will automatically appear in Homarr dashboard. You can also: + +### Add Grafana Widget to Homarr + +1. Edit Homarr dashboard +2. Add "iFrame" widget +3. URL: `https://logs.fig.systems/d/` +4. This embeds Grafana dashboards in Homarr + +## Backup and Restore + +### Backup + +```bash +# Backup Loki data +tar czf loki-backup-$(date +%Y%m%d).tar.gz ./loki-data + +# Backup Grafana dashboards and datasources +tar czf grafana-backup-$(date +%Y%m%d).tar.gz ./grafana-data ./grafana-provisioning +``` + +### Restore + +```bash +# Restore Loki +docker compose down +tar xzf loki-backup-YYYYMMDD.tar.gz +docker compose up -d + +# Restore Grafana +docker compose down +tar xzf grafana-backup-YYYYMMDD.tar.gz +docker compose up -d +``` + +## Updating + +```bash +cd ~/homelab/compose/monitoring/logging + +# Pull latest images +docker compose pull + +# Restart with new images +docker compose up -d +``` + +## Resource Usage + +**Typical usage:** +- **Loki**: 200-500MB RAM +- **Promtail**: 50-100MB RAM +- **Grafana**: 100-200MB RAM +- **Disk**: ~1-5GB per week (depends on log volume) + +## Next Steps + +1. ✅ Deploy the stack +2. ✅ Login to Grafana and explore logs +3. ✅ Create useful dashboards +4. ✅ Set up alerts for errors +5. ✅ Configure retention based on needs +6. ⬜ Add Prometheus for metrics (future) +7. ⬜ Add Tempo for distributed tracing (future) + +## Resources + +- [Loki Documentation](https://grafana.com/docs/loki/latest/) +- [LogQL Query Language](https://grafana.com/docs/loki/latest/logql/) +- [Promtail Configuration](https://grafana.com/docs/loki/latest/clients/promtail/configuration/) +- [Grafana Tutorials](https://grafana.com/tutorials/) + +--- + +**Now you can see logs from all containers in one place!** 🎉 diff --git a/compose/monitoring/logging/compose.yaml b/compose/monitoring/logging/compose.yaml new file mode 100644 index 0000000..7ec0ff8 --- /dev/null +++ b/compose/monitoring/logging/compose.yaml @@ -0,0 +1,123 @@ +# Centralized Logging Stack - Loki + Promtail + Grafana +# Docs: https://grafana.com/docs/loki/latest/ + +services: + loki: + container_name: loki + image: grafana/loki:2.9.3 + restart: unless-stopped + + env_file: + - .env + + volumes: + - ./loki-config.yaml:/etc/loki/local-config.yaml:ro + - ./loki-data:/loki + + command: -config.file=/etc/loki/local-config.yaml + + networks: + - homelab + - logging_internal + + labels: + # Traefik (for API access) + traefik.enable: true + traefik.docker.network: homelab + + # Loki API + traefik.http.routers.loki.rule: Host(`loki.fig.systems`) || Host(`loki.edfig.dev`) + traefik.http.routers.loki.entrypoints: websecure + traefik.http.routers.loki.tls.certresolver: letsencrypt + traefik.http.services.loki.loadbalancer.server.port: 3100 + + # SSO Protection + traefik.http.routers.loki.middlewares: tinyauth + + # Homarr Discovery + homarr.name: Loki (Logs) + homarr.group: Monitoring + homarr.icon: mdi:math-log + + healthcheck: + test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3100/ready || exit 1"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 40s + + promtail: + container_name: promtail + image: grafana/promtail:2.9.3 + restart: unless-stopped + + env_file: + - .env + + volumes: + - ./promtail-config.yaml:/etc/promtail/config.yaml:ro + - /var/log:/var/log:ro + - /var/lib/docker/containers:/var/lib/docker/containers:ro + - /var/run/docker.sock:/var/run/docker.sock:ro + + command: -config.file=/etc/promtail/config.yaml + + networks: + - logging_internal + + depends_on: + loki: + condition: service_healthy + + grafana: + container_name: grafana + image: grafana/grafana:10.2.3 + restart: unless-stopped + + env_file: + - .env + + volumes: + - ./grafana-data:/var/lib/grafana + - ./grafana-provisioning:/etc/grafana/provisioning + + networks: + - homelab + - logging_internal + + depends_on: + loki: + condition: service_healthy + + labels: + # Traefik + traefik.enable: true + traefik.docker.network: homelab + + # Grafana Web UI + traefik.http.routers.grafana.rule: Host(`logs.fig.systems`) || Host(`logs.edfig.dev`) + traefik.http.routers.grafana.entrypoints: websecure + traefik.http.routers.grafana.tls.certresolver: letsencrypt + traefik.http.services.grafana.loadbalancer.server.port: 3000 + + # SSO Protection (optional - Grafana has its own auth) + # traefik.http.routers.grafana.middlewares: tinyauth + + # Homarr Discovery + homarr.name: Grafana (Logs Dashboard) + homarr.group: Monitoring + homarr.icon: mdi:chart-line + + healthcheck: + test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3000/api/health || exit 1"] + interval: 30s + timeout: 10s + retries: 3 + start_period: 40s + +networks: + homelab: + external: true + logging_internal: + name: logging_internal + driver: bridge diff --git a/compose/monitoring/logging/grafana-provisioning/dashboards/dashboards.yaml b/compose/monitoring/logging/grafana-provisioning/dashboards/dashboards.yaml new file mode 100644 index 0000000..0ef43ed --- /dev/null +++ b/compose/monitoring/logging/grafana-provisioning/dashboards/dashboards.yaml @@ -0,0 +1,13 @@ +apiVersion: 1 + +providers: + - name: 'Loki Dashboards' + orgId: 1 + folder: 'Loki' + type: file + disableDeletion: false + updateIntervalSeconds: 10 + allowUiUpdates: true + options: + path: /etc/grafana/provisioning/dashboards + foldersFromFilesStructure: true diff --git a/compose/monitoring/logging/grafana-provisioning/datasources/loki.yaml b/compose/monitoring/logging/grafana-provisioning/datasources/loki.yaml new file mode 100644 index 0000000..1fdd70f --- /dev/null +++ b/compose/monitoring/logging/grafana-provisioning/datasources/loki.yaml @@ -0,0 +1,17 @@ +apiVersion: 1 + +datasources: + - name: Loki + type: loki + access: proxy + url: http://loki:3100 + isDefault: true + editable: true + jsonData: + maxLines: 1000 + derivedFields: + # Extract traceID from logs for distributed tracing (optional) + - datasourceUid: tempo + matcherRegex: "traceID=(\\w+)" + name: TraceID + url: "$${__value.raw}" diff --git a/compose/monitoring/logging/loki-config.yaml b/compose/monitoring/logging/loki-config.yaml new file mode 100644 index 0000000..9139189 --- /dev/null +++ b/compose/monitoring/logging/loki-config.yaml @@ -0,0 +1,57 @@ +auth_enabled: false + +server: + http_listen_port: 3100 + grpc_listen_port: 9096 + +common: + instance_addr: 127.0.0.1 + path_prefix: /loki + storage: + filesystem: + chunks_directory: /loki/chunks + rules_directory: /loki/rules + replication_factor: 1 + ring: + kvstore: + store: inmemory + +query_range: + results_cache: + cache: + embedded_cache: + enabled: true + max_size_mb: 100 + +schema_config: + configs: + - from: 2020-10-24 + store: boltdb-shipper + object_store: filesystem + schema: v11 + index: + prefix: index_ + period: 24h + +ruler: + alertmanager_url: http://localhost:9093 + +# Retention - keeps logs for 30 days +limits_config: + retention_period: 30d + ingestion_rate_mb: 10 + ingestion_burst_size_mb: 20 + +# Cleanup old logs +compactor: + working_directory: /loki/compactor + shared_store: filesystem + compaction_interval: 10m + retention_enabled: true + retention_delete_delay: 2h + retention_delete_worker_count: 150 + +# Table manager for retention +table_manager: + retention_deletes_enabled: true + retention_period: 30d diff --git a/compose/monitoring/logging/promtail-config.yaml b/compose/monitoring/logging/promtail-config.yaml new file mode 100644 index 0000000..e1a4f3f --- /dev/null +++ b/compose/monitoring/logging/promtail-config.yaml @@ -0,0 +1,70 @@ +server: + http_listen_port: 9080 + grpc_listen_port: 0 + +positions: + filename: /tmp/positions.yaml + +clients: + - url: http://loki:3100/loki/api/v1/push + +scrape_configs: + # Docker containers logs + - job_name: docker + docker_sd_configs: + - host: unix:///var/run/docker.sock + refresh_interval: 5s + filters: + - name: label + values: ["logging=promtail"] + + relabel_configs: + # Use container name as job + - source_labels: ['__meta_docker_container_name'] + regex: '/(.*)' + target_label: 'container' + + # Use image name + - source_labels: ['__meta_docker_container_image'] + target_label: 'image' + + # Use container ID + - source_labels: ['__meta_docker_container_id'] + target_label: 'container_id' + + # Add all docker labels as labels + - action: labelmap + regex: __meta_docker_container_label_(.+) + + # All Docker containers (fallback) + - job_name: docker_all + docker_sd_configs: + - host: unix:///var/run/docker.sock + refresh_interval: 5s + + relabel_configs: + - source_labels: ['__meta_docker_container_name'] + regex: '/(.*)' + target_label: 'container' + + - source_labels: ['__meta_docker_container_image'] + target_label: 'image' + + - source_labels: ['__meta_docker_container_log_stream'] + target_label: 'stream' + + # Extract compose project and service + - source_labels: ['__meta_docker_container_label_com_docker_compose_project'] + target_label: 'compose_project' + + - source_labels: ['__meta_docker_container_label_com_docker_compose_service'] + target_label: 'compose_service' + + # System logs + - job_name: system + static_configs: + - targets: + - localhost + labels: + job: varlogs + __path__: /var/log/*log diff --git a/docs/guides/centralized-logging.md b/docs/guides/centralized-logging.md new file mode 100644 index 0000000..d850ce6 --- /dev/null +++ b/docs/guides/centralized-logging.md @@ -0,0 +1,445 @@ +# Centralized Logging with Loki + +Guide for setting up and using the centralized logging stack (Loki + Promtail + Grafana). + +## Overview + +The logging stack provides centralized log aggregation and visualization for all Docker containers: + +- **Loki**: Log aggregation backend (stores and indexes logs) +- **Promtail**: Agent that collects logs from Docker containers +- **Grafana**: Web UI for querying and visualizing logs + +### Why Centralized Logging? + +**Problems without it:** +- Logs scattered across many containers +- Hard to correlate events across services +- Logs lost when containers restart +- No easy way to search historical logs + +**Benefits:** +- ✅ Single place to view all logs +- ✅ Powerful search and filtering (LogQL) +- ✅ Persist logs even after container restarts +- ✅ Correlate events across services +- ✅ Create dashboards and alerts +- ✅ Configurable retention (30 days default) + +## Quick Setup + +### 1. Configure Grafana Password + +```bash +cd ~/homelab/compose/monitoring/logging +nano .env +``` + +**Update:** +```env +GF_SECURITY_ADMIN_PASSWORD= +``` + +**Generate password:** +```bash +openssl rand -base64 20 +``` + +### 2. Deploy + +```bash +cd ~/homelab/compose/monitoring/logging +docker compose up -d +``` + +### 3. Access Grafana + +Go to: **https://logs.fig.systems** + +**Login:** +- Username: `admin` +- Password: `` + +### 4. Start Exploring Logs + +1. Click **Explore** (compass icon) in left sidebar +2. Loki datasource should be selected +3. Start querying! + +## Basic Usage + +### View Logs from a Container + +```logql +{container="jellyfin"} +``` + +### View Last Hour's Logs + +```logql +{container="immich_server"} | __timestamp__ >= now() - 1h +``` + +### Filter for Errors + +```logql +{container="traefik"} |= "error" +``` + +### Exclude Lines + +```logql +{container="traefik"} != "404" +``` + +### Multiple Containers + +```logql +{container=~"jellyfin|immich.*"} +``` + +### By Compose Project + +```logql +{compose_project="media"} +``` + +## Advanced Queries + +### Count Errors + +```logql +sum(count_over_time({container="jellyfin"} |= "error" [5m])) +``` + +### Error Rate + +```logql +rate({container="traefik"} |= "error" [5m]) +``` + +### Parse JSON Logs + +```logql +{container="linkwarden"} | json | level="error" +``` + +### Top 10 Error Messages + +```logql +topk(10, + sum by (container) ( + count_over_time({job="docker"} |= "error" [24h]) + ) +) +``` + +## Creating Dashboards + +### Import Pre-built Dashboard + +1. Go to **Dashboards** → **Import** +2. Dashboard ID: **13639** (Docker logs) +3. Select **Loki** as datasource +4. Click **Import** + +### Create Custom Dashboard + +1. Click **+** → **Dashboard** +2. **Add panel** +3. Select **Loki** datasource +4. Build query +5. Choose visualization (logs, graph, table, etc.) +6. **Save** + +**Example panels:** +- Error count by container +- Log volume over time +- Recent errors (table) +- Top logging containers + +## Setting Up Alerts + +### Create Alert Rule + +1. **Alerting** → **Alert rules** → **New alert rule** +2. **Query:** + ```logql + sum(count_over_time({container="jellyfin"} |= "error" [5m])) > 10 + ``` +3. **Condition**: Alert when > 10 errors in 5 minutes +4. **Configure** notification channel (email, webhook, etc.) +5. **Save** + +**Example alerts:** +- Too many errors in service +- Service stopped logging (might have crashed) +- Authentication failures +- Disk space warnings + +## Configuration + +### Change Log Retention + +**Default: 30 days** + +Edit `.env`: +```env +LOKI_RETENTION_PERIOD=60d # 60 days +``` + +Edit `loki-config.yaml`: +```yaml +limits_config: + retention_period: 60d + +table_manager: + retention_period: 60d +``` + +Restart: +```bash +docker compose restart loki +``` + +### Adjust Resource Limits + +For low-resource systems, edit `loki-config.yaml`: + +```yaml +limits_config: + retention_period: 7d # Shorter retention + ingestion_rate_mb: 5 # Lower rate + +query_range: + results_cache: + cache: + embedded_cache: + max_size_mb: 50 # Smaller cache +``` + +### Add Labels to Services + +Make services easier to find by adding labels: + +**Edit service `compose.yaml`:** +```yaml +services: + myservice: + labels: + logging: "promtail" + environment: "production" + tier: "frontend" +``` + +Query with these labels: +```logql +{environment="production", tier="frontend"} +``` + +## Troubleshooting + +### No Logs Appearing + +**Wait a few minutes** - initial log collection takes time + +**Check Promtail:** +```bash +docker logs promtail +``` + +**Check Loki:** +```bash +docker logs loki +``` + +**Verify Promtail can reach Loki:** +```bash +docker exec promtail wget -O- http://loki:3100/ready +``` + +### Grafana Can't Connect to Loki + +**Test from Grafana:** +```bash +docker exec grafana wget -O- http://loki:3100/ready +``` + +**Check datasource:** Grafana → Configuration → Data sources → Loki +- URL should be: `http://loki:3100` + +### High Disk Usage + +**Check size:** +```bash +du -sh compose/monitoring/logging/loki-data +``` + +**Reduce retention:** +```env +LOKI_RETENTION_PERIOD=7d +``` + +**Manual cleanup (CAREFUL):** +```bash +docker compose stop loki +rm -rf loki-data/chunks/* +docker compose start loki +``` + +### Slow Queries + +**Optimize queries:** +- Use specific labels: `{container="name"}` not `{container=~".*"}` +- Limit time range: Hours not days +- Filter early: `|= "error"` before parsing +- Avoid complex regex + +## Best Practices + +### Log Verbosity + +Configure appropriate log levels per environment: +- **Production**: `info` or `warning` +- **Debugging**: `debug` or `trace` + +Too verbose = wasted resources! + +### Retention Strategy + +Match retention to importance: +- **Critical services**: 60-90 days +- **Normal services**: 30 days +- **High-volume services**: 7-14 days + +### Useful Queries to Save + +Create saved queries for common tasks: + +**Recent errors:** +```logql +{job="docker"} |= "error" | __timestamp__ >= now() - 15m +``` + +**Service health check:** +```logql +{container="traefik"} |= "request" +``` + +**Failed logins:** +```logql +{container="lldap"} |= "failed" |= "login" +``` + +## Integration Tips + +### Embed in Homarr + +Add Grafana dashboards to Homarr: + +1. Edit Homarr dashboard +2. Add **iFrame widget** +3. URL: `https://logs.fig.systems/d/` + +### Use with Backups + +Include logging data in backups: + +```bash +cd ~/homelab/compose/monitoring/logging +tar czf logging-backup-$(date +%Y%m%d).tar.gz loki-data/ grafana-data/ +``` + +### Combine with Metrics + +Later you can add Prometheus for metrics: +- Loki for logs +- Prometheus for metrics (CPU, RAM, disk) +- Both in Grafana dashboards + +## Common LogQL Patterns + +### Filter by Time + +```logql +# Last 5 minutes +{container="name"} | __timestamp__ >= now() - 5m + +# Specific time range (in Grafana UI time picker) +# Or use: __timestamp__ >= "2024-01-01T00:00:00Z" +``` + +### Pattern Matching + +```logql +# Contains +{container="name"} |= "error" + +# Does not contain +{container="name"} != "404" + +# Regex match +{container="name"} |~ "error|fail|critical" + +# Regex does not match +{container="name"} !~ "debug|trace" +``` + +### Aggregations + +```logql +# Count +count_over_time({container="name"}[5m]) + +# Rate +rate({container="name"}[5m]) + +# Sum +sum(count_over_time({job="docker"}[1h])) by (container) + +# Average +avg_over_time({container="name"} | unwrap bytes [5m]) +``` + +### JSON Parsing + +```logql +# Parse JSON and filter +{container="name"} | json | level="error" + +# Extract field +{container="name"} | json | line_format "{{.message}}" + +# Filter on JSON field +{container="name"} | json status_code="500" +``` + +## Resource Usage + +**Typical usage:** +- **Loki**: 200-500MB RAM, 1-5GB disk/week +- **Promtail**: 50-100MB RAM +- **Grafana**: 100-200MB RAM, ~100MB disk +- **Total**: ~400-700MB RAM + +**For 20 containers with moderate logging** + +## Next Steps + +1. ✅ Explore your logs in Grafana +2. ✅ Create useful dashboards +3. ✅ Set up alerts for critical errors +4. ⬜ Add Prometheus for metrics (future) +5. ⬜ Add Tempo for distributed tracing (future) +6. ⬜ Create log-based SLA tracking + +## Resources + +- [Loki Documentation](https://grafana.com/docs/loki/latest/) +- [LogQL Reference](https://grafana.com/docs/loki/latest/logql/) +- [Grafana Dashboards](https://grafana.com/grafana/dashboards/) +- [Community Dashboards](https://grafana.com/grafana/dashboards/?search=loki) + +--- + +**Now debug issues 10x faster with centralized logs!** 🔍