feat: Add centralized logging stack with Loki, Promtail, and Grafana
Add complete centralized logging solution for all Docker containers. New services: - Loki: Log aggregation backend (loki.fig.systems) - Promtail: Log collection agent - Grafana: Log visualization (logs.fig.systems) Features: - Automatic Docker container discovery - 30-day log retention (configurable) - Powerful LogQL querying - Pre-configured Grafana datasource - Comprehensive documentation Resources: - ~400-700MB RAM for 20 containers - Automatic labeling by container/project/service - SSO protection for Loki API Documentation: - Complete setup guide - Query examples and patterns - Troubleshooting steps - Best practices
This commit is contained in:
parent
4adaa8e8be
commit
7797f89fcb
10 changed files with 1305 additions and 0 deletions
12
README.md
12
README.md
|
|
@ -31,6 +31,11 @@ compose/
|
||||||
│ ├── radarr/ # Movie management
|
│ ├── radarr/ # Movie management
|
||||||
│ ├── sabnzbd/ # Usenet downloader
|
│ ├── sabnzbd/ # Usenet downloader
|
||||||
│ └── qbittorrent/# Torrent client
|
│ └── qbittorrent/# Torrent client
|
||||||
|
├── monitoring/ # Monitoring & logging
|
||||||
|
│ └── logging/ # Centralized logging stack
|
||||||
|
│ ├── loki/ # Log aggregation (loki.fig.systems)
|
||||||
|
│ ├── promtail/ # Log collection agent
|
||||||
|
│ └── grafana/ # Log visualization (logs.fig.systems)
|
||||||
└── services/ # Utility services
|
└── services/ # Utility services
|
||||||
├── homarr/ # Dashboard (home.fig.systems)
|
├── homarr/ # Dashboard (home.fig.systems)
|
||||||
├── backrest/ # Backup manager (backup.fig.systems)
|
├── backrest/ # Backup manager (backup.fig.systems)
|
||||||
|
|
@ -58,6 +63,10 @@ All services are accessible via:
|
||||||
| Traefik Dashboard | traefik.fig.systems | ✅ |
|
| Traefik Dashboard | traefik.fig.systems | ✅ |
|
||||||
| LLDAP | lldap.fig.systems | ✅ |
|
| LLDAP | lldap.fig.systems | ✅ |
|
||||||
| Tinyauth | auth.fig.systems | ❌ |
|
| Tinyauth | auth.fig.systems | ❌ |
|
||||||
|
| **Monitoring** | | |
|
||||||
|
| Grafana (Logs) | logs.fig.systems | ❌* |
|
||||||
|
| Loki (API) | loki.fig.systems | ✅ |
|
||||||
|
| **Dashboard & Management** | | |
|
||||||
| Homarr | home.fig.systems | ✅ |
|
| Homarr | home.fig.systems | ✅ |
|
||||||
| Backrest | backup.fig.systems | ✅ |
|
| Backrest | backup.fig.systems | ✅ |
|
||||||
| Jellyfin | flix.fig.systems | ❌* |
|
| Jellyfin | flix.fig.systems | ❌* |
|
||||||
|
|
@ -149,6 +158,9 @@ cd compose/services/linkwarden && docker compose up -d
|
||||||
cd compose/services/vikunja && docker compose up -d
|
cd compose/services/vikunja && docker compose up -d
|
||||||
cd compose/services/homarr && docker compose up -d
|
cd compose/services/homarr && docker compose up -d
|
||||||
cd compose/services/backrest && docker compose up -d
|
cd compose/services/backrest && docker compose up -d
|
||||||
|
|
||||||
|
# Monitoring (optional but recommended)
|
||||||
|
cd compose/monitoring/logging && docker compose up -d
|
||||||
cd compose/services/lubelogger && docker compose up -d
|
cd compose/services/lubelogger && docker compose up -d
|
||||||
cd compose/services/calibre-web && docker compose up -d
|
cd compose/services/calibre-web && docker compose up -d
|
||||||
cd compose/services/booklore && docker compose up -d
|
cd compose/services/booklore && docker compose up -d
|
||||||
|
|
|
||||||
28
compose/monitoring/logging/.env
Normal file
28
compose/monitoring/logging/.env
Normal file
|
|
@ -0,0 +1,28 @@
|
||||||
|
# Centralized Logging Configuration
|
||||||
|
|
||||||
|
# Timezone
|
||||||
|
TZ=America/Los_Angeles
|
||||||
|
|
||||||
|
# Grafana Admin Credentials
|
||||||
|
# Default username: admin
|
||||||
|
# Change this password immediately after first login!
|
||||||
|
# Example format: MyGr@f@n@P@ssw0rd!2024
|
||||||
|
GF_SECURITY_ADMIN_PASSWORD=changeme_please_set_secure_grafana_password
|
||||||
|
|
||||||
|
# Grafana Configuration
|
||||||
|
GF_SERVER_ROOT_URL=https://logs.fig.systems
|
||||||
|
GF_SERVER_DOMAIN=logs.fig.systems
|
||||||
|
|
||||||
|
# Disable Grafana analytics (optional)
|
||||||
|
GF_ANALYTICS_REPORTING_ENABLED=false
|
||||||
|
GF_ANALYTICS_CHECK_FOR_UPDATES=false
|
||||||
|
|
||||||
|
# Allow embedding (for Homarr dashboard integration)
|
||||||
|
GF_SECURITY_ALLOW_EMBEDDING=true
|
||||||
|
|
||||||
|
# Loki Configuration
|
||||||
|
# Retention period in days (default: 30 days)
|
||||||
|
LOKI_RETENTION_PERIOD=30d
|
||||||
|
|
||||||
|
# Promtail Configuration
|
||||||
|
# No additional configuration needed - configured via promtail-config.yaml
|
||||||
13
compose/monitoring/logging/.gitignore
vendored
Normal file
13
compose/monitoring/logging/.gitignore
vendored
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
# Loki data
|
||||||
|
loki-data/
|
||||||
|
|
||||||
|
# Grafana data
|
||||||
|
grafana-data/
|
||||||
|
|
||||||
|
# Keep provisioning and config files
|
||||||
|
!grafana-provisioning/
|
||||||
|
!loki-config.yaml
|
||||||
|
!promtail-config.yaml
|
||||||
|
|
||||||
|
# Keep .env.example if created
|
||||||
|
!.env.example
|
||||||
527
compose/monitoring/logging/README.md
Normal file
527
compose/monitoring/logging/README.md
Normal file
|
|
@ -0,0 +1,527 @@
|
||||||
|
# Centralized Logging Stack
|
||||||
|
|
||||||
|
Grafana Loki + Promtail + Grafana for centralized Docker container log aggregation and visualization.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This stack provides centralized logging for all Docker containers in your homelab:
|
||||||
|
|
||||||
|
- **Loki**: Log aggregation backend (like Prometheus but for logs)
|
||||||
|
- **Promtail**: Agent that collects logs from Docker containers
|
||||||
|
- **Grafana**: Web UI for querying and visualizing logs
|
||||||
|
|
||||||
|
### Why This Stack?
|
||||||
|
|
||||||
|
- ✅ **Lightweight**: Minimal resource usage compared to ELK stack
|
||||||
|
- ✅ **Docker-native**: Automatically discovers and collects logs from all containers
|
||||||
|
- ✅ **Powerful search**: LogQL query language for filtering and searching
|
||||||
|
- ✅ **Retention**: Configurable log retention (default: 30 days)
|
||||||
|
- ✅ **Labels**: Automatic labeling by container, image, compose project
|
||||||
|
- ✅ **Integrated**: Works seamlessly with existing homelab services
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### 1. Configure Environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/homelab/compose/monitoring/logging
|
||||||
|
nano .env
|
||||||
|
```
|
||||||
|
|
||||||
|
**Update:**
|
||||||
|
```env
|
||||||
|
# Change this!
|
||||||
|
GF_SECURITY_ADMIN_PASSWORD=<your-strong-password>
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Deploy the Stack
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Access Grafana
|
||||||
|
|
||||||
|
Go to: **https://logs.fig.systems**
|
||||||
|
|
||||||
|
**Default credentials:**
|
||||||
|
- Username: `admin`
|
||||||
|
- Password: `<your GF_SECURITY_ADMIN_PASSWORD>`
|
||||||
|
|
||||||
|
**⚠️ Change the password immediately after first login!**
|
||||||
|
|
||||||
|
### 4. View Logs
|
||||||
|
|
||||||
|
1. Click "Explore" (compass icon) in left sidebar
|
||||||
|
2. Select "Loki" datasource (should be selected by default)
|
||||||
|
3. Start querying logs!
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Basic Log Queries
|
||||||
|
|
||||||
|
**View all logs from a container:**
|
||||||
|
```logql
|
||||||
|
{container="jellyfin"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**View logs from a compose project:**
|
||||||
|
```logql
|
||||||
|
{compose_project="media"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**View logs from specific service:**
|
||||||
|
```logql
|
||||||
|
{compose_service="lldap"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Filter by log level:**
|
||||||
|
```logql
|
||||||
|
{container="immich_server"} |= "error"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Exclude lines:**
|
||||||
|
```logql
|
||||||
|
{container="traefik"} != "404"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Multiple filters:**
|
||||||
|
```logql
|
||||||
|
{container="jellyfin"} |= "error" != "404"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Advanced Queries
|
||||||
|
|
||||||
|
**Count errors per minute:**
|
||||||
|
```logql
|
||||||
|
sum(count_over_time({container="jellyfin"} |= "error" [1m])) by (container)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rate of logs:**
|
||||||
|
```logql
|
||||||
|
rate({container="traefik"}[5m])
|
||||||
|
```
|
||||||
|
|
||||||
|
**Logs from last hour:**
|
||||||
|
```logql
|
||||||
|
{container="immich_server"} | __timestamp__ >= now() - 1h
|
||||||
|
```
|
||||||
|
|
||||||
|
**Filter by multiple containers:**
|
||||||
|
```logql
|
||||||
|
{container=~"jellyfin|immich.*|sonarr"}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Extract and filter JSON:**
|
||||||
|
```logql
|
||||||
|
{container="linkwarden"} | json | level="error"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Log Retention
|
||||||
|
|
||||||
|
Default: **30 days**
|
||||||
|
|
||||||
|
To change retention period:
|
||||||
|
|
||||||
|
**Edit `.env`:**
|
||||||
|
```env
|
||||||
|
LOKI_RETENTION_PERIOD=60d # Keep logs for 60 days
|
||||||
|
```
|
||||||
|
|
||||||
|
**Edit `loki-config.yaml`:**
|
||||||
|
```yaml
|
||||||
|
limits_config:
|
||||||
|
retention_period: 60d # Must match .env
|
||||||
|
|
||||||
|
table_manager:
|
||||||
|
retention_period: 60d # Must match above
|
||||||
|
```
|
||||||
|
|
||||||
|
**Restart:**
|
||||||
|
```bash
|
||||||
|
docker compose restart loki
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adjust Resource Limits
|
||||||
|
|
||||||
|
**Edit `loki-config.yaml`:**
|
||||||
|
```yaml
|
||||||
|
limits_config:
|
||||||
|
ingestion_rate_mb: 10 # MB/sec per stream
|
||||||
|
ingestion_burst_size_mb: 20 # Burst size
|
||||||
|
```
|
||||||
|
|
||||||
|
### Add Custom Labels
|
||||||
|
|
||||||
|
**Edit `promtail-config.yaml`:**
|
||||||
|
```yaml
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: docker
|
||||||
|
docker_sd_configs:
|
||||||
|
- host: unix:///var/run/docker.sock
|
||||||
|
|
||||||
|
relabel_configs:
|
||||||
|
# Add custom label
|
||||||
|
- source_labels: ['__meta_docker_container_label_environment']
|
||||||
|
target_label: 'environment'
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
### Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Docker Containers
|
||||||
|
↓ (logs via Docker socket)
|
||||||
|
Promtail (scrapes and ships)
|
||||||
|
↓ (HTTP push)
|
||||||
|
Loki (stores and indexes)
|
||||||
|
↓ (LogQL queries)
|
||||||
|
Grafana (visualization)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Log Collection
|
||||||
|
|
||||||
|
Promtail automatically collects logs from:
|
||||||
|
1. **All Docker containers** via Docker socket
|
||||||
|
2. **System logs** from `/var/log`
|
||||||
|
|
||||||
|
Logs are labeled with:
|
||||||
|
- `container`: Container name
|
||||||
|
- `image`: Docker image
|
||||||
|
- `compose_project`: Docker Compose project name
|
||||||
|
- `compose_service`: Service name from compose.yaml
|
||||||
|
- `stream`: stdout or stderr
|
||||||
|
|
||||||
|
### Storage
|
||||||
|
|
||||||
|
Logs are stored in:
|
||||||
|
- **Location**: `./loki-data/`
|
||||||
|
- **Format**: Compressed chunks
|
||||||
|
- **Index**: BoltDB
|
||||||
|
- **Retention**: Automatic cleanup after retention period
|
||||||
|
|
||||||
|
## Integration with Services
|
||||||
|
|
||||||
|
### Option 1: Automatic (Default)
|
||||||
|
|
||||||
|
Promtail automatically discovers all containers. No changes needed!
|
||||||
|
|
||||||
|
### Option 2: Explicit Labels (Recommended)
|
||||||
|
|
||||||
|
Add labels to services for better organization:
|
||||||
|
|
||||||
|
**Edit any service's `compose.yaml`:**
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
servicename:
|
||||||
|
# ... existing config ...
|
||||||
|
labels:
|
||||||
|
# ... existing labels ...
|
||||||
|
|
||||||
|
# Add logging labels
|
||||||
|
logging: "promtail"
|
||||||
|
log_level: "info"
|
||||||
|
environment: "production"
|
||||||
|
```
|
||||||
|
|
||||||
|
These labels will be available in Loki for filtering.
|
||||||
|
|
||||||
|
### Option 3: Send Logs Directly to Loki
|
||||||
|
|
||||||
|
Instead of Promtail scraping, send logs directly:
|
||||||
|
|
||||||
|
**Edit service `compose.yaml`:**
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
servicename:
|
||||||
|
# ... existing config ...
|
||||||
|
logging:
|
||||||
|
driver: loki
|
||||||
|
options:
|
||||||
|
loki-url: "http://loki:3100/loki/api/v1/push"
|
||||||
|
loki-external-labels: "container={{.Name}},compose_project={{.Config.Labels[\"com.docker.compose.project\"]}}"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: This requires the Loki Docker driver plugin (not recommended for simplicity).
|
||||||
|
|
||||||
|
## Grafana Dashboards
|
||||||
|
|
||||||
|
### Built-in Explore
|
||||||
|
|
||||||
|
Best way to start - use Grafana's Explore view:
|
||||||
|
1. Click "Explore" icon (compass)
|
||||||
|
2. Select "Loki" datasource
|
||||||
|
3. Use builder to create queries
|
||||||
|
4. Save interesting queries
|
||||||
|
|
||||||
|
### Pre-built Dashboards
|
||||||
|
|
||||||
|
You can import community dashboards:
|
||||||
|
|
||||||
|
1. Go to Dashboards → Import
|
||||||
|
2. Use dashboard ID: `13639` (Docker logs dashboard)
|
||||||
|
3. Select "Loki" as datasource
|
||||||
|
4. Import
|
||||||
|
|
||||||
|
### Create Custom Dashboard
|
||||||
|
|
||||||
|
1. Click "+" → "Dashboard"
|
||||||
|
2. Add panel
|
||||||
|
3. Select Loki datasource
|
||||||
|
4. Build query using LogQL
|
||||||
|
5. Save dashboard
|
||||||
|
|
||||||
|
**Example panels:**
|
||||||
|
- Error count by container
|
||||||
|
- Log volume over time
|
||||||
|
- Top 10 logging containers
|
||||||
|
- Recent errors table
|
||||||
|
|
||||||
|
## Alerting
|
||||||
|
|
||||||
|
### Create Log-Based Alerts
|
||||||
|
|
||||||
|
1. Go to Alerting → Alert rules
|
||||||
|
2. Create new alert rule
|
||||||
|
3. Query: `sum(count_over_time({container="jellyfin"} |= "error" [5m])) > 10`
|
||||||
|
4. Set thresholds and notification channels
|
||||||
|
5. Save
|
||||||
|
|
||||||
|
**Example alerts:**
|
||||||
|
- Too many errors in container
|
||||||
|
- Container restarted
|
||||||
|
- Disk space warnings
|
||||||
|
- Failed authentication attempts
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Promtail Not Collecting Logs
|
||||||
|
|
||||||
|
**Check Promtail is running:**
|
||||||
|
```bash
|
||||||
|
docker logs promtail
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify Docker socket access:**
|
||||||
|
```bash
|
||||||
|
docker exec promtail ls -la /var/run/docker.sock
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test Promtail config:**
|
||||||
|
```bash
|
||||||
|
docker exec promtail promtail -config.file=/etc/promtail/config.yaml -dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
### Loki Not Receiving Logs
|
||||||
|
|
||||||
|
**Check Loki health:**
|
||||||
|
```bash
|
||||||
|
curl http://localhost:3100/ready
|
||||||
|
```
|
||||||
|
|
||||||
|
**View Loki logs:**
|
||||||
|
```bash
|
||||||
|
docker logs loki
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check Promtail is pushing:**
|
||||||
|
```bash
|
||||||
|
docker logs promtail | grep -i push
|
||||||
|
```
|
||||||
|
|
||||||
|
### Grafana Can't Connect to Loki
|
||||||
|
|
||||||
|
**Test Loki from Grafana container:**
|
||||||
|
```bash
|
||||||
|
docker exec grafana wget -O- http://loki:3100/ready
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check datasource configuration:**
|
||||||
|
- Grafana → Configuration → Data sources → Loki
|
||||||
|
- URL should be: `http://loki:3100`
|
||||||
|
|
||||||
|
### No Logs Appearing
|
||||||
|
|
||||||
|
**Wait a few minutes** - logs take time to appear
|
||||||
|
|
||||||
|
**Check retention:**
|
||||||
|
```bash
|
||||||
|
# Logs older than retention period are deleted
|
||||||
|
grep retention_period loki-config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify time range in Grafana:**
|
||||||
|
- Make sure selected time range includes recent logs
|
||||||
|
- Try "Last 5 minutes"
|
||||||
|
|
||||||
|
### High Disk Usage
|
||||||
|
|
||||||
|
**Check Loki data size:**
|
||||||
|
```bash
|
||||||
|
du -sh ./loki-data
|
||||||
|
```
|
||||||
|
|
||||||
|
**Reduce retention:**
|
||||||
|
```env
|
||||||
|
LOKI_RETENTION_PERIOD=7d # Shorter retention
|
||||||
|
```
|
||||||
|
|
||||||
|
**Manual cleanup:**
|
||||||
|
```bash
|
||||||
|
# Stop Loki
|
||||||
|
docker compose stop loki
|
||||||
|
|
||||||
|
# Remove old data (CAREFUL!)
|
||||||
|
rm -rf ./loki-data/chunks/*
|
||||||
|
|
||||||
|
# Restart
|
||||||
|
docker compose start loki
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Tuning
|
||||||
|
|
||||||
|
### For Low Resources (< 8GB RAM)
|
||||||
|
|
||||||
|
**Edit `loki-config.yaml`:**
|
||||||
|
```yaml
|
||||||
|
limits_config:
|
||||||
|
retention_period: 7d # Shorter retention
|
||||||
|
ingestion_rate_mb: 5 # Lower rate
|
||||||
|
ingestion_burst_size_mb: 10 # Lower burst
|
||||||
|
|
||||||
|
query_range:
|
||||||
|
results_cache:
|
||||||
|
cache:
|
||||||
|
embedded_cache:
|
||||||
|
max_size_mb: 50 # Smaller cache
|
||||||
|
```
|
||||||
|
|
||||||
|
### For High Volume
|
||||||
|
|
||||||
|
**Edit `loki-config.yaml`:**
|
||||||
|
```yaml
|
||||||
|
limits_config:
|
||||||
|
ingestion_rate_mb: 20 # Higher rate
|
||||||
|
ingestion_burst_size_mb: 40 # Higher burst
|
||||||
|
|
||||||
|
query_range:
|
||||||
|
results_cache:
|
||||||
|
cache:
|
||||||
|
embedded_cache:
|
||||||
|
max_size_mb: 200 # Larger cache
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Log Levels
|
||||||
|
|
||||||
|
Configure services to log appropriately:
|
||||||
|
- **Production**: `info` or `warning`
|
||||||
|
- **Development**: `debug`
|
||||||
|
- **Troubleshooting**: `trace`
|
||||||
|
|
||||||
|
Too much logging = higher resource usage!
|
||||||
|
|
||||||
|
### Retention Strategy
|
||||||
|
|
||||||
|
- **Critical services**: 60+ days
|
||||||
|
- **Normal services**: 30 days
|
||||||
|
- **High volume services**: 7-14 days
|
||||||
|
|
||||||
|
### Query Optimization
|
||||||
|
|
||||||
|
- **Use specific labels**: `{container="name"}` not `{container=~".*"}`
|
||||||
|
- **Limit time range**: Query hours not days when possible
|
||||||
|
- **Use filters early**: `|= "error"` before parsing
|
||||||
|
- **Avoid regex when possible**: `|= "string"` faster than `|~ "reg.*ex"`
|
||||||
|
|
||||||
|
### Storage Management
|
||||||
|
|
||||||
|
Monitor disk usage:
|
||||||
|
```bash
|
||||||
|
# Check regularly
|
||||||
|
du -sh compose/monitoring/logging/loki-data
|
||||||
|
|
||||||
|
# Set up alerts when > 80% disk usage
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration with Homarr
|
||||||
|
|
||||||
|
Grafana will automatically appear in Homarr dashboard. You can also:
|
||||||
|
|
||||||
|
### Add Grafana Widget to Homarr
|
||||||
|
|
||||||
|
1. Edit Homarr dashboard
|
||||||
|
2. Add "iFrame" widget
|
||||||
|
3. URL: `https://logs.fig.systems/d/<dashboard-id>`
|
||||||
|
4. This embeds Grafana dashboards in Homarr
|
||||||
|
|
||||||
|
## Backup and Restore
|
||||||
|
|
||||||
|
### Backup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backup Loki data
|
||||||
|
tar czf loki-backup-$(date +%Y%m%d).tar.gz ./loki-data
|
||||||
|
|
||||||
|
# Backup Grafana dashboards and datasources
|
||||||
|
tar czf grafana-backup-$(date +%Y%m%d).tar.gz ./grafana-data ./grafana-provisioning
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restore
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restore Loki
|
||||||
|
docker compose down
|
||||||
|
tar xzf loki-backup-YYYYMMDD.tar.gz
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# Restore Grafana
|
||||||
|
docker compose down
|
||||||
|
tar xzf grafana-backup-YYYYMMDD.tar.gz
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
## Updating
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/homelab/compose/monitoring/logging
|
||||||
|
|
||||||
|
# Pull latest images
|
||||||
|
docker compose pull
|
||||||
|
|
||||||
|
# Restart with new images
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
## Resource Usage
|
||||||
|
|
||||||
|
**Typical usage:**
|
||||||
|
- **Loki**: 200-500MB RAM
|
||||||
|
- **Promtail**: 50-100MB RAM
|
||||||
|
- **Grafana**: 100-200MB RAM
|
||||||
|
- **Disk**: ~1-5GB per week (depends on log volume)
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. ✅ Deploy the stack
|
||||||
|
2. ✅ Login to Grafana and explore logs
|
||||||
|
3. ✅ Create useful dashboards
|
||||||
|
4. ✅ Set up alerts for errors
|
||||||
|
5. ✅ Configure retention based on needs
|
||||||
|
6. ⬜ Add Prometheus for metrics (future)
|
||||||
|
7. ⬜ Add Tempo for distributed tracing (future)
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Loki Documentation](https://grafana.com/docs/loki/latest/)
|
||||||
|
- [LogQL Query Language](https://grafana.com/docs/loki/latest/logql/)
|
||||||
|
- [Promtail Configuration](https://grafana.com/docs/loki/latest/clients/promtail/configuration/)
|
||||||
|
- [Grafana Tutorials](https://grafana.com/tutorials/)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Now you can see logs from all containers in one place!** 🎉
|
||||||
123
compose/monitoring/logging/compose.yaml
Normal file
123
compose/monitoring/logging/compose.yaml
Normal file
|
|
@ -0,0 +1,123 @@
|
||||||
|
# Centralized Logging Stack - Loki + Promtail + Grafana
|
||||||
|
# Docs: https://grafana.com/docs/loki/latest/
|
||||||
|
|
||||||
|
services:
|
||||||
|
loki:
|
||||||
|
container_name: loki
|
||||||
|
image: grafana/loki:2.9.3
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
env_file:
|
||||||
|
- .env
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
- ./loki-config.yaml:/etc/loki/local-config.yaml:ro
|
||||||
|
- ./loki-data:/loki
|
||||||
|
|
||||||
|
command: -config.file=/etc/loki/local-config.yaml
|
||||||
|
|
||||||
|
networks:
|
||||||
|
- homelab
|
||||||
|
- logging_internal
|
||||||
|
|
||||||
|
labels:
|
||||||
|
# Traefik (for API access)
|
||||||
|
traefik.enable: true
|
||||||
|
traefik.docker.network: homelab
|
||||||
|
|
||||||
|
# Loki API
|
||||||
|
traefik.http.routers.loki.rule: Host(`loki.fig.systems`) || Host(`loki.edfig.dev`)
|
||||||
|
traefik.http.routers.loki.entrypoints: websecure
|
||||||
|
traefik.http.routers.loki.tls.certresolver: letsencrypt
|
||||||
|
traefik.http.services.loki.loadbalancer.server.port: 3100
|
||||||
|
|
||||||
|
# SSO Protection
|
||||||
|
traefik.http.routers.loki.middlewares: tinyauth
|
||||||
|
|
||||||
|
# Homarr Discovery
|
||||||
|
homarr.name: Loki (Logs)
|
||||||
|
homarr.group: Monitoring
|
||||||
|
homarr.icon: mdi:math-log
|
||||||
|
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3100/ready || exit 1"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 40s
|
||||||
|
|
||||||
|
promtail:
|
||||||
|
container_name: promtail
|
||||||
|
image: grafana/promtail:2.9.3
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
env_file:
|
||||||
|
- .env
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
- ./promtail-config.yaml:/etc/promtail/config.yaml:ro
|
||||||
|
- /var/log:/var/log:ro
|
||||||
|
- /var/lib/docker/containers:/var/lib/docker/containers:ro
|
||||||
|
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||||
|
|
||||||
|
command: -config.file=/etc/promtail/config.yaml
|
||||||
|
|
||||||
|
networks:
|
||||||
|
- logging_internal
|
||||||
|
|
||||||
|
depends_on:
|
||||||
|
loki:
|
||||||
|
condition: service_healthy
|
||||||
|
|
||||||
|
grafana:
|
||||||
|
container_name: grafana
|
||||||
|
image: grafana/grafana:10.2.3
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
env_file:
|
||||||
|
- .env
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
- ./grafana-data:/var/lib/grafana
|
||||||
|
- ./grafana-provisioning:/etc/grafana/provisioning
|
||||||
|
|
||||||
|
networks:
|
||||||
|
- homelab
|
||||||
|
- logging_internal
|
||||||
|
|
||||||
|
depends_on:
|
||||||
|
loki:
|
||||||
|
condition: service_healthy
|
||||||
|
|
||||||
|
labels:
|
||||||
|
# Traefik
|
||||||
|
traefik.enable: true
|
||||||
|
traefik.docker.network: homelab
|
||||||
|
|
||||||
|
# Grafana Web UI
|
||||||
|
traefik.http.routers.grafana.rule: Host(`logs.fig.systems`) || Host(`logs.edfig.dev`)
|
||||||
|
traefik.http.routers.grafana.entrypoints: websecure
|
||||||
|
traefik.http.routers.grafana.tls.certresolver: letsencrypt
|
||||||
|
traefik.http.services.grafana.loadbalancer.server.port: 3000
|
||||||
|
|
||||||
|
# SSO Protection (optional - Grafana has its own auth)
|
||||||
|
# traefik.http.routers.grafana.middlewares: tinyauth
|
||||||
|
|
||||||
|
# Homarr Discovery
|
||||||
|
homarr.name: Grafana (Logs Dashboard)
|
||||||
|
homarr.group: Monitoring
|
||||||
|
homarr.icon: mdi:chart-line
|
||||||
|
|
||||||
|
healthcheck:
|
||||||
|
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:3000/api/health || exit 1"]
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
start_period: 40s
|
||||||
|
|
||||||
|
networks:
|
||||||
|
homelab:
|
||||||
|
external: true
|
||||||
|
logging_internal:
|
||||||
|
name: logging_internal
|
||||||
|
driver: bridge
|
||||||
|
|
@ -0,0 +1,13 @@
|
||||||
|
apiVersion: 1
|
||||||
|
|
||||||
|
providers:
|
||||||
|
- name: 'Loki Dashboards'
|
||||||
|
orgId: 1
|
||||||
|
folder: 'Loki'
|
||||||
|
type: file
|
||||||
|
disableDeletion: false
|
||||||
|
updateIntervalSeconds: 10
|
||||||
|
allowUiUpdates: true
|
||||||
|
options:
|
||||||
|
path: /etc/grafana/provisioning/dashboards
|
||||||
|
foldersFromFilesStructure: true
|
||||||
|
|
@ -0,0 +1,17 @@
|
||||||
|
apiVersion: 1
|
||||||
|
|
||||||
|
datasources:
|
||||||
|
- name: Loki
|
||||||
|
type: loki
|
||||||
|
access: proxy
|
||||||
|
url: http://loki:3100
|
||||||
|
isDefault: true
|
||||||
|
editable: true
|
||||||
|
jsonData:
|
||||||
|
maxLines: 1000
|
||||||
|
derivedFields:
|
||||||
|
# Extract traceID from logs for distributed tracing (optional)
|
||||||
|
- datasourceUid: tempo
|
||||||
|
matcherRegex: "traceID=(\\w+)"
|
||||||
|
name: TraceID
|
||||||
|
url: "$${__value.raw}"
|
||||||
57
compose/monitoring/logging/loki-config.yaml
Normal file
57
compose/monitoring/logging/loki-config.yaml
Normal file
|
|
@ -0,0 +1,57 @@
|
||||||
|
auth_enabled: false
|
||||||
|
|
||||||
|
server:
|
||||||
|
http_listen_port: 3100
|
||||||
|
grpc_listen_port: 9096
|
||||||
|
|
||||||
|
common:
|
||||||
|
instance_addr: 127.0.0.1
|
||||||
|
path_prefix: /loki
|
||||||
|
storage:
|
||||||
|
filesystem:
|
||||||
|
chunks_directory: /loki/chunks
|
||||||
|
rules_directory: /loki/rules
|
||||||
|
replication_factor: 1
|
||||||
|
ring:
|
||||||
|
kvstore:
|
||||||
|
store: inmemory
|
||||||
|
|
||||||
|
query_range:
|
||||||
|
results_cache:
|
||||||
|
cache:
|
||||||
|
embedded_cache:
|
||||||
|
enabled: true
|
||||||
|
max_size_mb: 100
|
||||||
|
|
||||||
|
schema_config:
|
||||||
|
configs:
|
||||||
|
- from: 2020-10-24
|
||||||
|
store: boltdb-shipper
|
||||||
|
object_store: filesystem
|
||||||
|
schema: v11
|
||||||
|
index:
|
||||||
|
prefix: index_
|
||||||
|
period: 24h
|
||||||
|
|
||||||
|
ruler:
|
||||||
|
alertmanager_url: http://localhost:9093
|
||||||
|
|
||||||
|
# Retention - keeps logs for 30 days
|
||||||
|
limits_config:
|
||||||
|
retention_period: 30d
|
||||||
|
ingestion_rate_mb: 10
|
||||||
|
ingestion_burst_size_mb: 20
|
||||||
|
|
||||||
|
# Cleanup old logs
|
||||||
|
compactor:
|
||||||
|
working_directory: /loki/compactor
|
||||||
|
shared_store: filesystem
|
||||||
|
compaction_interval: 10m
|
||||||
|
retention_enabled: true
|
||||||
|
retention_delete_delay: 2h
|
||||||
|
retention_delete_worker_count: 150
|
||||||
|
|
||||||
|
# Table manager for retention
|
||||||
|
table_manager:
|
||||||
|
retention_deletes_enabled: true
|
||||||
|
retention_period: 30d
|
||||||
70
compose/monitoring/logging/promtail-config.yaml
Normal file
70
compose/monitoring/logging/promtail-config.yaml
Normal file
|
|
@ -0,0 +1,70 @@
|
||||||
|
server:
|
||||||
|
http_listen_port: 9080
|
||||||
|
grpc_listen_port: 0
|
||||||
|
|
||||||
|
positions:
|
||||||
|
filename: /tmp/positions.yaml
|
||||||
|
|
||||||
|
clients:
|
||||||
|
- url: http://loki:3100/loki/api/v1/push
|
||||||
|
|
||||||
|
scrape_configs:
|
||||||
|
# Docker containers logs
|
||||||
|
- job_name: docker
|
||||||
|
docker_sd_configs:
|
||||||
|
- host: unix:///var/run/docker.sock
|
||||||
|
refresh_interval: 5s
|
||||||
|
filters:
|
||||||
|
- name: label
|
||||||
|
values: ["logging=promtail"]
|
||||||
|
|
||||||
|
relabel_configs:
|
||||||
|
# Use container name as job
|
||||||
|
- source_labels: ['__meta_docker_container_name']
|
||||||
|
regex: '/(.*)'
|
||||||
|
target_label: 'container'
|
||||||
|
|
||||||
|
# Use image name
|
||||||
|
- source_labels: ['__meta_docker_container_image']
|
||||||
|
target_label: 'image'
|
||||||
|
|
||||||
|
# Use container ID
|
||||||
|
- source_labels: ['__meta_docker_container_id']
|
||||||
|
target_label: 'container_id'
|
||||||
|
|
||||||
|
# Add all docker labels as labels
|
||||||
|
- action: labelmap
|
||||||
|
regex: __meta_docker_container_label_(.+)
|
||||||
|
|
||||||
|
# All Docker containers (fallback)
|
||||||
|
- job_name: docker_all
|
||||||
|
docker_sd_configs:
|
||||||
|
- host: unix:///var/run/docker.sock
|
||||||
|
refresh_interval: 5s
|
||||||
|
|
||||||
|
relabel_configs:
|
||||||
|
- source_labels: ['__meta_docker_container_name']
|
||||||
|
regex: '/(.*)'
|
||||||
|
target_label: 'container'
|
||||||
|
|
||||||
|
- source_labels: ['__meta_docker_container_image']
|
||||||
|
target_label: 'image'
|
||||||
|
|
||||||
|
- source_labels: ['__meta_docker_container_log_stream']
|
||||||
|
target_label: 'stream'
|
||||||
|
|
||||||
|
# Extract compose project and service
|
||||||
|
- source_labels: ['__meta_docker_container_label_com_docker_compose_project']
|
||||||
|
target_label: 'compose_project'
|
||||||
|
|
||||||
|
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
|
||||||
|
target_label: 'compose_service'
|
||||||
|
|
||||||
|
# System logs
|
||||||
|
- job_name: system
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- localhost
|
||||||
|
labels:
|
||||||
|
job: varlogs
|
||||||
|
__path__: /var/log/*log
|
||||||
445
docs/guides/centralized-logging.md
Normal file
445
docs/guides/centralized-logging.md
Normal file
|
|
@ -0,0 +1,445 @@
|
||||||
|
# Centralized Logging with Loki
|
||||||
|
|
||||||
|
Guide for setting up and using the centralized logging stack (Loki + Promtail + Grafana).
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The logging stack provides centralized log aggregation and visualization for all Docker containers:
|
||||||
|
|
||||||
|
- **Loki**: Log aggregation backend (stores and indexes logs)
|
||||||
|
- **Promtail**: Agent that collects logs from Docker containers
|
||||||
|
- **Grafana**: Web UI for querying and visualizing logs
|
||||||
|
|
||||||
|
### Why Centralized Logging?
|
||||||
|
|
||||||
|
**Problems without it:**
|
||||||
|
- Logs scattered across many containers
|
||||||
|
- Hard to correlate events across services
|
||||||
|
- Logs lost when containers restart
|
||||||
|
- No easy way to search historical logs
|
||||||
|
|
||||||
|
**Benefits:**
|
||||||
|
- ✅ Single place to view all logs
|
||||||
|
- ✅ Powerful search and filtering (LogQL)
|
||||||
|
- ✅ Persist logs even after container restarts
|
||||||
|
- ✅ Correlate events across services
|
||||||
|
- ✅ Create dashboards and alerts
|
||||||
|
- ✅ Configurable retention (30 days default)
|
||||||
|
|
||||||
|
## Quick Setup
|
||||||
|
|
||||||
|
### 1. Configure Grafana Password
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/homelab/compose/monitoring/logging
|
||||||
|
nano .env
|
||||||
|
```
|
||||||
|
|
||||||
|
**Update:**
|
||||||
|
```env
|
||||||
|
GF_SECURITY_ADMIN_PASSWORD=<your-strong-password>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Generate password:**
|
||||||
|
```bash
|
||||||
|
openssl rand -base64 20
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Deploy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/homelab/compose/monitoring/logging
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Access Grafana
|
||||||
|
|
||||||
|
Go to: **https://logs.fig.systems**
|
||||||
|
|
||||||
|
**Login:**
|
||||||
|
- Username: `admin`
|
||||||
|
- Password: `<your GF_SECURITY_ADMIN_PASSWORD>`
|
||||||
|
|
||||||
|
### 4. Start Exploring Logs
|
||||||
|
|
||||||
|
1. Click **Explore** (compass icon) in left sidebar
|
||||||
|
2. Loki datasource should be selected
|
||||||
|
3. Start querying!
|
||||||
|
|
||||||
|
## Basic Usage
|
||||||
|
|
||||||
|
### View Logs from a Container
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{container="jellyfin"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### View Last Hour's Logs
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{container="immich_server"} | __timestamp__ >= now() - 1h
|
||||||
|
```
|
||||||
|
|
||||||
|
### Filter for Errors
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{container="traefik"} |= "error"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exclude Lines
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{container="traefik"} != "404"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multiple Containers
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{container=~"jellyfin|immich.*"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### By Compose Project
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{compose_project="media"}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Queries
|
||||||
|
|
||||||
|
### Count Errors
|
||||||
|
|
||||||
|
```logql
|
||||||
|
sum(count_over_time({container="jellyfin"} |= "error" [5m]))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Error Rate
|
||||||
|
|
||||||
|
```logql
|
||||||
|
rate({container="traefik"} |= "error" [5m])
|
||||||
|
```
|
||||||
|
|
||||||
|
### Parse JSON Logs
|
||||||
|
|
||||||
|
```logql
|
||||||
|
{container="linkwarden"} | json | level="error"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Top 10 Error Messages
|
||||||
|
|
||||||
|
```logql
|
||||||
|
topk(10,
|
||||||
|
sum by (container) (
|
||||||
|
count_over_time({job="docker"} |= "error" [24h])
|
||||||
|
)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Creating Dashboards
|
||||||
|
|
||||||
|
### Import Pre-built Dashboard
|
||||||
|
|
||||||
|
1. Go to **Dashboards** → **Import**
|
||||||
|
2. Dashboard ID: **13639** (Docker logs)
|
||||||
|
3. Select **Loki** as datasource
|
||||||
|
4. Click **Import**
|
||||||
|
|
||||||
|
### Create Custom Dashboard
|
||||||
|
|
||||||
|
1. Click **+** → **Dashboard**
|
||||||
|
2. **Add panel**
|
||||||
|
3. Select **Loki** datasource
|
||||||
|
4. Build query
|
||||||
|
5. Choose visualization (logs, graph, table, etc.)
|
||||||
|
6. **Save**
|
||||||
|
|
||||||
|
**Example panels:**
|
||||||
|
- Error count by container
|
||||||
|
- Log volume over time
|
||||||
|
- Recent errors (table)
|
||||||
|
- Top logging containers
|
||||||
|
|
||||||
|
## Setting Up Alerts
|
||||||
|
|
||||||
|
### Create Alert Rule
|
||||||
|
|
||||||
|
1. **Alerting** → **Alert rules** → **New alert rule**
|
||||||
|
2. **Query:**
|
||||||
|
```logql
|
||||||
|
sum(count_over_time({container="jellyfin"} |= "error" [5m])) > 10
|
||||||
|
```
|
||||||
|
3. **Condition**: Alert when > 10 errors in 5 minutes
|
||||||
|
4. **Configure** notification channel (email, webhook, etc.)
|
||||||
|
5. **Save**
|
||||||
|
|
||||||
|
**Example alerts:**
|
||||||
|
- Too many errors in service
|
||||||
|
- Service stopped logging (might have crashed)
|
||||||
|
- Authentication failures
|
||||||
|
- Disk space warnings
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Change Log Retention
|
||||||
|
|
||||||
|
**Default: 30 days**
|
||||||
|
|
||||||
|
Edit `.env`:
|
||||||
|
```env
|
||||||
|
LOKI_RETENTION_PERIOD=60d # 60 days
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit `loki-config.yaml`:
|
||||||
|
```yaml
|
||||||
|
limits_config:
|
||||||
|
retention_period: 60d
|
||||||
|
|
||||||
|
table_manager:
|
||||||
|
retention_period: 60d
|
||||||
|
```
|
||||||
|
|
||||||
|
Restart:
|
||||||
|
```bash
|
||||||
|
docker compose restart loki
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adjust Resource Limits
|
||||||
|
|
||||||
|
For low-resource systems, edit `loki-config.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
limits_config:
|
||||||
|
retention_period: 7d # Shorter retention
|
||||||
|
ingestion_rate_mb: 5 # Lower rate
|
||||||
|
|
||||||
|
query_range:
|
||||||
|
results_cache:
|
||||||
|
cache:
|
||||||
|
embedded_cache:
|
||||||
|
max_size_mb: 50 # Smaller cache
|
||||||
|
```
|
||||||
|
|
||||||
|
### Add Labels to Services
|
||||||
|
|
||||||
|
Make services easier to find by adding labels:
|
||||||
|
|
||||||
|
**Edit service `compose.yaml`:**
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
myservice:
|
||||||
|
labels:
|
||||||
|
logging: "promtail"
|
||||||
|
environment: "production"
|
||||||
|
tier: "frontend"
|
||||||
|
```
|
||||||
|
|
||||||
|
Query with these labels:
|
||||||
|
```logql
|
||||||
|
{environment="production", tier="frontend"}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### No Logs Appearing
|
||||||
|
|
||||||
|
**Wait a few minutes** - initial log collection takes time
|
||||||
|
|
||||||
|
**Check Promtail:**
|
||||||
|
```bash
|
||||||
|
docker logs promtail
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check Loki:**
|
||||||
|
```bash
|
||||||
|
docker logs loki
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify Promtail can reach Loki:**
|
||||||
|
```bash
|
||||||
|
docker exec promtail wget -O- http://loki:3100/ready
|
||||||
|
```
|
||||||
|
|
||||||
|
### Grafana Can't Connect to Loki
|
||||||
|
|
||||||
|
**Test from Grafana:**
|
||||||
|
```bash
|
||||||
|
docker exec grafana wget -O- http://loki:3100/ready
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check datasource:** Grafana → Configuration → Data sources → Loki
|
||||||
|
- URL should be: `http://loki:3100`
|
||||||
|
|
||||||
|
### High Disk Usage
|
||||||
|
|
||||||
|
**Check size:**
|
||||||
|
```bash
|
||||||
|
du -sh compose/monitoring/logging/loki-data
|
||||||
|
```
|
||||||
|
|
||||||
|
**Reduce retention:**
|
||||||
|
```env
|
||||||
|
LOKI_RETENTION_PERIOD=7d
|
||||||
|
```
|
||||||
|
|
||||||
|
**Manual cleanup (CAREFUL):**
|
||||||
|
```bash
|
||||||
|
docker compose stop loki
|
||||||
|
rm -rf loki-data/chunks/*
|
||||||
|
docker compose start loki
|
||||||
|
```
|
||||||
|
|
||||||
|
### Slow Queries
|
||||||
|
|
||||||
|
**Optimize queries:**
|
||||||
|
- Use specific labels: `{container="name"}` not `{container=~".*"}`
|
||||||
|
- Limit time range: Hours not days
|
||||||
|
- Filter early: `|= "error"` before parsing
|
||||||
|
- Avoid complex regex
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Log Verbosity
|
||||||
|
|
||||||
|
Configure appropriate log levels per environment:
|
||||||
|
- **Production**: `info` or `warning`
|
||||||
|
- **Debugging**: `debug` or `trace`
|
||||||
|
|
||||||
|
Too verbose = wasted resources!
|
||||||
|
|
||||||
|
### Retention Strategy
|
||||||
|
|
||||||
|
Match retention to importance:
|
||||||
|
- **Critical services**: 60-90 days
|
||||||
|
- **Normal services**: 30 days
|
||||||
|
- **High-volume services**: 7-14 days
|
||||||
|
|
||||||
|
### Useful Queries to Save
|
||||||
|
|
||||||
|
Create saved queries for common tasks:
|
||||||
|
|
||||||
|
**Recent errors:**
|
||||||
|
```logql
|
||||||
|
{job="docker"} |= "error" | __timestamp__ >= now() - 15m
|
||||||
|
```
|
||||||
|
|
||||||
|
**Service health check:**
|
||||||
|
```logql
|
||||||
|
{container="traefik"} |= "request"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Failed logins:**
|
||||||
|
```logql
|
||||||
|
{container="lldap"} |= "failed" |= "login"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Integration Tips
|
||||||
|
|
||||||
|
### Embed in Homarr
|
||||||
|
|
||||||
|
Add Grafana dashboards to Homarr:
|
||||||
|
|
||||||
|
1. Edit Homarr dashboard
|
||||||
|
2. Add **iFrame widget**
|
||||||
|
3. URL: `https://logs.fig.systems/d/<dashboard-id>`
|
||||||
|
|
||||||
|
### Use with Backups
|
||||||
|
|
||||||
|
Include logging data in backups:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/homelab/compose/monitoring/logging
|
||||||
|
tar czf logging-backup-$(date +%Y%m%d).tar.gz loki-data/ grafana-data/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Combine with Metrics
|
||||||
|
|
||||||
|
Later you can add Prometheus for metrics:
|
||||||
|
- Loki for logs
|
||||||
|
- Prometheus for metrics (CPU, RAM, disk)
|
||||||
|
- Both in Grafana dashboards
|
||||||
|
|
||||||
|
## Common LogQL Patterns
|
||||||
|
|
||||||
|
### Filter by Time
|
||||||
|
|
||||||
|
```logql
|
||||||
|
# Last 5 minutes
|
||||||
|
{container="name"} | __timestamp__ >= now() - 5m
|
||||||
|
|
||||||
|
# Specific time range (in Grafana UI time picker)
|
||||||
|
# Or use: __timestamp__ >= "2024-01-01T00:00:00Z"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Pattern Matching
|
||||||
|
|
||||||
|
```logql
|
||||||
|
# Contains
|
||||||
|
{container="name"} |= "error"
|
||||||
|
|
||||||
|
# Does not contain
|
||||||
|
{container="name"} != "404"
|
||||||
|
|
||||||
|
# Regex match
|
||||||
|
{container="name"} |~ "error|fail|critical"
|
||||||
|
|
||||||
|
# Regex does not match
|
||||||
|
{container="name"} !~ "debug|trace"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Aggregations
|
||||||
|
|
||||||
|
```logql
|
||||||
|
# Count
|
||||||
|
count_over_time({container="name"}[5m])
|
||||||
|
|
||||||
|
# Rate
|
||||||
|
rate({container="name"}[5m])
|
||||||
|
|
||||||
|
# Sum
|
||||||
|
sum(count_over_time({job="docker"}[1h])) by (container)
|
||||||
|
|
||||||
|
# Average
|
||||||
|
avg_over_time({container="name"} | unwrap bytes [5m])
|
||||||
|
```
|
||||||
|
|
||||||
|
### JSON Parsing
|
||||||
|
|
||||||
|
```logql
|
||||||
|
# Parse JSON and filter
|
||||||
|
{container="name"} | json | level="error"
|
||||||
|
|
||||||
|
# Extract field
|
||||||
|
{container="name"} | json | line_format "{{.message}}"
|
||||||
|
|
||||||
|
# Filter on JSON field
|
||||||
|
{container="name"} | json status_code="500"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Resource Usage
|
||||||
|
|
||||||
|
**Typical usage:**
|
||||||
|
- **Loki**: 200-500MB RAM, 1-5GB disk/week
|
||||||
|
- **Promtail**: 50-100MB RAM
|
||||||
|
- **Grafana**: 100-200MB RAM, ~100MB disk
|
||||||
|
- **Total**: ~400-700MB RAM
|
||||||
|
|
||||||
|
**For 20 containers with moderate logging**
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. ✅ Explore your logs in Grafana
|
||||||
|
2. ✅ Create useful dashboards
|
||||||
|
3. ✅ Set up alerts for critical errors
|
||||||
|
4. ⬜ Add Prometheus for metrics (future)
|
||||||
|
5. ⬜ Add Tempo for distributed tracing (future)
|
||||||
|
6. ⬜ Create log-based SLA tracking
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Loki Documentation](https://grafana.com/docs/loki/latest/)
|
||||||
|
- [LogQL Reference](https://grafana.com/docs/loki/latest/logql/)
|
||||||
|
- [Grafana Dashboards](https://grafana.com/grafana/dashboards/)
|
||||||
|
- [Community Dashboards](https://grafana.com/grafana/dashboards/?search=loki)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Now debug issues 10x faster with centralized logs!** 🔍
|
||||||
Loading…
Reference in a new issue