feat: Add Uptime Kuma for service uptime and status monitoring

2025-11-09 01:21:14 +00:00 · 2025-11-09 01:21:14 +00:00 · 07ce29affe
commit 07ce29affe
parent 7797f89fcb
5 changed files with 652 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -32,10 +32,12 @@ compose/
 │       ├── sabnzbd/    # Usenet downloader
 │       └── qbittorrent/# Torrent client
 ├── monitoring/      # Monitoring & logging
-│   └── logging/     # Centralized logging stack
-│       ├── loki/        # Log aggregation (loki.fig.systems)
-│       ├── promtail/    # Log collection agent
-│       └── grafana/     # Log visualization (logs.fig.systems)
+│   ├── logging/     # Centralized logging stack
+│   │   ├── loki/        # Log aggregation (loki.fig.systems)
+│   │   ├── promtail/    # Log collection agent
+│   │   └── grafana/     # Log visualization (logs.fig.systems)
+│   └── uptime/      # Uptime monitoring
+│       └── uptime-kuma/ # Status & uptime monitoring (status.fig.systems)
 └── services/       # Utility services
    ├── homarr/         # Dashboard (home.fig.systems)
    ├── backrest/       # Backup manager (backup.fig.systems)
@ -66,6 +68,7 @@ All services are accessible via:
 | **Monitoring** | | |
 | Grafana (Logs) | logs.fig.systems | ❌* |
 | Loki (API) | loki.fig.systems | ✅ |
+| Uptime Kuma (Status) | status.fig.systems | ❌* |
 | **Dashboard & Management** | | |
 | Homarr | home.fig.systems | ✅ |
 | Backrest | backup.fig.systems | ✅ |
@ -161,6 +164,7 @@ cd compose/services/backrest && docker compose up -d

 # Monitoring (optional but recommended)
 cd compose/monitoring/logging && docker compose up -d
+cd compose/monitoring/uptime && docker compose up -d
 cd compose/services/lubelogger && docker compose up -d
 cd compose/services/calibre-web && docker compose up -d
 cd compose/services/booklore && docker compose up -d
--- a/compose/monitoring/uptime/.env
+++ b/compose/monitoring/uptime/.env
@ -0,0 +1,10 @@
+# Uptime Kuma Configuration
+
+# Timezone
+TZ=America/Los_Angeles
+
+# Port (default: 3001, but we use Traefik so this doesn't matter)
+# UPTIME_KUMA_PORT=3001
+
+# Disable auto-update check (optional)
+# UPTIME_KUMA_DISABLE_FRAME_SAMEORIGIN=false
--- a/compose/monitoring/uptime/.gitignore
+++ b/compose/monitoring/uptime/.gitignore
@ -0,0 +1,5 @@
+# Uptime Kuma data
+data/
+
+# Keep .env.example if created
+!.env.example
--- a/compose/monitoring/uptime/README.md
+++ b/compose/monitoring/uptime/README.md
@ -0,0 +1,581 @@
+# Uptime Kuma - Status & Uptime Monitoring
+
+Beautiful uptime monitoring and alerting for all your homelab services.
+
+## Overview
+
+**Uptime Kuma** monitors the health and uptime of your services:
+
+- ✅ **HTTP(s) Monitoring**: Check if web services are responding
+- ✅ **TCP Port Monitoring**: Check if services are listening on ports
+- ✅ **Docker Container Monitoring**: Check container status
+- ✅ **Response Time**: Measure how fast services respond
+- ✅ **SSL Certificate Monitoring**: Alert before certificates expire
+- ✅ **Status Pages**: Public or private status pages
+- ✅ **Notifications**: Email, Discord, Slack, Pushover, and 90+ more
+- ✅ **Beautiful UI**: Clean, modern interface
+
+## Quick Start
+
+### 1. Deploy
+
+```bash
+cd ~/homelab/compose/monitoring/uptime
+docker compose up -d
+```
+
+### 2. Access Web UI
+
+Go to: **https://status.fig.systems**
+
+### 3. Create Admin Account
+
+On first visit, you'll be prompted to create an admin account:
+- Username: `admin` (or your choice)
+- Password: Strong password
+- Click "Create"
+
+### 4. Add Your First Monitor
+
+Click **"Add New Monitor"**
+
+**Example: Monitor Jellyfin**
+- Monitor Type: `HTTP(s)`
+- Friendly Name: `Jellyfin`
+- URL: `https://flix.fig.systems`
+- Heartbeat Interval: `60` seconds
+- Retries: `3`
+- Click **Save**
+
+Uptime Kuma will now check Jellyfin every 60 seconds!
+
+## Monitoring Your Services
+
+### Quick Setup All Services
+
+Here's a template for all your homelab services:
+
+**Core Services:**
+```
+Name: Traefik Dashboard
+Type: HTTP(s)
+URL: https://traefik.fig.systems
+Interval: 60s
+
+Name: LLDAP
+Type: HTTP(s)
+URL: https://lldap.fig.systems
+Interval: 60s
+
+Name: Grafana Logs
+Type: HTTP(s)
+URL: https://logs.fig.systems
+Interval: 60s
+```
+
+**Media Services:**
+```
+Name: Jellyfin
+Type: HTTP(s)
+URL: https://flix.fig.systems
+Interval: 60s
+
+Name: Immich
+Type: HTTP(s)
+URL: https://photos.fig.systems
+Interval: 60s
+
+Name: Jellyseerr
+Type: HTTP(s)
+URL: https://requests.fig.systems
+Interval: 60s
+
+Name: Sonarr
+Type: HTTP(s)
+URL: https://sonarr.fig.systems
+Interval: 60s
+
+Name: Radarr
+Type: HTTP(s)
+URL: https://radarr.fig.systems
+Interval: 60s
+```
+
+**Utility Services:**
+```
+Name: Homarr Dashboard
+Type: HTTP(s)
+URL: https://home.fig.systems
+Interval: 60s
+
+Name: Backrest
+Type: HTTP(s)
+URL: https://backup.fig.systems
+Interval: 60s
+
+Name: Linkwarden
+Type: HTTP(s)
+URL: https://links.fig.systems
+Interval: 60s
+
+Name: Vikunja
+Type: HTTP(s)
+URL: https://tasks.fig.systems
+Interval: 60s
+```
+
+### Advanced Monitoring Options
+
+#### Monitor Docker Containers Directly
+
+**Setup:**
+1. Add New Monitor
+2. Type: **Docker Container**
+3. Docker Daemon: `unix:///var/run/docker.sock`
+4. Container Name: `jellyfin`
+5. Click Save
+
+**Benefits:**
+- Checks if container is running
+- Monitors container restarts
+- No network requests needed
+
+**Note**: Requires mounting Docker socket (already configured).
+
+#### Monitor TCP Ports
+
+**Example: Monitor PostgreSQL**
+```
+Type: TCP Port
+Hostname: linkwarden-postgres
+Port: 5432
+Interval: 60s
+```
+
+#### Check SSL Certificates
+
+**Automatic**: When using HTTP(s) monitors, Uptime Kuma automatically:
+- Checks SSL certificate validity
+- Alerts when certificate expires soon (7 days default)
+- Shows certificate expiry date
+
+#### Keyword Monitoring
+
+Check if a page contains specific text:
+
+```
+Type: HTTP(s) - Keyword
+URL: https://home.fig.systems
+Keyword: "Homarr"  # Check page contains "Homarr"
+```
+
+## Notifications
+
+### Setup Alerts
+
+1. Click **Settings** (gear icon)
+2. Click **Notifications**
+3. Click **Setup Notification**
+
+### Popular Options
+
+#### Email
+```
+Type: Email (SMTP)
+Host: smtp.gmail.com
+Port: 587
+Security: TLS
+Username: your-email@gmail.com
+Password: your-app-password
+From: alerts@yourdomain.com
+To: you@email.com
+```
+
+#### Discord
+```
+Type: Discord
+Webhook URL: https://discord.com/api/webhooks/...
+(Get from Discord Server Settings → Integrations → Webhooks)
+```
+
+#### Slack
+```
+Type: Slack
+Webhook URL: https://hooks.slack.com/services/...
+(Get from Slack App → Incoming Webhooks)
+```
+
+#### Pushover (Mobile)
+```
+Type: Pushover
+User Key: (from Pushover account)
+App Token: (create app in Pushover)
+Priority: Normal
+```
+
+#### Gotify (Self-hosted)
+```
+Type: Gotify
+Server URL: https://gotify.yourdomain.com
+App Token: (from Gotify)
+Priority: 5
+```
+
+### Apply to Monitors
+
+After setting up notification:
+1. Edit a monitor
+2. Scroll to **Notifications**
+3. Select your notification method
+4. Click **Save**
+
+Or apply to all monitors:
+1. Settings → Notifications
+2. Click **Apply on all existing monitors**
+
+## Status Pages
+
+### Create Public Status Page
+
+Perfect for showing service status to family/friends!
+
+**Setup:**
+1. Click **Status Pages**
+2. Click **Add New Status Page**
+3. **Slug**: `homelab` (creates /status/homelab)
+4. **Title**: `Homelab Status`
+5. **Description**: `Status of all homelab services`
+6. Click **Next**
+
+**Add Services:**
+1. Drag monitors into "Public" or "Groups"
+2. Organize by category (Core, Media, Utilities)
+3. Click **Save**
+
+**Access:**
+- Private: https://status.fig.systems/status/homelab
+- Or make public (no login required)
+
+**Share with family:**
+```
+https://status.fig.systems/status/homelab
+```
+
+### Customize Status Page
+
+**Options:**
+- Show/hide uptime percentage
+- Show/hide response time
+- Custom domain
+- Theme (light/dark/auto)
+- Custom CSS
+- Password protection
+
+## Tags and Groups
+
+### Organize Monitors with Tags
+
+**Create Tags:**
+1. Click **Manage Tags**
+2. Add tags like:
+   - `core`
+   - `media`
+   - `critical`
+   - `production`
+
+**Apply to Monitors:**
+1. Edit monitor
+2. Scroll to **Tags**
+3. Select tags
+4. Save
+
+**Filter by Tag:**
+- Click tag name to show only those monitors
+
+### Create Monitor Groups
+
+**Group by service type:**
+1. Settings → Groups
+2. Create groups:
+   - Core Infrastructure
+   - Media Services
+   - Productivity
+   - Monitoring
+
+Drag monitors into groups for organization.
+
+## Maintenance Windows
+
+### Schedule Maintenance
+
+Pause notifications during planned downtime:
+
+1. Edit monitor
+2. Click **Maintenance**
+3. **Add Maintenance**
+4. Set start/end time
+5. Select monitors
+6. Save
+
+During maintenance:
+- Monitor still checks but doesn't alert
+- Status page shows "In Maintenance"
+
+## Best Practices
+
+### Monitor Configuration
+
+**Heartbeat Interval:**
+- Critical services: 30-60 seconds
+- Normal services: 60-120 seconds
+- Background jobs: 300-600 seconds
+
+**Retries:**
+- Set to 2-3 to avoid false positives
+- Service must fail 2-3 times before alerting
+
+**Timeout:**
+- Web services: 10-30 seconds
+- APIs: 5-10 seconds
+- Slow services: 30-60 seconds
+
+### What to Monitor
+
+**Critical (Monitor these!):**
+- ✅ Traefik (if this is down, everything is down)
+- ✅ LLDAP (SSO depends on this)
+- ✅ Core services users depend on
+
+**Important:**
+- ✅ Jellyfin, Immich (main media services)
+- ✅ Sonarr, Radarr (automation)
+- ✅ Backrest (backups)
+
+**Nice to have:**
+- ⬜ Utility services
+- ⬜ Less critical services
+
+**Don't over-monitor:**
+- Internal components (databases, redis, etc.)
+- These should be monitored via main service health
+
+### Notification Strategy
+
+**Alert fatigue is real!**
+
+**Good approach:**
+- Critical services → Immediate push notification
+- Important services → Email
+- Nice-to-have → Email digest
+
+**Don't:**
+- Alert on every blip
+- Send all alerts to mobile push
+- Alert on expected downtime
+
+## Integration with Loki
+
+Uptime Kuma and Loki complement each other:
+
+**Uptime Kuma:**
+- ✅ Is the service UP or DOWN?
+- ✅ How long was it down?
+- ✅ Response time trends
+
+**Loki:**
+- ✅ WHY did it go down?
+- ✅ What errors happened?
+- ✅ Historical log analysis
+
+**Workflow:**
+1. Uptime Kuma alerts you: "Jellyfin is down!"
+2. Go to Grafana/Loki
+3. Query: `{container="jellyfin"} | __timestamp__ >= now() - 15m`
+4. See what went wrong
+
+## Metrics and Graphs
+
+### Built-in Metrics
+
+Uptime Kuma tracks:
+- **Uptime %**: 99.9%, 99.5%, etc.
+- **Response Time**: Average, min, max
+- **Ping**: Latency to service
+- **Certificate Expiry**: Days until SSL expires
+
+### Response Time Graph
+
+Click any monitor to see:
+- 24-hour response time graph
+- Uptime/downtime periods
+- Recent incidents
+
+### Export Data
+
+Export uptime data:
+1. Settings → Backup
+2. Export JSON (includes all monitors and data)
+3. Store backup safely
+
+## Troubleshooting
+
+### Monitor Shows Down But Service Works
+
+**Check:**
+1. **SSL Certificate**: Is it valid?
+2. **SSO**: Does monitor need to login first?
+3. **Timeout**: Is timeout too short?
+4. **Network**: Can Uptime Kuma reach the service?
+
+**Solutions:**
+- Increase timeout
+- Check accepted status codes (200-299)
+- Verify URL is correct
+- Check Uptime Kuma logs: `docker logs uptime-kuma`
+
+### Docker Container Monitor Not Working
+
+**Requirements:**
+- Docker socket must be mounted (✅ already configured)
+- Container name must be exact
+
+**Test:**
+```bash
+docker exec uptime-kuma ls /var/run/docker.sock
+# Should show the socket file
+```
+
+### Notifications Not Sending
+
+**Check:**
+1. Test notification in Settings → Notifications
+2. Check Uptime Kuma logs
+3. Verify notification service credentials
+4. Check if notification is enabled on monitor
+
+### Can't Access Web UI
+
+**Check:**
+```bash
+# Container running?
+docker ps | grep uptime-kuma
+
+# Logs
+docker logs uptime-kuma
+
+# Traefik routing
+docker logs traefik | grep uptime
+```
+
+## Advanced Features
+
+### API Access
+
+Uptime Kuma has a WebSocket API:
+
+**Get API Key:**
+1. Settings → API Keys
+2. Generate new key
+3. Use with monitoring tools
+
+### Docker Socket Monitoring
+
+Already configured! You can monitor:
+- Container status (running/stopped)
+- Container restarts
+- Resource usage (via Docker stats)
+
+### Multiple Status Pages
+
+Create different status pages:
+- `/status/public` - For family/friends
+- `/status/critical` - Only critical services
+- `/status/media` - Media services only
+
+### Custom CSS
+
+Brand your status page:
+1. Status Page → Edit
+2. Custom CSS
+3. Add styling
+
+**Example:**
+```css
+body {
+  background: #1a1a1a;
+}
+.title {
+  color: #00ff00;
+}
+```
+
+## Resource Usage
+
+**Typical usage:**
+- **RAM**: 50-150MB
+- **CPU**: Very low (only during checks)
+- **Disk**: <100MB
+- **Network**: Minimal (only during checks)
+
+**Very lightweight!**
+
+## Backup and Restore
+
+### Backup
+
+**Automatic backup:**
+1. Settings → Backup
+2. Export
+
+**Manual backup:**
+```bash
+cd ~/homelab/compose/monitoring/uptime
+tar czf uptime-backup-$(date +%Y%m%d).tar.gz ./data
+```
+
+### Restore
+
+```bash
+docker compose down
+tar xzf uptime-backup-YYYYMMDD.tar.gz
+docker compose up -d
+```
+
+## Comparison: Uptime Kuma vs Loki
+
+| Feature | Uptime Kuma | Loki |
+|---------|-------------|------|
+| **Purpose** | Uptime monitoring | Log aggregation |
+| **Checks** | HTTP, TCP, Ping, Docker | Logs only |
+| **Alerts** | Service down, slow | Log patterns |
+| **Response Time** | ✅ Yes | ❌ No |
+| **Uptime %** | ✅ Yes | ❌ No |
+| **SSL Monitoring** | ✅ Yes | ❌ No |
+| **Why Service Down** | ❌ No | ✅ Yes (via logs) |
+| **Historical Logs** | ❌ No | ✅ Yes |
+| **Status Pages** | ✅ Yes | ❌ No |
+
+**Use both together!**
+- Uptime Kuma tells you WHAT is down
+- Loki tells you WHY it went down
+
+## Next Steps
+
+1. ✅ Deploy Uptime Kuma
+2. ✅ Add monitors for all services
+3. ✅ Set up notifications (Email, Discord, etc.)
+4. ✅ Create status page
+5. ✅ Test alerts by stopping a service
+6. ⬜ Share status page with family
+7. ⬜ Set up maintenance windows
+8. ⬜ Review and tune check intervals
+
+## Resources
+
+- [Uptime Kuma GitHub](https://github.com/louislam/uptime-kuma)
+- [Uptime Kuma Wiki](https://github.com/louislam/uptime-kuma/wiki)
+- [Notification Services List](https://github.com/louislam/uptime-kuma/wiki/Notification-Services)
+
+---
+
+**Know instantly when something goes down!** 🚨
--- a/compose/monitoring/uptime/compose.yaml
+++ b/compose/monitoring/uptime/compose.yaml
@ -0,0 +1,48 @@
+# Uptime Kuma - Status and Uptime Monitoring
+# Docs: https://github.com/louislam/uptime-kuma
+
+services:
+  uptime-kuma:
+    container_name: uptime-kuma
+    image: louislam/uptime-kuma:1
+    restart: unless-stopped
+
+    env_file:
+      - .env
+
+    volumes:
+      - ./data:/app/data
+
+    networks:
+      - homelab
+
+    labels:
+      # Traefik
+      traefik.enable: true
+      traefik.docker.network: homelab
+
+      # Web UI
+      traefik.http.routers.uptime-kuma.rule: Host(`status.fig.systems`) || Host(`status.edfig.dev`)
+      traefik.http.routers.uptime-kuma.entrypoints: websecure
+      traefik.http.routers.uptime-kuma.tls.certresolver: letsencrypt
+      traefik.http.services.uptime-kuma.loadbalancer.server.port: 3001
+
+      # SSO Protection (optional - Uptime Kuma has its own auth)
+      # Uncomment to require SSO:
+      # traefik.http.routers.uptime-kuma.middlewares: tinyauth
+
+      # Homarr Discovery
+      homarr.name: Uptime Kuma (Status)
+      homarr.group: Monitoring
+      homarr.icon: mdi:heart-pulse
+
+    healthcheck:
+      test: ["CMD-SHELL", "node extra/healthcheck.js"]
+      interval: 60s
+      timeout: 10s
+      retries: 3
+      start_period: 60s
+
+networks:
+  homelab:
+    external: true