feat: Add Uptime Kuma for service uptime and status monitoring

2025-11-09 01:21:14 +00:00 · 2025-11-09 01:21:14 +00:00 · 07ce29affe
commit 07ce29affe
parent 7797f89fcb
5 changed files with 652 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -32,10 +32,12 @@ compose/
 │       ├── sabnzbd/    # Usenet downloader
 │       └── qbittorrent/# Torrent client
 ├── monitoring/      # Monitoring & logging
-│   └── logging/     # Centralized logging stack
+│   ├── logging/     # Centralized logging stack
-│       ├── loki/        # Log aggregation (loki.fig.systems)
+│   │   ├── loki/        # Log aggregation (loki.fig.systems)
-│       ├── promtail/    # Log collection agent
+│   │   ├── promtail/    # Log collection agent
-│       └── grafana/     # Log visualization (logs.fig.systems)
+│   │   └── grafana/     # Log visualization (logs.fig.systems)
 │   └── uptime/      # Uptime monitoring
 │       └── uptime-kuma/ # Status & uptime monitoring (status.fig.systems)
 └── services/       # Utility services
    ├── homarr/         # Dashboard (home.fig.systems)
    ├── backrest/       # Backup manager (backup.fig.systems)
@ -66,6 +68,7 @@ All services are accessible via:
 | **Monitoring** | | |
 | Grafana (Logs) | logs.fig.systems | ❌* |
 | Loki (API) | loki.fig.systems | ✅ |
 | Uptime Kuma (Status) | status.fig.systems | ❌* |
 | **Dashboard & Management** | | |
 | Homarr | home.fig.systems | ✅ |
 | Backrest | backup.fig.systems | ✅ |
@ -161,6 +164,7 @@ cd compose/services/backrest && docker compose up -d
 # Monitoring (optional but recommended)
 cd compose/monitoring/logging && docker compose up -d
 cd compose/monitoring/uptime && docker compose up -d
 cd compose/services/lubelogger && docker compose up -d
 cd compose/services/calibre-web && docker compose up -d
 cd compose/services/booklore && docker compose up -d
--- a/compose/monitoring/uptime/.env
+++ b/compose/monitoring/uptime/.env
@ -0,0 +1,10 @@
 # Uptime Kuma Configuration
 # Timezone
 TZ=America/Los_Angeles
 # Port (default: 3001, but we use Traefik so this doesn't matter)
 # UPTIME_KUMA_PORT=3001
 # Disable auto-update check (optional)
 # UPTIME_KUMA_DISABLE_FRAME_SAMEORIGIN=false
--- a/compose/monitoring/uptime/.gitignore
+++ b/compose/monitoring/uptime/.gitignore
@ -0,0 +1,5 @@
 # Uptime Kuma data
 data/
 # Keep .env.example if created
 !.env.example
--- a/compose/monitoring/uptime/README.md
+++ b/compose/monitoring/uptime/README.md
@ -0,0 +1,581 @@
 # Uptime Kuma - Status & Uptime Monitoring
 Beautiful uptime monitoring and alerting for all your homelab services.
 ## Overview
 **Uptime Kuma** monitors the health and uptime of your services:
 - ✅ **HTTP(s) Monitoring**: Check if web services are responding
 - ✅ **TCP Port Monitoring**: Check if services are listening on ports
 - ✅ **Docker Container Monitoring**: Check container status
 - ✅ **Response Time**: Measure how fast services respond
 - ✅ **SSL Certificate Monitoring**: Alert before certificates expire
 - ✅ **Status Pages**: Public or private status pages
 - ✅ **Notifications**: Email, Discord, Slack, Pushover, and 90+ more
 - ✅ **Beautiful UI**: Clean, modern interface
 ## Quick Start
 ### 1. Deploy
 ```bash
 cd ~/homelab/compose/monitoring/uptime
 docker compose up -d
 ```
 ### 2. Access Web UI
 Go to: **https://status.fig.systems**
 ### 3. Create Admin Account
 On first visit, you'll be prompted to create an admin account:
 - Username: `admin` (or your choice)
 - Password: Strong password
 - Click "Create"
 ### 4. Add Your First Monitor
 Click **"Add New Monitor"**
 **Example: Monitor Jellyfin**
 - Monitor Type: `HTTP(s)`
 - Friendly Name: `Jellyfin`
 - URL: `https://flix.fig.systems`
 - Heartbeat Interval: `60` seconds
 - Retries: `3`
 - Click **Save**
 Uptime Kuma will now check Jellyfin every 60 seconds!
 ## Monitoring Your Services
 ### Quick Setup All Services
 Here's a template for all your homelab services:
 **Core Services:**
 ```
 Name: Traefik Dashboard
 Type: HTTP(s)
 URL: https://traefik.fig.systems
 Interval: 60s
 Name: LLDAP
 Type: HTTP(s)
 URL: https://lldap.fig.systems
 Interval: 60s
 Name: Grafana Logs
 Type: HTTP(s)
 URL: https://logs.fig.systems
 Interval: 60s
 ```
 **Media Services:**
 ```
 Name: Jellyfin
 Type: HTTP(s)
 URL: https://flix.fig.systems
 Interval: 60s
 Name: Immich
 Type: HTTP(s)
 URL: https://photos.fig.systems
 Interval: 60s
 Name: Jellyseerr
 Type: HTTP(s)
 URL: https://requests.fig.systems
 Interval: 60s
 Name: Sonarr
 Type: HTTP(s)
 URL: https://sonarr.fig.systems
 Interval: 60s
 Name: Radarr
 Type: HTTP(s)
 URL: https://radarr.fig.systems
 Interval: 60s
 ```
 **Utility Services:**
 ```
 Name: Homarr Dashboard
 Type: HTTP(s)
 URL: https://home.fig.systems
 Interval: 60s
 Name: Backrest
 Type: HTTP(s)
 URL: https://backup.fig.systems
 Interval: 60s
 Name: Linkwarden
 Type: HTTP(s)
 URL: https://links.fig.systems
 Interval: 60s
 Name: Vikunja
 Type: HTTP(s)
 URL: https://tasks.fig.systems
 Interval: 60s
 ```
 ### Advanced Monitoring Options
 #### Monitor Docker Containers Directly
 **Setup:**
 1. Add New Monitor
 2. Type: **Docker Container**
 3. Docker Daemon: `unix:///var/run/docker.sock`
 4. Container Name: `jellyfin`
 5. Click Save
 **Benefits:**
 - Checks if container is running
 - Monitors container restarts
 - No network requests needed
 **Note**: Requires mounting Docker socket (already configured).
 #### Monitor TCP Ports
 **Example: Monitor PostgreSQL**
 ```
 Type: TCP Port
 Hostname: linkwarden-postgres
 Port: 5432
 Interval: 60s
 ```
 #### Check SSL Certificates
 **Automatic**: When using HTTP(s) monitors, Uptime Kuma automatically:
 - Checks SSL certificate validity
 - Alerts when certificate expires soon (7 days default)
 - Shows certificate expiry date
 #### Keyword Monitoring
 Check if a page contains specific text:
 ```
 Type: HTTP(s) - Keyword
 URL: https://home.fig.systems
 Keyword: "Homarr"  # Check page contains "Homarr"
 ```
 ## Notifications
 ### Setup Alerts
 1. Click **Settings** (gear icon)
 2. Click **Notifications**
 3. Click **Setup Notification**
 ### Popular Options
 #### Email
 ```
 Type: Email (SMTP)
 Host: smtp.gmail.com
 Port: 587
 Security: TLS
 Username: your-email@gmail.com
 Password: your-app-password
 From: alerts@yourdomain.com
 To: you@email.com
 ```
 #### Discord
 ```
 Type: Discord
 Webhook URL: https://discord.com/api/webhooks/...
 (Get from Discord Server Settings → Integrations → Webhooks)
 ```
 #### Slack
 ```
 Type: Slack
 Webhook URL: https://hooks.slack.com/services/...
 (Get from Slack App → Incoming Webhooks)
 ```
 #### Pushover (Mobile)
 ```
 Type: Pushover
 User Key: (from Pushover account)
 App Token: (create app in Pushover)
 Priority: Normal
 ```
 #### Gotify (Self-hosted)
 ```
 Type: Gotify
 Server URL: https://gotify.yourdomain.com
 App Token: (from Gotify)
 Priority: 5
 ```
 ### Apply to Monitors
 After setting up notification:
 1. Edit a monitor
 2. Scroll to **Notifications**
 3. Select your notification method
 4. Click **Save**
 Or apply to all monitors:
 1. Settings → Notifications
 2. Click **Apply on all existing monitors**
 ## Status Pages
 ### Create Public Status Page
 Perfect for showing service status to family/friends!
 **Setup:**
 1. Click **Status Pages**
 2. Click **Add New Status Page**
 3. **Slug**: `homelab` (creates /status/homelab)
 4. **Title**: `Homelab Status`
 5. **Description**: `Status of all homelab services`
 6. Click **Next**
 **Add Services:**
 1. Drag monitors into "Public" or "Groups"
 2. Organize by category (Core, Media, Utilities)
 3. Click **Save**
 **Access:**
 - Private: https://status.fig.systems/status/homelab
 - Or make public (no login required)
 **Share with family:**
 ```
 https://status.fig.systems/status/homelab
 ```
 ### Customize Status Page
 **Options:**
 - Show/hide uptime percentage
 - Show/hide response time
 - Custom domain
 - Theme (light/dark/auto)
 - Custom CSS
 - Password protection
 ## Tags and Groups
 ### Organize Monitors with Tags
 **Create Tags:**
 1. Click **Manage Tags**
 2. Add tags like:
   - `core`
   - `media`
   - `critical`
   - `production`
 **Apply to Monitors:**
 1. Edit monitor
 2. Scroll to **Tags**
 3. Select tags
 4. Save
 **Filter by Tag:**
 - Click tag name to show only those monitors
 ### Create Monitor Groups
 **Group by service type:**
 1. Settings → Groups
 2. Create groups:
   - Core Infrastructure
   - Media Services
   - Productivity
   - Monitoring
 Drag monitors into groups for organization.
 ## Maintenance Windows
 ### Schedule Maintenance
 Pause notifications during planned downtime:
 1. Edit monitor
 2. Click **Maintenance**
 3. **Add Maintenance**
 4. Set start/end time
 5. Select monitors
 6. Save
 During maintenance:
 - Monitor still checks but doesn't alert
 - Status page shows "In Maintenance"
 ## Best Practices
 ### Monitor Configuration
 **Heartbeat Interval:**
 - Critical services: 30-60 seconds
 - Normal services: 60-120 seconds
 - Background jobs: 300-600 seconds
 **Retries:**
 - Set to 2-3 to avoid false positives
 - Service must fail 2-3 times before alerting
 **Timeout:**
 - Web services: 10-30 seconds
 - APIs: 5-10 seconds
 - Slow services: 30-60 seconds
 ### What to Monitor
 **Critical (Monitor these!):**
 - ✅ Traefik (if this is down, everything is down)
 - ✅ LLDAP (SSO depends on this)
 - ✅ Core services users depend on
 **Important:**
 - ✅ Jellyfin, Immich (main media services)
 - ✅ Sonarr, Radarr (automation)
 - ✅ Backrest (backups)
 **Nice to have:**
 - ⬜ Utility services
 - ⬜ Less critical services
 **Don't over-monitor:**
 - Internal components (databases, redis, etc.)
 - These should be monitored via main service health
 ### Notification Strategy
 **Alert fatigue is real!**
 **Good approach:**
 - Critical services → Immediate push notification
 - Important services → Email
 - Nice-to-have → Email digest
 **Don't:**
 - Alert on every blip
 - Send all alerts to mobile push
 - Alert on expected downtime
 ## Integration with Loki
 Uptime Kuma and Loki complement each other:
 **Uptime Kuma:**
 - ✅ Is the service UP or DOWN?
 - ✅ How long was it down?
 - ✅ Response time trends
 **Loki:**
 - ✅ WHY did it go down?
 - ✅ What errors happened?
 - ✅ Historical log analysis
 **Workflow:**
 1. Uptime Kuma alerts you: "Jellyfin is down!"
 2. Go to Grafana/Loki
 3. Query: `{container="jellyfin"} | __timestamp__ >= now() - 15m`
 4. See what went wrong
 ## Metrics and Graphs
 ### Built-in Metrics
 Uptime Kuma tracks:
 - **Uptime %**: 99.9%, 99.5%, etc.
 - **Response Time**: Average, min, max
 - **Ping**: Latency to service
 - **Certificate Expiry**: Days until SSL expires
 ### Response Time Graph
 Click any monitor to see:
 - 24-hour response time graph
 - Uptime/downtime periods
 - Recent incidents
 ### Export Data
 Export uptime data:
 1. Settings → Backup
 2. Export JSON (includes all monitors and data)
 3. Store backup safely
 ## Troubleshooting
 ### Monitor Shows Down But Service Works
 **Check:**
 1. **SSL Certificate**: Is it valid?
 2. **SSO**: Does monitor need to login first?
 3. **Timeout**: Is timeout too short?
 4. **Network**: Can Uptime Kuma reach the service?
 **Solutions:**
 - Increase timeout
 - Check accepted status codes (200-299)
 - Verify URL is correct
 - Check Uptime Kuma logs: `docker logs uptime-kuma`
 ### Docker Container Monitor Not Working
 **Requirements:**
 - Docker socket must be mounted (✅ already configured)
 - Container name must be exact
 **Test:**
 ```bash
 docker exec uptime-kuma ls /var/run/docker.sock
 # Should show the socket file
 ```
 ### Notifications Not Sending
 **Check:**
 1. Test notification in Settings → Notifications
 2. Check Uptime Kuma logs
 3. Verify notification service credentials
 4. Check if notification is enabled on monitor
 ### Can't Access Web UI
 **Check:**
 ```bash
 # Container running?
 docker ps | grep uptime-kuma
 # Logs
 docker logs uptime-kuma
 # Traefik routing
 docker logs traefik | grep uptime
 ```
 ## Advanced Features
 ### API Access
 Uptime Kuma has a WebSocket API:
 **Get API Key:**
 1. Settings → API Keys
 2. Generate new key
 3. Use with monitoring tools
 ### Docker Socket Monitoring
 Already configured! You can monitor:
 - Container status (running/stopped)
 - Container restarts
 - Resource usage (via Docker stats)
 ### Multiple Status Pages
 Create different status pages:
 - `/status/public` - For family/friends
 - `/status/critical` - Only critical services
 - `/status/media` - Media services only
 ### Custom CSS
 Brand your status page:
 1. Status Page → Edit
 2. Custom CSS
 3. Add styling
 **Example:**
 ```css
 body {
  background: #1a1a1a;
 }
 .title {
  color: #00ff00;
 }
 ```
 ## Resource Usage
 **Typical usage:**
 - **RAM**: 50-150MB
 - **CPU**: Very low (only during checks)
 - **Disk**: <100MB
 - **Network**: Minimal (only during checks)
 **Very lightweight!**
 ## Backup and Restore
 ### Backup
 **Automatic backup:**
 1. Settings → Backup
 2. Export
 **Manual backup:**
 ```bash
 cd ~/homelab/compose/monitoring/uptime
 tar czf uptime-backup-$(date +%Y%m%d).tar.gz ./data
 ```
 ### Restore
 ```bash
 docker compose down
 tar xzf uptime-backup-YYYYMMDD.tar.gz
 docker compose up -d
 ```
 ## Comparison: Uptime Kuma vs Loki
 | Feature | Uptime Kuma | Loki |
 |---------|-------------|------|
 | **Purpose** | Uptime monitoring | Log aggregation |
 | **Checks** | HTTP, TCP, Ping, Docker | Logs only |
 | **Alerts** | Service down, slow | Log patterns |
 | **Response Time** | ✅ Yes | ❌ No |
 | **Uptime %** | ✅ Yes | ❌ No |
 | **SSL Monitoring** | ✅ Yes | ❌ No |
 | **Why Service Down** | ❌ No | ✅ Yes (via logs) |
 | **Historical Logs** | ❌ No | ✅ Yes |
 | **Status Pages** | ✅ Yes | ❌ No |
 **Use both together!**
 - Uptime Kuma tells you WHAT is down
 - Loki tells you WHY it went down
 ## Next Steps
 1. ✅ Deploy Uptime Kuma
 2. ✅ Add monitors for all services
 3. ✅ Set up notifications (Email, Discord, etc.)
 4. ✅ Create status page
 5. ✅ Test alerts by stopping a service
 6. ⬜ Share status page with family
 7. ⬜ Set up maintenance windows
 8. ⬜ Review and tune check intervals
 ## Resources
 - [Uptime Kuma GitHub](https://github.com/louislam/uptime-kuma)
 - [Uptime Kuma Wiki](https://github.com/louislam/uptime-kuma/wiki)
 - [Notification Services List](https://github.com/louislam/uptime-kuma/wiki/Notification-Services)
 ---
 **Know instantly when something goes down!** 🚨
--- a/compose/monitoring/uptime/compose.yaml
+++ b/compose/monitoring/uptime/compose.yaml
@ -0,0 +1,48 @@
 # Uptime Kuma - Status and Uptime Monitoring
 # Docs: https://github.com/louislam/uptime-kuma
 services:
  uptime-kuma:
    container_name: uptime-kuma
    image: louislam/uptime-kuma:1
    restart: unless-stopped
    env_file:
      - .env
    volumes:
      - ./data:/app/data
    networks:
      - homelab
    labels:
      # Traefik
      traefik.enable: true
      traefik.docker.network: homelab
      # Web UI
      traefik.http.routers.uptime-kuma.rule: Host(`status.fig.systems`) || Host(`status.edfig.dev`)
      traefik.http.routers.uptime-kuma.entrypoints: websecure
      traefik.http.routers.uptime-kuma.tls.certresolver: letsencrypt
      traefik.http.services.uptime-kuma.loadbalancer.server.port: 3001
      # SSO Protection (optional - Uptime Kuma has its own auth)
      # Uncomment to require SSO:
      # traefik.http.routers.uptime-kuma.middlewares: tinyauth
      # Homarr Discovery
      homarr.name: Uptime Kuma (Status)
      homarr.group: Monitoring
      homarr.icon: mdi:heart-pulse
    healthcheck:
      test: ["CMD-SHELL", "node extra/healthcheck.js"]
      interval: 60s
      timeout: 10s
      retries: 3
      start_period: 60s
 networks:
  homelab:
    external: true