homelab/compose/monitoring/uptime/README.md

# Uptime Kuma - Status & Uptime Monitoring

Beautiful uptime monitoring and alerting for all your homelab services.

## Overview

**Uptime Kuma** monitors the health and uptime of your services:

- ✅ **HTTP(s) Monitoring**: Check if web services are responding
- ✅ **TCP Port Monitoring**: Check if services are listening on ports
- ✅ **Docker Container Monitoring**: Check container status
- ✅ **Response Time**: Measure how fast services respond
- ✅ **SSL Certificate Monitoring**: Alert before certificates expire
- ✅ **Status Pages**: Public or private status pages
- ✅ **Notifications**: Email, Discord, Slack, Pushover, and 90+ more
- ✅ **Beautiful UI**: Clean, modern interface

## Quick Start

### 1. Deploy

```bash
cd ~/homelab/compose/monitoring/uptime
docker compose up -d
```

### 2. Access Web UI

Go to: **https://status.fig.systems**

### 3. Create Admin Account

On first visit, you'll be prompted to create an admin account:
- Username: `admin` (or your choice)
- Password: Strong password
- Click "Create"

### 4. Add Your First Monitor

Click **"Add New Monitor"**

**Example: Monitor Jellyfin**
- Monitor Type: `HTTP(s)`
- Friendly Name: `Jellyfin`
- URL: `https://flix.fig.systems`
- Heartbeat Interval: `60` seconds
- Retries: `3`
- Click **Save**

Uptime Kuma will now check Jellyfin every 60 seconds!

## Monitoring Your Services

### Quick Setup All Services

Here's a template for all your homelab services:

**Core Services:**
```
Name: Traefik Dashboard
Type: HTTP(s)
URL: https://traefik.fig.systems
Interval: 60s

Name: LLDAP
Type: HTTP(s)
URL: https://lldap.fig.systems
Interval: 60s

Name: Grafana Logs
Type: HTTP(s)
URL: https://logs.fig.systems
Interval: 60s
```

**Media Services:**
```
Name: Jellyfin
Type: HTTP(s)
URL: https://flix.fig.systems
Interval: 60s

Name: Immich
Type: HTTP(s)
URL: https://photos.fig.systems
Interval: 60s

Name: Jellyseerr
Type: HTTP(s)
URL: https://requests.fig.systems
Interval: 60s

Name: Sonarr
Type: HTTP(s)
URL: https://sonarr.fig.systems
Interval: 60s

Name: Radarr
Type: HTTP(s)
URL: https://radarr.fig.systems
Interval: 60s
```

**Utility Services:**
```
Name: Homarr Dashboard
Type: HTTP(s)
URL: https://home.fig.systems
Interval: 60s

Name: Backrest
Type: HTTP(s)
URL: https://backup.fig.systems
Interval: 60s

Name: Linkwarden
Type: HTTP(s)
URL: https://links.fig.systems
Interval: 60s

Name: Vikunja
Type: HTTP(s)
URL: https://tasks.fig.systems
Interval: 60s
```

### Advanced Monitoring Options

#### Monitor Docker Containers Directly

**Setup:**
1. Add New Monitor
2. Type: **Docker Container**
3. Docker Daemon: `unix:///var/run/docker.sock`
4. Container Name: `jellyfin`
5. Click Save

**Benefits:**
- Checks if container is running
- Monitors container restarts
- No network requests needed

**Note**: Requires mounting Docker socket (already configured).

#### Monitor TCP Ports

**Example: Monitor PostgreSQL**
```
Type: TCP Port
Hostname: linkwarden-postgres
Port: 5432
Interval: 60s
```

#### Check SSL Certificates

**Automatic**: When using HTTP(s) monitors, Uptime Kuma automatically:
- Checks SSL certificate validity
- Alerts when certificate expires soon (7 days default)
- Shows certificate expiry date

#### Keyword Monitoring

Check if a page contains specific text:

```
Type: HTTP(s) - Keyword
URL: https://home.fig.systems
Keyword: "Homarr"  # Check page contains "Homarr"
```

## Notifications

### Setup Alerts

1. Click **Settings** (gear icon)
2. Click **Notifications**
3. Click **Setup Notification**

### Popular Options

#### Email
```
Type: Email (SMTP)
Host: smtp.gmail.com
Port: 587
Security: TLS
Username: your-email@gmail.com
Password: your-app-password
From: alerts@yourdomain.com
To: you@email.com
```

#### Discord
```
Type: Discord
Webhook URL: https://discord.com/api/webhooks/...
(Get from Discord Server Settings → Integrations → Webhooks)
```

#### Slack
```
Type: Slack
Webhook URL: https://hooks.slack.com/services/...
(Get from Slack App → Incoming Webhooks)
```

#### Pushover (Mobile)
```
Type: Pushover
User Key: (from Pushover account)
App Token: (create app in Pushover)
Priority: Normal
```

#### Gotify (Self-hosted)
```
Type: Gotify
Server URL: https://gotify.yourdomain.com
App Token: (from Gotify)
Priority: 5
```

### Apply to Monitors

After setting up notification:
1. Edit a monitor
2. Scroll to **Notifications**
3. Select your notification method
4. Click **Save**

Or apply to all monitors:
1. Settings → Notifications
2. Click **Apply on all existing monitors**

## Status Pages

### Create Public Status Page

Perfect for showing service status to family/friends!

**Setup:**
1. Click **Status Pages**
2. Click **Add New Status Page**
3. **Slug**: `homelab` (creates /status/homelab)
4. **Title**: `Homelab Status`
5. **Description**: `Status of all homelab services`
6. Click **Next**

**Add Services:**
1. Drag monitors into "Public" or "Groups"
2. Organize by category (Core, Media, Utilities)
3. Click **Save**

**Access:**
- Private: https://status.fig.systems/status/homelab
- Or make public (no login required)

**Share with family:**
```
https://status.fig.systems/status/homelab
```

### Customize Status Page

**Options:**
- Show/hide uptime percentage
- Show/hide response time
- Custom domain
- Theme (light/dark/auto)
- Custom CSS
- Password protection

## Tags and Groups

### Organize Monitors with Tags

**Create Tags:**
1. Click **Manage Tags**
2. Add tags like:
   - `core`
   - `media`
   - `critical`
   - `production`

**Apply to Monitors:**
1. Edit monitor
2. Scroll to **Tags**
3. Select tags
4. Save

**Filter by Tag:**
- Click tag name to show only those monitors

### Create Monitor Groups

**Group by service type:**
1. Settings → Groups
2. Create groups:
   - Core Infrastructure
   - Media Services
   - Productivity
   - Monitoring

Drag monitors into groups for organization.

## Maintenance Windows

### Schedule Maintenance

Pause notifications during planned downtime:

1. Edit monitor
2. Click **Maintenance**
3. **Add Maintenance**
4. Set start/end time
5. Select monitors
6. Save

During maintenance:
- Monitor still checks but doesn't alert
- Status page shows "In Maintenance"

## Best Practices

### Monitor Configuration

**Heartbeat Interval:**
- Critical services: 30-60 seconds
- Normal services: 60-120 seconds
- Background jobs: 300-600 seconds

**Retries:**
- Set to 2-3 to avoid false positives
- Service must fail 2-3 times before alerting

**Timeout:**
- Web services: 10-30 seconds
- APIs: 5-10 seconds
- Slow services: 30-60 seconds

### What to Monitor

**Critical (Monitor these!):**
- ✅ Traefik (if this is down, everything is down)
- ✅ LLDAP (SSO depends on this)
- ✅ Core services users depend on

**Important:**
- ✅ Jellyfin, Immich (main media services)
- ✅ Sonarr, Radarr (automation)
- ✅ Backrest (backups)

**Nice to have:**
- ⬜ Utility services
- ⬜ Less critical services

**Don't over-monitor:**
- Internal components (databases, redis, etc.)
- These should be monitored via main service health

### Notification Strategy

**Alert fatigue is real!**

**Good approach:**
- Critical services → Immediate push notification
- Important services → Email
- Nice-to-have → Email digest

**Don't:**
- Alert on every blip
- Send all alerts to mobile push
- Alert on expected downtime

## Integration with Loki

Uptime Kuma and Loki complement each other:

**Uptime Kuma:**
- ✅ Is the service UP or DOWN?
- ✅ How long was it down?
- ✅ Response time trends

**Loki:**
- ✅ WHY did it go down?
- ✅ What errors happened?
- ✅ Historical log analysis

**Workflow:**
1. Uptime Kuma alerts you: "Jellyfin is down!"
2. Go to Grafana/Loki
3. Query: `{container="jellyfin"} | __timestamp__ >= now() - 15m`
4. See what went wrong

## Metrics and Graphs

### Built-in Metrics

Uptime Kuma tracks:
- **Uptime %**: 99.9%, 99.5%, etc.
- **Response Time**: Average, min, max
- **Ping**: Latency to service
- **Certificate Expiry**: Days until SSL expires

### Response Time Graph

Click any monitor to see:
- 24-hour response time graph
- Uptime/downtime periods
- Recent incidents

### Export Data

Export uptime data:
1. Settings → Backup
2. Export JSON (includes all monitors and data)
3. Store backup safely

## Troubleshooting

### Monitor Shows Down But Service Works

**Check:**
1. **SSL Certificate**: Is it valid?
2. **SSO**: Does monitor need to login first?
3. **Timeout**: Is timeout too short?
4. **Network**: Can Uptime Kuma reach the service?

**Solutions:**
- Increase timeout
- Check accepted status codes (200-299)
- Verify URL is correct
- Check Uptime Kuma logs: `docker logs uptime-kuma`

### Docker Container Monitor Not Working

**Requirements:**
- Docker socket must be mounted (✅ already configured)
- Container name must be exact

**Test:**
```bash
docker exec uptime-kuma ls /var/run/docker.sock
# Should show the socket file
```

### Notifications Not Sending

**Check:**
1. Test notification in Settings → Notifications
2. Check Uptime Kuma logs
3. Verify notification service credentials
4. Check if notification is enabled on monitor

### Can't Access Web UI

**Check:**
```bash
# Container running?
docker ps | grep uptime-kuma

# Logs
docker logs uptime-kuma

# Traefik routing
docker logs traefik | grep uptime
```

## Advanced Features

### API Access

Uptime Kuma has a WebSocket API:

**Get API Key:**
1. Settings → API Keys
2. Generate new key
3. Use with monitoring tools

### Docker Socket Monitoring

Already configured! You can monitor:
- Container status (running/stopped)
- Container restarts
- Resource usage (via Docker stats)

### Multiple Status Pages

Create different status pages:
- `/status/public` - For family/friends
- `/status/critical` - Only critical services
- `/status/media` - Media services only

### Custom CSS

Brand your status page:
1. Status Page → Edit
2. Custom CSS
3. Add styling

**Example:**
```css
body {
  background: #1a1a1a;
}
.title {
  color: #00ff00;
}
```

## Resource Usage

**Typical usage:**
- **RAM**: 50-150MB
- **CPU**: Very low (only during checks)
- **Disk**: <100MB
- **Network**: Minimal (only during checks)

**Very lightweight!**

## Backup and Restore

### Backup

**Automatic backup:**
1. Settings → Backup
2. Export

**Manual backup:**
```bash
cd ~/homelab/compose/monitoring/uptime
tar czf uptime-backup-$(date +%Y%m%d).tar.gz ./data
```

### Restore

```bash
docker compose down
tar xzf uptime-backup-YYYYMMDD.tar.gz
docker compose up -d
```

## Comparison: Uptime Kuma vs Loki

| Feature | Uptime Kuma | Loki |
|---------|-------------|------|
| **Purpose** | Uptime monitoring | Log aggregation |
| **Checks** | HTTP, TCP, Ping, Docker | Logs only |
| **Alerts** | Service down, slow | Log patterns |
| **Response Time** | ✅ Yes | ❌ No |
| **Uptime %** | ✅ Yes | ❌ No |
| **SSL Monitoring** | ✅ Yes | ❌ No |
| **Why Service Down** | ❌ No | ✅ Yes (via logs) |
| **Historical Logs** | ❌ No | ✅ Yes |
| **Status Pages** | ✅ Yes | ❌ No |

**Use both together!**
- Uptime Kuma tells you WHAT is down
- Loki tells you WHY it went down

## Next Steps

1. ✅ Deploy Uptime Kuma
2. ✅ Add monitors for all services
3. ✅ Set up notifications (Email, Discord, etc.)
4. ✅ Create status page
5. ✅ Test alerts by stopping a service
6. ⬜ Share status page with family
7. ⬜ Set up maintenance windows
8. ⬜ Review and tune check intervals

## Resources

- [Uptime Kuma GitHub](https://github.com/louislam/uptime-kuma)
- [Uptime Kuma Wiki](https://github.com/louislam/uptime-kuma/wiki)
- [Notification Services List](https://github.com/louislam/uptime-kuma/wiki/Notification-Services)

---

**Know instantly when something goes down!** 🚨