diff --git a/README.md b/README.md index 67ceac6..964a9c1 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,23 @@ This repository contains Docker Compose configurations for self-hosted home services. +## 💻 Hardware Specifications + +- **Host**: Proxmox VE 9 (Debian 13) + - CPU: AMD Ryzen 5 7600X (6 cores, 12 threads, up to 5.3 GHz) + - GPU: NVIDIA GeForce GTX 1070 (8GB VRAM) + - RAM: 32GB DDR5 + +- **VM**: AlmaLinux 9.6 (RHEL 9 compatible) + - CPU: 8 vCPUs + - RAM: 24GB + - Storage: 500GB+ (expandable) + - GPU: GTX 1070 (PCIe passthrough) + +**Documentation:** +- [Complete Architecture Guide](docs/architecture.md) - Integration, networking, logging, GPU setup +- [AlmaLinux VM Setup](docs/setup/almalinux-vm.md) - Full installation and configuration guide + ## 🏗️ Infrastructure ### Core Services (Port 80/443) @@ -199,9 +216,21 @@ Each service has its own `.env` file where applicable. Key files to review: - `core/lldap/.env` - LDAP configuration and admin credentials - `core/tinyauth/.env` - LDAP connection and session settings - `media/frontend/immich/.env` - Photo management configuration -- `services/linkwarden/.env` - Bookmark manager settings +- `services/karakeep/.env` - AI-powered bookmark manager +- `services/ollama/.env` - Local LLM configuration - `services/microbin/.env` - Pastebin configuration +**Example Configuration Files:** +Several services include `.example` config files for reference: +- `media/automation/sonarr/config.xml.example` +- `media/automation/radarr/config.xml.example` +- `media/automation/sabnzbd/sabnzbd.ini.example` +- `media/automation/qbittorrent/qBittorrent.conf.example` +- `services/vikunja/config.yml.example` +- `services/FreshRSS/config.php.example` + +Copy these to the appropriate location (usually `./config/`) and customize as needed. + ## 🔧 Maintenance ### Viewing Logs @@ -241,6 +270,83 @@ Important data locations: 2. Check LLDAP connection in tinyauth logs 3. Verify LDAP bind credentials match in both services +### GPU not detected +1. Check GPU passthrough: `lspci | grep -i nvidia` +2. Verify drivers: `nvidia-smi` +3. Test in container: `docker exec ollama nvidia-smi` +4. See [AlmaLinux VM Setup](docs/setup/almalinux-vm.md) for GPU configuration + +## 📊 Monitoring & Logging + +### Centralized Logging (Loki + Promtail + Grafana) + +All container logs are automatically collected and stored in Loki: + +**Access Grafana**: https://logs.fig.systems + +**Query examples:** +```logql +# View logs for specific service +{container="sonarr"} + +# Filter by log level +{container="radarr"} |= "ERROR" + +# Multiple services +{container=~"sonarr|radarr"} + +# Search with JSON parsing +{container="karakeep"} |= "ollama" | json +``` + +**Retention**: 30 days (configurable in `compose/monitoring/logging/loki-config.yaml`) + +### Uptime Monitoring (Uptime Kuma) + +Monitor service availability and performance: + +**Access Uptime Kuma**: https://status.fig.systems + +**Features:** +- HTTP(s) monitoring for all web services +- Docker container health checks +- SSL certificate expiration alerts +- Public/private status pages +- 90+ notification integrations (Discord, Slack, Email, etc.) + +### Service Integration + +**How services integrate:** + +``` +Traefik (Reverse Proxy) + ├─→ All services (SSL + routing) + └─→ Let's Encrypt (certificates) + +Tinyauth (SSO) + ├─→ LLDAP (user authentication) + └─→ Protected services (authorization) + +Promtail (Log Collection) + ├─→ Docker socket (all containers) + └─→ Loki (log storage) + +Loki (Log Storage) + └─→ Grafana (visualization) + +Karakeep (Bookmarks) + ├─→ Ollama (AI tagging) + ├─→ Meilisearch (search) + └─→ Chrome (web archiving) + +Sonarr/Radarr (Media Automation) + ├─→ SABnzbd/qBittorrent (downloads) + ├─→ Jellyfin (media library) + └─→ Recyclarr/Profilarr (quality management) +``` + +See [Architecture Guide](docs/architecture.md) for complete integration details. + ## 📄 License This is a personal homelab configuration. Use at your own risk. diff --git a/compose/services/booklore/compose.yaml b/compose/services/booklore/compose.yaml index e065fbf..2f94e79 100644 --- a/compose/services/booklore/compose.yaml +++ b/compose/services/booklore/compose.yaml @@ -5,7 +5,36 @@ services: booklore: container_name: booklore image: ghcr.io/lorebooks/booklore:latest + restart: unless-stopped env_file: - - .env + + volumes: + - ./data:/app/data + + networks: + - homelab + + labels: + # Traefik + traefik.enable: true + traefik.docker.network: homelab + + # Web UI + traefik.http.routers.booklore.rule: Host(`booklore.fig.systems`) || Host(`booklore.edfig.dev`) + traefik.http.routers.booklore.entrypoints: websecure + traefik.http.routers.booklore.tls.certresolver: letsencrypt + traefik.http.services.booklore.loadbalancer.server.port: 3000 + + # SSO Protection + traefik.http.routers.booklore.middlewares: tinyauth + + # Homarr Discovery + homarr.name: Booklore + homarr.group: Services + homarr.icon: mdi:book-open-variant + +networks: + homelab: + external: true diff --git a/compose/services/microbin/compose.yaml b/compose/services/microbin/compose.yaml index 395a08d..694c679 100644 --- a/compose/services/microbin/compose.yaml +++ b/compose/services/microbin/compose.yaml @@ -5,17 +5,36 @@ services: microbin: container_name: microbin image: danielszabo99/microbin:latest - env_file: .env + restart: unless-stopped + + env_file: + - .env + + volumes: + - ./data:/app/data + + networks: - homelab + labels: + # Traefik traefik.enable: true + traefik.docker.network: homelab + + # Web UI traefik.http.routers.microbin.rule: Host(`paste.fig.systems`) || Host(`paste.edfig.dev`) traefik.http.routers.microbin.entrypoints: websecure traefik.http.routers.microbin.tls.certresolver: letsencrypt traefik.http.services.microbin.loadbalancer.server.port: 8080 + # Note: MicroBin has its own auth, SSO disabled by default # traefik.http.routers.microbin.middlewares: tinyauth + # Homarr Discovery + homarr.name: MicroBin + homarr.group: Services + homarr.icon: mdi:content-paste + networks: homelab: external: true diff --git a/compose/services/rsshub/compose.yaml b/compose/services/rsshub/compose.yaml index 429fe85..6876873 100644 --- a/compose/services/rsshub/compose.yaml +++ b/compose/services/rsshub/compose.yaml @@ -6,7 +6,36 @@ services: container_name: rsshub # Using chromium-bundled image for full puppeteer support image: diygod/rsshub:chromium-bundled + restart: unless-stopped env_file: - - .env + + volumes: + - ./data:/app/data + + networks: + - homelab + + labels: + # Traefik + traefik.enable: true + traefik.docker.network: homelab + + # Web UI + traefik.http.routers.rsshub.rule: Host(`rsshub.fig.systems`) || Host(`rsshub.edfig.dev`) + traefik.http.routers.rsshub.entrypoints: websecure + traefik.http.routers.rsshub.tls.certresolver: letsencrypt + traefik.http.services.rsshub.loadbalancer.server.port: 1200 + + # Note: RSSHub is public by design, SSO disabled + # traefik.http.routers.rsshub.middlewares: tinyauth + + # Homarr Discovery + homarr.name: RSSHub + homarr.group: Services + homarr.icon: mdi:rss-box + +networks: + homelab: + external: true diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..853dc15 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,648 @@ +# Homelab Architecture & Integration + +Complete integration guide for the homelab setup on AlmaLinux 9.6. + +## 🖥️ Hardware Specifications + +### Host System +- **Hypervisor**: Proxmox VE 9 (Debian 13 based) +- **CPU**: AMD Ryzen 5 7600X (6 cores, 12 threads, up to 5.3 GHz) +- **GPU**: NVIDIA GeForce GTX 1070 (8GB VRAM, 1920 CUDA cores) +- **RAM**: 32GB DDR5 + +### VM Configuration +- **OS**: AlmaLinux 9.6 (RHEL 9 compatible) +- **CPU**: 8 vCPUs (allocated from host) +- **RAM**: 24GB (leaving 8GB for host) +- **Storage**: 500GB+ (adjust based on media library size) +- **GPU**: GTX 1070 (PCIe passthrough from Proxmox) + +## 🏗️ Architecture Overview + +### Network Architecture + +``` +Internet + ↓ +[Router/Firewall] + ↓ (Port 80/443) +[Traefik Reverse Proxy] + ↓ +┌──────────────────────────────────────┐ +│ homelab network │ +│ (Docker bridge - 172.18.0.0/16) │ +│ │ +│ ┌─────────────┐ ┌──────────────┐ │ +│ │ Core │ │ Media │ │ +│ │ - Traefik │ │ - Jellyfin │ │ +│ │ - LLDAP │ │ - Sonarr │ │ +│ │ - Tinyauth │ │ - Radarr │ │ +│ └─────────────┘ └──────────────┘ │ +│ │ +│ ┌─────────────┐ ┌──────────────┐ │ +│ │ Services │ │ Monitoring │ │ +│ │ - Karakeep │ │ - Loki │ │ +│ │ - Ollama │ │ - Promtail │ │ +│ │ - Vikunja │ │ - Grafana │ │ +│ └─────────────┘ └──────────────┘ │ +└──────────────────────────────────────┘ + ↓ + [Promtail Agent] + ↓ + [Loki Storage] +``` + +### Service Internal Networks + +Services with databases use isolated internal networks: + +``` +karakeep +├── homelab (external traffic) +└── karakeep_internal + ├── karakeep (app) + ├── karakeep-chrome (browser) + └── karakeep-meilisearch (search) + +vikunja +├── homelab (external traffic) +└── vikunja_internal + ├── vikunja (app) + └── vikunja-db (postgres) + +monitoring/logging +├── homelab (external traffic) +└── logging_internal + ├── loki (storage) + ├── promtail (collector) + └── grafana (UI) +``` + +## 🔐 Security Architecture + +### Authentication Flow + +``` +User Request + ↓ +[Traefik] → Check route rules + ↓ +[Tinyauth Middleware] → Forward Auth + ↓ +[LLDAP] → Verify credentials + ↓ +[Backend Service] → Authorized access +``` + +### SSL/TLS + +- **Certificate Provider**: Let's Encrypt +- **Challenge Type**: HTTP-01 (ports 80/443) +- **Automatic Renewal**: Via Traefik +- **Domains**: + - Primary: `*.fig.systems` + - Fallback: `*.edfig.dev` + +### SSO Protection + +**Protected Services** (require authentication): +- Traefik Dashboard +- LLDAP +- Sonarr, Radarr, SABnzbd, qBittorrent +- Profilarr, Recyclarr (monitoring) +- Homarr, Backrest +- Karakeep, Vikunja, LubeLogger +- Calibre-web, Booklore, FreshRSS, File Browser +- Loki API, Ollama API + +**Unprotected Services** (own authentication): +- Tinyauth (SSO provider itself) +- Jellyfin (own user system) +- Jellyseerr (linked to Jellyfin) +- Immich (own user system) +- RSSHub (public feed generator) +- MicroBin (public pastebin) +- Grafana (own authentication) +- Uptime Kuma (own authentication) + +## 📊 Logging Architecture + +### Centralized Logging with Loki + +All services forward logs to Loki via Promtail: + +``` +[Docker Container] → stdout/stderr + ↓ +[Docker Socket] → /var/run/docker.sock + ↓ +[Promtail] → Scrapes logs via Docker API + ↓ +[Loki] → Stores and indexes logs + ↓ +[Grafana] → Query and visualize +``` + +### Log Labels + +Promtail automatically adds labels to all logs: +- `container`: Container name +- `compose_project`: Docker Compose project +- `compose_service`: Service name from compose +- `image`: Docker image name +- `stream`: stdout or stderr + +### Log Retention + +- **Default**: 30 days +- **Storage**: `compose/monitoring/logging/loki-data/` +- **Automatic cleanup**: Enabled via Loki compactor + +### Querying Logs + +**View all logs for a service:** +```logql +{container="sonarr"} +``` + +**Filter by log level:** +```logql +{container="radarr"} |= "ERROR" +``` + +**Multiple services:** +```logql +{container=~"sonarr|radarr"} +``` + +**Time range with filters:** +```logql +{container="karakeep"} |= "ollama" | json +``` + +## 🌐 Network Configuration + +### Docker Networks + +**homelab** (external bridge): +- Type: External bridge network +- Subnet: Auto-assigned by Docker +- Purpose: Inter-service communication + Traefik routing +- Create: `docker network create homelab` + +**Service-specific internal networks**: +- `karakeep_internal`: Karakeep + Chrome + Meilisearch +- `vikunja_internal`: Vikunja + PostgreSQL +- `logging_internal`: Loki + Promtail + Grafana +- etc. + +### Port Mappings + +**External Ports** (exposed to host): +- `80/tcp`: HTTP (Traefik) - redirects to HTTPS +- `443/tcp`: HTTPS (Traefik) +- `6881/tcp+udp`: BitTorrent (qBittorrent) + +**No other ports exposed** - all access via Traefik reverse proxy. + +## 🔧 Traefik Integration + +### Standard Traefik Labels + +All services use consistent Traefik labels: + +```yaml +labels: + # Enable Traefik + traefik.enable: true + traefik.docker.network: homelab + + # Router configuration + traefik.http.routers..rule: Host(`.fig.systems`) || Host(`.edfig.dev`) + traefik.http.routers..entrypoints: websecure + traefik.http.routers..tls.certresolver: letsencrypt + + # Service configuration (backend port) + traefik.http.services..loadbalancer.server.port: + + # SSO middleware (if protected) + traefik.http.routers..middlewares: tinyauth + + # Homarr auto-discovery + homarr.name: + homarr.group: + homarr.icon: mdi: +``` + +### Middleware + +**tinyauth** - Forward authentication: +```yaml +# Defined in traefik/compose.yaml +middlewares: + tinyauth: + forwardAuth: + address: http://tinyauth:8080 + trustForwardHeader: true +``` + +## 💾 Volume Management + +### Volume Types + +**Bind Mounts** (host directories): +```yaml +volumes: + - ./data:/data # Service data + - ./config:/config # Configuration files + - /media:/media # Media library (shared) +``` + +**Named Volumes** (Docker-managed): +```yaml +volumes: + - loki-data:/loki # Loki storage + - postgres-data:/var/lib/postgresql/data +``` + +### Media Directory Structure + +``` +/media/ +├── tv/ # TV shows (Sonarr → Jellyfin) +├── movies/ # Movies (Radarr → Jellyfin) +├── music/ # Music +├── photos/ # Photos (Immich) +├── books/ # Ebooks (Calibre-web) +├── audiobooks/ # Audiobooks +├── comics/ # Comics +├── homemovies/ # Home videos +├── downloads/ # Active downloads (SABnzbd/qBittorrent) +├── complete/ # Completed downloads +└── incomplete/ # In-progress downloads +``` + +### Backup Strategy + +**Important directories to backup:** +``` +compose/core/lldap/data/ # User directory +compose/core/traefik/letsencrypt/ # SSL certificates +compose/services/*/config/ # Service configurations +compose/services/*/data/ # Service data +compose/monitoring/logging/loki-data/ # Logs (optional) +/media/ # Media library +``` + +**Excluded from backups:** +``` +compose/services/*/db/ # Databases (backup via dump) +compose/monitoring/logging/loki-data/ # Logs (can be recreated) +/media/downloads/ # Temporary downloads +/media/incomplete/ # Incomplete downloads +``` + +## 🎮 GPU Acceleration + +### NVIDIA GTX 1070 Configuration + +**GPU Passthrough (Proxmox → VM):** + +1. **Proxmox host** (`/etc/pve/nodes//qemu-server/.conf`): +``` +hostpci0: 0000:01:00,pcie=1,x-vga=1 +``` + +2. **VM (AlmaLinux)** - Install NVIDIA drivers: +```bash +# Add NVIDIA repository +sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo + +# Install drivers +sudo dnf install nvidia-driver nvidia-settings + +# Verify +nvidia-smi +``` + +3. **Docker** - Install NVIDIA Container Toolkit: +```bash +# Add NVIDIA Container Toolkit repo +sudo dnf config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo + +# Install toolkit +sudo dnf install nvidia-container-toolkit + +# Configure Docker +sudo nvidia-ctk runtime configure --runtime=docker +sudo systemctl restart docker + +# Verify +docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi +``` + +### Services Using GPU + +**Jellyfin** (Hardware transcoding): +```yaml +# Uncomment in compose.yaml +devices: + - /dev/dri:/dev/dri # For NVENC/NVDEC +environment: + - NVIDIA_VISIBLE_DEVICES=all + - NVIDIA_DRIVER_CAPABILITIES=all +``` + +**Immich** (AI features): +```yaml +# Already configured +deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] +``` + +**Ollama** (LLM inference): +```yaml +# Uncomment in compose.yaml +deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] +``` + +### GPU Performance Tuning + +**For Ryzen 5 7600X + GTX 1070:** + +- **Jellyfin**: Can transcode 4-6 simultaneous 4K → 1080p streams +- **Ollama**: + - 3B models: 40-60 tokens/sec + - 7B models: 20-35 tokens/sec + - 13B models: 10-15 tokens/sec (quantized) +- **Immich**: AI tagging ~5-10 images/sec + +## 🚀 Resource Allocation + +### CPU Allocation (Ryzen 5 7600X - 6C/12T) + +**High Priority** (4-6 cores): +- Jellyfin (transcoding) +- Sonarr/Radarr (media processing) +- Ollama (when running) + +**Medium Priority** (2-4 cores): +- Immich (AI processing) +- Karakeep (bookmark processing) +- SABnzbd/qBittorrent (downloads) + +**Low Priority** (1-2 cores): +- Traefik, LLDAP, Tinyauth +- Monitoring services +- Other utilities + +### RAM Allocation (32GB Total, 24GB VM) + +**Recommended allocation:** + +``` +Host (Proxmox): 8GB +VM Total: 24GB breakdown: + ├── System: 4GB (AlmaLinux base) + ├── Docker: 2GB (daemon overhead) + ├── Jellyfin: 2-4GB (transcoding buffers) + ├── Immich: 2-3GB (ML models + database) + ├── Sonarr/Radarr: 1GB each + ├── Ollama: 4-6GB (when running models) + ├── Databases: 2-3GB total + ├── Monitoring: 2GB (Loki + Grafana) + └── Other services: 4-5GB +``` + +### Disk Space Planning + +**System:** 100GB +**Docker:** 50GB (images + containers) +**Service Data:** 50GB (configs, databases, logs) +**Media Library:** Remaining space (expandable) + +**Recommended VM disk:** +- Minimum: 500GB (200GB system + 300GB media) +- Recommended: 1TB+ (allows room for growth) + +## 🔄 Service Dependencies + +### Startup Order + +**Critical order for initial deployment:** + +1. **Networks**: `docker network create homelab` +2. **Core** (must start first): + - Traefik (reverse proxy) + - LLDAP (user directory) + - Tinyauth (SSO provider) +3. **Monitoring** (optional but recommended): + - Loki + Promtail + Grafana + - Uptime Kuma +4. **Media Automation**: + - Sonarr, Radarr + - SABnzbd, qBittorrent + - Recyclarr, Profilarr +5. **Media Frontend**: + - Jellyfin + - Jellyseer + - Immich +6. **Services**: + - Karakeep, Ollama (AI features) + - Vikunja, Homarr + - All other services + +### Service Integration Map + +``` +Traefik + ├─→ All services (reverse proxy) + └─→ Let's Encrypt (SSL) + +Tinyauth + ├─→ LLDAP (authentication backend) + └─→ All SSO-protected services + +LLDAP + └─→ User database for SSO + +Promtail + ├─→ Docker socket (log collection) + └─→ Loki (log forwarding) + +Loki + └─→ Grafana (log visualization) + +Karakeep + ├─→ Ollama (AI tagging) + ├─→ Meilisearch (search) + └─→ Chrome (web archiving) + +Jellyseer + ├─→ Jellyfin (media info) + ├─→ Sonarr (TV requests) + └─→ Radarr (movie requests) + +Sonarr/Radarr + ├─→ SABnzbd/qBittorrent (downloads) + ├─→ Jellyfin (media library) + └─→ Recyclarr/Profilarr (quality profiles) + +Homarr + └─→ All services (dashboard auto-discovery) +``` + +## 🐛 Troubleshooting + +### Check Service Health + +```bash +# All services status +cd ~/homelab +docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" + +# Logs for specific service +docker logs --tail 100 -f + +# Logs via Loki/Grafana +# Go to https://logs.fig.systems +# Query: {container=""} +``` + +### Network Issues + +```bash +# Check homelab network exists +docker network ls | grep homelab + +# Inspect network +docker network inspect homelab + +# Test service connectivity +docker exec ping +docker exec karakeep curl http://ollama:11434 +``` + +### GPU Not Detected + +```bash +# Check GPU in VM +nvidia-smi + +# Check Docker can access GPU +docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi + +# Check service GPU allocation +docker exec jellyfin nvidia-smi +docker exec ollama nvidia-smi +``` + +### SSL Certificate Issues + +```bash +# Check Traefik logs +docker logs traefik | grep -i certificate + +# Force certificate renewal +docker exec traefik rm -rf /letsencrypt/acme.json +docker restart traefik + +# Verify DNS +dig +short sonarr.fig.systems +``` + +### SSO Not Working + +```bash +# Check Tinyauth status +docker logs tinyauth + +# Check LLDAP connection +docker exec tinyauth nc -zv lldap 3890 +docker exec tinyauth nc -zv lldap 17170 + +# Verify credentials match +grep LDAP_BIND_PASSWORD compose/core/tinyauth/.env +grep LLDAP_LDAP_USER_PASS compose/core/lldap/.env +``` + +## 📈 Monitoring Best Practices + +### Key Metrics to Monitor + +**System Level:** +- CPU usage per container +- Memory usage per container +- Disk I/O +- Network throughput +- GPU utilization (for Jellyfin/Ollama/Immich) + +**Application Level:** +- Traefik request rate +- Failed authentication attempts +- Jellyfin concurrent streams +- Download speeds (SABnzbd/qBittorrent) +- Sonarr/Radarr queue size + +### Uptime Kuma Monitoring + +Configure monitors for: +- **HTTP(s)**: All web services (200 status check) +- **TCP**: Database ports (PostgreSQL, etc.) +- **Docker**: Container health (via Docker socket) +- **SSL**: Certificate expiration (30-day warning) + +### Log Monitoring + +Set up Loki alerts for: +- ERROR level logs +- Authentication failures +- Service crashes +- Disk space warnings + +## 🔧 Maintenance Tasks + +### Daily +- Check Uptime Kuma dashboard +- Review any critical alerts + +### Weekly +- Check disk space: `df -h` +- Review failed downloads in Sonarr/Radarr +- Check Loki logs for errors + +### Monthly +- Update all containers: `docker compose pull && docker compose up -d` +- Review and clean old Docker images: `docker image prune -a` +- Backup configurations +- Check SSL certificate renewal + +### Quarterly +- Review and update documentation +- Clean up old media (if needed) +- Review and adjust quality profiles +- Update Recyclarr configurations + +## 📚 Additional Resources + +- [Traefik Documentation](https://doc.traefik.io/traefik/) +- [Docker Compose Best Practices](https://docs.docker.com/compose/production/) +- [Loki LogQL Guide](https://grafana.com/docs/loki/latest/logql/) +- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/) +- [Proxmox GPU Passthrough](https://pve.proxmox.com/wiki/PCI_Passthrough) +- [AlmaLinux Documentation](https://wiki.almalinux.org/) + +--- + +**System Ready!** 🚀 diff --git a/docs/setup/almalinux-vm.md b/docs/setup/almalinux-vm.md new file mode 100644 index 0000000..dc03e3f --- /dev/null +++ b/docs/setup/almalinux-vm.md @@ -0,0 +1,775 @@ +# AlmaLinux 9.6 VM Setup Guide + +Complete setup guide for the homelab VM on AlmaLinux 9.6 running on Proxmox VE 9. + +## Hardware Context + +- **Host**: Proxmox VE 9 (Debian 13 based) + - CPU: AMD Ryzen 5 7600X (6C/12T, 5.3 GHz boost) + - GPU: NVIDIA GTX 1070 (8GB VRAM) + - RAM: 32GB DDR5 + +- **VM Allocation**: + - OS: AlmaLinux 9.6 (RHEL 9 compatible) + - CPU: 8 vCPUs + - RAM: 24GB + - Disk: 500GB+ (expandable) + - GPU: GTX 1070 (PCIe passthrough) + +## Proxmox VM Creation + +### 1. Create VM + +```bash +# On Proxmox host +qm create 100 \ + --name homelab \ + --memory 24576 \ + --cores 8 \ + --cpu host \ + --sockets 1 \ + --net0 virtio,bridge=vmbr0 \ + --scsi0 local-lvm:500 \ + --ostype l26 \ + --boot order=scsi0 + +# Attach AlmaLinux ISO +qm set 100 --ide2 local:iso/AlmaLinux-9.6-x86_64-dvd.iso,media=cdrom + +# Enable UEFI +qm set 100 --bios ovmf --efidisk0 local-lvm:1 +``` + +### 2. GPU Passthrough + +**Find GPU PCI address:** +```bash +lspci | grep -i nvidia +# Example output: 01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070] +``` + +**Enable IOMMU in Proxmox:** + +Edit `/etc/default/grub`: +```bash +# For AMD CPU (Ryzen 5 7600X) +GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt" +``` + +Update GRUB and reboot: +```bash +update-grub +reboot +``` + +**Verify IOMMU:** +```bash +dmesg | grep -e DMAR -e IOMMU +# Should show IOMMU enabled +``` + +**Add GPU to VM:** + +Edit `/etc/pve/qemu-server/100.conf`: +``` +hostpci0: 0000:01:00,pcie=1,x-vga=1 +``` + +Or via command: +```bash +qm set 100 --hostpci0 0000:01:00,pcie=1,x-vga=1 +``` + +**Blacklist GPU on host:** + +Edit `/etc/modprobe.d/blacklist-nvidia.conf`: +``` +blacklist nouveau +blacklist nvidia +blacklist nvidia_drm +blacklist nvidia_modeset +blacklist nvidia_uvm +``` + +Update initramfs: +```bash +update-initramfs -u +reboot +``` + +## AlmaLinux Installation + +### 1. Install AlmaLinux 9.6 + +Start VM and follow installer: +1. **Language**: English (US) +2. **Installation Destination**: Use all space, automatic partitioning +3. **Network**: Enable and set hostname to `homelab.fig.systems` +4. **Software Selection**: Minimal Install +5. **Root Password**: Set strong password +6. **User Creation**: Create admin user (e.g., `homelab`) + +### 2. Post-Installation Configuration + +```bash +# SSH into VM +ssh homelab@ + +# Update system +sudo dnf update -y + +# Install essential tools +sudo dnf install -y \ + vim \ + git \ + curl \ + wget \ + htop \ + ncdu \ + tree \ + tmux \ + bind-utils \ + net-tools \ + firewalld + +# Enable and configure firewall +sudo systemctl enable --now firewalld +sudo firewall-cmd --permanent --add-service=http +sudo firewall-cmd --permanent --add-service=https +sudo firewall-cmd --reload +``` + +### 3. Configure Static IP (Optional) + +```bash +# Find connection name +nmcli connection show + +# Set static IP (example: 192.168.1.100) +sudo nmcli connection modify "System eth0" \ + ipv4.addresses 192.168.1.100/24 \ + ipv4.gateway 192.168.1.1 \ + ipv4.dns "1.1.1.1,8.8.8.8" \ + ipv4.method manual + +# Restart network +sudo nmcli connection down "System eth0" +sudo nmcli connection up "System eth0" +``` + +## Docker Installation + +### 1. Install Docker Engine + +```bash +# Remove old versions +sudo dnf remove docker \ + docker-client \ + docker-client-latest \ + docker-common \ + docker-latest \ + docker-latest-logrotate \ + docker-logrotate \ + docker-engine + +# Add Docker repository +sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo + +# Install Docker +sudo dnf install -y \ + docker-ce \ + docker-ce-cli \ + containerd.io \ + docker-buildx-plugin \ + docker-compose-plugin + +# Start Docker +sudo systemctl enable --now docker + +# Verify +sudo docker run hello-world +``` + +### 2. Configure Docker + +**Add user to docker group:** +```bash +sudo usermod -aG docker $USER +newgrp docker + +# Verify (no sudo needed) +docker ps +``` + +**Configure Docker daemon:** + +Create `/etc/docker/daemon.json`: +```json +{ + "log-driver": "json-file", + "log-opts": { + "max-size": "10m", + "max-file": "3" + }, + "storage-driver": "overlay2", + "features": { + "buildkit": true + } +} +``` + +Restart Docker: +```bash +sudo systemctl restart docker +``` + +## NVIDIA GPU Setup + +### 1. Install NVIDIA Drivers + +```bash +# Add EPEL repository +sudo dnf install -y epel-release + +# Add NVIDIA repository +sudo dnf config-manager --add-repo \ + https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo + +# Install drivers +sudo dnf install -y \ + nvidia-driver \ + nvidia-driver-cuda \ + nvidia-settings \ + nvidia-persistenced + +# Reboot to load drivers +sudo reboot +``` + +### 2. Verify GPU + +```bash +# Check driver version +nvidia-smi + +# Expected output: +# +-----------------------------------------------------------------------------+ +# | NVIDIA-SMI 535.xx.xx Driver Version: 535.xx.xx CUDA Version: 12.2 | +# |-------------------------------+----------------------+----------------------+ +# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | +# | 0 GeForce GTX 1070 Off | 00000000:01:00.0 Off | N/A | +# +-------------------------------+----------------------+----------------------+ +``` + +### 3. Install NVIDIA Container Toolkit + +```bash +# Add NVIDIA Container Toolkit repository +sudo dnf config-manager --add-repo \ + https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo + +# Install toolkit +sudo dnf install -y nvidia-container-toolkit + +# Configure Docker to use nvidia runtime +sudo nvidia-ctk runtime configure --runtime=docker + +# Restart Docker +sudo systemctl restart docker + +# Test GPU in container +docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi +``` + +## Storage Setup + +### 1. Create Media Directory + +```bash +# Create media directory structure +sudo mkdir -p /media/{tv,movies,music,photos,books,audiobooks,comics,homemovies} +sudo mkdir -p /media/{downloads,complete,incomplete} + +# Set ownership +sudo chown -R $USER:$USER /media + +# Set permissions +chmod -R 755 /media +``` + +### 2. Mount Additional Storage (Optional) + +If using separate disk for media: + +```bash +# Find disk +lsblk + +# Format disk (example: /dev/sdb) +sudo mkfs.ext4 /dev/sdb + +# Get UUID +sudo blkid /dev/sdb + +# Add to /etc/fstab +echo "UUID= /media ext4 defaults,nofail 0 2" | sudo tee -a /etc/fstab + +# Mount +sudo mount -a +``` + +## Homelab Repository Setup + +### 1. Clone Repository + +```bash +# Create workspace +mkdir -p ~/homelab +cd ~/homelab + +# Clone repository +git clone https://github.com/efigueroa/homelab.git . + +# Or if using SSH +git clone git@github.com:efigueroa/homelab.git . +``` + +### 2. Create Docker Network + +```bash +# Create homelab network +docker network create homelab + +# Verify +docker network ls | grep homelab +``` + +### 3. Configure Environment Variables + +```bash +# Generate secrets for all services +cd ~/homelab + +# LLDAP +cd compose/core/lldap +openssl rand -hex 32 > /tmp/lldap_jwt_secret +openssl rand -base64 32 | tr -d /=+ | cut -c1-32 > /tmp/lldap_pass +# Update .env with generated secrets + +# Tinyauth +cd ../tinyauth +openssl rand -hex 32 > /tmp/tinyauth_session +# Update .env (LDAP_BIND_PASSWORD must match LLDAP) + +# Continue for all services... +``` + +See [`docs/guides/secrets-management.md`](../guides/secrets-management.md) for complete guide. + +## SELinux Configuration + +AlmaLinux uses SELinux by default. Configure for Docker: + +```bash +# Check SELinux status +getenforce +# Should show: Enforcing + +# Allow Docker to access bind mounts +sudo setsebool -P container_manage_cgroup on + +# If you encounter permission issues: +# Option 1: Add SELinux context to directories +sudo chcon -R -t container_file_t ~/homelab/compose +sudo chcon -R -t container_file_t /media + +# Option 2: Use :Z flag in docker volumes (auto-relabels) +# Example: ./data:/data:Z + +# Option 3: Set SELinux to permissive (not recommended) +# sudo setenforce 0 +``` + +## System Tuning + +### 1. Increase File Limits + +```bash +# Add to /etc/security/limits.conf +echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf +echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf + +# Add to /etc/sysctl.conf +echo "fs.file-max = 65536" | sudo tee -a /etc/sysctl.conf +echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf + +# Apply +sudo sysctl -p +``` + +### 2. Optimize for Media Server + +```bash +# Network tuning +echo "net.core.rmem_max = 134217728" | sudo tee -a /etc/sysctl.conf +echo "net.core.wmem_max = 134217728" | sudo tee -a /etc/sysctl.conf +echo "net.ipv4.tcp_rmem = 4096 87380 67108864" | sudo tee -a /etc/sysctl.conf +echo "net.ipv4.tcp_wmem = 4096 65536 67108864" | sudo tee -a /etc/sysctl.conf + +# Apply +sudo sysctl -p +``` + +### 3. CPU Governor (Ryzen 5 7600X) + +```bash +# Install cpupower +sudo dnf install -y kernel-tools + +# Set to performance mode +sudo cpupower frequency-set -g performance + +# Make permanent +echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor +``` + +## Deployment + +### 1. Deploy Core Services + +```bash +cd ~/homelab + +# Create network +docker network create homelab + +# Deploy Traefik +cd compose/core/traefik +docker compose up -d + +# Deploy LLDAP +cd ../lldap +docker compose up -d + +# Wait for LLDAP to be ready (30 seconds) +sleep 30 + +# Deploy Tinyauth +cd ../tinyauth +docker compose up -d +``` + +### 2. Configure LLDAP + +```bash +# Access LLDAP web UI +# https://lldap.fig.systems + +# 1. Login with admin credentials from .env +# 2. Create observer user for tinyauth +# 3. Create regular users +``` + +### 3. Deploy Monitoring + +```bash +cd ~/homelab + +# Deploy logging stack +cd compose/monitoring/logging +docker compose up -d + +# Deploy uptime monitoring +cd ../uptime +docker compose up -d +``` + +### 4. Deploy Services + +See [`README.md`](../../README.md) for complete deployment order. + +## Verification + +### 1. Check All Services + +```bash +# List all running containers +docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" + +# Check networks +docker network ls + +# Check volumes +docker volume ls +``` + +### 2. Test GPU Access + +```bash +# Test in Jellyfin +docker exec jellyfin nvidia-smi + +# Test in Ollama +docker exec ollama nvidia-smi + +# Test in Immich +docker exec immich-machine-learning nvidia-smi +``` + +### 3. Test Logging + +```bash +# Check Promtail is collecting logs +docker logs promtail | grep "clients configured" + +# Access Grafana +# https://logs.fig.systems + +# Query logs +# {container="traefik"} +``` + +### 4. Test SSL + +```bash +# Check certificate +curl -vI https://sonarr.fig.systems 2>&1 | grep -i "subject:" + +# Should show valid Let's Encrypt certificate +``` + +## Backup Strategy + +### 1. VM Snapshots (Proxmox) + +```bash +# On Proxmox host +# Create snapshot before major changes +qm snapshot 100 pre-update-$(date +%Y%m%d) + +# List snapshots +qm listsnapshot 100 + +# Restore snapshot +qm rollback 100 +``` + +### 2. Configuration Backup + +```bash +# On VM +cd ~/homelab + +# Backup all configs (excludes data directories) +tar czf homelab-config-$(date +%Y%m%d).tar.gz \ + --exclude='*/data' \ + --exclude='*/db' \ + --exclude='*/pgdata' \ + --exclude='*/config' \ + --exclude='*/models' \ + --exclude='*_data' \ + compose/ + +# Backup to external storage +scp homelab-config-*.tar.gz user@backup-server:/backups/ +``` + +### 3. Automated Backups with Backrest + +Backrest service is included and configured. See: +- `compose/services/backrest/` +- Access: https://backup.fig.systems + +## Maintenance + +### Weekly + +```bash +# Update containers +cd ~/homelab +find compose -name "compose.yaml" -type f | while read compose; do + dir=$(dirname "$compose") + echo "Updating $dir" + cd "$dir" + docker compose pull + docker compose up -d + cd ~/homelab +done + +# Clean up old images +docker image prune -a -f + +# Check disk space +df -h +ncdu /media +``` + +### Monthly + +```bash +# Update AlmaLinux +sudo dnf update -y + +# Update NVIDIA drivers (if available) +sudo dnf update nvidia-driver* -y + +# Reboot if kernel updated +sudo reboot +``` + +## Troubleshooting + +### Services Won't Start + +```bash +# Check SELinux denials +sudo ausearch -m avc -ts recent + +# If SELinux is blocking: +sudo setsebool -P container_manage_cgroup on + +# Or relabel directories +sudo restorecon -Rv ~/homelab/compose +``` + +### GPU Not Detected + +```bash +# Check GPU is passed through +lspci | grep -i nvidia + +# Check drivers loaded +lsmod | grep nvidia + +# Reinstall drivers +sudo dnf reinstall nvidia-driver* -y +sudo reboot +``` + +### Network Issues + +```bash +# Check firewall +sudo firewall-cmd --list-all + +# Add ports if needed +sudo firewall-cmd --permanent --add-port=80/tcp +sudo firewall-cmd --permanent --add-port=443/tcp +sudo firewall-cmd --reload + +# Check Docker network +docker network inspect homelab +``` + +### Permission Denied Errors + +```bash +# Check ownership +ls -la ~/homelab/compose/*/ + +# Fix ownership +sudo chown -R $USER:$USER ~/homelab + +# Check SELinux context +ls -Z ~/homelab/compose + +# Fix SELinux labels +sudo chcon -R -t container_file_t ~/homelab/compose +``` + +## Performance Monitoring + +### System Stats + +```bash +# CPU usage +htop + +# GPU usage +watch -n 1 nvidia-smi + +# Disk I/O +iostat -x 1 + +# Network +iftop + +# Per-container stats +docker stats +``` + +### Resource Limits + +Example container resource limits: + +```yaml +# In compose.yaml +deploy: + resources: + limits: + cpus: '2.0' + memory: 4G + reservations: + cpus: '1.0' + memory: 2G +``` + +## Security Hardening + +### 1. Disable Root SSH + +```bash +# Edit /etc/ssh/sshd_config +sudo sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config + +# Restart SSH +sudo systemctl restart sshd +``` + +### 2. Configure Fail2Ban + +```bash +# Install +sudo dnf install -y fail2ban + +# Configure +sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local + +# Edit /etc/fail2ban/jail.local +# [sshd] +# enabled = true +# maxretry = 3 +# bantime = 3600 + +# Start +sudo systemctl enable --now fail2ban +``` + +### 3. Automatic Updates + +```bash +# Install dnf-automatic +sudo dnf install -y dnf-automatic + +# Configure /etc/dnf/automatic.conf +# apply_updates = yes + +# Enable +sudo systemctl enable --now dnf-automatic.timer +``` + +## Next Steps + +1. ✅ VM created and AlmaLinux installed +2. ✅ Docker and NVIDIA drivers configured +3. ✅ Homelab repository cloned +4. ✅ Network and storage configured +5. ⬜ Deploy core services +6. ⬜ Configure SSO +7. ⬜ Deploy all services +8. ⬜ Configure backups +9. ⬜ Set up monitoring + +--- + +**System ready for deployment!** 🚀 \ No newline at end of file