docs: Add architecture docs and fix compose files for integration

This commit is contained in:
Claude 2025-11-10 11:32:13 +00:00
parent 9fbd003798
commit 07a8154fea
No known key found for this signature in database
6 changed files with 1610 additions and 4 deletions

108
README.md
View file

@ -2,6 +2,23 @@
This repository contains Docker Compose configurations for self-hosted home services. This repository contains Docker Compose configurations for self-hosted home services.
## 💻 Hardware Specifications
- **Host**: Proxmox VE 9 (Debian 13)
- CPU: AMD Ryzen 5 7600X (6 cores, 12 threads, up to 5.3 GHz)
- GPU: NVIDIA GeForce GTX 1070 (8GB VRAM)
- RAM: 32GB DDR5
- **VM**: AlmaLinux 9.6 (RHEL 9 compatible)
- CPU: 8 vCPUs
- RAM: 24GB
- Storage: 500GB+ (expandable)
- GPU: GTX 1070 (PCIe passthrough)
**Documentation:**
- [Complete Architecture Guide](docs/architecture.md) - Integration, networking, logging, GPU setup
- [AlmaLinux VM Setup](docs/setup/almalinux-vm.md) - Full installation and configuration guide
## 🏗️ Infrastructure ## 🏗️ Infrastructure
### Core Services (Port 80/443) ### Core Services (Port 80/443)
@ -199,9 +216,21 @@ Each service has its own `.env` file where applicable. Key files to review:
- `core/lldap/.env` - LDAP configuration and admin credentials - `core/lldap/.env` - LDAP configuration and admin credentials
- `core/tinyauth/.env` - LDAP connection and session settings - `core/tinyauth/.env` - LDAP connection and session settings
- `media/frontend/immich/.env` - Photo management configuration - `media/frontend/immich/.env` - Photo management configuration
- `services/linkwarden/.env` - Bookmark manager settings - `services/karakeep/.env` - AI-powered bookmark manager
- `services/ollama/.env` - Local LLM configuration
- `services/microbin/.env` - Pastebin configuration - `services/microbin/.env` - Pastebin configuration
**Example Configuration Files:**
Several services include `.example` config files for reference:
- `media/automation/sonarr/config.xml.example`
- `media/automation/radarr/config.xml.example`
- `media/automation/sabnzbd/sabnzbd.ini.example`
- `media/automation/qbittorrent/qBittorrent.conf.example`
- `services/vikunja/config.yml.example`
- `services/FreshRSS/config.php.example`
Copy these to the appropriate location (usually `./config/`) and customize as needed.
## 🔧 Maintenance ## 🔧 Maintenance
### Viewing Logs ### Viewing Logs
@ -241,6 +270,83 @@ Important data locations:
2. Check LLDAP connection in tinyauth logs 2. Check LLDAP connection in tinyauth logs
3. Verify LDAP bind credentials match in both services 3. Verify LDAP bind credentials match in both services
### GPU not detected
1. Check GPU passthrough: `lspci | grep -i nvidia`
2. Verify drivers: `nvidia-smi`
3. Test in container: `docker exec ollama nvidia-smi`
4. See [AlmaLinux VM Setup](docs/setup/almalinux-vm.md) for GPU configuration
## 📊 Monitoring & Logging
### Centralized Logging (Loki + Promtail + Grafana)
All container logs are automatically collected and stored in Loki:
**Access Grafana**: https://logs.fig.systems
**Query examples:**
```logql
# View logs for specific service
{container="sonarr"}
# Filter by log level
{container="radarr"} |= "ERROR"
# Multiple services
{container=~"sonarr|radarr"}
# Search with JSON parsing
{container="karakeep"} |= "ollama" | json
```
**Retention**: 30 days (configurable in `compose/monitoring/logging/loki-config.yaml`)
### Uptime Monitoring (Uptime Kuma)
Monitor service availability and performance:
**Access Uptime Kuma**: https://status.fig.systems
**Features:**
- HTTP(s) monitoring for all web services
- Docker container health checks
- SSL certificate expiration alerts
- Public/private status pages
- 90+ notification integrations (Discord, Slack, Email, etc.)
### Service Integration
**How services integrate:**
```
Traefik (Reverse Proxy)
├─→ All services (SSL + routing)
└─→ Let's Encrypt (certificates)
Tinyauth (SSO)
├─→ LLDAP (user authentication)
└─→ Protected services (authorization)
Promtail (Log Collection)
├─→ Docker socket (all containers)
└─→ Loki (log storage)
Loki (Log Storage)
└─→ Grafana (visualization)
Karakeep (Bookmarks)
├─→ Ollama (AI tagging)
├─→ Meilisearch (search)
└─→ Chrome (web archiving)
Sonarr/Radarr (Media Automation)
├─→ SABnzbd/qBittorrent (downloads)
├─→ Jellyfin (media library)
└─→ Recyclarr/Profilarr (quality management)
```
See [Architecture Guide](docs/architecture.md) for complete integration details.
## 📄 License ## 📄 License
This is a personal homelab configuration. Use at your own risk. This is a personal homelab configuration. Use at your own risk.

View file

@ -5,7 +5,36 @@ services:
booklore: booklore:
container_name: booklore container_name: booklore
image: ghcr.io/lorebooks/booklore:latest image: ghcr.io/lorebooks/booklore:latest
restart: unless-stopped
env_file: env_file:
- .env - .env
volumes:
- ./data:/app/data
networks:
- homelab
labels:
# Traefik
traefik.enable: true
traefik.docker.network: homelab
# Web UI
traefik.http.routers.booklore.rule: Host(`booklore.fig.systems`) || Host(`booklore.edfig.dev`)
traefik.http.routers.booklore.entrypoints: websecure
traefik.http.routers.booklore.tls.certresolver: letsencrypt
traefik.http.services.booklore.loadbalancer.server.port: 3000
# SSO Protection
traefik.http.routers.booklore.middlewares: tinyauth
# Homarr Discovery
homarr.name: Booklore
homarr.group: Services
homarr.icon: mdi:book-open-variant
networks:
homelab:
external: true

View file

@ -5,17 +5,36 @@ services:
microbin: microbin:
container_name: microbin container_name: microbin
image: danielszabo99/microbin:latest image: danielszabo99/microbin:latest
env_file: .env restart: unless-stopped
env_file:
- .env
volumes:
- ./data:/app/data
networks:
- homelab - homelab
labels: labels:
# Traefik
traefik.enable: true traefik.enable: true
traefik.docker.network: homelab
# Web UI
traefik.http.routers.microbin.rule: Host(`paste.fig.systems`) || Host(`paste.edfig.dev`) traefik.http.routers.microbin.rule: Host(`paste.fig.systems`) || Host(`paste.edfig.dev`)
traefik.http.routers.microbin.entrypoints: websecure traefik.http.routers.microbin.entrypoints: websecure
traefik.http.routers.microbin.tls.certresolver: letsencrypt traefik.http.routers.microbin.tls.certresolver: letsencrypt
traefik.http.services.microbin.loadbalancer.server.port: 8080 traefik.http.services.microbin.loadbalancer.server.port: 8080
# Note: MicroBin has its own auth, SSO disabled by default # Note: MicroBin has its own auth, SSO disabled by default
# traefik.http.routers.microbin.middlewares: tinyauth # traefik.http.routers.microbin.middlewares: tinyauth
# Homarr Discovery
homarr.name: MicroBin
homarr.group: Services
homarr.icon: mdi:content-paste
networks: networks:
homelab: homelab:
external: true external: true

View file

@ -6,7 +6,36 @@ services:
container_name: rsshub container_name: rsshub
# Using chromium-bundled image for full puppeteer support # Using chromium-bundled image for full puppeteer support
image: diygod/rsshub:chromium-bundled image: diygod/rsshub:chromium-bundled
restart: unless-stopped
env_file: env_file:
- .env - .env
volumes:
- ./data:/app/data
networks:
- homelab
labels:
# Traefik
traefik.enable: true
traefik.docker.network: homelab
# Web UI
traefik.http.routers.rsshub.rule: Host(`rsshub.fig.systems`) || Host(`rsshub.edfig.dev`)
traefik.http.routers.rsshub.entrypoints: websecure
traefik.http.routers.rsshub.tls.certresolver: letsencrypt
traefik.http.services.rsshub.loadbalancer.server.port: 1200
# Note: RSSHub is public by design, SSO disabled
# traefik.http.routers.rsshub.middlewares: tinyauth
# Homarr Discovery
homarr.name: RSSHub
homarr.group: Services
homarr.icon: mdi:rss-box
networks:
homelab:
external: true

648
docs/architecture.md Normal file
View file

@ -0,0 +1,648 @@
# Homelab Architecture & Integration
Complete integration guide for the homelab setup on AlmaLinux 9.6.
## 🖥️ Hardware Specifications
### Host System
- **Hypervisor**: Proxmox VE 9 (Debian 13 based)
- **CPU**: AMD Ryzen 5 7600X (6 cores, 12 threads, up to 5.3 GHz)
- **GPU**: NVIDIA GeForce GTX 1070 (8GB VRAM, 1920 CUDA cores)
- **RAM**: 32GB DDR5
### VM Configuration
- **OS**: AlmaLinux 9.6 (RHEL 9 compatible)
- **CPU**: 8 vCPUs (allocated from host)
- **RAM**: 24GB (leaving 8GB for host)
- **Storage**: 500GB+ (adjust based on media library size)
- **GPU**: GTX 1070 (PCIe passthrough from Proxmox)
## 🏗️ Architecture Overview
### Network Architecture
```
Internet
[Router/Firewall]
↓ (Port 80/443)
[Traefik Reverse Proxy]
┌──────────────────────────────────────┐
│ homelab network │
│ (Docker bridge - 172.18.0.0/16) │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Core │ │ Media │ │
│ │ - Traefik │ │ - Jellyfin │ │
│ │ - LLDAP │ │ - Sonarr │ │
│ │ - Tinyauth │ │ - Radarr │ │
│ └─────────────┘ └──────────────┘ │
│ │
│ ┌─────────────┐ ┌──────────────┐ │
│ │ Services │ │ Monitoring │ │
│ │ - Karakeep │ │ - Loki │ │
│ │ - Ollama │ │ - Promtail │ │
│ │ - Vikunja │ │ - Grafana │ │
│ └─────────────┘ └──────────────┘ │
└──────────────────────────────────────┘
[Promtail Agent]
[Loki Storage]
```
### Service Internal Networks
Services with databases use isolated internal networks:
```
karakeep
├── homelab (external traffic)
└── karakeep_internal
├── karakeep (app)
├── karakeep-chrome (browser)
└── karakeep-meilisearch (search)
vikunja
├── homelab (external traffic)
└── vikunja_internal
├── vikunja (app)
└── vikunja-db (postgres)
monitoring/logging
├── homelab (external traffic)
└── logging_internal
├── loki (storage)
├── promtail (collector)
└── grafana (UI)
```
## 🔐 Security Architecture
### Authentication Flow
```
User Request
[Traefik] → Check route rules
[Tinyauth Middleware] → Forward Auth
[LLDAP] → Verify credentials
[Backend Service] → Authorized access
```
### SSL/TLS
- **Certificate Provider**: Let's Encrypt
- **Challenge Type**: HTTP-01 (ports 80/443)
- **Automatic Renewal**: Via Traefik
- **Domains**:
- Primary: `*.fig.systems`
- Fallback: `*.edfig.dev`
### SSO Protection
**Protected Services** (require authentication):
- Traefik Dashboard
- LLDAP
- Sonarr, Radarr, SABnzbd, qBittorrent
- Profilarr, Recyclarr (monitoring)
- Homarr, Backrest
- Karakeep, Vikunja, LubeLogger
- Calibre-web, Booklore, FreshRSS, File Browser
- Loki API, Ollama API
**Unprotected Services** (own authentication):
- Tinyauth (SSO provider itself)
- Jellyfin (own user system)
- Jellyseerr (linked to Jellyfin)
- Immich (own user system)
- RSSHub (public feed generator)
- MicroBin (public pastebin)
- Grafana (own authentication)
- Uptime Kuma (own authentication)
## 📊 Logging Architecture
### Centralized Logging with Loki
All services forward logs to Loki via Promtail:
```
[Docker Container] → stdout/stderr
[Docker Socket] → /var/run/docker.sock
[Promtail] → Scrapes logs via Docker API
[Loki] → Stores and indexes logs
[Grafana] → Query and visualize
```
### Log Labels
Promtail automatically adds labels to all logs:
- `container`: Container name
- `compose_project`: Docker Compose project
- `compose_service`: Service name from compose
- `image`: Docker image name
- `stream`: stdout or stderr
### Log Retention
- **Default**: 30 days
- **Storage**: `compose/monitoring/logging/loki-data/`
- **Automatic cleanup**: Enabled via Loki compactor
### Querying Logs
**View all logs for a service:**
```logql
{container="sonarr"}
```
**Filter by log level:**
```logql
{container="radarr"} |= "ERROR"
```
**Multiple services:**
```logql
{container=~"sonarr|radarr"}
```
**Time range with filters:**
```logql
{container="karakeep"} |= "ollama" | json
```
## 🌐 Network Configuration
### Docker Networks
**homelab** (external bridge):
- Type: External bridge network
- Subnet: Auto-assigned by Docker
- Purpose: Inter-service communication + Traefik routing
- Create: `docker network create homelab`
**Service-specific internal networks**:
- `karakeep_internal`: Karakeep + Chrome + Meilisearch
- `vikunja_internal`: Vikunja + PostgreSQL
- `logging_internal`: Loki + Promtail + Grafana
- etc.
### Port Mappings
**External Ports** (exposed to host):
- `80/tcp`: HTTP (Traefik) - redirects to HTTPS
- `443/tcp`: HTTPS (Traefik)
- `6881/tcp+udp`: BitTorrent (qBittorrent)
**No other ports exposed** - all access via Traefik reverse proxy.
## 🔧 Traefik Integration
### Standard Traefik Labels
All services use consistent Traefik labels:
```yaml
labels:
# Enable Traefik
traefik.enable: true
traefik.docker.network: homelab
# Router configuration
traefik.http.routers.<service>.rule: Host(`<service>.fig.systems`) || Host(`<service>.edfig.dev`)
traefik.http.routers.<service>.entrypoints: websecure
traefik.http.routers.<service>.tls.certresolver: letsencrypt
# Service configuration (backend port)
traefik.http.services.<service>.loadbalancer.server.port: <port>
# SSO middleware (if protected)
traefik.http.routers.<service>.middlewares: tinyauth
# Homarr auto-discovery
homarr.name: <Service Name>
homarr.group: <Category>
homarr.icon: mdi:<icon-name>
```
### Middleware
**tinyauth** - Forward authentication:
```yaml
# Defined in traefik/compose.yaml
middlewares:
tinyauth:
forwardAuth:
address: http://tinyauth:8080
trustForwardHeader: true
```
## 💾 Volume Management
### Volume Types
**Bind Mounts** (host directories):
```yaml
volumes:
- ./data:/data # Service data
- ./config:/config # Configuration files
- /media:/media # Media library (shared)
```
**Named Volumes** (Docker-managed):
```yaml
volumes:
- loki-data:/loki # Loki storage
- postgres-data:/var/lib/postgresql/data
```
### Media Directory Structure
```
/media/
├── tv/ # TV shows (Sonarr → Jellyfin)
├── movies/ # Movies (Radarr → Jellyfin)
├── music/ # Music
├── photos/ # Photos (Immich)
├── books/ # Ebooks (Calibre-web)
├── audiobooks/ # Audiobooks
├── comics/ # Comics
├── homemovies/ # Home videos
├── downloads/ # Active downloads (SABnzbd/qBittorrent)
├── complete/ # Completed downloads
└── incomplete/ # In-progress downloads
```
### Backup Strategy
**Important directories to backup:**
```
compose/core/lldap/data/ # User directory
compose/core/traefik/letsencrypt/ # SSL certificates
compose/services/*/config/ # Service configurations
compose/services/*/data/ # Service data
compose/monitoring/logging/loki-data/ # Logs (optional)
/media/ # Media library
```
**Excluded from backups:**
```
compose/services/*/db/ # Databases (backup via dump)
compose/monitoring/logging/loki-data/ # Logs (can be recreated)
/media/downloads/ # Temporary downloads
/media/incomplete/ # Incomplete downloads
```
## 🎮 GPU Acceleration
### NVIDIA GTX 1070 Configuration
**GPU Passthrough (Proxmox → VM):**
1. **Proxmox host** (`/etc/pve/nodes/<node>/qemu-server/<vmid>.conf`):
```
hostpci0: 0000:01:00,pcie=1,x-vga=1
```
2. **VM (AlmaLinux)** - Install NVIDIA drivers:
```bash
# Add NVIDIA repository
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
# Install drivers
sudo dnf install nvidia-driver nvidia-settings
# Verify
nvidia-smi
```
3. **Docker** - Install NVIDIA Container Toolkit:
```bash
# Add NVIDIA Container Toolkit repo
sudo dnf config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
# Install toolkit
sudo dnf install nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
```
### Services Using GPU
**Jellyfin** (Hardware transcoding):
```yaml
# Uncomment in compose.yaml
devices:
- /dev/dri:/dev/dri # For NVENC/NVDEC
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all
```
**Immich** (AI features):
```yaml
# Already configured
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
**Ollama** (LLM inference):
```yaml
# Uncomment in compose.yaml
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
### GPU Performance Tuning
**For Ryzen 5 7600X + GTX 1070:**
- **Jellyfin**: Can transcode 4-6 simultaneous 4K → 1080p streams
- **Ollama**:
- 3B models: 40-60 tokens/sec
- 7B models: 20-35 tokens/sec
- 13B models: 10-15 tokens/sec (quantized)
- **Immich**: AI tagging ~5-10 images/sec
## 🚀 Resource Allocation
### CPU Allocation (Ryzen 5 7600X - 6C/12T)
**High Priority** (4-6 cores):
- Jellyfin (transcoding)
- Sonarr/Radarr (media processing)
- Ollama (when running)
**Medium Priority** (2-4 cores):
- Immich (AI processing)
- Karakeep (bookmark processing)
- SABnzbd/qBittorrent (downloads)
**Low Priority** (1-2 cores):
- Traefik, LLDAP, Tinyauth
- Monitoring services
- Other utilities
### RAM Allocation (32GB Total, 24GB VM)
**Recommended allocation:**
```
Host (Proxmox): 8GB
VM Total: 24GB breakdown:
├── System: 4GB (AlmaLinux base)
├── Docker: 2GB (daemon overhead)
├── Jellyfin: 2-4GB (transcoding buffers)
├── Immich: 2-3GB (ML models + database)
├── Sonarr/Radarr: 1GB each
├── Ollama: 4-6GB (when running models)
├── Databases: 2-3GB total
├── Monitoring: 2GB (Loki + Grafana)
└── Other services: 4-5GB
```
### Disk Space Planning
**System:** 100GB
**Docker:** 50GB (images + containers)
**Service Data:** 50GB (configs, databases, logs)
**Media Library:** Remaining space (expandable)
**Recommended VM disk:**
- Minimum: 500GB (200GB system + 300GB media)
- Recommended: 1TB+ (allows room for growth)
## 🔄 Service Dependencies
### Startup Order
**Critical order for initial deployment:**
1. **Networks**: `docker network create homelab`
2. **Core** (must start first):
- Traefik (reverse proxy)
- LLDAP (user directory)
- Tinyauth (SSO provider)
3. **Monitoring** (optional but recommended):
- Loki + Promtail + Grafana
- Uptime Kuma
4. **Media Automation**:
- Sonarr, Radarr
- SABnzbd, qBittorrent
- Recyclarr, Profilarr
5. **Media Frontend**:
- Jellyfin
- Jellyseer
- Immich
6. **Services**:
- Karakeep, Ollama (AI features)
- Vikunja, Homarr
- All other services
### Service Integration Map
```
Traefik
├─→ All services (reverse proxy)
└─→ Let's Encrypt (SSL)
Tinyauth
├─→ LLDAP (authentication backend)
└─→ All SSO-protected services
LLDAP
└─→ User database for SSO
Promtail
├─→ Docker socket (log collection)
└─→ Loki (log forwarding)
Loki
└─→ Grafana (log visualization)
Karakeep
├─→ Ollama (AI tagging)
├─→ Meilisearch (search)
└─→ Chrome (web archiving)
Jellyseer
├─→ Jellyfin (media info)
├─→ Sonarr (TV requests)
└─→ Radarr (movie requests)
Sonarr/Radarr
├─→ SABnzbd/qBittorrent (downloads)
├─→ Jellyfin (media library)
└─→ Recyclarr/Profilarr (quality profiles)
Homarr
└─→ All services (dashboard auto-discovery)
```
## 🐛 Troubleshooting
### Check Service Health
```bash
# All services status
cd ~/homelab
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Logs for specific service
docker logs <service-name> --tail 100 -f
# Logs via Loki/Grafana
# Go to https://logs.fig.systems
# Query: {container="<service-name>"}
```
### Network Issues
```bash
# Check homelab network exists
docker network ls | grep homelab
# Inspect network
docker network inspect homelab
# Test service connectivity
docker exec <service-a> ping <service-b>
docker exec karakeep curl http://ollama:11434
```
### GPU Not Detected
```bash
# Check GPU in VM
nvidia-smi
# Check Docker can access GPU
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
# Check service GPU allocation
docker exec jellyfin nvidia-smi
docker exec ollama nvidia-smi
```
### SSL Certificate Issues
```bash
# Check Traefik logs
docker logs traefik | grep -i certificate
# Force certificate renewal
docker exec traefik rm -rf /letsencrypt/acme.json
docker restart traefik
# Verify DNS
dig +short sonarr.fig.systems
```
### SSO Not Working
```bash
# Check Tinyauth status
docker logs tinyauth
# Check LLDAP connection
docker exec tinyauth nc -zv lldap 3890
docker exec tinyauth nc -zv lldap 17170
# Verify credentials match
grep LDAP_BIND_PASSWORD compose/core/tinyauth/.env
grep LLDAP_LDAP_USER_PASS compose/core/lldap/.env
```
## 📈 Monitoring Best Practices
### Key Metrics to Monitor
**System Level:**
- CPU usage per container
- Memory usage per container
- Disk I/O
- Network throughput
- GPU utilization (for Jellyfin/Ollama/Immich)
**Application Level:**
- Traefik request rate
- Failed authentication attempts
- Jellyfin concurrent streams
- Download speeds (SABnzbd/qBittorrent)
- Sonarr/Radarr queue size
### Uptime Kuma Monitoring
Configure monitors for:
- **HTTP(s)**: All web services (200 status check)
- **TCP**: Database ports (PostgreSQL, etc.)
- **Docker**: Container health (via Docker socket)
- **SSL**: Certificate expiration (30-day warning)
### Log Monitoring
Set up Loki alerts for:
- ERROR level logs
- Authentication failures
- Service crashes
- Disk space warnings
## 🔧 Maintenance Tasks
### Daily
- Check Uptime Kuma dashboard
- Review any critical alerts
### Weekly
- Check disk space: `df -h`
- Review failed downloads in Sonarr/Radarr
- Check Loki logs for errors
### Monthly
- Update all containers: `docker compose pull && docker compose up -d`
- Review and clean old Docker images: `docker image prune -a`
- Backup configurations
- Check SSL certificate renewal
### Quarterly
- Review and update documentation
- Clean up old media (if needed)
- Review and adjust quality profiles
- Update Recyclarr configurations
## 📚 Additional Resources
- [Traefik Documentation](https://doc.traefik.io/traefik/)
- [Docker Compose Best Practices](https://docs.docker.com/compose/production/)
- [Loki LogQL Guide](https://grafana.com/docs/loki/latest/logql/)
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/)
- [Proxmox GPU Passthrough](https://pve.proxmox.com/wiki/PCI_Passthrough)
- [AlmaLinux Documentation](https://wiki.almalinux.org/)
---
**System Ready!** 🚀

775
docs/setup/almalinux-vm.md Normal file
View file

@ -0,0 +1,775 @@
# AlmaLinux 9.6 VM Setup Guide
Complete setup guide for the homelab VM on AlmaLinux 9.6 running on Proxmox VE 9.
## Hardware Context
- **Host**: Proxmox VE 9 (Debian 13 based)
- CPU: AMD Ryzen 5 7600X (6C/12T, 5.3 GHz boost)
- GPU: NVIDIA GTX 1070 (8GB VRAM)
- RAM: 32GB DDR5
- **VM Allocation**:
- OS: AlmaLinux 9.6 (RHEL 9 compatible)
- CPU: 8 vCPUs
- RAM: 24GB
- Disk: 500GB+ (expandable)
- GPU: GTX 1070 (PCIe passthrough)
## Proxmox VM Creation
### 1. Create VM
```bash
# On Proxmox host
qm create 100 \
--name homelab \
--memory 24576 \
--cores 8 \
--cpu host \
--sockets 1 \
--net0 virtio,bridge=vmbr0 \
--scsi0 local-lvm:500 \
--ostype l26 \
--boot order=scsi0
# Attach AlmaLinux ISO
qm set 100 --ide2 local:iso/AlmaLinux-9.6-x86_64-dvd.iso,media=cdrom
# Enable UEFI
qm set 100 --bios ovmf --efidisk0 local-lvm:1
```
### 2. GPU Passthrough
**Find GPU PCI address:**
```bash
lspci | grep -i nvidia
# Example output: 01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070]
```
**Enable IOMMU in Proxmox:**
Edit `/etc/default/grub`:
```bash
# For AMD CPU (Ryzen 5 7600X)
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
```
Update GRUB and reboot:
```bash
update-grub
reboot
```
**Verify IOMMU:**
```bash
dmesg | grep -e DMAR -e IOMMU
# Should show IOMMU enabled
```
**Add GPU to VM:**
Edit `/etc/pve/qemu-server/100.conf`:
```
hostpci0: 0000:01:00,pcie=1,x-vga=1
```
Or via command:
```bash
qm set 100 --hostpci0 0000:01:00,pcie=1,x-vga=1
```
**Blacklist GPU on host:**
Edit `/etc/modprobe.d/blacklist-nvidia.conf`:
```
blacklist nouveau
blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
blacklist nvidia_uvm
```
Update initramfs:
```bash
update-initramfs -u
reboot
```
## AlmaLinux Installation
### 1. Install AlmaLinux 9.6
Start VM and follow installer:
1. **Language**: English (US)
2. **Installation Destination**: Use all space, automatic partitioning
3. **Network**: Enable and set hostname to `homelab.fig.systems`
4. **Software Selection**: Minimal Install
5. **Root Password**: Set strong password
6. **User Creation**: Create admin user (e.g., `homelab`)
### 2. Post-Installation Configuration
```bash
# SSH into VM
ssh homelab@<vm-ip>
# Update system
sudo dnf update -y
# Install essential tools
sudo dnf install -y \
vim \
git \
curl \
wget \
htop \
ncdu \
tree \
tmux \
bind-utils \
net-tools \
firewalld
# Enable and configure firewall
sudo systemctl enable --now firewalld
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload
```
### 3. Configure Static IP (Optional)
```bash
# Find connection name
nmcli connection show
# Set static IP (example: 192.168.1.100)
sudo nmcli connection modify "System eth0" \
ipv4.addresses 192.168.1.100/24 \
ipv4.gateway 192.168.1.1 \
ipv4.dns "1.1.1.1,8.8.8.8" \
ipv4.method manual
# Restart network
sudo nmcli connection down "System eth0"
sudo nmcli connection up "System eth0"
```
## Docker Installation
### 1. Install Docker Engine
```bash
# Remove old versions
sudo dnf remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine
# Add Docker repository
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# Install Docker
sudo dnf install -y \
docker-ce \
docker-ce-cli \
containerd.io \
docker-buildx-plugin \
docker-compose-plugin
# Start Docker
sudo systemctl enable --now docker
# Verify
sudo docker run hello-world
```
### 2. Configure Docker
**Add user to docker group:**
```bash
sudo usermod -aG docker $USER
newgrp docker
# Verify (no sudo needed)
docker ps
```
**Configure Docker daemon:**
Create `/etc/docker/daemon.json`:
```json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"storage-driver": "overlay2",
"features": {
"buildkit": true
}
}
```
Restart Docker:
```bash
sudo systemctl restart docker
```
## NVIDIA GPU Setup
### 1. Install NVIDIA Drivers
```bash
# Add EPEL repository
sudo dnf install -y epel-release
# Add NVIDIA repository
sudo dnf config-manager --add-repo \
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
# Install drivers
sudo dnf install -y \
nvidia-driver \
nvidia-driver-cuda \
nvidia-settings \
nvidia-persistenced
# Reboot to load drivers
sudo reboot
```
### 2. Verify GPU
```bash
# Check driver version
nvidia-smi
# Expected output:
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 535.xx.xx Driver Version: 535.xx.xx CUDA Version: 12.2 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | 0 GeForce GTX 1070 Off | 00000000:01:00.0 Off | N/A |
# +-------------------------------+----------------------+----------------------+
```
### 3. Install NVIDIA Container Toolkit
```bash
# Add NVIDIA Container Toolkit repository
sudo dnf config-manager --add-repo \
https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
# Install toolkit
sudo dnf install -y nvidia-container-toolkit
# Configure Docker to use nvidia runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Restart Docker
sudo systemctl restart docker
# Test GPU in container
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
```
## Storage Setup
### 1. Create Media Directory
```bash
# Create media directory structure
sudo mkdir -p /media/{tv,movies,music,photos,books,audiobooks,comics,homemovies}
sudo mkdir -p /media/{downloads,complete,incomplete}
# Set ownership
sudo chown -R $USER:$USER /media
# Set permissions
chmod -R 755 /media
```
### 2. Mount Additional Storage (Optional)
If using separate disk for media:
```bash
# Find disk
lsblk
# Format disk (example: /dev/sdb)
sudo mkfs.ext4 /dev/sdb
# Get UUID
sudo blkid /dev/sdb
# Add to /etc/fstab
echo "UUID=<uuid> /media ext4 defaults,nofail 0 2" | sudo tee -a /etc/fstab
# Mount
sudo mount -a
```
## Homelab Repository Setup
### 1. Clone Repository
```bash
# Create workspace
mkdir -p ~/homelab
cd ~/homelab
# Clone repository
git clone https://github.com/efigueroa/homelab.git .
# Or if using SSH
git clone git@github.com:efigueroa/homelab.git .
```
### 2. Create Docker Network
```bash
# Create homelab network
docker network create homelab
# Verify
docker network ls | grep homelab
```
### 3. Configure Environment Variables
```bash
# Generate secrets for all services
cd ~/homelab
# LLDAP
cd compose/core/lldap
openssl rand -hex 32 > /tmp/lldap_jwt_secret
openssl rand -base64 32 | tr -d /=+ | cut -c1-32 > /tmp/lldap_pass
# Update .env with generated secrets
# Tinyauth
cd ../tinyauth
openssl rand -hex 32 > /tmp/tinyauth_session
# Update .env (LDAP_BIND_PASSWORD must match LLDAP)
# Continue for all services...
```
See [`docs/guides/secrets-management.md`](../guides/secrets-management.md) for complete guide.
## SELinux Configuration
AlmaLinux uses SELinux by default. Configure for Docker:
```bash
# Check SELinux status
getenforce
# Should show: Enforcing
# Allow Docker to access bind mounts
sudo setsebool -P container_manage_cgroup on
# If you encounter permission issues:
# Option 1: Add SELinux context to directories
sudo chcon -R -t container_file_t ~/homelab/compose
sudo chcon -R -t container_file_t /media
# Option 2: Use :Z flag in docker volumes (auto-relabels)
# Example: ./data:/data:Z
# Option 3: Set SELinux to permissive (not recommended)
# sudo setenforce 0
```
## System Tuning
### 1. Increase File Limits
```bash
# Add to /etc/security/limits.conf
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
# Add to /etc/sysctl.conf
echo "fs.file-max = 65536" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf
# Apply
sudo sysctl -p
```
### 2. Optimize for Media Server
```bash
# Network tuning
echo "net.core.rmem_max = 134217728" | sudo tee -a /etc/sysctl.conf
echo "net.core.wmem_max = 134217728" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_rmem = 4096 87380 67108864" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_wmem = 4096 65536 67108864" | sudo tee -a /etc/sysctl.conf
# Apply
sudo sysctl -p
```
### 3. CPU Governor (Ryzen 5 7600X)
```bash
# Install cpupower
sudo dnf install -y kernel-tools
# Set to performance mode
sudo cpupower frequency-set -g performance
# Make permanent
echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
```
## Deployment
### 1. Deploy Core Services
```bash
cd ~/homelab
# Create network
docker network create homelab
# Deploy Traefik
cd compose/core/traefik
docker compose up -d
# Deploy LLDAP
cd ../lldap
docker compose up -d
# Wait for LLDAP to be ready (30 seconds)
sleep 30
# Deploy Tinyauth
cd ../tinyauth
docker compose up -d
```
### 2. Configure LLDAP
```bash
# Access LLDAP web UI
# https://lldap.fig.systems
# 1. Login with admin credentials from .env
# 2. Create observer user for tinyauth
# 3. Create regular users
```
### 3. Deploy Monitoring
```bash
cd ~/homelab
# Deploy logging stack
cd compose/monitoring/logging
docker compose up -d
# Deploy uptime monitoring
cd ../uptime
docker compose up -d
```
### 4. Deploy Services
See [`README.md`](../../README.md) for complete deployment order.
## Verification
### 1. Check All Services
```bash
# List all running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Check networks
docker network ls
# Check volumes
docker volume ls
```
### 2. Test GPU Access
```bash
# Test in Jellyfin
docker exec jellyfin nvidia-smi
# Test in Ollama
docker exec ollama nvidia-smi
# Test in Immich
docker exec immich-machine-learning nvidia-smi
```
### 3. Test Logging
```bash
# Check Promtail is collecting logs
docker logs promtail | grep "clients configured"
# Access Grafana
# https://logs.fig.systems
# Query logs
# {container="traefik"}
```
### 4. Test SSL
```bash
# Check certificate
curl -vI https://sonarr.fig.systems 2>&1 | grep -i "subject:"
# Should show valid Let's Encrypt certificate
```
## Backup Strategy
### 1. VM Snapshots (Proxmox)
```bash
# On Proxmox host
# Create snapshot before major changes
qm snapshot 100 pre-update-$(date +%Y%m%d)
# List snapshots
qm listsnapshot 100
# Restore snapshot
qm rollback 100 <snapshot-name>
```
### 2. Configuration Backup
```bash
# On VM
cd ~/homelab
# Backup all configs (excludes data directories)
tar czf homelab-config-$(date +%Y%m%d).tar.gz \
--exclude='*/data' \
--exclude='*/db' \
--exclude='*/pgdata' \
--exclude='*/config' \
--exclude='*/models' \
--exclude='*_data' \
compose/
# Backup to external storage
scp homelab-config-*.tar.gz user@backup-server:/backups/
```
### 3. Automated Backups with Backrest
Backrest service is included and configured. See:
- `compose/services/backrest/`
- Access: https://backup.fig.systems
## Maintenance
### Weekly
```bash
# Update containers
cd ~/homelab
find compose -name "compose.yaml" -type f | while read compose; do
dir=$(dirname "$compose")
echo "Updating $dir"
cd "$dir"
docker compose pull
docker compose up -d
cd ~/homelab
done
# Clean up old images
docker image prune -a -f
# Check disk space
df -h
ncdu /media
```
### Monthly
```bash
# Update AlmaLinux
sudo dnf update -y
# Update NVIDIA drivers (if available)
sudo dnf update nvidia-driver* -y
# Reboot if kernel updated
sudo reboot
```
## Troubleshooting
### Services Won't Start
```bash
# Check SELinux denials
sudo ausearch -m avc -ts recent
# If SELinux is blocking:
sudo setsebool -P container_manage_cgroup on
# Or relabel directories
sudo restorecon -Rv ~/homelab/compose
```
### GPU Not Detected
```bash
# Check GPU is passed through
lspci | grep -i nvidia
# Check drivers loaded
lsmod | grep nvidia
# Reinstall drivers
sudo dnf reinstall nvidia-driver* -y
sudo reboot
```
### Network Issues
```bash
# Check firewall
sudo firewall-cmd --list-all
# Add ports if needed
sudo firewall-cmd --permanent --add-port=80/tcp
sudo firewall-cmd --permanent --add-port=443/tcp
sudo firewall-cmd --reload
# Check Docker network
docker network inspect homelab
```
### Permission Denied Errors
```bash
# Check ownership
ls -la ~/homelab/compose/*/
# Fix ownership
sudo chown -R $USER:$USER ~/homelab
# Check SELinux context
ls -Z ~/homelab/compose
# Fix SELinux labels
sudo chcon -R -t container_file_t ~/homelab/compose
```
## Performance Monitoring
### System Stats
```bash
# CPU usage
htop
# GPU usage
watch -n 1 nvidia-smi
# Disk I/O
iostat -x 1
# Network
iftop
# Per-container stats
docker stats
```
### Resource Limits
Example container resource limits:
```yaml
# In compose.yaml
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 2G
```
## Security Hardening
### 1. Disable Root SSH
```bash
# Edit /etc/ssh/sshd_config
sudo sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
# Restart SSH
sudo systemctl restart sshd
```
### 2. Configure Fail2Ban
```bash
# Install
sudo dnf install -y fail2ban
# Configure
sudo cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
# Edit /etc/fail2ban/jail.local
# [sshd]
# enabled = true
# maxretry = 3
# bantime = 3600
# Start
sudo systemctl enable --now fail2ban
```
### 3. Automatic Updates
```bash
# Install dnf-automatic
sudo dnf install -y dnf-automatic
# Configure /etc/dnf/automatic.conf
# apply_updates = yes
# Enable
sudo systemctl enable --now dnf-automatic.timer
```
## Next Steps
1. ✅ VM created and AlmaLinux installed
2. ✅ Docker and NVIDIA drivers configured
3. ✅ Homelab repository cloned
4. ✅ Network and storage configured
5. ⬜ Deploy core services
6. ⬜ Configure SSO
7. ⬜ Deploy all services
8. ⬜ Configure backups
9. ⬜ Set up monitoring
---
**System ready for deployment!** 🚀