homelab/docs/guides/gpu-setup.md
Claude 4adaa8e8be
docs: Add comprehensive documentation for homelab setup and operations
This commit adds extensive documentation covering all aspects of homelab setup,
configuration, and troubleshooting.

## Documentation Structure

### Main Documentation
- **docs/README.md**: Documentation hub with table of contents
- **docs/getting-started.md**: Complete setup guide from scratch
- **docs/quick-reference.md**: Fast reference for common tasks and commands

### Configuration Guides (docs/guides/)
- **secrets-management.md**: Environment variables and secrets configuration
  - How to generate secure secrets
  - Service-specific configuration
  - Automated secret generation scripts
  - Security best practices
  - Common mistakes to avoid

- **gpu-setup.md**: NVIDIA GTX 1070 GPU acceleration setup
  - Specific to Proxmox 9 on Debian 13
  - Complete passthrough configuration
  - Jellyfin hardware transcoding setup
  - Immich ML inference acceleration
  - Performance tuning and benchmarks
  - Troubleshooting GPU issues

### Troubleshooting (docs/troubleshooting/)
- **faq.md**: Frequently asked questions (60+ Q&A)
  - General questions about the homelab
  - Setup and configuration questions
  - SSL/TLS and SSO questions
  - Service-specific questions
  - Security and backup questions
  - Performance optimization

- **common-issues.md**: Common problems and solutions
  - Service startup failures
  - SSL certificate errors
  - SSO authentication issues
  - Access problems
  - Performance issues
  - Database errors
  - Network issues
  - GPU problems

### Services (docs/services/)
- **README.md**: Complete service overview
  - All 20 services with descriptions
  - Use cases for each service
  - Resource requirements
  - Deployment checklists
  - Service dependencies
  - Minimum viable setups

## Key Features

### Environment-Specific
All GPU documentation is specific to:
- **Platform**: Proxmox 9 (PVE)
- **OS**: Debian 13
- **GPU**: NVIDIA GTX 1070 (Pascal)
- Includes Proxmox-specific GPU passthrough
- VM guest setup on Debian 13
- NVIDIA Container Toolkit configuration

### Comprehensive Coverage
- 60+ FAQs answered
- 50+ common issues documented
- 100+ command examples
- Step-by-step procedures
- Troubleshooting decision trees
- Quick reference tables

### Practical Examples
- Actual command outputs
- Real-world scenarios
- Copy-paste ready commands
- Configuration file examples
- Debugging procedures

## Documentation Highlights

### Getting Started Guide
- Prerequisites checklist
- Docker installation
- Media directory setup
- DNS configuration
- Environment variable setup
- Service deployment order
- Initial service configuration
- Verification procedures

### Secrets Management
- Secret type identification
- Generation commands for each type
- Service-specific requirements
- Automated generation script
- Password manager integration
- Backup procedures
- Security best practices
- Common mistakes

### GPU Setup (Proxmox/Debian/GTX 1070)
- IOMMU enablement
- VFIO configuration
- PCI passthrough to VM
- NVIDIA driver installation on Debian 13
- Container toolkit setup
- Jellyfin NVENC configuration
- Immich CUDA acceleration
- Performance benchmarks
- NVENC stream limit unlock
- Monitoring and tuning

### Quick Reference
- All service URLs
- Common Docker Compose commands
- System check commands
- Secret generation commands
- Troubleshooting steps
- File locations
- Port reference
- Emergency procedures

### FAQ
Covers questions about:
- Hardware requirements
- Domain requirements
- Cost estimates
- Setup procedures
- Configuration details
- SSL certificates
- SSO authentication
- Service-specific issues
- Backup strategies
- Performance optimization
- Security considerations

### Common Issues
Solutions for:
- Container startup failures
- Environment variable errors
- Port conflicts
- Permission issues
- SSL certificate problems
- DNS issues
- SSO login failures
- Database connections
- Network connectivity
- GPU detection
- Resource constraints

### Services Overview
- Detailed description of all 20 services
- Use cases and features
- Required vs optional services
- Resource requirements by tier
- Service dependencies diagram
- Deployment checklists
- "When to use" guidance

## File Structure

```
docs/
├── README.md                           # Documentation hub
├── getting-started.md                  # Setup walkthrough
├── quick-reference.md                  # Command reference
├── guides/
│   ├── secrets-management.md           # Secrets configuration
│   └── gpu-setup.md                    # GPU acceleration (GTX 1070)
├── troubleshooting/
│   ├── faq.md                          # 60+ FAQs
│   └── common-issues.md                # Problem solving
└── services/
    └── README.md                       # Service overview
```

## Benefits

### For New Users
- Clear setup path from zero to running services
- Explains "why" not just "how"
- Common pitfalls documented and avoided
- Example configurations provided

### For Experienced Users
- Quick reference for commands
- Troubleshooting decision trees
- Performance tuning guides
- Advanced configurations

### For Maintenance
- Update procedures
- Backup and restore
- Monitoring guidelines
- Security hardening

## Documentation Standards

- Clear, concise writing
- Code blocks with syntax highlighting
- Examples with expected output
- Warning and tip callouts
- Cross-references between docs
- Tested commands and procedures

## Next Steps

Users should:
1. Start with getting-started.md
2. Configure secrets using secrets-management.md
3. Enable GPU if available (gpu-setup.md)
4. Use quick-reference.md for daily operations
5. Refer to faq.md and common-issues.md when stuck

---

**This documentation makes the homelab accessible to users of all skill levels!**
2025-11-06 19:32:10 +00:00

16 KiB

NVIDIA GPU Acceleration Setup (GTX 1070)

This guide covers setting up NVIDIA GPU acceleration for your homelab running on Proxmox 9 (Debian 13) with an NVIDIA GTX 1070.

Overview

GPU acceleration provides significant benefits:

  • Jellyfin: Hardware video transcoding (H.264, HEVC)
  • Immich: Faster ML inference (face recognition, object detection)
  • Performance: 10-20x faster transcoding vs CPU
  • Efficiency: Lower power consumption, CPU freed for other tasks

Your Hardware:

  • GPU: NVIDIA GTX 1070 (Pascal architecture)
  • Capabilities: NVENC (encoding), NVDEC (decoding), CUDA
  • Max Concurrent Streams: 2 (can be unlocked)
  • Supported Codecs: H.264, HEVC (H.265)

Architecture Overview

Proxmox Host (Debian 13)
  │
  ├─ NVIDIA Drivers (host)
  ├─ NVIDIA Container Toolkit
  │
  └─ Docker VM/LXC
       │
       ├─ GPU passthrough
       │
       └─ Jellyfin/Immich containers
            └─ Hardware transcoding

Part 1: Proxmox Host Setup

Step 1.1: Enable IOMMU (for GPU Passthrough)

Edit GRUB configuration:

# SSH into Proxmox host
ssh root@proxmox-host

# Edit GRUB config
nano /etc/default/grub

Find this line:

GRUB_CMDLINE_LINUX_DEFAULT="quiet"

Replace with (Intel CPU):

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

Or (AMD CPU):

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

Update GRUB and reboot:

update-grub
reboot

Verify IOMMU is enabled:

dmesg | grep -e DMAR -e IOMMU

# Should see: "IOMMU enabled"

Step 1.2: Load VFIO Modules

Edit modules:

nano /etc/modules

Add these lines:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Update initramfs:

update-initramfs -u -k all
reboot

Step 1.3: Find GPU PCI ID

lspci -nn | grep -i nvidia

# Example output:
# 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
# 01:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

Note the IDs: 10de:1b81 and 10de:10f0 (your values may differ)

Step 1.4: Configure VFIO

Create VFIO config:

nano /etc/modprobe.d/vfio.conf

Add (replace with your IDs from above):

options vfio-pci ids=10de:1b81,10de:10f0
softdep nvidia pre: vfio-pci

Blacklist nouveau (open-source NVIDIA driver):

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Update and reboot:

update-initramfs -u -k all
reboot

Verify GPU is bound to VFIO:

lspci -nnk -d 10de:1b81

# Should show:
# Kernel driver in use: vfio-pci

Part 2: VM/LXC Setup

Create Ubuntu 24.04 VM with GPU passthrough:

  1. Create VM in Proxmox UI:

    • OS: Ubuntu 24.04 Server
    • CPU: 4+ cores
    • RAM: 16GB+
    • Disk: 100GB+
  2. Add PCI Device (GPU):

    • Hardware → Add → PCI Device
    • Device: Select your GTX 1070 (01:00.0)
    • All Functions
    • Primary GPU (if no other GPU)
    • PCI-Express
  3. Add PCI Device (GPU Audio):

    • Hardware → Add → PCI Device
    • Device: NVIDIA Audio (01:00.1)
    • All Functions
  4. Machine Settings:

    • Machine: q35
    • BIOS: OVMF (UEFI)
    • Add EFI Disk
  5. Start VM and install Ubuntu

Option B: Using LXC (Advanced, Less Stable)

Note: LXC with GPU is less reliable. VM recommended.

If you insist on LXC:

# Edit LXC config
nano /etc/pve/lxc/VMID.conf

# Add:
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 509:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file

For this guide, we'll use VM (Option A).

Part 3: VM Guest Setup (Debian 13)

Now we're inside the Ubuntu/Debian VM where Docker runs.

Step 3.1: Install NVIDIA Drivers

SSH into your Docker VM:

ssh user@docker-vm

Update system:

sudo apt update
sudo apt upgrade -y

Debian 13 - Install NVIDIA drivers:

# Add non-free repositories
sudo nano /etc/apt/sources.list

# Add 'non-free non-free-firmware' to each line, example:
deb http://deb.debian.org/debian bookworm main non-free non-free-firmware
deb http://deb.debian.org/debian bookworm-updates main non-free non-free-firmware

# Update and install
sudo apt update
sudo apt install -y linux-headers-$(uname -r)
sudo apt install -y nvidia-driver nvidia-smi

# Reboot
sudo reboot

Verify driver installation:

nvidia-smi

# Should show:
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 535.xx.xx    Driver Version: 535.xx.xx    CUDA Version: 12.2     |
# |-------------------------------+----------------------+----------------------+
# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
# |===============================+======================+======================|
# |   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
# | 30%   35C    P8    10W / 150W |      0MiB /  8192MiB |      0%      Default |
# +-------------------------------+----------------------+----------------------+

Success! Your GTX 1070 is now accessible in the VM.

Step 3.2: Install NVIDIA Container Toolkit

Add NVIDIA Container Toolkit repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Install toolkit:

sudo apt update
sudo apt install -y nvidia-container-toolkit

Configure Docker to use NVIDIA runtime:

sudo nvidia-ctk runtime configure --runtime=docker

Restart Docker:

sudo systemctl restart docker

Verify Docker can access GPU:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

# Should show nvidia-smi output from inside container

Success! Docker can now use your GPU.

Part 4: Configure Jellyfin for GPU Transcoding

Step 4.1: Update Jellyfin Compose File

Edit compose file:

cd ~/homelab/compose/media/frontend/jellyfin
nano compose.yaml

Uncomment the GPU sections:

services:
  jellyfin:
    container_name: jellyfin
    image: lscr.io/linuxserver/jellyfin:latest
    env_file:
      - .env
    volumes:
      - ./config:/config
      - ./cache:/cache
      - /media/movies:/media/movies:ro
      - /media/tv:/media/tv:ro
      - /media/music:/media/music:ro
      - /media/photos:/media/photos:ro
      - /media/homemovies:/media/homemovies:ro
    ports:
      - "8096:8096"
      - "7359:7359/udp"
    restart: unless-stopped
    networks:
      - homelab
    labels:
      traefik.enable: true
      traefik.http.routers.jellyfin.rule: Host(`flix.fig.systems`) || Host(`flix.edfig.dev`)
      traefik.http.routers.jellyfin.entrypoints: websecure
      traefik.http.routers.jellyfin.tls.certresolver: letsencrypt
      traefik.http.services.jellyfin.loadbalancer.server.port: 8096

    # UNCOMMENT THESE LINES FOR GTX 1070:
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

networks:
  homelab:
    external: true

Restart Jellyfin:

docker compose down
docker compose up -d

Check logs:

docker compose logs -f

# Should see lines about NVENC/CUDA being detected

Step 4.2: Enable in Jellyfin UI

  1. Go to https://flix.fig.systems
  2. Dashboard → Playback → Transcoding
  3. Hardware acceleration: NVIDIA NVENC
  4. Enable hardware decoding for:
    • H264
    • HEVC
    • VC1
    • VP8
    • MPEG2
  5. Enable hardware encoding
  6. Enable encoding in HEVC format
  7. Save

Step 4.3: Test Transcoding

  1. Play a video in Jellyfin web UI
  2. Click Settings (gear icon) → Quality
  3. Select a lower bitrate to force transcoding
  4. In another terminal:
    nvidia-smi
    
    # While video is transcoding, should see:
    # GPU utilization: 20-40%
    # Memory usage: 500-1000MB
    

Success! Jellyfin is using your GTX 1070!

Part 5: Configure Immich for GPU Acceleration

Immich can use GPU for two purposes:

  1. ML Inference (face recognition, object detection)
  2. Video Transcoding

Step 5.1: ML Inference (CUDA)

Edit Immich compose file:

cd ~/homelab/compose/media/frontend/immich
nano compose.yaml

Change ML image to CUDA version:

Find this line:

image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}

Change to:

image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda

Add GPU support:

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    networks:
      - immich_internal

    # ADD THESE LINES:
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Step 5.2: Video Transcoding (NVENC)

For video transcoding, add to immich-server:

  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # ... existing config ...

    # ADD THESE LINES:
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Restart Immich:

docker compose down
docker compose up -d

Step 5.3: Enable in Immich UI

  1. Go to https://photos.fig.systems

  2. Administration → Settings → Video Transcoding

  3. Transcoding: h264 (NVENC)

  4. Hardware Acceleration: NVIDIA

  5. Save

  6. Administration → Settings → Machine Learning

  7. Facial Recognition: Enabled

  8. Object Detection: Enabled

  9. Should automatically use CUDA

Step 5.4: Test ML Inference

  1. Upload photos with faces
  2. In terminal:
    nvidia-smi
    
    # While processing, should see:
    # GPU utilization: 50-80%
    # Memory usage: 2-4GB
    

Success! Immich is using GPU for ML inference!

Part 6: Performance Tuning

GTX 1070 Specific Settings

Jellyfin optimal settings:

  • Hardware acceleration: NVIDIA NVENC
  • Target transcode bandwidth: Let clients decide
  • Enable hardware encoding: Yes
  • Prefer OS native DXVA or VA-API hardware decoders: No
  • Allow encoding in HEVC format: Yes (GTX 1070 supports HEVC)

Immich optimal settings:

  • Transcoding: h264 or hevc
  • Target resolution: 1080p (for GTX 1070)
  • CRF: 23 (good balance)
  • Preset: fast

Unlock NVENC Stream Limit

GTX 1070 is limited to 2 concurrent transcoding streams. You can unlock unlimited streams:

Install patch:

# Inside Docker VM
git clone https://github.com/keylase/nvidia-patch.git
cd nvidia-patch
sudo bash ./patch.sh

# Reboot
sudo reboot

Verify:

nvidia-smi

# Now supports unlimited concurrent streams

⚠️ Note: This is a hack that modifies NVIDIA driver. Use at your own risk.

Monitor GPU Usage

Real-time monitoring:

watch -n 1 nvidia-smi

Check GPU usage from Docker:

docker stats $(docker ps --format '{{.Names}}' | grep -E 'jellyfin|immich')

Troubleshooting

GPU Not Detected in VM

Check from Proxmox host:

lspci | grep -i nvidia

Check from VM:

lspci | grep -i nvidia
nvidia-smi

If not visible in VM:

  1. Verify IOMMU is enabled (dmesg | grep IOMMU)
  2. Check PCI passthrough is configured correctly
  3. Ensure VM is using q35 machine type
  4. Verify BIOS is OVMF (UEFI)

Docker Can't Access GPU

Error: could not select device driver "" with capabilities: [[gpu]]

Fix:

# Reconfigure NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Test again
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

Jellyfin Shows "No Hardware Acceleration Available"

Check:

# Verify container has GPU access
docker exec jellyfin nvidia-smi

# Check Jellyfin logs
docker logs jellyfin | grep -i nvenc

Fix:

  1. Ensure runtime: nvidia is uncommented
  2. Verify deploy.resources.reservations.devices is configured
  3. Restart container: docker compose up -d

Transcoding Fails with "Failed to Open GPU"

Check:

# GPU might be busy
nvidia-smi

# Kill processes using GPU
sudo fuser -v /dev/nvidia*

Low GPU Utilization During Transcoding

Normal: GTX 1070 is powerful. 20-40% utilization is expected for single stream.

To max out GPU:

  • Transcode multiple streams simultaneously
  • Use higher resolution source (4K)
  • Enable HEVC encoding

Performance Benchmarks (GTX 1070)

Typical Performance:

  • 4K HEVC → 1080p H.264: ~120-150 FPS (real-time)
  • 1080p H.264 → 720p H.264: ~300-400 FPS
  • Concurrent streams: 4-6 (after unlocking limit)
  • Power draw: 80-120W during transcoding
  • Temperature: 55-65°C

Compare to CPU (typical 4-core):

  • 4K HEVC → 1080p H.264: ~10-15 FPS
  • CPU would be at 100% utilization
  • GPU: 10-15x faster!

Monitoring and Maintenance

Create GPU Monitoring Dashboard

Install nvtop (nvidia-top):

sudo apt install nvtop

Run:

nvtop

Shows real-time GPU usage, memory, temperature, processes.

Check GPU Health

# Temperature
nvidia-smi --query-gpu=temperature.gpu --format=csv

# Memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv

# Fan speed
nvidia-smi --query-gpu=fan.speed --format=csv

# Power draw
nvidia-smi --query-gpu=power.draw,power.limit --format=csv

Automated Monitoring

Add to cron:

crontab -e

# Add:
*/5 * * * * nvidia-smi --query-gpu=utilization.gpu,memory.used,temperature.gpu --format=csv,noheader >> /var/log/gpu-stats.log

Next Steps

GPU is now configured for Jellyfin and Immich!

Recommended:

  1. Test transcoding with various file formats
  2. Upload photos to Immich and verify ML inference works
  3. Monitor GPU temperature and utilization
  4. Consider unlocking NVENC stream limit
  5. Set up automated monitoring

Optional:

  • Configure Tdarr for batch transcoding using GPU
  • Set up Plex (also supports NVENC)
  • Use GPU for other workloads (AI, rendering)

Reference

Quick Command Reference

# Check GPU from host (Proxmox)
lspci | grep -i nvidia

# Check GPU from VM
nvidia-smi

# Test Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

# Monitor GPU real-time
watch -n 1 nvidia-smi

# Check Jellyfin GPU usage
docker exec jellyfin nvidia-smi

# Restart Jellyfin with GPU
cd ~/homelab/compose/media/frontend/jellyfin
docker compose down && docker compose up -d

# View GPU processes
nvidia-smi pmon

# GPU temperature
nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader

GTX 1070 Specifications

  • Architecture: Pascal (GP104)
  • CUDA Cores: 1920
  • Memory: 8GB GDDR5
  • Memory Bandwidth: 256 GB/s
  • TDP: 150W
  • NVENC: 6th generation (H.264, HEVC)
  • NVDEC: 2nd generation
  • Concurrent Streams: 2 (unlockable to unlimited)

Your GTX 1070 is now accelerating your homelab! 🚀