homelab/terraform/proxmox-examples/docker-host/README.md
Claude 9109712b59
feat: Add SOPS state management and fix Terraform deployment issues
SOPS State Management:
- Implemented Git + SOPS + age encryption for Terraform state files
- Added .gitignore files to prevent committing unencrypted secrets
- Created .sops.yaml.example template for age encryption configuration
- Created helper scripts for automated encryption/decryption workflow:
  - scripts/tf: Wrapper script with auto-encrypt/decrypt
  - scripts/tf-encrypt: Manual encryption of state files
  - scripts/tf-decrypt: Manual decryption of state files
- Added comprehensive STATE_MANAGEMENT.md documentation covering:
  - Installation of age and SOPS
  - Initial setup and key generation
  - Daily workflow examples
  - Security best practices
  - Troubleshooting common issues
  - Multi-user key management
  - Backup strategies

Terraform Deployment Fixes:
- Added snippets_storage variable for cloud-init snippet storage
- Fixed datastore error: "local" does not support snippets
- Updated README with solutions for datastore and SSH issues
- Added troubleshooting for:
  - Enabling snippets on existing storage (pvesm set)
  - Creating dedicated directory storage for snippets
  - SSH authentication setup with ssh-agent
  - Manual cloud-init snippet creation workaround

Files modified:
- terraform/proxmox-examples/docker-host/main.tf
- terraform/proxmox-examples/docker-host/variables.tf
- terraform/proxmox-examples/docker-host/terraform.tfvars.example
- terraform/proxmox-examples/docker-host/README.md

Files added:
- .gitignore (root level)
- terraform/proxmox-examples/docker-host/.gitignore
- terraform/proxmox-examples/docker-host/.sops.yaml.example
- terraform/proxmox-examples/docker-host/STATE_MANAGEMENT.md
- terraform/proxmox-examples/docker-host/scripts/tf
- terraform/proxmox-examples/docker-host/scripts/tf-encrypt
- terraform/proxmox-examples/docker-host/scripts/tf-decrypt
2025-11-11 05:55:07 +00:00

771 lines
15 KiB
Markdown

# Docker Host VM with OpenTofu
This configuration creates a VM optimized for running Docker containers in your homelab with support for GPU passthrough and NFS media mounts.
## What This Creates
- ✅ Ubuntu or AlmaLinux VM (from cloud template)
- ✅ Docker & Docker Compose installed
- ✅ Homelab network created
- ✅ /media directories structure
- ✅ SSH key authentication
- ✅ Automatic updates enabled
- ✅ Optional GPU passthrough (NVIDIA GTX 1070)
- ✅ Optional NFS mounts from Proxmox host
## Prerequisites
### 1. Create Ubuntu Cloud Template
First, create a cloud-init enabled template in Proxmox:
```bash
# SSH to Proxmox server
ssh root@proxmox.local
# Download Ubuntu cloud image
wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
# Create VM
qm create 9000 --name ubuntu-cloud-template --memory 2048 --net0 virtio,bridge=vmbr0
# Import disk
qm importdisk 9000 jammy-server-cloudimg-amd64.img local-lvm
# Attach disk
qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0
# Add cloud-init drive
qm set 9000 --ide2 local-lvm:cloudinit
# Set boot disk
qm set 9000 --boot c --bootdisk scsi0
# Add serial console
qm set 9000 --serial0 socket --vga serial0
# Convert to template
qm template 9000
# Cleanup
rm jammy-server-cloudimg-amd64.img
```
**Or create AlmaLinux 9.6 Cloud Template:**
```bash
# SSH to Proxmox server
ssh root@proxmox.local
# Download AlmaLinux cloud image
wget https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-latest.x86_64.qcow2
# Create VM
qm create 9001 --name almalinux-cloud-template --memory 2048 --net0 virtio,bridge=vmbr0
# Import disk
qm importdisk 9001 AlmaLinux-9-GenericCloud-latest.x86_64.qcow2 local-lvm
# Attach disk
qm set 9001 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9001-disk-0
# Add cloud-init drive
qm set 9001 --ide2 local-lvm:cloudinit
# Set boot disk
qm set 9001 --boot c --bootdisk scsi0
# Add serial console
qm set 9001 --serial0 socket --vga serial0
# Convert to template
qm template 9001
# Cleanup
rm AlmaLinux-9-GenericCloud-latest.x86_64.qcow2
```
### 2. (Optional) Enable GPU Passthrough
**For NVIDIA GTX 1070 on AMD Ryzen CPU:**
```bash
# On Proxmox host, edit GRUB config
nano /etc/default/grub
# Add to GRUB_CMDLINE_LINUX_DEFAULT:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
# Update GRUB
update-grub
# Load required kernel modules
nano /etc/modules
# Add these lines:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
# Blacklist NVIDIA drivers on host
nano /etc/modprobe.d/blacklist.conf
# Add:
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
# Update initramfs
update-initramfs -u -k all
# Reboot Proxmox host
reboot
# After reboot, verify IOMMU is enabled:
dmesg | grep -e DMAR -e IOMMU
# Find GPU PCI ID:
lspci | grep -i nvidia
# Output example: 01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1070]
# Use: 0000:01:00 (note the format)
```
### 3. (Optional) Configure NFS Server on Proxmox
**Export media directories from Proxmox host:**
```bash
# On Proxmox host
# Install NFS server
apt update
apt install nfs-kernel-server -y
# Create /etc/exports entry
nano /etc/exports
# Add (replace 192.168.1.0/24 with your network):
/data/media/audiobooks 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/books 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/comics 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/complete 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/downloads 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/homemovies 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/incomplete 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/movies 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/music 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/photos 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
/data/media/tv 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
# Export NFS shares
exportfs -ra
# Enable and start NFS server
systemctl enable nfs-server
systemctl start nfs-server
# Verify exports
showmount -e localhost
```
### 4. Create API Token
In Proxmox UI:
1. Datacenter → Permissions → API Tokens
2. Add → User: `root@pam`, Token ID: `terraform`
3. Uncheck "Privilege Separation"
4. Save the secret!
### 5. Install OpenTofu
```bash
# Linux/macOS
curl --proto '=https' --tlsv1.2 -fsSL https://get.opentofu.org/install-opentofu.sh | sh
# Verify
tofu version
```
## Quick Start
### 1. Configure Variables
```bash
cd terraform/proxmox-examples/docker-host
# Copy example config
cp terraform.tfvars.example terraform.tfvars
# Edit with your values
nano terraform.tfvars
```
**Required changes:**
- `pm_api_token_secret` - Your Proxmox API secret
- `vm_ssh_keys` - Your SSH public key
- `vm_password` - Set a secure password
**Optional changes:**
- `vm_name` - Change VM name
- `vm_cores` / `vm_memory` - Adjust resources
- `vm_ip_address` - Set static IP (or keep DHCP)
- `vm_os_type` - Choose "ubuntu", "almalinux", or "debian"
- `template_vm_id` - Use 9001 for AlmaLinux template
- `enable_gpu_passthrough` - Set to true for GPU support
- `gpu_pci_id` - Your GPU PCI ID (find with `lspci`)
- `mount_media_directories` - Set to true for NFS mounts
- `proxmox_host_ip` - IP for NFS server (Proxmox host)
- `media_source_path` - Path on Proxmox host (default: /data/media)
### 2. Initialize
```bash
tofu init
```
### 3. Plan
```bash
tofu plan
```
Review what will be created.
### 4. Apply
```bash
tofu apply
```
Type `yes` to confirm.
### 5. Connect
```bash
# Get SSH command from output
tofu output ssh_command
# Or manually
ssh ubuntu@<VM-IP>
# Verify Docker
docker --version
docker ps
docker network ls | grep homelab
```
## Configuration Options
### Resource Sizing
**Light workload (1-5 containers):**
```hcl
vm_cores = 2
vm_memory = 4096
disk_size = "30"
```
**Medium workload (5-15 containers):**
```hcl
vm_cores = 4
vm_memory = 8192
disk_size = "50"
```
**Heavy workload (15+ containers):**
```hcl
vm_cores = 8
vm_memory = 16384
disk_size = "100"
```
### Network Configuration
**DHCP (easiest):**
```hcl
vm_ip_address = "dhcp"
```
**Static IP:**
```hcl
vm_ip_address = "192.168.1.100"
vm_ip_netmask = 24
vm_gateway = "192.168.1.1"
```
### Multiple SSH Keys
```hcl
vm_ssh_keys = [
"ssh-rsa AAAAB3... user1@laptop",
"ssh-rsa AAAAB3... user2@desktop"
]
```
### GPU Passthrough Configuration
**Enable NVIDIA GTX 1070 for Jellyfin, Ollama, Immich:**
```hcl
# Must complete Proxmox host GPU passthrough setup first
enable_gpu_passthrough = true
gpu_pci_id = "0000:01:00" # Find with: lspci | grep -i nvidia
# Use AlmaLinux for better GPU support
vm_os_type = "almalinux"
template_vm_id = 9001
# Allocate sufficient resources
vm_cores = 8
vm_memory = 24576 # 24GB
```
**Verify GPU in VM after deployment:**
```bash
ssh ubuntu@<VM-IP>
# Install NVIDIA drivers (AlmaLinux)
sudo dnf install -y epel-release
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf install -y nvidia-driver nvidia-container-toolkit
# Verify
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
```
### NFS Media Mounts Configuration
**Mount Proxmox host media directories to VM:**
```hcl
# Enable NFS mounts from Proxmox host
mount_media_directories = true
# Proxmox host IP (not API URL)
proxmox_host_ip = "192.168.1.100"
# Source path on Proxmox host
media_source_path = "/data/media"
# Mount point in VM
media_mount_path = "/media"
```
**After deployment, verify mounts:**
```bash
ssh ubuntu@<VM-IP>
# Check mounts
df -h | grep /media
ls -la /media
# Expected directories:
# /media/audiobooks, /media/books, /media/comics,
# /media/complete, /media/downloads, /media/homemovies,
# /media/incomplete, /media/movies, /media/music,
# /media/photos, /media/tv
```
### Operating System Selection
**AlmaLinux 9.6 (Recommended for GPU):**
```hcl
vm_os_type = "almalinux"
template_vm_id = 9001
vm_username = "almalinux" # Default AlmaLinux user
```
**Ubuntu 22.04 LTS:**
```hcl
vm_os_type = "ubuntu"
template_vm_id = 9000
vm_username = "ubuntu"
```
**Key differences:**
- AlmaLinux: Better RHEL ecosystem, SELinux, dnf package manager
- Ubuntu: Wider community support, apt package manager
- Both support Docker, GPU passthrough, and NFS mounts
## Post-Deployment
### Deploy Homelab Services
```bash
# SSH to VM
ssh ubuntu@<VM-IP>
# Clone homelab repo (if not auto-cloned)
git clone https://github.com/efigueroa/homelab.git
cd homelab
# Deploy services
cd compose/core/traefik
docker compose up -d
cd ../lldap
docker compose up -d
# Continue with other services...
```
### Verify Setup
```bash
# Check Docker
docker --version
docker compose version
# Check network
docker network ls | grep homelab
# Check media directories and NFS mounts
ls -la /media
df -h | grep /media
# If GPU passthrough is enabled
nvidia-smi
lspci | grep -i nvidia
# Check system resources
htop
df -h
```
## Managing the VM
### View State
```bash
tofu show
tofu state list
```
### Update VM
1. Edit `terraform.tfvars`:
```hcl
vm_cores = 8 # Increase from 4
vm_memory = 16384 # Increase from 8192
```
2. Apply changes:
```bash
tofu plan
tofu apply
```
**Note:** Some changes require VM restart.
### Destroy VM
```bash
# Backup any data first!
tofu destroy
```
Type `yes` to confirm deletion.
## Troubleshooting
### Datastore Does Not Support Snippets
Error: `the datastore "local" does not support content type "snippets"`
**Cause:** The storage you specified doesn't have snippets enabled
**Solution 1 - Enable snippets on existing storage:**
```bash
# On Proxmox host, check current content types
pvesm status
# Enable snippets on local storage
pvesm set local --content backup,iso,vztmpl,snippets
# Verify
pvesm status | grep local
```
**Solution 2 - Create dedicated directory storage:**
```bash
# On Proxmox host
# Create directory for snippets
mkdir -p /var/lib/vz/snippets
# Add directory storage via Proxmox UI:
# Datacenter → Storage → Add → Directory
# ID: local-snippets
# Directory: /var/lib/vz/snippets
# Content: Snippets
# Or via CLI:
pvesm add dir local-snippets --path /var/lib/vz/snippets --content snippets
# Update terraform.tfvars:
# snippets_storage = "local-snippets"
```
### SSH Authentication Failed
Error: `failed to open SSH client: unable to authenticate`
**Cause:** The Proxmox provider needs SSH access to upload cloud-init files
**Solution 1 - Add SSH key to Proxmox (Recommended):**
```bash
# On your workstation, generate SSH key if you don't have one
ssh-keygen -t ed25519 -C "terraform@homelab"
# Copy to Proxmox host
ssh-copy-id root@proxmox.local
# Add key to ssh-agent
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519
# Verify
ssh-add -L
ssh root@proxmox.local "echo 'SSH works!'"
```
**Solution 2 - Use API token only (workaround):**
If SSH is problematic, you can create the cloud-init snippet manually:
```bash
# On Proxmox host, create the snippet
nano /var/lib/vz/snippets/cloud-init-docker-host.yaml
# Paste the cloud-init content from main.tf
# Then remove the proxmox_virtual_environment_file resource from main.tf
# and reference the file directly in the VM resource:
# user_data_file_id = "local:snippets/cloud-init-docker-host.yaml"
```
### Template Not Found
Error: `template with ID 9000 not found`
**Solution:** Create cloud template (see Prerequisites)
### API Permission Error
Error: `permission denied`
**Solution:** Check API token permissions:
```bash
pveum acl modify / -token 'root@pam!terraform' -role Administrator
```
### Cloud-Init Not Working
**Check cloud-init status:**
```bash
ssh ubuntu@<VM-IP>
sudo cloud-init status
sudo cat /var/log/cloud-init-output.log
```
### Docker Not Installed
**Manual installation:**
```bash
ssh ubuntu@<VM-IP>
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker ubuntu
```
### VM Won't Start
**Check Proxmox logs:**
```bash
# On Proxmox server
qm status <VM-ID>
tail -f /var/log/pve/tasks/active
```
### GPU Not Detected in VM
**Verify IOMMU is enabled:**
```bash
# On Proxmox host
dmesg | grep -e DMAR -e IOMMU
# Should show: "IOMMU enabled"
```
**Check GPU is available:**
```bash
# On Proxmox host
lspci | grep -i nvidia
lspci -n -s 01:00
# Verify it's not being used by host
lsmod | grep nvidia
# Should be empty (blacklisted)
```
**In VM, install drivers:**
```bash
# AlmaLinux
sudo dnf install -y epel-release
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
sudo dnf install -y nvidia-driver
# Ubuntu
sudo apt install -y nvidia-driver-535
sudo reboot
# Verify
nvidia-smi
```
### NFS Mounts Not Working
**On Proxmox host, verify NFS server:**
```bash
systemctl status nfs-server
showmount -e localhost
# Should list all /data/media/* exports
```
**In VM, test manual mount:**
```bash
# Install NFS client if missing
sudo apt install nfs-common # Ubuntu
sudo dnf install nfs-utils # AlmaLinux
# Test mount
sudo mount -t nfs 192.168.1.100:/data/media/movies /mnt
ls /mnt
sudo umount /mnt
```
**Check /etc/fstab in VM:**
```bash
cat /etc/fstab | grep nfs
# Should show all media directory mounts
```
**Firewall issues:**
```bash
# On Proxmox host, allow NFS
ufw allow from 192.168.1.0/24 to any port nfs
# Or disable firewall temporarily to test:
systemctl stop ufw
```
## Advanced Usage
### Multiple VMs
Create `docker-host-02.tfvars`:
```hcl
vm_name = "docker-host-02"
vm_ip_address = "192.168.1.101"
```
Deploy:
```bash
tofu apply -var-file="docker-host-02.tfvars"
```
### Custom Cloud-Init
Edit `main.tf` to add custom cloud-init sections:
```yaml
users:
- name: myuser
groups: sudo, docker
shell: /bin/bash
sudo: ALL=(ALL) NOPASSWD:ALL
packages:
- zsh
- tmux
- neovim
runcmd:
- sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"
```
### Attach Additional Disk
Add to `main.tf`:
```hcl
disk {
datastore_id = var.storage
size = 200
interface = "scsi1"
}
```
Mount in cloud-init:
```yaml
mounts:
- ["/dev/sdb", "/mnt/data", "ext4", "defaults", "0", "0"]
```
## Cost Analysis
**Resource Usage:**
- 4 cores, 8GB RAM, 50GB disk
- Running 24/7
**Homelab Cost:** $0 (uses existing hardware)
**If in cloud (comparison):**
- AWS: ~$50-100/month
- DigitalOcean: ~$40/month
- Linode: ~$40/month
**Homelab ROI:** Pays for itself in ~2-3 months!
## Security Hardening
### Enable Firewall
Add to cloud-init:
```yaml
runcmd:
- ufw default deny incoming
- ufw default allow outgoing
- ufw allow ssh
- ufw allow 80/tcp
- ufw allow 443/tcp
- ufw --force enable
```
### Disable Password Authentication
After SSH key setup:
```yaml
ssh_pwauth: false
```
### Automatic Updates
Already enabled in cloud-init. Verify:
```bash
sudo systemctl status unattended-upgrades
```
## Next Steps
1. ✅ Deploy core services (Traefik, LLDAP, Tinyauth)
2. ✅ Configure SSL certificates
3. ✅ Deploy media services
4. ✅ Set up backups (Restic)
5. ✅ Add monitoring (Prometheus/Grafana)
## Resources
- [OpenTofu Docs](https://opentofu.org/docs/)
- [Proxmox Provider](https://registry.terraform.io/providers/bpg/proxmox/latest/docs)
- [Cloud-Init Docs](https://cloudinit.readthedocs.io/)
- [Docker Docs](https://docs.docker.com/)