homelab/docs/troubleshooting/common-issues.md
Claude 4adaa8e8be
docs: Add comprehensive documentation for homelab setup and operations
This commit adds extensive documentation covering all aspects of homelab setup,
configuration, and troubleshooting.

## Documentation Structure

### Main Documentation
- **docs/README.md**: Documentation hub with table of contents
- **docs/getting-started.md**: Complete setup guide from scratch
- **docs/quick-reference.md**: Fast reference for common tasks and commands

### Configuration Guides (docs/guides/)
- **secrets-management.md**: Environment variables and secrets configuration
  - How to generate secure secrets
  - Service-specific configuration
  - Automated secret generation scripts
  - Security best practices
  - Common mistakes to avoid

- **gpu-setup.md**: NVIDIA GTX 1070 GPU acceleration setup
  - Specific to Proxmox 9 on Debian 13
  - Complete passthrough configuration
  - Jellyfin hardware transcoding setup
  - Immich ML inference acceleration
  - Performance tuning and benchmarks
  - Troubleshooting GPU issues

### Troubleshooting (docs/troubleshooting/)
- **faq.md**: Frequently asked questions (60+ Q&A)
  - General questions about the homelab
  - Setup and configuration questions
  - SSL/TLS and SSO questions
  - Service-specific questions
  - Security and backup questions
  - Performance optimization

- **common-issues.md**: Common problems and solutions
  - Service startup failures
  - SSL certificate errors
  - SSO authentication issues
  - Access problems
  - Performance issues
  - Database errors
  - Network issues
  - GPU problems

### Services (docs/services/)
- **README.md**: Complete service overview
  - All 20 services with descriptions
  - Use cases for each service
  - Resource requirements
  - Deployment checklists
  - Service dependencies
  - Minimum viable setups

## Key Features

### Environment-Specific
All GPU documentation is specific to:
- **Platform**: Proxmox 9 (PVE)
- **OS**: Debian 13
- **GPU**: NVIDIA GTX 1070 (Pascal)
- Includes Proxmox-specific GPU passthrough
- VM guest setup on Debian 13
- NVIDIA Container Toolkit configuration

### Comprehensive Coverage
- 60+ FAQs answered
- 50+ common issues documented
- 100+ command examples
- Step-by-step procedures
- Troubleshooting decision trees
- Quick reference tables

### Practical Examples
- Actual command outputs
- Real-world scenarios
- Copy-paste ready commands
- Configuration file examples
- Debugging procedures

## Documentation Highlights

### Getting Started Guide
- Prerequisites checklist
- Docker installation
- Media directory setup
- DNS configuration
- Environment variable setup
- Service deployment order
- Initial service configuration
- Verification procedures

### Secrets Management
- Secret type identification
- Generation commands for each type
- Service-specific requirements
- Automated generation script
- Password manager integration
- Backup procedures
- Security best practices
- Common mistakes

### GPU Setup (Proxmox/Debian/GTX 1070)
- IOMMU enablement
- VFIO configuration
- PCI passthrough to VM
- NVIDIA driver installation on Debian 13
- Container toolkit setup
- Jellyfin NVENC configuration
- Immich CUDA acceleration
- Performance benchmarks
- NVENC stream limit unlock
- Monitoring and tuning

### Quick Reference
- All service URLs
- Common Docker Compose commands
- System check commands
- Secret generation commands
- Troubleshooting steps
- File locations
- Port reference
- Emergency procedures

### FAQ
Covers questions about:
- Hardware requirements
- Domain requirements
- Cost estimates
- Setup procedures
- Configuration details
- SSL certificates
- SSO authentication
- Service-specific issues
- Backup strategies
- Performance optimization
- Security considerations

### Common Issues
Solutions for:
- Container startup failures
- Environment variable errors
- Port conflicts
- Permission issues
- SSL certificate problems
- DNS issues
- SSO login failures
- Database connections
- Network connectivity
- GPU detection
- Resource constraints

### Services Overview
- Detailed description of all 20 services
- Use cases and features
- Required vs optional services
- Resource requirements by tier
- Service dependencies diagram
- Deployment checklists
- "When to use" guidance

## File Structure

```
docs/
├── README.md                           # Documentation hub
├── getting-started.md                  # Setup walkthrough
├── quick-reference.md                  # Command reference
├── guides/
│   ├── secrets-management.md           # Secrets configuration
│   └── gpu-setup.md                    # GPU acceleration (GTX 1070)
├── troubleshooting/
│   ├── faq.md                          # 60+ FAQs
│   └── common-issues.md                # Problem solving
└── services/
    └── README.md                       # Service overview
```

## Benefits

### For New Users
- Clear setup path from zero to running services
- Explains "why" not just "how"
- Common pitfalls documented and avoided
- Example configurations provided

### For Experienced Users
- Quick reference for commands
- Troubleshooting decision trees
- Performance tuning guides
- Advanced configurations

### For Maintenance
- Update procedures
- Backup and restore
- Monitoring guidelines
- Security hardening

## Documentation Standards

- Clear, concise writing
- Code blocks with syntax highlighting
- Examples with expected output
- Warning and tip callouts
- Cross-references between docs
- Tested commands and procedures

## Next Steps

Users should:
1. Start with getting-started.md
2. Configure secrets using secrets-management.md
3. Enable GPU if available (gpu-setup.md)
4. Use quick-reference.md for daily operations
5. Refer to faq.md and common-issues.md when stuck

---

**This documentation makes the homelab accessible to users of all skill levels!**
2025-11-06 19:32:10 +00:00

12 KiB

Common Issues and Solutions

This guide covers the most common problems you might encounter and how to fix them.

Table of Contents

Service Won't Start

Symptom

Container exits immediately or shows "Exited (1)" status.

Diagnosis

cd ~/homelab/compose/path/to/service

# Check container status
docker compose ps

# View logs
docker compose logs

# Check for specific errors
docker compose logs | grep -i error

Common Causes and Fixes

1. Environment Variables Not Set

Error in logs:

Error: POSTGRES_PASSWORD is not set
Error: required environment variable 'XXX' is missing

Fix:

# Check .env file exists
ls -la .env

# Check for changeme_ values
grep "changeme_" .env

# Update with proper secrets (see secrets guide)
nano .env

# Restart
docker compose up -d

2. Port Already in Use

Error in logs:

Error: bind: address already in use
Error: failed to bind to port 80: address already in use

Fix:

# Find what's using the port
sudo netstat -tulpn | grep :80
sudo netstat -tulpn | grep :443

# Stop conflicting service
sudo systemctl stop apache2  # Example
sudo systemctl stop nginx    # Example

# Or change port in compose.yaml

3. Network Not Created

Error in logs:

network homelab declared as external, but could not be found

Fix:

# Create network
docker network create homelab

# Verify
docker network ls | grep homelab

# Restart service
docker compose up -d

4. Volume Permission Issues

Error in logs:

Permission denied: '/config'
mkdir: cannot create directory '/data': Permission denied

Fix:

# Check directory ownership
ls -la ./config ./data

# Fix ownership (replace 1000:1000 with your UID:GID)
sudo chown -R 1000:1000 ./config ./data

# Restart
docker compose up -d

5. Dependency Not Running

Error in logs:

Failed to connect to database
Connection refused: postgres:5432

Fix:

# Start dependency first
cd ~/homelab/compose/path/to/dependency
docker compose up -d

# Wait for it to be healthy
docker compose logs -f

# Then start the service
cd ~/homelab/compose/path/to/service
docker compose up -d

SSL/TLS Certificate Errors

Symptom

Browser shows "Your connection is not private" or "NET::ERR_CERT_AUTHORITY_INVALID"

Diagnosis

# Check Traefik logs
docker logs traefik | grep -i certificate
docker logs traefik | grep -i letsencrypt
docker logs traefik | grep -i error

# Test certificate
echo | openssl s_client -servername home.fig.systems -connect home.fig.systems:443 2>/dev/null | openssl x509 -noout -dates

Common Causes and Fixes

1. DNS Not Configured

Fix:

# Test DNS resolution
dig home.fig.systems +short

# Should return your server's IP
# If not, configure DNS A records:
# *.fig.systems -> YOUR_SERVER_IP

2. Port 80 Not Accessible

Let's Encrypt needs port 80 for HTTP-01 challenge.

Fix:

# Test from external network
curl -I http://home.fig.systems

# Check firewall
sudo ufw status
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Check port forwarding on router
# Ensure ports 80 and 443 are forwarded to server

3. Rate Limiting

Let's Encrypt has limits: 5 certificates per domain per week.

Fix:

# Check Traefik logs for rate limit errors
docker logs traefik | grep -i "rate limit"

# Wait for rate limit to reset (1 week)
# Or use Let's Encrypt staging environment for testing

# Enable staging in traefik/compose.yaml:
# - --certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory

4. First Startup - Certificates Not Yet Generated

Fix:

# Wait 2-5 minutes for certificate generation
docker logs traefik -f

# Look for:
# "Certificate obtained for domain"

5. Certificate Expired

Traefik should auto-renew, but if manual renewal needed:

Fix:

# Remove old certificates
cd ~/homelab/compose/core/traefik
rm -rf ./acme.json

# Restart Traefik
docker compose restart

# Wait for new certificates
docker logs traefik -f

SSO Authentication Issues

Symptom

  • Can't login to SSO-protected services
  • Redirected to auth page but login fails
  • "Invalid credentials" error

Diagnosis

# Check LLDAP is running
docker ps | grep lldap

# Check Tinyauth is running
docker ps | grep tinyauth

# View logs
docker logs lldap
docker logs tinyauth

Common Causes and Fixes

1. Password Mismatch

LDAP_BIND_PASSWORD must match LLDAP_LDAP_USER_PASS.

Fix:

# Check both passwords
grep LLDAP_LDAP_USER_PASS ~/homelab/compose/core/lldap/.env
grep LDAP_BIND_PASSWORD ~/homelab/compose/core/tinyauth/.env

# They must be EXACTLY the same!

# If different, update tinyauth/.env
cd ~/homelab/compose/core/tinyauth
nano .env
# Set LDAP_BIND_PASSWORD to match LLDAP_LDAP_USER_PASS

# Restart Tinyauth
docker compose restart

2. User Doesn't Exist in LLDAP

Fix:

# Access LLDAP web UI
# Go to: https://lldap.fig.systems

# Login with admin credentials
# Username: admin
# Password: <your LLDAP_LDAP_USER_PASS>

# Create user:
# - Click "Create user"
# - Set username, email, password
# - Add to "lldap_admin" group

# Try logging in again

3. LLDAP or Tinyauth Not Running

Fix:

# Start LLDAP
cd ~/homelab/compose/core/lldap
docker compose up -d

# Wait for it to be ready
docker compose logs -f

# Start Tinyauth
cd ~/homelab/compose/core/tinyauth
docker compose up -d
docker compose logs -f

4. Network Issue Between Tinyauth and LLDAP

Fix:

# Test connection
docker exec tinyauth nc -zv lldap 3890

# Should show: Connection to lldap 3890 port [tcp/*] succeeded!

# If not, check both are on homelab network
docker network inspect homelab

Access Issues

Symptom

  • Can't access service from browser
  • Connection timeout
  • "This site can't be reached"

Diagnosis

# Test from server
curl -I https://home.fig.systems

# Test DNS
dig home.fig.systems +short

# Check container is running
docker ps | grep servicename

# Check Traefik routing
docker logs traefik | grep servicename

Common Causes and Fixes

1. Service Not Running

Fix:

cd ~/homelab/compose/path/to/service
docker compose up -d
docker compose logs -f

2. Traefik Not Running

Fix:

cd ~/homelab/compose/core/traefik
docker compose up -d
docker compose logs -f

3. DNS Not Resolving

Fix:

# Check DNS
dig service.fig.systems +short

# Should return your server IP
# If not, add/update DNS A record

4. Firewall Blocking

Fix:

# Check firewall
sudo ufw status

# Allow if needed
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

5. Wrong Traefik Labels

Fix:

# Check compose.yaml has correct labels
cd ~/homelab/compose/path/to/service
cat compose.yaml | grep -A 10 "labels:"

# Should have:
# traefik.enable: true
# traefik.http.routers.servicename.rule: Host(`service.fig.systems`)
# etc.

Performance Problems

Symptom

  • Services running slowly
  • High CPU/RAM usage
  • System unresponsive

Diagnosis

# Overall system
htop

# Docker resources
docker stats

# Disk usage
df -h
docker system df

Common Causes and Fixes

1. Insufficient RAM

Fix:

# Check RAM usage
free -h

# If low, either:
# 1. Add more RAM
# 2. Stop unused services
# 3. Add resource limits to compose files

# Example resource limit:
deploy:
  resources:
    limits:
      memory: 2G
    reservations:
      memory: 1G

2. Disk Full

Fix:

# Check disk usage
df -h

# Clean Docker
docker system prune -a

# Remove old logs
sudo journalctl --vacuum-time=7d

# Check media folder
du -sh /media/*

3. Too Many Services Running

Fix:

# Stop unused services
cd ~/homelab/compose/services/unused-service
docker compose down

# Or deploy only what you need

4. Database Not Optimized

Fix:

# For postgres services, add to .env:
POSTGRES_INITDB_ARGS=--data-checksums

# Increase shared buffers (if enough RAM):
# Edit compose.yaml, add to postgres:
command: postgres -c shared_buffers=256MB -c max_connections=200

Database Errors

Symptom

  • "Connection refused" to database
  • "Authentication failed for user"
  • "Database does not exist"

Diagnosis

# Check database container
docker ps | grep postgres

# View database logs
docker logs <postgres_container_name>

# Test connection from app
docker exec <app_container> nc -zv <db_container> 5432

Common Causes and Fixes

1. Password Mismatch

Fix:

# Check passwords match in .env
cat .env | grep PASSWORD

# For example, in Vikunja:
# VIKUNJA_DATABASE_PASSWORD and POSTGRES_PASSWORD must match!

# Update if needed
nano .env
docker compose down
docker compose up -d

2. Database Not Initialized

Fix:

# Remove database and reinitialize
docker compose down
rm -rf ./db/  # CAREFUL: This deletes all data!
docker compose up -d

3. Database Still Starting

Fix:

# Wait for database to be ready
docker logs <postgres_container> -f

# Look for "database system is ready to accept connections"

# Then restart app
docker compose restart <app_service>

Network Issues

Symptom

  • Containers can't communicate
  • "Connection refused" between services

Diagnosis

# Inspect network
docker network inspect homelab

# Test connectivity
docker exec container1 ping container2
docker exec container1 nc -zv container2 PORT

Common Causes and Fixes

1. Containers Not on Same Network

Fix:

# Check compose.yaml has networks section
networks:
  homelab:
    external: true

# Ensure service is using the network
services:
  servicename:
    networks:
      - homelab

2. Network Doesn't Exist

Fix:

docker network create homelab
docker compose up -d

3. DNS Resolution Between Containers

Fix:

# Use container name, not localhost
# Wrong: http://localhost:5432
# Right:  http://postgres:5432

# Or use service name from compose.yaml

GPU Problems

Symptom

  • "No hardware acceleration available"
  • GPU not detected in container
  • "Failed to open GPU"

Diagnosis

# Check GPU on host
nvidia-smi

# Check GPU in container
docker exec jellyfin nvidia-smi

# Check Docker GPU runtime
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

Common Causes and Fixes

1. NVIDIA Container Toolkit Not Installed

Fix:

# Install toolkit
sudo apt install nvidia-container-toolkit

# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Restart Docker
sudo systemctl restart docker

2. Runtime Not Specified in Compose

Fix:

# Edit compose.yaml
nano compose.yaml

# Uncomment:
runtime: nvidia
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]

# Restart
docker compose up -d

3. GPU Already in Use

Fix:

# Check processes using GPU
nvidia-smi

# Kill process if needed
sudo kill <PID>

# Restart service
docker compose restart

4. GPU Not Passed Through to VM (Proxmox)

Fix:

# From Proxmox host, check GPU passthrough
lspci | grep -i nvidia

# From VM, check GPU visible
lspci | grep -i nvidia

# If not visible, reconfigure passthrough (see GPU guide)

Getting More Help

If your issue isn't listed here:

  1. Check service-specific logs:

    cd ~/homelab/compose/path/to/service
    docker compose logs --tail=200
    
  2. Search container logs for errors:

    docker compose logs | grep -i error
    docker compose logs | grep -i fail
    
  3. Check FAQ: See FAQ

  4. Debugging Guide: See Debugging Guide

  5. Service Documentation: Check service's official documentation


Most issues can be solved by checking logs and environment variables!