Skip to content

Latest commit

 

History

History
1035 lines (795 loc) · 22.9 KB

File metadata and controls

1035 lines (795 loc) · 22.9 KB

Troubleshooting Guide

Complete troubleshooting guide for XaresAICoder platform issues.

Table of Contents

Quick Diagnostics

Health Check Commands

# 1. Check system health
curl http://localhost/api/health

# 2. Check Docker status
docker info
docker compose ps

# 3. Check service logs
docker compose logs --tail=50

# 4. Check container resources
docker stats --no-stream

# 5. Check network connectivity
docker network inspect xares-aicoder-network

Expected Healthy Output

// Health check response
{
  "status": "ok",
  "timestamp": "2024-01-15T14:30:00.000Z",
  "version": "4.2.0",
  "uptime": "2d 14h 22m"
}
# Docker compose status
NAME                    COMMAND                  SERVICE   STATUS    PORTS
xaresaicoder-nginx-1    "/docker-entrypoint.…"   nginx     running   0.0.0.0:80->80/tcp
xaresaicoder-server-1   "docker-entrypoint.s…"   server    running   3000/tcp

Installation Issues

Docker Not Found

Symptoms:

$ ./deploy.sh
deploy.sh: line 55: docker: command not found

Solutions:

  1. Install Docker:

    # Ubuntu/Debian
    curl -fsSL https://get.docker.com -o get-docker.sh
    sudo sh get-docker.sh
    
    # Add user to docker group
    sudo usermod -aG docker $USER
    newgrp docker
  2. Verify Installation:

    docker --version
    docker info

Docker Compose Not Found

Symptoms:

[ERROR] Docker Compose is not available

Solutions:

  1. Docker Compose v2 (Recommended):

    # Usually included with Docker Desktop
    # Or install Docker Engine with Compose plugin
    sudo apt-get install docker-compose-plugin
  2. Docker Compose v1 (Legacy):

    sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose

Permission Denied Errors

Symptoms:

Got permission denied while trying to connect to the Docker daemon socket

Solutions:

# Add user to docker group
sudo usermod -aG docker $USER

# Apply group membership
newgrp docker

# Or run with sudo (not recommended for production)
sudo ./deploy.sh

Network Overlap Errors

Symptoms:

Error response from daemon: Pool overlaps with other one on this address space

Solutions:

# 1. Check existing networks
docker network ls

# 2. Find conflicting subnet
docker network inspect bridge

# 3. Edit network configuration
nano setup-network.sh

# Change subnet from 172.19.0.0/16 to available range:
NETWORK_SUBNET="172.21.0.0/16"  # or 172.18.0.0/16

# 4. Remove old network and recreate
docker network rm xares-aicoder-network
./setup-network.sh

Workspace Creation Problems

Code-Server Image Missing

Symptoms:

Error: No such image: xares-aicoder-codeserver:latest

Solutions:

# Build the code-server image
cd code-server
docker build -t xares-aicoder-codeserver:latest .
cd ..

# Or use deploy script
./deploy.sh --build-only

Workspace Creation Timeout

Symptoms:

  • Workspace stuck in "creating" status
  • No workspace URL returned after 60+ seconds

Diagnosis:

# Check for failed containers
docker ps -a | grep workspace-

# Check container logs
docker logs workspace-PROJECT_ID

# Check system resources
docker stats
df -h

Solutions:

  1. Resource Issues:

    # Free up disk space
    docker system prune -a
    
    # Free up memory
    docker stop $(docker ps -q)
    docker system prune -f
  2. Network Issues:

    # Restart Docker daemon
    sudo systemctl restart docker
    
    # Recreate network
    docker network rm xares-aicoder-network
    ./setup-network.sh
  3. Template Issues:

    # Check template script logs
    docker logs workspace-PROJECT_ID | grep setup
    
    # Test template manually
    docker run -it xares-aicoder-codeserver:latest bash

Java Spring Boot Slow Creation

Symptoms:

  • Java projects take 30-60 seconds to create
  • Other templates work fine

Explanation: This is normal behavior. Java Spring Boot template:

  • Downloads Maven dependencies
  • Compiles initial project
  • Sets up comprehensive project structure

Solutions:

  • Wait patiently (up to 2 minutes)
  • Check logs: docker logs workspace-PROJECT_ID
  • Monitor progress in VS Code terminal when it opens

Container Management Issues

Cannot Start Workspace

Symptoms:

{
  "success": false,
  "error": "Failed to start workspace container"
}

Diagnosis:

# Check if container exists
docker ps -a | grep workspace-PROJECT_ID

# Check container status
docker inspect workspace-PROJECT_ID

# Try manual start
docker start workspace-PROJECT_ID

Solutions:

  1. Container Exited:

    # Check exit reason
    docker logs workspace-PROJECT_ID
    
    # Remove and recreate
    docker rm workspace-PROJECT_ID
    # Then create new workspace through UI
  2. Container crash-loops with "resource temporarily unavailable":

    This happens when multiple workspace containers share UID 1000 and the combined thread count exceeds the nproc ulimit. RLIMIT_NPROC counts threads per-UID system-wide — not per container. With 14+ running workspaces this easily exceeds a limit of 2048.

    # Check total thread count for UID 1000
    ps -eLf | awk '$1 == "workshop"' | wc -l
    
    # Verify the container has no nproc ulimit (should only show nofile)
    docker inspect workspace-PROJECT_ID --format '{{json .HostConfig.Ulimits}}'

    If the container was created with an nproc ulimit, remove it and recreate via UI:

    docker rm -f workspace-PROJECT_ID
    # Then recreate via UI
  3. Resource Constraints:

    # Check available resources
    docker system df
    free -h
    
    # Clean up unused resources
    docker system prune -a
  4. Network Issues:

    # Check network connectivity
    docker network inspect xares-aicoder-network
    
    # Reconnect to network
    docker network disconnect xares-aicoder-network workspace-PROJECT_ID
    docker network connect xares-aicoder-network workspace-PROJECT_ID

Cannot Stop Workspace

Symptoms:

  • Stop button doesn't work
  • API returns timeout error
  • Container still running after stop command

Solutions:

  1. Force Stop:

    # Force stop container
    docker stop workspace-PROJECT_ID --time 10
    
    # If still running, force kill
    docker kill workspace-PROJECT_ID
  2. Check for Stuck Processes:

    # Check processes in container
    docker exec workspace-PROJECT_ID ps aux
    
    # Kill specific processes if needed
    docker exec workspace-PROJECT_ID pkill -f node
    docker exec workspace-PROJECT_ID pkill -f python

Workspace Status Stuck

Symptoms:

  • UI shows "creating" but workspace is running
  • Status doesn't update after operations

Solutions:

# Refresh page and check API directly
curl http://localhost/api/projects/PROJECT_ID

# Check actual container status
docker ps | grep workspace-PROJECT_ID

# Restart API server if needed
docker compose restart server

Network and Connectivity

502 Bad Gateway

Symptoms:

  • nginx returns 502 error
  • Cannot access main application or workspaces

Diagnosis:

# Check nginx logs
docker compose logs nginx

# Check if backend services are running
docker compose ps

# Check service connectivity
docker exec xaresaicoder-nginx-1 curl http://server:3000/api/health

Solutions:

  1. Backend Service Down:

    # Restart server
    docker compose restart server
    
    # Check server logs
    docker compose logs server
  2. Network Issues:

    # Restart all services
    docker compose down
    docker compose up -d
    
    # Check network connectivity
    docker network inspect xares-aicoder-network
  3. Configuration Issues:

    # Check nginx configuration
    docker exec xaresaicoder-nginx-1 nginx -t
    
    # Reload nginx
    docker compose exec nginx nginx -s reload

Service Unavailable (503)

Symptoms:

  • HTTP 503 errors
  • Services appear to be running

Solutions:

# Check service health
curl http://localhost/api/health

# Check resource usage
docker stats

# Check for memory/disk exhaustion
free -h
df -h

# Restart services if needed
docker compose restart

Port Forwarding Problems

VS Code Port Detection Not Working

Symptoms:

  • Development server running but VS Code doesn't show port
  • No "Open in Browser" notification

Solutions:

  1. Check VS Code Settings:

    // In workspace .vscode/settings.json
    {
      "remote.autoForwardPorts": true,
      "remote.portsAttributes": {
        "5000": {
          "label": "Flask App",
          "onAutoForward": "openBrowserOnce"
        }
      }
    }
  2. Manual Port Forwarding:

    # In VS Code terminal
    # Start your application bound to all interfaces
    python app.py  # Flask
    npm run dev    # React/Vite
    mvn spring-boot:run  # Spring Boot
  3. Check Application Binding:

    # Flask - bind to all interfaces
    app.run(host='0.0.0.0', port=5000)
    
    # Not just localhost
    # app.run(host='127.0.0.1', port=5000)  # Wrong!

Subdomain URLs Not Working

Symptoms:

  • VS Code shows port forwarded but subdomain returns 404
  • URLs like projectid-5000.localhost don't work

Diagnosis:

# Check nginx configuration
docker exec xaresaicoder-nginx-1 cat /etc/nginx/nginx.conf

# Check if nginx sees the subdomain request
docker compose logs nginx | grep projectid-5000

# Test container connectivity
docker exec xaresaicoder-nginx-1 curl http://workspace-projectid:5000

Solutions:

  1. DNS Resolution:

    # Test local DNS resolution
    nslookup projectid-5000.localhost
    
    # Add to /etc/hosts if needed (usually not required)
    echo "127.0.0.1 projectid-5000.localhost" >> /etc/hosts
  2. Check Container Network:

    # Verify container is on correct network
    docker inspect workspace-projectid | grep NetworkMode
    
    # Reconnect to network if needed
    docker network connect xares-aicoder-network workspace-projectid
  3. Application Configuration:

    # Ensure app binds to all interfaces, not just localhost
    # Check in VS Code terminal:
    netstat -tuln | grep :5000
    
    # Should show 0.0.0.0:5000, not 127.0.0.1:5000

Application Returns 404

Symptoms:

  • Subdomain URL loads but application returns 404
  • Application works in container terminal but not via browser

Solutions:

  1. Check Application Routes:

    # Flask example - ensure route handles all paths
    @app.route('/')
    @app.route('/<path:path>')
    def catch_all(path=''):
        return render_template('index.html')
  2. Check Base Path Configuration:

    // React/Vite - check base URL configuration
    // vite.config.js
    export default {
      base: '/',  // Ensure base is root
      server: {
        host: '0.0.0.0',
        port: 3000
      }
    }
  3. Check Proxy Headers:

    # Flask - handle proxy headers if needed
    from werkzeug.middleware.proxy_fix import ProxyFix
    app.wsgi_app = ProxyFix(app.wsgi_app)

"Unknown subdomain" 404 (Catch-all)

Symptoms:

  • Visiting <something>.<BASE_DOMAIN> returns HTTP 404 with body Unknown subdomain. The alias may have been deleted or was never created.
  • Used to return the platform frontend before; now returns 404.

Cause: nginx has a default_server catch-all server block. Any Host header that doesn't match the frontend (<BASE_DOMAIN>), a workspace UUID (<uuid>.<BASE_DOMAIN>), a workspace port subdomain (<uuid>-<port>.<BASE_DOMAIN>), or a registered alias falls through to this 404. This is intentional — it prevents random subdomain scans from hitting the auth-protected frontend.

Diagnosis:

# Is the alias actually registered?
curl http://localhost/api/projects/<projectId>/aliases

# Is it in nginx's dynamic config?
docker exec xaresaicoder-nginx cat /etc/nginx/dynamic/aliases.conf | grep server_name

# Did nginx reload after the last alias change?
docker logs xaresaicoder-nginx 2>&1 | grep -E "reloading|signal" | tail -5

Solutions:

  1. Create / re-create the alias via the Aliases UI on the workspace card.
  2. Force nginx reload if you suspect a stale state:
    docker exec xaresaicoder-nginx nginx -t && docker exec xaresaicoder-nginx nginx -s reload
  3. Confirm the subdomain is the one you expect — aliases are case-sensitive (lowercase only) and limited to [a-z][a-z0-9-]+[a-z0-9].

Custom Alias Routes But Returns 502/401 Even With Right Credentials

Symptoms:

  • The alias myapp.<BASE_DOMAIN> returns 502 Bad Gateway instead of the expected app.
  • Or: returns 401 even though Basic Auth is disabled.

Causes & Fixes:

Code Likely cause Fix
502 Workspace stopped, or no listener on configured port Start workspace; verify netstat -tln | grep <port> shows 0.0.0.0:<port>
502 (intermittent) nginx reloaded mid-request Retry; permanent 502 indicates app crash
401 with no auth configured Stale config / nginx didn't reload docker exec xaresaicoder-nginx nginx -s reload; check aliases.conf actually omits auth_basic
401 with right credentials htpasswd contains a hash format nginx-alpine can't parse Re-create the alias — server now writes {SHA}<base64> which is always supported

Password Protection Issues

Cannot Access Protected Workspace

Symptoms:

  • VS Code shows authentication prompt
  • Correct password doesn't work
  • Gets rejected repeatedly

Solutions:

  1. Password Case Sensitivity:

    # Passwords are case-sensitive
    # Check exact password from creation response
    curl http://localhost/api/projects/PROJECT_ID
  2. Browser Cache Issues:

    # Clear browser cache
    # Or use incognito/private mode
    # Or try different browser
  3. Container Restart Required:

    # Restart workspace container
    docker restart workspace-PROJECT_ID
    
    # Wait for container to be ready
    curl http://PROJECT_ID.localhost/

Invalid Password for Stop/Delete

Symptoms:

{
  "success": false,
  "error": "Invalid password for password-protected workspace"
}

Solutions:

  1. Check Password in Request:

    # Ensure password is in request body
    curl -X POST http://localhost/api/projects/PROJECT_ID/stop \
      -H "Content-Type: application/json" \
      -d '{"password": "correct-password-here"}'
  2. Check Workspace Protection Status:

    # Verify workspace is actually password protected
    curl http://localhost/api/projects/PROJECT_ID
    # Look for "passwordProtected": true
  3. Server Memory Issues:

    # Password might be lost if server restarted
    # Check server uptime
    curl http://localhost/api/health
    
    # If server restarted, workspace passwords are lost
    # Delete workspace and recreate if needed

Performance Problems

High Memory Usage

Symptoms:

  • System becomes slow
  • Out of memory errors
  • Containers being killed

Diagnosis:

# Check memory usage
free -h
docker stats --no-stream

# Check which containers use most memory
docker stats --format "table {{.Container}}\t{{.MemUsage}}\t{{.MemPerc}}"

# Check for memory leaks
docker system df

Solutions:

  1. Resource Cleanup:

    # Clean unused containers and images
    docker system prune -a
    
    # Stop unused workspaces
    docker stop $(docker ps -q --filter "name=workspace-")
  2. Adjust Resource Limits:

    # Edit docker-compose.yml
    deploy:
      resources:
        limits:
          memory: 2G  # Reduce from 4G
  3. System Optimization:

    # Increase swap if needed
    sudo fallocate -l 2G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfile

High CPU Usage

Symptoms:

  • System becomes unresponsive
  • High load averages
  • Containers using excessive CPU

Solutions:

  1. Identify CPU-Heavy Containers:

    # Find high CPU containers
    docker stats --format "table {{.Container}}\t{{.CPUPerc}}"
    
    # Check processes in container
    docker exec CONTAINER_ID top
  2. Reduce CPU Limits:

    # Adjust in docker-compose.yml
    deploy:
      resources:
        limits:
          cpus: '1.0'  # Reduce from 2.0
  3. Check for Runaway Processes:

    # Check for infinite loops or stuck processes
    docker exec workspace-PROJECT_ID ps aux
    
    # Kill problematic processes
    docker exec workspace-PROJECT_ID pkill -f problematic-process

Slow Workspace Creation

Symptoms:

  • Workspaces take very long to create
  • Timeout errors during creation

Solutions:

  1. Pre-pull Images:

    # Pre-pull base images to speed up creation
    docker pull node:18-alpine
    docker pull python:3.11-alpine
    docker pull openjdk:17-jdk-alpine
  2. Optimize Docker Build Cache:

    # Rebuild with build cache
    docker build --cache-from xares-aicoder-codeserver:latest \
      -t xares-aicoder-codeserver:latest .
  3. Check Network Speed:

    # Test internet connectivity for package downloads
    curl -o /dev/null -s -w "%{time_total}\n" https://registry.npmjs.org/

Git Server Issues

Forgejo Not Starting

Symptoms:

  • Git server URLs return 502 error
  • Forgejo container not running

Diagnosis:

# Check if Forgejo container is running
docker ps | grep forgejo

# Check Forgejo logs
docker compose logs forgejo

# Check if profile is enabled
grep ENABLE_GIT_SERVER .env

Solutions:

  1. Enable Git Server Profile:

    # Ensure Git server is enabled
    echo "ENABLE_GIT_SERVER=true" >> .env
    
    # Deploy with Git server
    ./deploy.sh --git-server
  2. Check Volume Permissions:

    # Check Forgejo data volume
    docker volume inspect xaresaicoder_forgejo_data
    
    # Fix permissions if needed
    docker run --rm -v xaresaicoder_forgejo_data:/data alpine \
      chown -R 1000:1000 /data

Git Repository Creation Fails

Symptoms:

  • Workspace creation succeeds but Git repo creation fails
  • "createGitRepo": true but no repository created

Solutions:

  1. Check Forgejo API:

    # Test Forgejo API connectivity
    curl -u developer:admin123! \
      http://localhost/git/api/v1/user
  2. Check Admin User:

    # Verify admin user exists
    docker exec xaresaicoder-forgejo-1 \
      forgejo admin user list
  3. Manual Repository Creation:

    # Create repository manually
    curl -u developer:admin123! -X POST \
      -H "Content-Type: application/json" \
      -d '{"name":"test-repo","private":false}' \
      http://localhost/git/api/v1/user/repos

AI Tools Problems

API Key Issues

Symptoms:

  • AI tools return authentication errors
  • "Invalid API key" messages

Solutions:

  1. Check Environment Variables:

    # In workspace terminal
    echo $OPENAI_API_KEY
    echo $ANTHROPIC_API_KEY
    echo $GEMINI_API_KEY
  2. Set API Keys:

    # Set in workspace
    export OPENAI_API_KEY=your_key_here
    export ANTHROPIC_API_KEY=your_key_here
    
    # Make permanent
    echo 'export OPENAI_API_KEY=your_key' >> ~/.bashrc
  3. Test API Connectivity:

    # Test OpenAI API
    curl https://api.openai.com/v1/models \
      -H "Authorization: Bearer $OPENAI_API_KEY"

Tool Installation Issues

Symptoms:

  • command not found errors for AI tools
  • Setup scripts fail

Solutions:

  1. Reinstall Tools (sudo required):

    sudo update_aider
    update_opencode
    sudo update_gemini
    # See `info` for the full list of update commands
  2. Check PATH:

    # Verify tools are in PATH
    which aider
    which opencode
    
    # Add to PATH if needed
    export PATH="$HOME/.local/bin:$PATH"
  3. Manual Installation:

    # Install aider manually
    pip install aider-chat
    
    # Install OpenCode SST manually
    curl -sSL https://install.opencodesst.com | bash

System Recovery

Complete System Reset

When to Use:

  • Multiple persistent issues
  • Corrupted system state
  • Major configuration problems

Steps:

# 1. Stop all services
docker compose down -v

# 2. Remove all containers and images
docker system prune -a

# 3. Remove networks
docker network prune

# 4. Remove volumes (WARNING: This deletes all workspace data)
docker volume prune

# 5. Clean repository
git clean -fd
git reset --hard HEAD

# 6. Fresh deployment
./deploy.sh

Backup and Restore

Before Major Changes:

# Backup configuration
cp .env .env.backup
cp docker-compose.yml docker-compose.yml.backup

# Backup workspace data
docker run --rm -v xaresaicoder_workspace_data:/data \
  -v $(pwd):/backup alpine \
  tar czf /backup/workspace-backup.tar.gz /data

# Backup Git server data
docker run --rm -v xaresaicoder_forgejo_data:/data \
  -v $(pwd):/backup alpine \
  tar czf /backup/forgejo-backup.tar.gz /data

Restore from Backup:

# Restore configuration
cp .env.backup .env

# Restore workspace data
docker run --rm -v xaresaicoder_workspace_data:/data \
  -v $(pwd):/backup alpine \
  tar xzf /backup/workspace-backup.tar.gz -C /

# Restart services
docker compose up -d

Getting Additional Help

  1. Check Logs First:

    docker compose logs --tail=100
  2. System Information:

    # Gather system info for support
    docker version
    docker compose version
    uname -a
    free -h
    df -h
  3. Create Support Bundle:

    # Create comprehensive log bundle
    mkdir support-bundle
    docker compose logs > support-bundle/compose-logs.txt
    docker system info > support-bundle/docker-info.txt
    docker network ls > support-bundle/networks.txt
    docker volume ls > support-bundle/volumes.txt
    cp .env support-bundle/config.env
    tar czf support-bundle.tar.gz support-bundle/
  4. Community Support:

    • GitHub Issues: Report platform-specific problems
    • Docker Documentation: For Docker-related issues
    • Tool-specific support: Check individual AI tool documentation

← Back to Security | Next: Development Guide →