From c662a5237b57d2d0d4ee1b192ea7bce7a4dac023 Mon Sep 17 00:00:00 2001
From: Patrick Plate <pplate@fedora.fritz.box>
Date: Mon, 6 Apr 2026 10:43:36 +0200
Subject: [PATCH] feat(mcp-image-gen): add ComfyUI auto-start health check +
 systemd service

Option A: Add lifespan context manager to server.py
- _ping_comfyui(): async health check against /system_stats
- check_and_start_comfyui(): ping on startup; if down, launches ComfyUI
  via subprocess.Popen from COMFYUI_DIR (.venv/bin/python main.py)
  with HSA_OVERRIDE_GFX_VERSION=11.0.0 injected for AMD ROCm
- Polls up to 30s for readiness after auto-start
- New env var: COMFYUI_DIR (default ~/ComfyUI)
- FastMCP lifespan= wired in; 34/34 tests still passing

Option B: Add comfyui.service systemd user service file
- Install: cp mcp/mcp-image-gen/comfyui.service ~/.config/systemd/user/
- Enable: systemctl --user enable --now comfyui
- Sets HSA_OVERRIDE_GFX_VERSION=11.0.0, WorkingDirectory=%h/ComfyUI
- Restart=on-failure, logs via journald

docs: Update mcp-image-gen-ComfyUI-Setup.md
- New Step 4: systemd service install + linger instructions
- Step 5: manual start (moved from old Step 4)
- Step 6/7 renumbered; COMFYUI_DIR env var documented
- Architecture diagram added; troubleshooting rows updated
---
 .../wiki/pages/mcp-image-gen-ComfyUI-Setup.md |  91 +++++++++--
 mcp/mcp-image-gen/comfyui.service             |  21 +++
 mcp/mcp-image-gen/src/server.py               | 153 ++++++++++++++----
 3 files changed, 222 insertions(+), 43 deletions(-)
 create mode 100644 mcp/mcp-image-gen/comfyui.service

diff --git a/docs/wiki/pages/mcp-image-gen-ComfyUI-Setup.md b/docs/wiki/pages/mcp-image-gen-ComfyUI-Setup.md
index 40999cb..5d3bb0f 100644
--- a/docs/wiki/pages/mcp-image-gen-ComfyUI-Setup.md
+++ b/docs/wiki/pages/mcp-image-gen-ComfyUI-Setup.md
@@ -62,7 +62,52 @@ huggingface-cli download comfyanonymous/flux_text_encoders \
   --local-dir ~/ComfyUI/models/clip
 ```
 
-## Step 4: Start ComfyUI
+## Step 4: Install the systemd User Service (Recommended)
+
+Installing ComfyUI as a systemd user service ensures it starts automatically on login and restarts on failure.
+
+```bash
+# Copy the bundled service file to the systemd user directory
+mkdir -p ~/.config/systemd/user
+cp ~/pi_mcps/mcp/mcp-image-gen/comfyui.service ~/.config/systemd/user/comfyui.service
+
+# Reload systemd, enable + start the service
+systemctl --user daemon-reload
+systemctl --user enable --now comfyui
+
+# Verify it is running
+systemctl --user status comfyui
+```
+
+> ⚠️ `HSA_OVERRIDE_GFX_VERSION=11.0.0` is already set in the service file — it is mandatory for RX 7900 XTX on ROCm. Without it, model loading fails silently.
+
+### Enable lingering (start ComfyUI even without a login session)
+
+```bash
+loginctl enable-linger $USER
+```
+
+This ensures the service starts at boot even before you log in — recommended for headless / homelab setups.
+
+### Managing the service
+
+```bash
+# Follow live logs
+journalctl --user -u comfyui -f
+
+# Restart after model changes
+systemctl --user restart comfyui
+
+# Stop temporarily
+systemctl --user stop comfyui
+
+# Disable autostart
+systemctl --user disable comfyui
+```
+
+## Step 5: Manual Start (without systemd)
+
+If you prefer to start ComfyUI manually (e.g. for debugging):
 
 ```bash
 cd ~/ComfyUI
@@ -74,26 +119,36 @@ HSA_OVERRIDE_GFX_VERSION=11.0.0 \
 echo "ComfyUI PID: $!"
 ```
 
-> ⚠️ `HSA_OVERRIDE_GFX_VERSION=11.0.0` is mandatory for RX 7900 XTX on ROCm. Without it, model loading fails silently.
-
-## Step 5: Verify ComfyUI is Running
+## Step 6: Verify ComfyUI is Running
 
 ```bash
 curl http://localhost:8188/system_stats
 # Should return JSON with GPU info
 ```
 
-## Step 6: Configure mcp-image-gen
+## Step 7: Configure mcp-image-gen
 
 ```bash
 cd /home/pplate/pi_mcps/mcp/mcp-image-gen
 
 # Environment variables (set in .roo/mcp.json or shell):
-# COMFYUI_URL=http://localhost:8188
-# IMAGE_OUTPUT_DIR=~/Pictures/mcp-generated
-# COMFYUI_TIMEOUT=120
+# COMFYUI_URL=http://localhost:8188      — ComfyUI API endpoint
+# IMAGE_OUTPUT_DIR=~/Pictures/mcp-generated — where generated images are saved
+# COMFYUI_TIMEOUT=120                    — max wait time (seconds) per image
+# COMFYUI_DIR=~/ComfyUI                  — path to ComfyUI install (used by auto-start)
 ```
 
+### Auto-start behaviour
+
+`mcp-image-gen` includes a **startup health check** in its lifespan. Every time the MCP server starts it:
+
+1. Pings `http://localhost:8188/system_stats`
+2. **If reachable** — logs `ComfyUI is already running ✓` and proceeds normally.
+3. **If not reachable** — attempts to launch ComfyUI as a background subprocess from `COMFYUI_DIR` using `.venv/bin/python main.py --listen --port 8188` with `HSA_OVERRIDE_GFX_VERSION=11.0.0` injected automatically.
+4. Polls up to 30 s for ComfyUI to become ready.
+
+With the systemd service enabled, step 3 is never needed in practice — but the check acts as a safety net.
+
 ## Performance
 
 | GPU | Model | Resolution | Steps | Time |
@@ -101,12 +156,28 @@ cd /home/pplate/pi_mcps/mcp/mcp-image-gen
 | AMD RX 7900 XTX | FLUX.1-schnell | 1024×1024 | 4 | ~8s |
 | AMD RX 7900 XTX | FLUX.1-schnell | 1280×512 | 4 | ~7s |
 
+## Architecture Overview
+
+```
+Boot
+ └─ systemd --user (comfyui.service)
+       └─ ComfyUI at localhost:8188
+
+VS Code / Roo Code
+ └─ mcp-image-gen MCP server (stdio)
+       ├─ lifespan startup: ping localhost:8188
+       │    └─ if down: subprocess.Popen ComfyUI, wait ≤30s
+       └─ tools: generate_image, list_available_models, …
+```
+
 ## Troubleshooting
 
 | Problem | Solution |
 |---|---|
 | `HTTP 401` downloading model | Accept FLUX license on HuggingFace first |
 | GPU not detected | Ensure `HSA_OVERRIDE_GFX_VERSION=11.0.0` is set |
-| `Connection refused` from mcp-image-gen | Start ComfyUI first, check port 8188 |
-| Slow generation (>60s) | ComfyUI may be running on CPU — check ROCm install |
+| `Connection refused` from mcp-image-gen | Check `systemctl --user status comfyui`; or set `COMFYUI_DIR` so auto-start can locate the install |
+| Slow generation (>60s) | ComfyUI may be running on CPU — check ROCm install and `HSA_OVERRIDE_GFX_VERSION` |
 | Ollama image gen | As of April 2026: macOS-only, not available on Linux |
+| Auto-start logs | `journalctl --user -u comfyui -f` or check mcp-image-gen server logs |
+| Service not starting at boot | Run `loginctl enable-linger $USER` to enable session-less startup |
diff --git a/mcp/mcp-image-gen/comfyui.service b/mcp/mcp-image-gen/comfyui.service
new file mode 100644
index 0000000..cda47a1
--- /dev/null
+++ b/mcp/mcp-image-gen/comfyui.service
@@ -0,0 +1,21 @@
+[Unit]
+Description=ComfyUI — Local AI Image Generation (AMD ROCm / FLUX.1-schnell)
+Documentation=https://github.com/comfyanonymous/ComfyUI
+After=network.target
+
+[Service]
+Type=simple
+WorkingDirectory=%h/ComfyUI
+ExecStart=%h/ComfyUI/.venv/bin/python main.py --listen --port 8188
+Restart=on-failure
+RestartSec=10
+
+# AMD RX 7900 XTX ROCm GFX override — required for correct GPU detection
+Environment=HSA_OVERRIDE_GFX_VERSION=11.0.0
+
+# Redirect output — follow with: journalctl --user -u comfyui -f
+StandardOutput=journal
+StandardError=journal
+
+[Install]
+WantedBy=default.target
diff --git a/mcp/mcp-image-gen/src/server.py b/mcp/mcp-image-gen/src/server.py
index b55ed65..ab8dfb1 100644
--- a/mcp/mcp-image-gen/src/server.py
+++ b/mcp/mcp-image-gen/src/server.py
@@ -4,16 +4,23 @@ import asyncio
 import base64
 import copy
 import json
+import logging
 import os
 import random
 import re
+import subprocess
 import time
+from contextlib import asynccontextmanager
 from datetime import datetime
 from pathlib import Path
+from typing import Annotated
 
 import httpx
 from fastmcp import FastMCP
 from mcp.types import ImageContent, TextContent
+from pydantic import Field
+
+logger = logging.getLogger("mcp-image-gen")
 
 # ---------------------------------------------------------------------------
 # Configuration
@@ -23,13 +30,112 @@ COMFYUI_URL = os.environ.get("COMFYUI_URL", "http://localhost:8188").rstrip("/")
 IMAGE_OUTPUT_DIR = os.environ.get("IMAGE_OUTPUT_DIR", "~/Pictures/mcp-generated")
 COMFYUI_TIMEOUT = int(os.environ.get("COMFYUI_TIMEOUT", "120"))
 
+# Directory where ComfyUI is installed (used for auto-start only)
+# Override via COMFYUI_DIR env var. Systemd service sets this automatically.
+COMFYUI_DIR = Path(
+    os.environ.get("COMFYUI_DIR", "~/ComfyUI")
+).expanduser().resolve()
+
 # Maximum number of images allowed in a single batch call
 MAX_COUNT = 10
 
 # Path to the bundled FLUX.1-schnell workflow template
 _WORKFLOW_PATH = Path(__file__).parent / "workflows" / "flux_schnell.json"
 
-mcp = FastMCP("mcp-image-gen")
+
+# ---------------------------------------------------------------------------
+# ComfyUI health check + auto-start
+# ---------------------------------------------------------------------------
+
+async def _ping_comfyui(url: str, timeout: float = 5.0) -> bool:
+    """Return True if ComfyUI is reachable at *url*/system_stats."""
+    try:
+        async with httpx.AsyncClient(timeout=timeout) as client:
+            resp = await client.get(f"{url}/system_stats")
+            return resp.status_code == 200
+    except (httpx.ConnectError, httpx.TimeoutException, OSError):
+        return False
+
+
+async def check_and_start_comfyui() -> None:
+    """Ping ComfyUI; if not reachable, attempt to launch it as a subprocess.
+
+    Called once at server startup from the lifespan context manager.
+    Uses COMFYUI_DIR to locate the installation and its venv Python.
+    The HSA_OVERRIDE_GFX_VERSION=11.0.0 env var is injected automatically
+    for AMD ROCm / RX 7900 XTX compatibility.
+    """
+    if await _ping_comfyui(COMFYUI_URL):
+        logger.info("ComfyUI is already running at %s ✓", COMFYUI_URL)
+        return
+
+    logger.warning(
+        "ComfyUI not reachable at %s — attempting to start from %s",
+        COMFYUI_URL, COMFYUI_DIR,
+    )
+
+    python = COMFYUI_DIR / ".venv" / "bin" / "python"
+    main_py = COMFYUI_DIR / "main.py"
+
+    if not python.exists():
+        logger.error(
+            "ComfyUI venv Python not found at %s. "
+            "Install ComfyUI first (see docs/wiki/pages/mcp-image-gen-ComfyUI-Setup.md).",
+            python,
+        )
+        return
+    if not main_py.exists():
+        logger.error(
+            "ComfyUI main.py not found at %s — is COMFYUI_DIR correct?",
+            main_py,
+        )
+        return
+
+    # Build environment: inherit current env, set ROCm override for AMD RX 7900 XTX
+    env = os.environ.copy()
+    env.setdefault("HSA_OVERRIDE_GFX_VERSION", "11.0.0")
+
+    try:
+        proc = subprocess.Popen(
+            [str(python), str(main_py), "--listen", "--port", "8188"],
+            cwd=str(COMFYUI_DIR),
+            env=env,
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.DEVNULL,
+            start_new_session=True,  # detach from MCP server process group
+        )
+        logger.info("ComfyUI launched (PID %d) — waiting for readiness…", proc.pid)
+    except OSError as exc:
+        logger.error("Failed to start ComfyUI subprocess: %s", exc)
+        return
+
+    # Wait up to 30 s for ComfyUI to become ready (polls every 2 s)
+    wait_limit = 30
+    for attempt in range(wait_limit // 2):
+        await asyncio.sleep(2)
+        if await _ping_comfyui(COMFYUI_URL):
+            logger.info(
+                "ComfyUI ready at %s after ~%ds ✓", COMFYUI_URL, (attempt + 1) * 2
+            )
+            return
+
+    logger.warning(
+        "ComfyUI did not respond within %ds. "
+        "Generation calls will fail until it is ready. "
+        "Check logs: journalctl --user -u comfyui -f",
+        wait_limit,
+    )
+
+
+@asynccontextmanager
+async def lifespan(app):
+    """FastMCP lifespan: run ComfyUI health check at server startup."""
+    await check_and_start_comfyui()
+    yield  # server is live here
+    # Nothing to tear down — ComfyUI is managed by systemd, not this process
+
+
+mcp = FastMCP("mcp-image-gen", lifespan=lifespan)
 
 
 # ---------------------------------------------------------------------------
@@ -332,40 +438,22 @@ async def _generate_single(
 
 @mcp.tool()
 async def generate_image(
-    prompt: str,
-    width: int = 1024,
-    height: int = 1024,
-    steps: int = 4,
-    model: str = "flux1-schnell.safetensors",
-    seed: int = -1,
-    negative_prompt: str = "",
-    output_dir: str = "",
-    name: str = "",
-    count: int = 1,
+    prompt: Annotated[str, Field(description="Text description of the image to generate.")],
+    width: Annotated[int, Field(description="Image width in pixels (default: 1024).")] = 1024,
+    height: Annotated[int, Field(description="Image height in pixels (default: 1024).")] = 1024,
+    steps: Annotated[int, Field(description="Number of inference steps. FLUX.1-schnell works well at 4.")] = 4,
+    model: Annotated[str, Field(description="ComfyUI model filename (default: flux1-schnell.safetensors).")] = "flux1-schnell.safetensors",
+    seed: Annotated[int, Field(description="Random seed for reproducibility. -1 = random. When count > 1 and seed != -1, seeds are incremented per image (seed, seed+1, seed+2, ...) to produce deterministic variation.")] = -1,
+    negative_prompt: Annotated[str, Field(description="Things to exclude from the image (optional).")] = "",
+    output_dir: Annotated[str, Field(description="Override output directory. Defaults to IMAGE_OUTPUT_DIR env var or ~/Pictures/mcp-generated.")] = "",
+    name: Annotated[str, Field(description="Optional filename prefix. Saved as {name}_{timestamp}_{seed}.png. Useful to avoid confusion with auto-generated timestamp filenames.")] = "",
+    count: Annotated[int, Field(description="Number of images to generate (1–10). Each image is generated sequentially. Partial failures are returned inline — the batch continues even if one image fails.")] = 1,
 ) -> list:
     """Generate an image from a text prompt using ComfyUI.
 
     Returns both a file path (for persistence) and an inline base64 image
     (for display in Claude / Roo Code chat).
 
-    Args:
-        prompt:          Text description of the image to generate.
-        width:           Image width in pixels (default: 1024).
-        height:          Image height in pixels (default: 1024).
-        steps:           Number of inference steps. FLUX.1-schnell works well at 4.
-        model:           ComfyUI model filename (default: flux1-schnell.safetensors).
-        seed:            Random seed for reproducibility. -1 = random.
-                         When count > 1 and seed != -1, seeds are incremented per image
-                         (seed, seed+1, seed+2, ...) to produce deterministic variation.
-        negative_prompt: Things to exclude from the image (optional).
-        output_dir:      Override output directory. Defaults to IMAGE_OUTPUT_DIR env var
-                         or ~/Pictures/mcp-generated.
-        name:            Optional filename prefix. Saved as {name}_{timestamp}_{seed}.png.
-                         Useful to avoid confusion with auto-generated timestamp filenames.
-        count:           Number of images to generate (1–10). Each image is generated
-                         sequentially. Partial failures are returned inline — the batch
-                         continues even if one image fails.
-
     Returns:
         Flat interleaved list: [TextContent1, ImageContent1, TextContent2, ImageContent2, ...]
         On error for any single image, that slot contains only [TextContent(error)].
@@ -442,12 +530,11 @@ async def list_available_models() -> list[str]:
 
 
 @mcp.tool()
-async def get_generation_status(prompt_id: str) -> dict:
+async def get_generation_status(
+    prompt_id: Annotated[str, Field(description="The prompt ID returned by a previous generate_image call.")],
+) -> dict:
     """Check the status of a queued or running generation job.
 
-    Args:
-        prompt_id: The prompt ID returned by a previous generate_image call.
-
     Returns:
         Dict with 'status' key: "pending", "running", "completed", or "not_found".
     """