feat(mcp-image-gen): scaffold ComfyUI-backed image generation MCP server

- FastMCP server with 4 tools: generate_image, list_available_models, get_generation_status, get_output_directory - ComfyUI REST API client (httpx) polling lifecycle - FLUX.1-schnell workflow JSON template - Dual output: TextContent (path + seed) + ImageContent (base64 PNG) - 14 passing pytest tests with respx HTTP mocking - ROCm/AMD RX 7900 XTX optimized setup in README - Ollama Linux migration path documented (future)
2026-04-04 11:49:31 +02:00
parent ba7d4bc248
commit 8112ff2f12
11 changed files with 1748 additions and 0 deletions
@@ -0,0 +1,496 @@
+# mcp-image-gen — Implementation Plan
+
+**Date:** 2026-04-04
+**Author:** Lumen (for Patrick / pplate)
+**Status:** Ready for implementation
+**Assessment:** [ASSESSMENT.md](./ASSESSMENT.md)
+**Research Session:** `39809470-6ac8-4713-adf2-79ac0eb36ba7`
+
+---
+
+## 1. Directory Structure
+
+```
+mcp/mcp-image-gen/
+├── ASSESSMENT.md               ← Architecture assessment (this session)
+├── PLAN.md                     ← This file
+├── README.md                   ← Usage docs, tool table, env vars
+├── pyproject.toml              ← uv project + deps
+├── run.sh                      ← Launch script (used by .roo/mcp.json)
+├── src/
+│   ├── __init__.py
+│   ├── server.py               ← FastMCP server + all tools
+│   └── workflows/
+│       └── flux_schnell.json   ← Minimal ComfyUI API-format workflow
+└── tests/
+    ├── __init__.py
+    ├── conftest.py             ← sys.path + shared fixtures
+    └── test_server.py          ← All tool tests (mocked ComfyUI)
+```
+
+---
+
+## 2. Tool Definitions
+
+### 2.1 `generate_image`
+
+```python
+@mcp.tool()
+async def generate_image(
+    prompt: str,
+    width: int = 1024,
+    height: int = 1024,
+    steps: int = 4,
+    model: str = "flux1-schnell.safetensors",
+    seed: int = -1,
+    negative_prompt: str = "",
+    output_dir: str = "",
+) -> list:
+    """
+    Generate an image from a text prompt using ComfyUI.
+
+    Returns both a file path (for persistence) and an inline base64 image
+    (for display in Claude / Roo Code chat).
+
+    Args:
+        prompt:          Text description of the image to generate.
+        width:           Image width in pixels (default: 1024).
+        height:          Image height in pixels (default: 1024).
+        steps:           Number of inference steps. FLUX.1-schnell works well at 4.
+        model:           ComfyUI model filename (default: flux1-schnell.safetensors).
+        seed:            Random seed for reproducibility. -1 = random.
+        negative_prompt: Things to exclude from the image (optional).
+        output_dir:      Override output directory. Defaults to IMAGE_OUTPUT_DIR env var
+                         or ~/Pictures/mcp-generated.
+
+    Returns:
+        [TextContent(path + metadata), ImageContent(base64 PNG)]
+    """
+```
+
+**Return type:** `list` containing:
+1. `mcp.types.TextContent` — human-readable summary with file path, seed, elapsed time
+2. `mcp.types.ImageContent` — `type="image"`, `data=base64_encoded_png`, `mimeType="image/png"`
+
+> ⚠️ **FastMCP 3.x rule:** NEVER annotate return as `-> Image` (fastmcp utility type). It triggers
+> `output_schema` generation which breaks the early-return path. Return `mcp.types.ImageContent`
+> directly as part of a `list` — it is a `ContentBlock` and passes through cleanly.
+
+---
+
+### 2.2 `list_available_models`
+
+```python
+@mcp.tool()
+async def list_available_models() -> str:
+    """
+    List all checkpoint models available in ComfyUI.
+
+    Returns a newline-separated list of model filenames.
+    Requires ComfyUI to be running at COMFYUI_URL.
+    """
+```
+
+**Implementation:** `GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple` → parse
+`input.required.ckpt_name[0]` list → join with newlines.
+
+---
+
+### 2.3 `get_generation_status`
+
+```python
+@mcp.tool()
+async def get_generation_status(prompt_id: str) -> str:
+    """
+    Check the status of a queued or running generation job.
+
+    Args:
+        prompt_id: The prompt ID returned by a previous generate_image call.
+
+    Returns:
+        Status string: "pending", "running", "completed", or "not_found".
+    """
+```
+
+**Implementation:** `GET {COMFYUI_URL}/api/queue` → check `queue_running` and `queue_pending`
+lists for matching `prompt_id`. If not found in either, check history endpoint.
+
+---
+
+### 2.4 `get_output_directory`
+
+```python
+@mcp.tool()
+def get_output_directory() -> str:
+    """
+    Return the directory where generated images are saved.
+
+    Returns:
+        Absolute path to the output directory.
+    """
+```
+
+**Implementation:** Resolve `IMAGE_OUTPUT_DIR` env var or default `~/Pictures/mcp-generated`,
+expand `~`, return as string.
+
+---
+
+## 3. ComfyUI Integration
+
+### 3.1 Workflow: Submit → Poll → Retrieve
+
+```
+generate_image()
+    │
+    ├── 1. Load flux_schnell.json workflow template
+    ├── 2. Parameterize: inject prompt, width, height, steps, seed, model
+    ├── 3. POST {COMFYUI_URL}/api/prompt  →  {"prompt_id": "uuid"}
+    │
+    ├── 4. POLL loop (max 120s, sleep 2s between)
+    │       GET {COMFYUI_URL}/api/queue
+    │       → check queue_running[].prompt_id == our id
+    │       → check queue_pending[].prompt_id == our id
+    │       → if neither: job is done
+    │
+    ├── 5. GET {COMFYUI_URL}/api/history/{prompt_id}
+    │       → find output image filename + subfolder
+    │
+    ├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output
+    │       → raw PNG bytes
+    │
+    ├── 7. Save PNG to output_dir/{timestamp}_{seed}.png
+    └── 8. Return [TextContent(path + meta), ImageContent(base64)]
+```
+
+### 3.2 API Endpoints Used
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/api/prompt` | POST | Submit workflow for generation |
+| `/api/queue` | GET | Poll queue status (pending + running) |
+| `/api/history/{prompt_id}` | GET | Get completed job output filenames |
+| `/api/view` | GET | Download image bytes by filename |
+| `/object_info/CheckpointLoaderSimple` | GET | List available checkpoint models |
+
+### 3.3 Error Handling
+
+| Condition | Response |
+|-----------|----------|
+| ComfyUI unreachable | `"ComfyUI not reachable at {url}. Start it with: python main.py --listen"` |
+| Timeout (>120s) | `"Generation timed out after 120s. prompt_id={id} — use get_generation_status to check"` |
+| ComfyUI returns error in history | Extract and return the error message from history response |
+| Invalid model name | ComfyUI returns error in history; surface it clearly |
+| Output dir not writable | `"Cannot write to output directory: {path}"` |
+
+---
+
+## 4. Configuration
+
+All configuration via environment variables. No hardcoded paths.
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `COMFYUI_URL` | `http://localhost:8188` | Base URL of running ComfyUI instance |
+| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Where to save generated PNG files |
+| `COMFYUI_TIMEOUT` | `120` | Max seconds to wait for generation (int) |
+
+### `.roo/mcp.json` entry (to be added during implementation):
+
+```json
+"mcp-image-gen": {
+  "command": "uv",
+  "args": [
+    "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
+    "run", "src/server.py"
+  ],
+  "env": {
+    "COMFYUI_URL": "http://localhost:8188",
+    "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
+  }
+}
+```
+
+---
+
+## 5. `pyproject.toml`
+
+```toml
+[project]
+name = "mcp-image-gen"
+version = "0.1.0"
+requires-python = ">=3.11"
+description = "MCP server for local AI image generation via ComfyUI"
+dependencies = [
+    "fastmcp>=0.1.0",
+    "httpx>=0.27.0",
+    "pillow>=10.0.0",
+]
+
+[project.optional-dependencies]
+test = [
+    "pytest>=7.0",
+    "pytest-mock>=3.0",
+    "pytest-cov>=4.0",
+    "pytest-asyncio>=0.23",
+]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+```
+
+**Dependency rationale:**
+- `fastmcp` — MCP framework
+- `httpx` — async HTTP client for ComfyUI REST API
+- `pillow` — validate PNG output, potential future thumbnail generation
+- `pytest-asyncio` — needed for async tool tests
+
+---
+
+## 6. FLUX.1-schnell Workflow JSON
+
+The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image.
+This is the "API format" (node-graph JSON), not the UI export format.
+
+File: `src/workflows/flux_schnell.json`
+
+```json
+{
+  "6": {
+    "class_type": "CLIPTextEncode",
+    "inputs": {
+      "clip": ["30", 1],
+      "text": "PROMPT_PLACEHOLDER"
+    }
+  },
+  "8": {
+    "class_type": "VAEDecode",
+    "inputs": {
+      "samples": ["13", 0],
+      "vae": ["30", 2]
+    }
+  },
+  "9": {
+    "class_type": "SaveImage",
+    "inputs": {
+      "filename_prefix": "mcp-image-gen",
+      "images": ["8", 0]
+    }
+  },
+  "13": {
+    "class_type": "KSampler",
+    "inputs": {
+      "cfg": 1.0,
+      "denoise": 1.0,
+      "latent_image": ["27", 0],
+      "model": ["30", 0],
+      "negative": ["33", 0],
+      "positive": ["6", 0],
+      "sampler_name": "euler",
+      "scheduler": "simple",
+      "seed": 42,
+      "steps": 4
+    }
+  },
+  "27": {
+    "class_type": "EmptySD3LatentImage",
+    "inputs": {
+      "batch_size": 1,
+      "height": 1024,
+      "width": 1024
+    }
+  },
+  "30": {
+    "class_type": "CheckpointLoaderSimple",
+    "inputs": {
+      "ckpt_name": "flux1-schnell.safetensors"
+    }
+  },
+  "33": {
+    "class_type": "CLIPTextEncode",
+    "inputs": {
+      "clip": ["30", 1],
+      "text": "NEGATIVE_PLACEHOLDER"
+    }
+  }
+}
+```
+
+**Parameterization at runtime** (in `server.py`):
+
+```python
+import json, copy
+
+def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model):
+    with open(Path(__file__).parent / "workflows/flux_schnell.json") as f:
+        wf = json.load(f)
+    wf = copy.deepcopy(wf)
+    wf["6"]["inputs"]["text"] = prompt
+    wf["33"]["inputs"]["text"] = negative_prompt
+    wf["27"]["inputs"]["width"] = width
+    wf["27"]["inputs"]["height"] = height
+    wf["13"]["inputs"]["steps"] = steps
+    wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1)
+    wf["30"]["inputs"]["ckpt_name"] = model
+    return wf
+```
+
+---
+
+## 7. Testing Strategy
+
+### 7.1 Test Structure (`tests/test_server.py`)
+
+All tests mock `httpx.AsyncClient` — no real ComfyUI needed.
+
+| Test | Description |
+|------|-------------|
+| `test_generate_image_happy_path` | Mock submit → poll done → history → view → returns TextContent + ImageContent |
+| `test_generate_image_comfyui_offline` | httpx.ConnectError → returns clear error string |
+| `test_generate_image_timeout` | Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id |
+| `test_generate_image_saves_file` | Verify PNG written to output_dir with correct filename pattern |
+| `test_generate_image_random_seed` | seed=-1 → seed in output filename is a valid integer |
+| `test_generate_image_custom_params` | Non-default width/height/steps/model passed through to workflow |
+| `test_generate_image_returns_image_content` | Second item in result list is `mcp.types.ImageContent` with valid base64 |
+| `test_list_available_models_happy_path` | Mock object_info response → returns model name list |
+| `test_list_available_models_offline` | ConnectError → returns error string |
+| `test_get_generation_status_pending` | prompt_id found in queue_pending → "pending" |
+| `test_get_generation_status_running` | prompt_id found in queue_running → "running" |
+| `test_get_generation_status_not_found` | prompt_id not in queue, not in history → "not_found" |
+| `test_get_output_directory_default` | No env var → returns expanded ~/Pictures/mcp-generated |
+| `test_get_output_directory_custom` | IMAGE_OUTPUT_DIR set → returns that path |
+| `test_build_workflow_parameterization` | _build_workflow() injects all params correctly into JSON |
+
+### 7.2 conftest.py fixtures
+
+```python
+import sys
+from pathlib import Path
+import pytest
+
+sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
+
+@pytest.fixture
+def mock_comfyui_submit_response():
+    return {"prompt_id": "test-uuid-1234"}
+
+@pytest.fixture
+def mock_comfyui_queue_empty():
+    return {"queue_running": [], "queue_pending": []}
+
+@pytest.fixture
+def mock_comfyui_history():
+    return {
+        "test-uuid-1234": {
+            "outputs": {
+                "9": {
+                    "images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}]
+                }
+            }
+        }
+    }
+
+@pytest.fixture
+def sample_png_bytes():
+    """Minimal valid 1x1 PNG in bytes."""
+    import base64
+    # 1x1 red pixel PNG
+    data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg=="
+    return base64.b64decode(data)
+```
+
+### 7.3 Run command
+
+```bash
+cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing
+```
+
+---
+
+## 8. `run.sh`
+
+```bash
+#!/usr/bin/env bash
+BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+export PATH="$HOME/.local/bin:$PATH"
+
+# Create output dir if it doesn't exist
+OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}"
+mkdir -p "$OUTPUT_DIR"
+
+cd "$BASEDIR"
+exec uv run src/server.py
+```
+
+---
+
+## 9. Future: Ollama Migration Path
+
+When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026):
+
+### Adapter pattern (no breaking changes to MCP tool signatures)
+
+```python
+BACKEND = os.getenv("IMAGE_BACKEND", "comfyui")  # or "ollama"
+
+async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
+    # current ComfyUI implementation
+    ...
+
+async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
+    # POST http://localhost:11434/api/generate
+    # with model=Z-Image-Turbo or FLUX.2-Klein
+    # width, height, steps in request body
+    # save returned image path
+    ...
+
+@mcp.tool()
+async def generate_image(prompt, width=1024, height=1024, steps=4, ...):
+    if BACKEND == "ollama":
+        return await _generate_ollama(...)
+    return await _generate_comfyui(...)
+```
+
+**No changes to:** tool signatures, return types, env vars (add `IMAGE_BACKEND`), tests structure.
+
+---
+
+## 10. Implementation Order (for Code mode)
+
+1. `src/workflows/flux_schnell.json` — write and validate JSON structure
+2. `pyproject.toml` — set up project + deps
+3. `src/__init__.py` — empty
+4. `src/server.py` — implement all 4 tools + `_build_workflow` + polling helpers
+5. `tests/conftest.py` — fixtures + sys.path
+6. `tests/test_server.py` — all 15 tests
+7. `run.sh` — launch script
+8. `README.md` — usage docs
+9. `.roo/mcp.json` — wire server in (requires switching to Code or Homelab mode for that file)
+10. `uv sync && uv run pytest tests/ -v` — confirm all tests pass
+
+---
+
+## 11. ComfyUI Setup Notes (for README)
+
+These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed:
+
+```bash
+# Install ComfyUI (ROCm/AMD)
+pip install comfyui
+
+# Download FLUX.1-schnell model (~8GB)
+# Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors
+# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell
+
+# Start ComfyUI with AMD ROCm
+HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen
+
+# Verify API is running
+curl http://localhost:8188/system_stats
+```
+
+> The `HSA_OVERRIDE_GFX_VERSION=11.0.0` env var may be needed for RX 7900 XTX (gfx1100)
+> to identify correctly to ROCm libraries.