docs(mcp-image-gen): add USAGE.md and expand tests to 19

2026-04-04 12:16:03 +02:00
parent b0ce5c55ed
commit 8cbeb6571b
2 changed files with 838 additions and 5 deletions
@@ -0,0 +1,588 @@
+# mcp-image-gen — Usage Guide
+
+> **Comprehensive reference for using the ComfyUI-backed image generation MCP server**
+
+---
+
+## Table of Contents
+
+1. [Prerequisites — ComfyUI Setup](#1-prerequisites--comfyui-setup)
+2. [Quick Start — Running the MCP Server](#2-quick-start--running-the-mcp-server)
+3. [How to Ask Lumen to Generate Images](#3-how-to-ask-lumen-to-generate-images)
+4. [Available Tools](#4-available-tools)
+5. [Parameters Reference](#5-parameters-reference)
+6. [Output Format](#6-output-format)
+7. [Environment Variables](#7-environment-variables)
+8. [Test Status](#8-test-status)
+9. [Prompt Tips for FLUX.1-schnell](#9-prompt-tips-for-flux1-schnell)
+10. [Known Limitations](#10-known-limitations)
+
+---
+
+## 1. Prerequisites — ComfyUI Setup
+
+### ComfyUI must be running before any image generation tool call succeeds.
+
+The MCP server connects to ComfyUI's REST API at `http://localhost:8188`. If ComfyUI is not running, `generate_image` and `list_available_models` will return a graceful error message — no crash.
+
+### Install ComfyUI
+
+```bash
+# Option A — pip install (simplest)
+pip install comfyui
+
+# Option B — git clone (more control)
+git clone https://github.com/comfyanonymous/ComfyUI.git
+cd ComfyUI
+pip install -r requirements.txt
+```
+
+### Install PyTorch with ROCm (AMD RX 7900 XTX)
+
+Patrick's RX 7900 XTX (gfx1100, 24GB VRAM) uses the ROCm backend. Standard CUDA builds **will not work** on AMD hardware.
+
+```bash
+# PyTorch with ROCm 6.1 support
+pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.1
+```
+
+> **ROCm version note:** ROCm 7.2.1 is the current production release as of April 2026.
+> Check `rocm-smi` to confirm your ROCm version before installing torch.
+
+### Download FLUX.1-schnell (Primary Model)
+
+FLUX.1-schnell is the recommended model — fast (4 steps), Apache 2.0 licensed, excellent quality.
+
+```bash
+# Download (~8GB) — place in ComfyUI/models/checkpoints/
+wget https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors \
+     -O ~/ComfyUI/models/checkpoints/flux1-schnell.safetensors
+
+# Or use huggingface_hub:
+huggingface-cli download black-forest-labs/FLUX.1-schnell \
+    flux1-schnell.safetensors \
+    --local-dir ~/ComfyUI/models/checkpoints/
+```
+
+You'll also need the CLIP and VAE models — see the [ComfyUI FLUX guide](https://github.com/comfyanonymous/ComfyUI/blob/master/README.md) for full model list.
+
+### Start ComfyUI (AMD ROCm)
+
+```bash
+# Standard start — listens on all interfaces at port 8188
+HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen
+
+# Or with explicit port
+HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen --port 8188
+```
+
+> **`HSA_OVERRIDE_GFX_VERSION=11.0.0`** — Required for RX 7900 XTX (gfx1100).
+> Without this, ROCm may fail to detect the GPU correctly. This tells the HIP runtime
+> to treat the GPU as gfx1100 architecture.
+
+### Verify ComfyUI is Running
+
+```bash
+curl -s http://localhost:8188/system_stats | python3 -m json.tool | head -20
+```
+
+Expected response includes `system` object with `python_version`, `pytorch_version`, `embedded_python`, and `comfyui_version`.
+
+---
+
+## 2. Quick Start — Running the MCP Server
+
+### Via `run.sh` (recommended)
+
+```bash
+cd /home/pplate/pi_mcps/mcp/mcp-image-gen
+./run.sh
+```
+
+[`run.sh`](run.sh) automatically:
+- Sets `PATH` to include `~/.local/bin` for `uv`
+- Creates `IMAGE_OUTPUT_DIR` (`~/Pictures/mcp-generated`) if it doesn't exist
+- Launches the FastMCP server via `uv run src/server.py` (stdio transport)
+
+### Via uv directly
+
+```bash
+cd /home/pplate/pi_mcps/mcp/mcp-image-gen
+uv run src/server.py
+```
+
+### Wired into `.roo/mcp.json`
+
+The server is already configured in [`.roo/mcp.json`](../../.roo/mcp.json):
+
+```json
+"mcp-image-gen": {
+  "command": "uv",
+  "args": [
+    "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
+    "run", "src/server.py"
+  ],
+  "env": {
+    "COMFYUI_URL": "http://localhost:8188",
+    "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
+  }
+}
+```
+
+Roo Code / Claude Desktop will auto-start the server when any image generation tool is invoked. The MCP server itself starts in ~1 second — ComfyUI must already be running separately.
+
+### Install dependencies (first time)
+
+```bash
+cd /home/pplate/pi_mcps/mcp/mcp-image-gen
+uv sync
+```
+
+---
+
+## 3. How to Ask Lumen to Generate Images
+
+Just speak naturally. Lumen will call the appropriate MCP tool automatically.
+
+### Basic generation
+
+> *"Generate an image of a futuristic city at sunset"*
+
+```
+→ generate_image(prompt="futuristic city at sunset", width=1024, height=1024, steps=4)
+```
+
+### Specific style and size
+
+> *"Create a portrait of a red fox in watercolor style, 1024x1024"*
+
+```
+→ generate_image(
+    prompt="portrait of a red fox, watercolor style, detailed fur, soft brushstrokes",
+    width=1024, height=1024
+  )
+```
+
+### Reproducible with a fixed seed
+
+> *"Make an image with seed 42 so I can reproduce it"*
+
+```
+→ generate_image(prompt="...", seed=42)
+```
+
+The seed is reported in the text output so you can use the same seed again.
+
+### Landscape format
+
+> *"Generate a wide cinematic landscape of a Norwegian fjord, 1920x1080"*
+
+```
+→ generate_image(prompt="Norwegian fjord, cinematic, golden hour", width=1920, height=1080)
+```
+
+### Excluding unwanted elements
+
+> *"Generate a clean product photo of a coffee mug, no background clutter, no text"*
+
+```
+→ generate_image(
+    prompt="product photo of a ceramic coffee mug, studio lighting, white background",
+    negative_prompt="clutter, text, watermark, blurry, shadows"
+  )
+```
+
+### More inference steps for higher quality
+
+> *"Generate a highly detailed oil painting of a medieval castle, use 20 steps"*
+
+```
+→ generate_image(
+    prompt="oil painting of a medieval castle, highly detailed, dramatic lighting",
+    steps=20,
+    model="flux1-dev.safetensors"   # FLUX.1-dev supports higher step counts better
+  )
+```
+
+### Check what models are available
+
+> *"List what models are available in ComfyUI"*
+
+```
+→ list_available_models()
+```
+
+### Check status of a long-running job
+
+> *"What's the status of prompt ID abc-123?"*
+
+```
+→ get_generation_status(prompt_id="abc-123")
+```
+
+### Find out where images are saved
+
+> *"Where are my generated images being saved?"*
+
+```
+→ get_output_directory()
+```
+
+---
+
+## 4. Available Tools
+
+### `generate_image`
+
+Generate an image from a text prompt using ComfyUI's FLUX.1-schnell workflow.
+
+**Full signature:**
+```python
+async def generate_image(
+    prompt: str,
+    width: int = 1024,
+    height: int = 1024,
+    steps: int = 4,
+    model: str = "flux1-schnell.safetensors",
+    seed: int = -1,
+    negative_prompt: str = "",
+    output_dir: str = "",
+) -> list[TextContent | ImageContent]
+```
+
+**What it does:**
+1. Loads the bundled `flux_schnell.json` ComfyUI API workflow template
+2. Injects your prompt, dimensions, seed, model into the correct workflow nodes
+3. Submits the workflow to ComfyUI via `POST /api/prompt`
+4. Polls `/api/queue` every 2 seconds until the job leaves the queue
+5. Fetches history via `/api/history/{prompt_id}` to find the output filename
+6. Downloads the PNG from `/api/view`
+7. Saves the PNG to disk as `YYYYMMDD_HHMMSS_{seed}.png`
+8. Returns `[TextContent(path + metadata), ImageContent(base64 PNG)]`
+
+---
+
+### `list_available_models`
+
+List all checkpoint models currently available in ComfyUI.
+
+```python
+async def list_available_models() -> list[str]
+```
+
+Calls `/object_info/CheckpointLoaderSimple` and extracts the checkpoint name list. Use this to discover what models are installed before passing a `model` name to `generate_image`.
+
+**Example return:**
+```json
+["flux1-schnell.safetensors", "flux1-dev.safetensors", "sd_xl_base_1.0.safetensors"]
+```
+
+---
+
+### `get_generation_status`
+
+Check the status of a queued or running generation job.
+
+```python
+async def get_generation_status(prompt_id: str) -> dict
+```
+
+**Return values:**
+
+| `status` | Meaning |
+|---|---|
+| `"pending"` | Job is in the queue, not yet started |
+| `"running"` | Job is currently being processed |
+| `"completed"` | Job finished — image is in ComfyUI's history |
+| `"not_found"` | Unknown prompt_id — may have expired from history |
+| `"error"` | ComfyUI was unreachable |
+
+Useful when `generate_image` times out (default 120s) — the job may still be running in ComfyUI.
+
+---
+
+### `get_output_directory`
+
+Return the absolute path where generated images will be saved.
+
+```python
+def get_output_directory() -> str
+```
+
+Returns the expanded, absolute path derived from `IMAGE_OUTPUT_DIR` env var (or `~/Pictures/mcp-generated` default). The directory may not exist yet — `generate_image` creates it on first use.
+
+---
+
+## 5. Parameters Reference
+
+Full parameter table for `generate_image`:
+
+| Parameter | Type | Default | Description |
+|---|---|---|---|
+| `prompt` | `str` | *(required)* | Text description of the image. Goes into the positive CLIP text encoder node. |
+| `width` | `int` | `1024` | Image width in pixels. FLUX.1-schnell: 512–2048 recommended. |
+| `height` | `int` | `1024` | Image height in pixels. FLUX.1-schnell: 512–2048 recommended. |
+| `steps` | `int` | `4` | Number of KSampler inference steps. FLUX.1-schnell is designed for 1–8 steps. |
+| `model` | `str` | `"flux1-schnell.safetensors"` | Checkpoint model filename as listed by `list_available_models`. |
+| `seed` | `int` | `-1` | RNG seed for reproducibility. `-1` = new random seed each call (0 to 2³²−1). |
+| `negative_prompt` | `str` | `""` | Text description of things to exclude. Goes into negative CLIP encoder node. |
+| `output_dir` | `str` | `""` | Override save directory. Empty = uses `IMAGE_OUTPUT_DIR` env var or default. |
+
+### Recommended dimensions
+
+| Use case | Width | Height |
+|---|---|---|
+| Square (default) | 1024 | 1024 |
+| Portrait | 768 | 1024 |
+| Landscape | 1024 | 768 |
+| Widescreen | 1280 | 720 |
+| HD widescreen | 1920 | 1080 |
+| Tall portrait | 512 | 768 |
+
+> **VRAM note:** Patrick's RX 7900 XTX has 24GB VRAM. FLUX.1-schnell requires ~8GB,
+> so you can comfortably run 1920×1080 and even larger. FLUX.1-dev requires ~12GB.
+
+---
+
+## 6. Output Format
+
+`generate_image` returns a list with **two items** when successful:
+
+### Item 1 — `TextContent` (file path + metadata)
+
+```
+Generated: /home/pplate/Pictures/mcp-generated/20260404_121500_3847291045.png
+Seed: 3847291045
+Elapsed: 8.3s
+Size: 1024x1024, Steps: 4, Model: flux1-schnell.safetensors
+```
+
+The filename format is `YYYYMMDD_HHMMSS_{seed}.png` — the seed is embedded so you can reproduce the exact image by passing it back as the `seed` parameter.
+
+### Item 2 — `ImageContent` (inline base64 PNG)
+
+The image displays **directly in Roo Code / Claude Desktop chat** as an inline image — no need to open a file browser. The same PNG is also saved to disk at the path shown in the TextContent.
+
+```json
+{
+  "type": "image",
+  "mimeType": "image/png",
+  "data": "<base64-encoded PNG bytes>"
+}
+```
+
+### Error responses
+
+When ComfyUI is unreachable or an error occurs, only **one** `TextContent` is returned (no ImageContent):
+
+```
+ComfyUI not reachable at http://localhost:8188. Start it with: python main.py --listen
+```
+
+```
+Generation timed out after 120s. prompt_id=abc-123 — use get_generation_status to check
+```
+
+---
+
+## 7. Environment Variables
+
+Configure via environment variables in [`.roo/mcp.json`](../../.roo/mcp.json) or shell:
+
+| Variable | Default | Description |
+|---|---|---|
+| `COMFYUI_URL` | `http://localhost:8188` | Base URL of the running ComfyUI REST API. Change this if ComfyUI runs on a different host or port. |
+| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Directory where generated PNG files are saved. Supports `~` expansion. Created automatically on first generation. |
+| `COMFYUI_TIMEOUT` | `120` | Maximum seconds to wait for a generation job before returning a timeout error. Increase for very large images or slow hardware. |
+
+### Setting via shell
+
+```bash
+export COMFYUI_URL="http://localhost:8188"
+export IMAGE_OUTPUT_DIR="/home/pplate/Pictures/ai-art"
+export COMFYUI_TIMEOUT="300"
+./run.sh
+```
+
+### Setting via mcp.json env block
+
+```json
+"mcp-image-gen": {
+  "command": "uv",
+  "args": ["--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", "run", "src/server.py"],
+  "env": {
+    "COMFYUI_URL": "http://localhost:8188",
+    "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated",
+    "COMFYUI_TIMEOUT": "120"
+  }
+}
+```
+
+---
+
+## 8. Test Status
+
+**19 pytest tests — all passing.** Tests mock all ComfyUI HTTP calls using [respx](https://lundberg.github.io/respx/). No running ComfyUI instance is needed to run the tests.
+
+```bash
+cd /home/pplate/pi_mcps/mcp/mcp-image-gen
+uv run pytest tests/ -v
+```
+
+### Test coverage breakdown
+
+| Test file | Tests | Coverage area |
+|---|---|---|
+| [`tests/test_server.py`](tests/test_server.py) | 19 | All 4 tools + workflow builder |
+
+| Test name | What it verifies |
+|---|---|
+| `test_build_flux_workflow_structure` | Workflow has correct node class_types |
+| `test_build_flux_workflow_params_injected` | All params injected into correct nodes |
+| `test_negative_prompt_included` | Negative prompt goes to node 33 |
+| `test_random_seed_generated` | `seed=-1` produces a valid integer in `_meta` |
+| `test_list_available_models` | Returns model list from mocked `/object_info` |
+| `test_list_available_models_comfyui_offline` | ConnectError → graceful error string |
+| `test_get_generation_status_pending` | `prompt_id` in queue_pending → `"pending"` |
+| `test_get_generation_status_running` | `prompt_id` in queue_running → `"running"` |
+| `test_get_generation_status_complete` | Not in queue + in history → `"completed"` |
+| `test_get_output_directory_default` | No env var → `~/Pictures/mcp-generated` expanded |
+| `test_get_output_directory_custom` | Custom env var → that path returned |
+| `test_generate_image_success` | Full lifecycle: queue→poll→history→view→save |
+| `test_generate_image_comfyui_unavailable` | ConnectError → single TextContent error |
+| `test_generate_image_timeout` | COMFYUI_TIMEOUT=0 → timeout TextContent |
+| `test_generate_image_empty_prompt` | Empty string prompt → still succeeds |
+| `test_generate_image_long_prompt` | 500-char prompt → not truncated, succeeds |
+| `test_generate_image_invalid_model` | 404 from /prompt → error TextContent, no file saved |
+| `test_generate_image_custom_output_dir` | Custom `output_dir` param → saved there, dir created |
+| `test_generate_image_random_seed_variance` | `seed=-1` × 2 → different seeds, different filenames |
+
+### Test mock stack
+
+- **[respx](https://lundberg.github.io/respx/)** — HTTP-level mocking for all ComfyUI API endpoints
+- **[Pillow](https://pillow.readthedocs.io/)** (in conftest) — generates real PNG bytes for image response fixtures
+- **monkeypatch** — env vars (`IMAGE_OUTPUT_DIR`, `COMFYUI_URL`, `COMFYUI_TIMEOUT`) and server module attributes
+
+Real image generation requires ComfyUI to be running. Tests prove the tool logic is correct at the protocol level.
+
+---
+
+## 9. Prompt Tips for FLUX.1-schnell
+
+FLUX.1-schnell is a guidance-distilled model designed for speed at 1–8 steps. It responds differently from SDXL or SD1.5.
+
+### Prompt structure that works well
+
+```
+[subject], [style/medium], [lighting], [camera/composition], [mood/atmosphere], [quality modifiers]
+```
+
+**Example:**
+```
+ancient library at night, oil painting, warm candlelight, wide angle, mysterious atmosphere, highly detailed, sharp focus
+```
+
+### Style keywords
+
+| Style | Prompt keywords |
+|---|---|
+| Photography | `cinematic photograph, DSLR, 85mm lens, shallow depth of field, bokeh` |
+| Oil painting | `oil painting, thick brushstrokes, textured canvas, impressionist` |
+| Watercolor | `watercolor painting, soft washes, paper texture, flowing colors` |
+| Digital art | `digital art, concept art, artstation, octane render` |
+| Anime/illustration | `anime style, cel shading, vibrant colors, clean linework` |
+| Sketch | `pencil sketch, hand drawn, crosshatching, charcoal` |
+
+### Lighting keywords
+
+- `golden hour`, `blue hour`, `dramatic lighting`, `rim lighting`
+- `studio lighting`, `soft diffused light`, `volumetric light`
+- `neon glow`, `bioluminescent`, `moonlit`, `candlelight`
+
+### What works well with FLUX.1-schnell
+
+- **Clear subject + style** — "red panda in a cozy library, watercolor style"
+- **Landscape scenes** — fjords, forests, cities, abstract environments
+- **Portrait shots** — animals and characters with descriptive appearance
+- **Concept art** — futuristic cities, sci-fi environments, fantasy scenes
+- **Low step counts** — 4 steps is designed to be near-optimal for this model
+
+### What to avoid
+
+- **Booru-style tag dumps** (FLUX handles natural language better than SD1.5)
+- **Contradictory instructions** — "dark AND bright", "realistic AND cartoon"
+- **Overly complex scenes** at very small resolutions
+
+### Using the negative prompt
+
+FLUX.1-schnell has reduced CFG guidance so negative prompts have less impact than in SDXL.
+Use them for broad exclusions:
+
+```
+negative_prompt="blurry, out of focus, watermark, text, signature, low quality, artifacts"
+```
+
+### Reproducibility
+
+Always save the seed from the TextContent output if you want to reproduce a result:
+
+```
+Seed: 3847291045
+```
+
+Then pass it back: `seed=3847291045`
+
+---
+
+## 10. Known Limitations
+
+### ComfyUI must run locally
+
+The MCP server connects to `COMFYUI_URL` (default: `http://localhost:8188`). ComfyUI is a local application — it does not have a cloud API. You must start it before requesting image generation. The server returns a clear error message if ComfyUI is not reachable.
+
+### Model must be pre-loaded
+
+ComfyUI loads checkpoint models into VRAM on first use. The first generation with a model takes longer as VRAM is allocated (FLUX.1-schnell: ~8GB). Subsequent generations with the same model are faster.
+
+```bash
+# Verify model is installed before generation
+# → ask Lumen: "list available models in ComfyUI"
+```
+
+### AMD ROCm setup complexity
+
+AMD GPU support requires:
+1. ROCm drivers installed (`rocm-smi` working)
+2. PyTorch built with ROCm support (not the default CUDA build)
+3. `HSA_OVERRIDE_GFX_VERSION=11.0.0` for RX 7900 XTX (gfx1100)
+
+Without these, ComfyUI will fall back to CPU — very slow (minutes per image vs. ~8 seconds on RX 7900 XTX).
+
+Check GPU is being used:
+```bash
+# In another terminal while generating:
+watch -n 1 rocm-smi
+# VRAM usage should spike to ~8GB during generation
+```
+
+### Timeout on large images
+
+The default `COMFYUI_TIMEOUT=120` (2 minutes) may not be enough for:
+- Very large resolutions (2048×2048+)
+- High step counts (20+)
+- First generation loading a new model
+
+Increase via env var:
+```bash
+export COMFYUI_TIMEOUT=300  # 5 minutes
+```
+
+If `generate_image` returns a timeout error, the job may still be running in ComfyUI. Use `get_generation_status(prompt_id)` to check.
+
+### Ollama image gen is macOS-only (April 2026)
+
+Ollama launched experimental image generation in January 2026, but it is **macOS-only** as of April 2026. Linux support is announced as "coming soon." When Linux support arrives, the server can switch backends via `IMAGE_BACKEND=ollama` without changing any tool signatures.
+
+### ComfyUI history is ephemeral
+
+ComfyUI keeps generation history in memory — it is lost on restart. The `get_generation_status` tool will return `"not_found"` for old prompt IDs after a ComfyUI restart. The saved PNG file on disk persists regardless.