Files
pi_mcps/mcp/mcp-image-gen/USAGE.md
T
Patrick Plate ea0c5d39c4 fix(mcp-image-gen): fix Heretic/FLUX2 integration bugs
- Fix syntax error in server.py (dangling docstring lines)
- Correct model filename: flux-2-klein-4b.safetensors (without -fp8)
- Fix _WORKFLOW_REGISTRY key to match actual downloaded filename
- Update get_models() to always include registry models as fallback
- Fix test expectations to match corrected model names
- All 37 tests passing
2026-04-10 19:21:51 +02:00

669 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# mcp-image-gen — Usage Guide
> **Comprehensive reference for using the ComfyUI-backed image generation MCP server**
---
## Table of Contents
1. [Prerequisites — ComfyUI Setup](#1-prerequisites--comfyui-setup)
2. [Quick Start — Running the MCP Server](#2-quick-start--running-the-mcp-server)
3. [How to Ask Lumen to Generate Images](#3-how-to-ask-lumen-to-generate-images)
4. [Available Tools](#4-available-tools)
5. [Parameters Reference](#5-parameters-reference)
6. [Output Format](#6-output-format)
7. [Environment Variables](#7-environment-variables)
8. [Test Status](#8-test-status)
9. [Prompt Tips for FLUX.1-schnell](#9-prompt-tips-for-flux1-schnell)
10. [Known Limitations](#10-known-limitations)
---
## 1. Prerequisites — ComfyUI Setup
### ComfyUI must be running before any image generation tool call succeeds.
The MCP server connects to ComfyUI's REST API at `http://localhost:8188`. If ComfyUI is not running, `generate_image` and `list_available_models` will return a graceful error message — no crash.
### Install ComfyUI
> ⚠️ **ComfyUI is NOT on PyPI** — `pip install comfyui` will fail with "No matching distribution found".
> It must be installed from source via `git clone`.
```bash
# Clone from source (the only correct installation method)
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
```
### Install PyTorch with ROCm (AMD RX 7900 XTX)
Patrick's RX 7900 XTX (gfx1100, 24GB VRAM) uses the ROCm backend. Standard CUDA builds **will not work** on AMD hardware.
```bash
# PyTorch with ROCm 6.1 support
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.1
```
> **ROCm version note:** ROCm 7.2.1 is the current production release as of April 2026.
> Check `rocm-smi` to confirm your ROCm version before installing torch.
### Download FLUX.1-schnell (Primary Model)
FLUX.1-schnell is the recommended model — fast (4 steps), Apache 2.0 licensed, excellent quality.
> ⚠️ **FLUX.1-schnell is a gated model on HuggingFace.**
> A bare `wget` on the URL returns HTTP 401. You must:
> 1. Accept the license at https://huggingface.co/black-forest-labs/FLUX.1-schnell (click **"Agree and access repository"** — one-time)
> 2. Create a HuggingFace access token with **Read** permissions at https://huggingface.co/settings/tokens
#### Option A — `huggingface-cli` (recommended)
```bash
# Install the HuggingFace Hub CLI
pip install huggingface_hub
# Log in — paste your Read token when prompted
huggingface-cli login
# Download (~8GB) directly into ComfyUI checkpoints
huggingface-cli download black-forest-labs/FLUX.1-schnell \
flux1-schnell.safetensors \
--local-dir ~/ComfyUI/models/checkpoints/
```
#### Option B — `wget` with Authorization header
```bash
wget --header="Authorization: Bearer hf_YOUR_TOKEN_HERE" \
https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors \
-O ~/ComfyUI/models/checkpoints/flux1-schnell.safetensors
```
> Replace `hf_YOUR_TOKEN_HERE` with your actual HuggingFace token from https://huggingface.co/settings/tokens
#### Alternative: fp8 quantized variant (~8.1GB, faster inference)
If you want slightly faster inference with near-identical quality, the fp8 quantized version is also available:
```bash
huggingface-cli download black-forest-labs/FLUX.1-schnell-fp8 \
flux1-schnell-fp8.safetensors \
--local-dir ~/ComfyUI/models/checkpoints/
```
> **Download note:** Both variants are ~8GB — expect 1030 minutes depending on connection speed.
You'll also need the CLIP and VAE models — see the [ComfyUI FLUX guide](https://github.com/comfyanonymous/ComfyUI/blob/master/README.md) for full model list.
### Start ComfyUI (AMD ROCm)
```bash
# Standard start — listens on all interfaces at port 8188
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen
# Or with explicit port
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen --port 8188
```
> **`HSA_OVERRIDE_GFX_VERSION=11.0.0`** — Required for RX 7900 XTX (gfx1100).
> Without this, ROCm may fail to detect the GPU correctly. This tells the HIP runtime
> to treat the GPU as gfx1100 architecture.
### Verify ComfyUI is Running
```bash
curl -s http://localhost:8188/system_stats | python3 -m json.tool | head -20
```
Expected response includes `system` object with `python_version`, `pytorch_version`, `embedded_python`, and `comfyui_version`.
---
## 2. Quick Start — Running the MCP Server
### Via `run.sh` (recommended)
```bash
cd /home/pplate/pi_mcps/mcp/mcp-image-gen
./run.sh
```
[`run.sh`](run.sh) automatically:
- Sets `PATH` to include `~/.local/bin` for `uv`
- Creates `IMAGE_OUTPUT_DIR` (`~/Pictures/mcp-generated`) if it doesn't exist
- Launches the FastMCP server via `uv run src/server.py` (stdio transport)
### Via uv directly
```bash
cd /home/pplate/pi_mcps/mcp/mcp-image-gen
uv run src/server.py
```
### Wired into `.roo/mcp.json`
The server is already configured in [`.roo/mcp.json`](../../.roo/mcp.json):
```json
"mcp-image-gen": {
"command": "uv",
"args": [
"--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
"run", "src/server.py"
],
"env": {
"COMFYUI_URL": "http://localhost:8188",
"IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
}
}
```
Roo Code / Claude Desktop will auto-start the server when any image generation tool is invoked. The MCP server itself starts in ~1 second — ComfyUI must already be running separately.
### Install dependencies (first time)
```bash
cd /home/pplate/pi_mcps/mcp/mcp-image-gen
uv sync
```
---
## 3. How to Ask Lumen to Generate Images
Just speak naturally. Lumen will call the appropriate MCP tool automatically.
### Basic generation
> *"Generate an image of a futuristic city at sunset"*
```
→ generate_image(prompt="futuristic city at sunset", width=1024, height=1024, steps=4)
```
### Specific style and size
> *"Create a portrait of a red fox in watercolor style, 1024x1024"*
```
→ generate_image(
prompt="portrait of a red fox, watercolor style, detailed fur, soft brushstrokes",
width=1024, height=1024
)
```
### Reproducible with a fixed seed
> *"Make an image with seed 42 so I can reproduce it"*
```
→ generate_image(prompt="...", seed=42)
```
The seed is reported in the text output so you can use the same seed again.
### Landscape format
> *"Generate a wide cinematic landscape of a Norwegian fjord, 1920x1080"*
```
→ generate_image(prompt="Norwegian fjord, cinematic, golden hour", width=1920, height=1080)
```
### Excluding unwanted elements
> *"Generate a clean product photo of a coffee mug, no background clutter, no text"*
```
→ generate_image(
prompt="product photo of a ceramic coffee mug, studio lighting, white background",
negative_prompt="clutter, text, watermark, blurry, shadows"
)
```
### More inference steps for higher quality
> *"Generate a highly detailed oil painting of a medieval castle, use 20 steps"*
```
→ generate_image(
prompt="oil painting of a medieval castle, highly detailed, dramatic lighting",
steps=20,
model="flux1-dev.safetensors" # FLUX.1-dev supports higher step counts better
)
```
### Check what models are available
> *"List what models are available in ComfyUI"*
```
→ list_available_models()
```
### Check status of a long-running job
> *"What's the status of prompt ID abc-123?"*
```
→ get_generation_status(prompt_id="abc-123")
```
### Find out where images are saved
> *"Where are my generated images being saved?"*
```
→ get_output_directory()
```
---
## 4. Available Tools
### `generate_image`
Generate an image from a text prompt using ComfyUI's FLUX.1-schnell workflow.
**Full signature:**
```python
async def generate_image(
prompt: str,
width: int = 1024,
height: int = 1024,
steps: int = 4,
model: str = "flux1-schnell.safetensors",
seed: int = -1,
negative_prompt: str = "",
output_dir: str = "",
) -> list[TextContent | ImageContent]
```
**What it does:**
1. Loads the bundled `flux_schnell.json` ComfyUI API workflow template
2. Injects your prompt, dimensions, seed, model into the correct workflow nodes
3. Submits the workflow to ComfyUI via `POST /api/prompt`
4. Polls `/api/queue` every 2 seconds until the job leaves the queue
5. Fetches history via `/api/history/{prompt_id}` to find the output filename
6. Downloads the PNG from `/api/view`
7. Saves the PNG to disk as `YYYYMMDD_HHMMSS_{seed}.png`
8. Returns `[TextContent(path + metadata), ImageContent(base64 PNG)]`
---
### `list_available_models`
List all checkpoint models currently available in ComfyUI.
```python
async def list_available_models() -> list[str]
```
Calls `/object_info/CheckpointLoaderSimple` and extracts the checkpoint name list. Use this to discover what models are installed before passing a `model` name to `generate_image`.
**Example return:**
```json
["flux1-schnell.safetensors", "flux1-dev.safetensors", "sd_xl_base_1.0.safetensors"]
```
---
### `get_generation_status`
Check the status of a queued or running generation job.
```python
async def get_generation_status(prompt_id: str) -> dict
```
**Return values:**
| `status` | Meaning |
|---|---|
| `"pending"` | Job is in the queue, not yet started |
| `"running"` | Job is currently being processed |
| `"completed"` | Job finished — image is in ComfyUI's history |
| `"not_found"` | Unknown prompt_id — may have expired from history |
| `"error"` | ComfyUI was unreachable |
Useful when `generate_image` times out (default 120s) — the job may still be running in ComfyUI.
---
### `get_output_directory`
Return the absolute path where generated images will be saved.
```python
def get_output_directory() -> str
```
Returns the expanded, absolute path derived from `IMAGE_OUTPUT_DIR` env var (or `~/Pictures/mcp-generated` default). The directory may not exist yet — `generate_image` creates it on first use.
---
## 5. Parameters Reference
Full parameter table for `generate_image`:
| Parameter | Type | Default | Description |
|---|---|---|---|
| `prompt` | `str` | *(required)* | Text description of the image. Goes into the positive CLIP text encoder node. |
| `width` | `int` | `1024` | Image width in pixels. FLUX.1-schnell: 5122048 recommended. |
| `height` | `int` | `1024` | Image height in pixels. FLUX.1-schnell: 5122048 recommended. |
| `steps` | `int` | `4` | Number of KSampler inference steps. FLUX.1-schnell is designed for 18 steps. |
| `model` | `str` | `"flux1-schnell.safetensors"` | Checkpoint model filename as listed by `list_available_models`. |
| `seed` | `int` | `-1` | RNG seed for reproducibility. `-1` = new random seed each call (0 to 2³²−1). |
| `negative_prompt` | `str` | `""` | Text description of things to exclude. Goes into negative CLIP encoder node. |
| `output_dir` | `str` | `""` | Override save directory. Empty = uses `IMAGE_OUTPUT_DIR` env var or default. |
### Recommended dimensions
| Use case | Width | Height |
|---|---|---|
| Square (default) | 1024 | 1024 |
| Portrait | 768 | 1024 |
| Landscape | 1024 | 768 |
| Widescreen | 1280 | 720 |
| HD widescreen | 1920 | 1080 |
| Tall portrait | 512 | 768 |
> **VRAM note:** Patrick's RX 7900 XTX has 24GB VRAM. FLUX.1-schnell requires ~8GB,
> so you can comfortably run 1920×1080 and even larger. FLUX.1-dev requires ~12GB.
---
## 6. Output Format
`generate_image` returns a list with **two items** when successful:
### Item 1 — `TextContent` (file path + metadata)
```
Generated: /home/pplate/Pictures/mcp-generated/20260404_121500_3847291045.png
Seed: 3847291045
Elapsed: 8.3s
Size: 1024x1024, Steps: 4, Model: flux1-schnell.safetensors
```
The filename format is `YYYYMMDD_HHMMSS_{seed}.png` — the seed is embedded so you can reproduce the exact image by passing it back as the `seed` parameter.
### Item 2 — `ImageContent` (inline base64 PNG)
The image displays **directly in Roo Code / Claude Desktop chat** as an inline image — no need to open a file browser. The same PNG is also saved to disk at the path shown in the TextContent.
```json
{
"type": "image",
"mimeType": "image/png",
"data": "<base64-encoded PNG bytes>"
}
```
### Error responses
When ComfyUI is unreachable or an error occurs, only **one** `TextContent` is returned (no ImageContent):
```
ComfyUI not reachable at http://localhost:8188. Start it with: python main.py --listen
```
```
Generation timed out after 120s. prompt_id=abc-123 — use get_generation_status to check
```
---
## 7. Environment Variables
Configure via environment variables in [`.roo/mcp.json`](../../.roo/mcp.json) or shell:
| Variable | Default | Description |
|---|---|---|
| `COMFYUI_URL` | `http://localhost:8188` | Base URL of the running ComfyUI REST API. Change this if ComfyUI runs on a different host or port. |
| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Directory where generated PNG files are saved. Supports `~` expansion. Created automatically on first generation. |
| `COMFYUI_TIMEOUT` | `120` | Maximum seconds to wait for a generation job before returning a timeout error. Increase for very large images or slow hardware. |
### Setting via shell
```bash
export COMFYUI_URL="http://localhost:8188"
export IMAGE_OUTPUT_DIR="/home/pplate/Pictures/ai-art"
export COMFYUI_TIMEOUT="300"
./run.sh
```
### Setting via mcp.json env block
```json
"mcp-image-gen": {
"command": "uv",
"args": ["--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", "run", "src/server.py"],
"env": {
"COMFYUI_URL": "http://localhost:8188",
"IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated",
"COMFYUI_TIMEOUT": "120"
}
}
```
---
## 8. Test Status
**19 pytest tests — all passing.** Tests mock all ComfyUI HTTP calls using [respx](https://lundberg.github.io/respx/). No running ComfyUI instance is needed to run the tests.
```bash
cd /home/pplate/pi_mcps/mcp/mcp-image-gen
uv run pytest tests/ -v
```
### Test coverage breakdown
| Test file | Tests | Coverage area |
|---|---|---|
| [`tests/test_server.py`](tests/test_server.py) | 19 | All 4 tools + workflow builder |
| Test name | What it verifies |
|---|---|
| `test_build_flux_workflow_structure` | Workflow has correct node class_types |
| `test_build_flux_workflow_params_injected` | All params injected into correct nodes |
| `test_negative_prompt_included` | Negative prompt goes to node 33 |
| `test_random_seed_generated` | `seed=-1` produces a valid integer in `_meta` |
| `test_list_available_models` | Returns model list from mocked `/object_info` |
| `test_list_available_models_comfyui_offline` | ConnectError → graceful error string |
| `test_get_generation_status_pending` | `prompt_id` in queue_pending → `"pending"` |
| `test_get_generation_status_running` | `prompt_id` in queue_running → `"running"` |
| `test_get_generation_status_complete` | Not in queue + in history → `"completed"` |
| `test_get_output_directory_default` | No env var → `~/Pictures/mcp-generated` expanded |
| `test_get_output_directory_custom` | Custom env var → that path returned |
| `test_generate_image_success` | Full lifecycle: queue→poll→history→view→save |
| `test_generate_image_comfyui_unavailable` | ConnectError → single TextContent error |
| `test_generate_image_timeout` | COMFYUI_TIMEOUT=0 → timeout TextContent |
| `test_generate_image_empty_prompt` | Empty string prompt → still succeeds |
| `test_generate_image_long_prompt` | 500-char prompt → not truncated, succeeds |
| `test_generate_image_invalid_model` | 404 from /prompt → error TextContent, no file saved |
| `test_generate_image_custom_output_dir` | Custom `output_dir` param → saved there, dir created |
| `test_generate_image_random_seed_variance` | `seed=-1` × 2 → different seeds, different filenames |
### Test mock stack
- **[respx](https://lundberg.github.io/respx/)** — HTTP-level mocking for all ComfyUI API endpoints
- **[Pillow](https://pillow.readthedocs.io/)** (in conftest) — generates real PNG bytes for image response fixtures
- **monkeypatch** — env vars (`IMAGE_OUTPUT_DIR`, `COMFYUI_URL`, `COMFYUI_TIMEOUT`) and server module attributes
Real image generation requires ComfyUI to be running. Tests prove the tool logic is correct at the protocol level.
---
## 9. Prompt Tips for FLUX.1-schnell
FLUX.1-schnell is a guidance-distilled model designed for speed at 18 steps. It responds differently from SDXL or SD1.5.
### Prompt structure that works well
```
[subject], [style/medium], [lighting], [camera/composition], [mood/atmosphere], [quality modifiers]
```
**Example:**
```
ancient library at night, oil painting, warm candlelight, wide angle, mysterious atmosphere, highly detailed, sharp focus
```
### Style keywords
| Style | Prompt keywords |
|---|---|
| Photography | `cinematic photograph, DSLR, 85mm lens, shallow depth of field, bokeh` |
| Oil painting | `oil painting, thick brushstrokes, textured canvas, impressionist` |
| Watercolor | `watercolor painting, soft washes, paper texture, flowing colors` |
| Digital art | `digital art, concept art, artstation, octane render` |
| Anime/illustration | `anime style, cel shading, vibrant colors, clean linework` |
| Sketch | `pencil sketch, hand drawn, crosshatching, charcoal` |
### Lighting keywords
- `golden hour`, `blue hour`, `dramatic lighting`, `rim lighting`
- `studio lighting`, `soft diffused light`, `volumetric light`
- `neon glow`, `bioluminescent`, `moonlit`, `candlelight`
### What works well with FLUX.1-schnell
- **Clear subject + style** — "red panda in a cozy library, watercolor style"
- **Landscape scenes** — fjords, forests, cities, abstract environments
- **Portrait shots** — animals and characters with descriptive appearance
- **Concept art** — futuristic cities, sci-fi environments, fantasy scenes
- **Low step counts** — 4 steps is designed to be near-optimal for this model
### What to avoid
- **Booru-style tag dumps** (FLUX handles natural language better than SD1.5)
- **Contradictory instructions** — "dark AND bright", "realistic AND cartoon"
- **Overly complex scenes** at very small resolutions
### Using the negative prompt
FLUX.1-schnell has reduced CFG guidance so negative prompts have less impact than in SDXL.
Use them for broad exclusions:
```
negative_prompt="blurry, out of focus, watermark, text, signature, low quality, artifacts"
```
### Reproducibility
Always save the seed from the TextContent output if you want to reproduce a result:
```
Seed: 3847291045
```
Then pass it back: `seed=3847291045`
---
## 10. FLUX.2 Klein 4B with Heretic Abliteration (New)
**New in this release:** Support for **FLUX.2 Klein 4B** using an **abliterated Qwen3-4B text encoder** via Heretic.
### Why Heretic?
FLUX.2 Klein uses a full LLM (Qwen3-4B) as its text encoder instead of CLIP+T5. This LLM has safety alignment that can refuse certain prompts. Heretic removes this alignment with **zero measurable KL divergence** (0.0000) and only 3/100 refusals.
### How to use it
```python
generate_image(
prompt="a beautiful cyberpunk fox in neon tokyo, highly detailed",
model="flux-2-klein-4b-fp8.safetensors",
width=1024,
height=1024,
steps=4
)
```
### Models to download
```bash
# 1. FLUX.2 Klein 4B (distilled, fp8)
huggingface-cli download black-forest-labs/FLUX.2-klein-4B \
flux-2-klein-4b-fp8.safetensors \
--local-dir ~/ComfyUI/models/diffusion_models/
# 2. FLUX.2 VAE
huggingface-cli download black-forest-labs/FLUX.2-klein-4B \
flux2-vae.safetensors \
--local-dir ~/ComfyUI/models/vae/
# 3. Heretic-abliterated Qwen3-4B (from DreamFast)
huggingface-cli download DreamFast/qwen3-4b-heretic \
--local-dir /tmp/qwen3-heretic/
cp /tmp/qwen3-heretic/model.safetensors \
~/ComfyUI/models/text_encoders/qwen_3_4b_heretic.safetensors
```
### Supported models (via `model=` parameter)
| Model | Description | VRAM | Speed | Censorship |
|-------|-------------|------|-------|------------|
| `flux1-schnell.safetensors` | Original (default) | ~8GB | Very fast | None |
| `flux-2-klein-4b-fp8.safetensors` | **New** — with Heretic Qwen3-4B | ~12GB | Fast | **Removed** |
---
## 11. Known Limitations
### ComfyUI must run locally
The MCP server connects to `COMFYUI_URL` (default: `http://localhost:8188`). ComfyUI is a local application — it does not have a cloud API. You must start it before requesting image generation. The server returns a clear error message if ComfyUI is not reachable.
### Model must be pre-loaded
ComfyUI loads checkpoint models into VRAM on first use. The first generation with a model takes longer as VRAM is allocated (FLUX.1-schnell: ~8GB). Subsequent generations with the same model are faster.
```bash
# Verify model is installed before generation
# → ask Lumen: "list available models in ComfyUI"
```
### AMD ROCm setup complexity
AMD GPU support requires:
1. ROCm drivers installed (`rocm-smi` working)
2. PyTorch built with ROCm support (not the default CUDA build)
3. `HSA_OVERRIDE_GFX_VERSION=11.0.0` for RX 7900 XTX (gfx1100)
Without these, ComfyUI will fall back to CPU — very slow (minutes per image vs. ~8 seconds on RX 7900 XTX).
Check GPU is being used:
```bash
# In another terminal while generating:
watch -n 1 rocm-smi
# VRAM usage should spike to ~8GB during generation
```
### Timeout on large images
The default `COMFYUI_TIMEOUT=120` (2 minutes) may not be enough for:
- Very large resolutions (2048×2048+)
- High step counts (20+)
- First generation loading a new model
Increase via env var:
```bash
export COMFYUI_TIMEOUT=300 # 5 minutes
```
If `generate_image` returns a timeout error, the job may still be running in ComfyUI. Use `get_generation_status(prompt_id)` to check.
### Ollama image gen is macOS-only (April 2026)
Ollama launched experimental image generation in January 2026, but it is **macOS-only** as of April 2026. Linux support is announced as "coming soon." When Linux support arrives, the server can switch backends via `IMAGE_BACKEND=ollama` without changing any tool signatures.
### ComfyUI history is ephemeral
ComfyUI keeps generation history in memory — it is lost on restart. The `get_generation_status` tool will return `"not_found"` for old prompt IDs after a ComfyUI restart. The saved PNG file on disk persists regardless.