# mcp-image-gen — Usage Guide > **Comprehensive reference for using the ComfyUI-backed image generation MCP server** --- ## Table of Contents 1. [Prerequisites — ComfyUI Setup](#1-prerequisites--comfyui-setup) 2. [Quick Start — Running the MCP Server](#2-quick-start--running-the-mcp-server) 3. [How to Ask Lumen to Generate Images](#3-how-to-ask-lumen-to-generate-images) 4. [Available Tools](#4-available-tools) 5. [Parameters Reference](#5-parameters-reference) 6. [Output Format](#6-output-format) 7. [Environment Variables](#7-environment-variables) 8. [Test Status](#8-test-status) 9. [Prompt Tips for FLUX.1-schnell](#9-prompt-tips-for-flux1-schnell) 10. [Known Limitations](#10-known-limitations) --- ## 1. Prerequisites — ComfyUI Setup ### ComfyUI must be running before any image generation tool call succeeds. The MCP server connects to ComfyUI's REST API at `http://localhost:8188`. If ComfyUI is not running, `generate_image` and `list_available_models` will return a graceful error message — no crash. ### Install ComfyUI > ⚠️ **ComfyUI is NOT on PyPI** — `pip install comfyui` will fail with "No matching distribution found". > It must be installed from source via `git clone`. ```bash # Clone from source (the only correct installation method) git clone https://github.com/comfyanonymous/ComfyUI.git cd ComfyUI pip install -r requirements.txt ``` ### Install PyTorch with ROCm (AMD RX 7900 XTX) Patrick's RX 7900 XTX (gfx1100, 24GB VRAM) uses the ROCm backend. Standard CUDA builds **will not work** on AMD hardware. ```bash # PyTorch with ROCm 6.1 support pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.1 ``` > **ROCm version note:** ROCm 7.2.1 is the current production release as of April 2026. > Check `rocm-smi` to confirm your ROCm version before installing torch. ### Download FLUX.1-schnell (Primary Model) FLUX.1-schnell is the recommended model — fast (4 steps), Apache 2.0 licensed, excellent quality. > ⚠️ **FLUX.1-schnell is a gated model on HuggingFace.** > A bare `wget` on the URL returns HTTP 401. You must: > 1. Accept the license at https://huggingface.co/black-forest-labs/FLUX.1-schnell (click **"Agree and access repository"** — one-time) > 2. Create a HuggingFace access token with **Read** permissions at https://huggingface.co/settings/tokens #### Option A — `huggingface-cli` (recommended) ```bash # Install the HuggingFace Hub CLI pip install huggingface_hub # Log in — paste your Read token when prompted huggingface-cli login # Download (~8GB) directly into ComfyUI checkpoints huggingface-cli download black-forest-labs/FLUX.1-schnell \ flux1-schnell.safetensors \ --local-dir ~/ComfyUI/models/checkpoints/ ``` #### Option B — `wget` with Authorization header ```bash wget --header="Authorization: Bearer hf_YOUR_TOKEN_HERE" \ https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors \ -O ~/ComfyUI/models/checkpoints/flux1-schnell.safetensors ``` > Replace `hf_YOUR_TOKEN_HERE` with your actual HuggingFace token from https://huggingface.co/settings/tokens #### Alternative: fp8 quantized variant (~8.1GB, faster inference) If you want slightly faster inference with near-identical quality, the fp8 quantized version is also available: ```bash huggingface-cli download black-forest-labs/FLUX.1-schnell-fp8 \ flux1-schnell-fp8.safetensors \ --local-dir ~/ComfyUI/models/checkpoints/ ``` > **Download note:** Both variants are ~8GB — expect 10–30 minutes depending on connection speed. You'll also need the CLIP and VAE models — see the [ComfyUI FLUX guide](https://github.com/comfyanonymous/ComfyUI/blob/master/README.md) for full model list. ### Start ComfyUI (AMD ROCm) ```bash # Standard start — listens on all interfaces at port 8188 HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen # Or with explicit port HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen --port 8188 ``` > **`HSA_OVERRIDE_GFX_VERSION=11.0.0`** — Required for RX 7900 XTX (gfx1100). > Without this, ROCm may fail to detect the GPU correctly. This tells the HIP runtime > to treat the GPU as gfx1100 architecture. ### Verify ComfyUI is Running ```bash curl -s http://localhost:8188/system_stats | python3 -m json.tool | head -20 ``` Expected response includes `system` object with `python_version`, `pytorch_version`, `embedded_python`, and `comfyui_version`. --- ## 2. Quick Start — Running the MCP Server ### Via `run.sh` (recommended) ```bash cd /home/pplate/pi_mcps/mcp/mcp-image-gen ./run.sh ``` [`run.sh`](run.sh) automatically: - Sets `PATH` to include `~/.local/bin` for `uv` - Creates `IMAGE_OUTPUT_DIR` (`~/Pictures/mcp-generated`) if it doesn't exist - Launches the FastMCP server via `uv run src/server.py` (stdio transport) ### Via uv directly ```bash cd /home/pplate/pi_mcps/mcp/mcp-image-gen uv run src/server.py ``` ### Wired into `.roo/mcp.json` The server is already configured in [`.roo/mcp.json`](../../.roo/mcp.json): ```json "mcp-image-gen": { "command": "uv", "args": [ "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", "run", "src/server.py" ], "env": { "COMFYUI_URL": "http://localhost:8188", "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated" } } ``` Roo Code / Claude Desktop will auto-start the server when any image generation tool is invoked. The MCP server itself starts in ~1 second — ComfyUI must already be running separately. ### Install dependencies (first time) ```bash cd /home/pplate/pi_mcps/mcp/mcp-image-gen uv sync ``` --- ## 3. How to Ask Lumen to Generate Images Just speak naturally. Lumen will call the appropriate MCP tool automatically. ### Basic generation > *"Generate an image of a futuristic city at sunset"* ``` → generate_image(prompt="futuristic city at sunset", width=1024, height=1024, steps=4) ``` ### Specific style and size > *"Create a portrait of a red fox in watercolor style, 1024x1024"* ``` → generate_image( prompt="portrait of a red fox, watercolor style, detailed fur, soft brushstrokes", width=1024, height=1024 ) ``` ### Reproducible with a fixed seed > *"Make an image with seed 42 so I can reproduce it"* ``` → generate_image(prompt="...", seed=42) ``` The seed is reported in the text output so you can use the same seed again. ### Landscape format > *"Generate a wide cinematic landscape of a Norwegian fjord, 1920x1080"* ``` → generate_image(prompt="Norwegian fjord, cinematic, golden hour", width=1920, height=1080) ``` ### Excluding unwanted elements > *"Generate a clean product photo of a coffee mug, no background clutter, no text"* ``` → generate_image( prompt="product photo of a ceramic coffee mug, studio lighting, white background", negative_prompt="clutter, text, watermark, blurry, shadows" ) ``` ### More inference steps for higher quality > *"Generate a highly detailed oil painting of a medieval castle, use 20 steps"* ``` → generate_image( prompt="oil painting of a medieval castle, highly detailed, dramatic lighting", steps=20, model="flux1-dev.safetensors" # FLUX.1-dev supports higher step counts better ) ``` ### Check what models are available > *"List what models are available in ComfyUI"* ``` → list_available_models() ``` ### Check status of a long-running job > *"What's the status of prompt ID abc-123?"* ``` → get_generation_status(prompt_id="abc-123") ``` ### Find out where images are saved > *"Where are my generated images being saved?"* ``` → get_output_directory() ``` --- ## 4. Available Tools ### `generate_image` Generate an image from a text prompt using ComfyUI's FLUX.1-schnell workflow. **Full signature:** ```python async def generate_image( prompt: str, width: int = 1024, height: int = 1024, steps: int = 4, model: str = "flux1-schnell.safetensors", seed: int = -1, negative_prompt: str = "", output_dir: str = "", ) -> list[TextContent | ImageContent] ``` **What it does:** 1. Loads the bundled `flux_schnell.json` ComfyUI API workflow template 2. Injects your prompt, dimensions, seed, model into the correct workflow nodes 3. Submits the workflow to ComfyUI via `POST /api/prompt` 4. Polls `/api/queue` every 2 seconds until the job leaves the queue 5. Fetches history via `/api/history/{prompt_id}` to find the output filename 6. Downloads the PNG from `/api/view` 7. Saves the PNG to disk as `YYYYMMDD_HHMMSS_{seed}.png` 8. Returns `[TextContent(path + metadata), ImageContent(base64 PNG)]` --- ### `list_available_models` List all checkpoint models currently available in ComfyUI. ```python async def list_available_models() -> list[str] ``` Calls `/object_info/CheckpointLoaderSimple` and extracts the checkpoint name list. Use this to discover what models are installed before passing a `model` name to `generate_image`. **Example return:** ```json ["flux1-schnell.safetensors", "flux1-dev.safetensors", "sd_xl_base_1.0.safetensors"] ``` --- ### `get_generation_status` Check the status of a queued or running generation job. ```python async def get_generation_status(prompt_id: str) -> dict ``` **Return values:** | `status` | Meaning | |---|---| | `"pending"` | Job is in the queue, not yet started | | `"running"` | Job is currently being processed | | `"completed"` | Job finished — image is in ComfyUI's history | | `"not_found"` | Unknown prompt_id — may have expired from history | | `"error"` | ComfyUI was unreachable | Useful when `generate_image` times out (default 120s) — the job may still be running in ComfyUI. --- ### `get_output_directory` Return the absolute path where generated images will be saved. ```python def get_output_directory() -> str ``` Returns the expanded, absolute path derived from `IMAGE_OUTPUT_DIR` env var (or `~/Pictures/mcp-generated` default). The directory may not exist yet — `generate_image` creates it on first use. --- ## 5. Parameters Reference Full parameter table for `generate_image`: | Parameter | Type | Default | Description | |---|---|---|---| | `prompt` | `str` | *(required)* | Text description of the image. Goes into the positive CLIP text encoder node. | | `width` | `int` | `1024` | Image width in pixels. FLUX.1-schnell: 512–2048 recommended. | | `height` | `int` | `1024` | Image height in pixels. FLUX.1-schnell: 512–2048 recommended. | | `steps` | `int` | `4` | Number of KSampler inference steps. FLUX.1-schnell is designed for 1–8 steps. | | `model` | `str` | `"flux1-schnell.safetensors"` | Checkpoint model filename as listed by `list_available_models`. | | `seed` | `int` | `-1` | RNG seed for reproducibility. `-1` = new random seed each call (0 to 2³²−1). | | `negative_prompt` | `str` | `""` | Text description of things to exclude. Goes into negative CLIP encoder node. | | `output_dir` | `str` | `""` | Override save directory. Empty = uses `IMAGE_OUTPUT_DIR` env var or default. | ### Recommended dimensions | Use case | Width | Height | |---|---|---| | Square (default) | 1024 | 1024 | | Portrait | 768 | 1024 | | Landscape | 1024 | 768 | | Widescreen | 1280 | 720 | | HD widescreen | 1920 | 1080 | | Tall portrait | 512 | 768 | > **VRAM note:** Patrick's RX 7900 XTX has 24GB VRAM. FLUX.1-schnell requires ~8GB, > so you can comfortably run 1920×1080 and even larger. FLUX.1-dev requires ~12GB. --- ## 6. Output Format `generate_image` returns a list with **two items** when successful: ### Item 1 — `TextContent` (file path + metadata) ``` Generated: /home/pplate/Pictures/mcp-generated/20260404_121500_3847291045.png Seed: 3847291045 Elapsed: 8.3s Size: 1024x1024, Steps: 4, Model: flux1-schnell.safetensors ``` The filename format is `YYYYMMDD_HHMMSS_{seed}.png` — the seed is embedded so you can reproduce the exact image by passing it back as the `seed` parameter. ### Item 2 — `ImageContent` (inline base64 PNG) The image displays **directly in Roo Code / Claude Desktop chat** as an inline image — no need to open a file browser. The same PNG is also saved to disk at the path shown in the TextContent. ```json { "type": "image", "mimeType": "image/png", "data": "" } ``` ### Error responses When ComfyUI is unreachable or an error occurs, only **one** `TextContent` is returned (no ImageContent): ``` ComfyUI not reachable at http://localhost:8188. Start it with: python main.py --listen ``` ``` Generation timed out after 120s. prompt_id=abc-123 — use get_generation_status to check ``` --- ## 7. Environment Variables Configure via environment variables in [`.roo/mcp.json`](../../.roo/mcp.json) or shell: | Variable | Default | Description | |---|---|---| | `COMFYUI_URL` | `http://localhost:8188` | Base URL of the running ComfyUI REST API. Change this if ComfyUI runs on a different host or port. | | `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Directory where generated PNG files are saved. Supports `~` expansion. Created automatically on first generation. | | `COMFYUI_TIMEOUT` | `120` | Maximum seconds to wait for a generation job before returning a timeout error. Increase for very large images or slow hardware. | ### Setting via shell ```bash export COMFYUI_URL="http://localhost:8188" export IMAGE_OUTPUT_DIR="/home/pplate/Pictures/ai-art" export COMFYUI_TIMEOUT="300" ./run.sh ``` ### Setting via mcp.json env block ```json "mcp-image-gen": { "command": "uv", "args": ["--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", "run", "src/server.py"], "env": { "COMFYUI_URL": "http://localhost:8188", "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated", "COMFYUI_TIMEOUT": "120" } } ``` --- ## 8. Test Status **19 pytest tests — all passing.** Tests mock all ComfyUI HTTP calls using [respx](https://lundberg.github.io/respx/). No running ComfyUI instance is needed to run the tests. ```bash cd /home/pplate/pi_mcps/mcp/mcp-image-gen uv run pytest tests/ -v ``` ### Test coverage breakdown | Test file | Tests | Coverage area | |---|---|---| | [`tests/test_server.py`](tests/test_server.py) | 19 | All 4 tools + workflow builder | | Test name | What it verifies | |---|---| | `test_build_flux_workflow_structure` | Workflow has correct node class_types | | `test_build_flux_workflow_params_injected` | All params injected into correct nodes | | `test_negative_prompt_included` | Negative prompt goes to node 33 | | `test_random_seed_generated` | `seed=-1` produces a valid integer in `_meta` | | `test_list_available_models` | Returns model list from mocked `/object_info` | | `test_list_available_models_comfyui_offline` | ConnectError → graceful error string | | `test_get_generation_status_pending` | `prompt_id` in queue_pending → `"pending"` | | `test_get_generation_status_running` | `prompt_id` in queue_running → `"running"` | | `test_get_generation_status_complete` | Not in queue + in history → `"completed"` | | `test_get_output_directory_default` | No env var → `~/Pictures/mcp-generated` expanded | | `test_get_output_directory_custom` | Custom env var → that path returned | | `test_generate_image_success` | Full lifecycle: queue→poll→history→view→save | | `test_generate_image_comfyui_unavailable` | ConnectError → single TextContent error | | `test_generate_image_timeout` | COMFYUI_TIMEOUT=0 → timeout TextContent | | `test_generate_image_empty_prompt` | Empty string prompt → still succeeds | | `test_generate_image_long_prompt` | 500-char prompt → not truncated, succeeds | | `test_generate_image_invalid_model` | 404 from /prompt → error TextContent, no file saved | | `test_generate_image_custom_output_dir` | Custom `output_dir` param → saved there, dir created | | `test_generate_image_random_seed_variance` | `seed=-1` × 2 → different seeds, different filenames | ### Test mock stack - **[respx](https://lundberg.github.io/respx/)** — HTTP-level mocking for all ComfyUI API endpoints - **[Pillow](https://pillow.readthedocs.io/)** (in conftest) — generates real PNG bytes for image response fixtures - **monkeypatch** — env vars (`IMAGE_OUTPUT_DIR`, `COMFYUI_URL`, `COMFYUI_TIMEOUT`) and server module attributes Real image generation requires ComfyUI to be running. Tests prove the tool logic is correct at the protocol level. --- ## 9. Prompt Tips for FLUX.1-schnell FLUX.1-schnell is a guidance-distilled model designed for speed at 1–8 steps. It responds differently from SDXL or SD1.5. ### Prompt structure that works well ``` [subject], [style/medium], [lighting], [camera/composition], [mood/atmosphere], [quality modifiers] ``` **Example:** ``` ancient library at night, oil painting, warm candlelight, wide angle, mysterious atmosphere, highly detailed, sharp focus ``` ### Style keywords | Style | Prompt keywords | |---|---| | Photography | `cinematic photograph, DSLR, 85mm lens, shallow depth of field, bokeh` | | Oil painting | `oil painting, thick brushstrokes, textured canvas, impressionist` | | Watercolor | `watercolor painting, soft washes, paper texture, flowing colors` | | Digital art | `digital art, concept art, artstation, octane render` | | Anime/illustration | `anime style, cel shading, vibrant colors, clean linework` | | Sketch | `pencil sketch, hand drawn, crosshatching, charcoal` | ### Lighting keywords - `golden hour`, `blue hour`, `dramatic lighting`, `rim lighting` - `studio lighting`, `soft diffused light`, `volumetric light` - `neon glow`, `bioluminescent`, `moonlit`, `candlelight` ### What works well with FLUX.1-schnell - **Clear subject + style** — "red panda in a cozy library, watercolor style" - **Landscape scenes** — fjords, forests, cities, abstract environments - **Portrait shots** — animals and characters with descriptive appearance - **Concept art** — futuristic cities, sci-fi environments, fantasy scenes - **Low step counts** — 4 steps is designed to be near-optimal for this model ### What to avoid - **Booru-style tag dumps** (FLUX handles natural language better than SD1.5) - **Contradictory instructions** — "dark AND bright", "realistic AND cartoon" - **Overly complex scenes** at very small resolutions ### Using the negative prompt FLUX.1-schnell has reduced CFG guidance so negative prompts have less impact than in SDXL. Use them for broad exclusions: ``` negative_prompt="blurry, out of focus, watermark, text, signature, low quality, artifacts" ``` ### Reproducibility Always save the seed from the TextContent output if you want to reproduce a result: ``` Seed: 3847291045 ``` Then pass it back: `seed=3847291045` --- ## 10. FLUX.2 Klein 4B with Heretic Abliteration (New) **New in this release:** Support for **FLUX.2 Klein 4B** using an **abliterated Qwen3-4B text encoder** via Heretic. ### Why Heretic? FLUX.2 Klein uses a full LLM (Qwen3-4B) as its text encoder instead of CLIP+T5. This LLM has safety alignment that can refuse certain prompts. Heretic removes this alignment with **zero measurable KL divergence** (0.0000) and only 3/100 refusals. ### How to use it ```python generate_image( prompt="a beautiful cyberpunk fox in neon tokyo, highly detailed", model="flux-2-klein-4b-fp8.safetensors", width=1024, height=1024, steps=4 ) ``` ### Models to download ```bash # 1. FLUX.2 Klein 4B (distilled, fp8) huggingface-cli download black-forest-labs/FLUX.2-klein-4B \ flux-2-klein-4b-fp8.safetensors \ --local-dir ~/ComfyUI/models/diffusion_models/ # 2. FLUX.2 VAE huggingface-cli download black-forest-labs/FLUX.2-klein-4B \ flux2-vae.safetensors \ --local-dir ~/ComfyUI/models/vae/ # 3. Heretic-abliterated Qwen3-4B (from DreamFast) huggingface-cli download DreamFast/qwen3-4b-heretic \ --local-dir /tmp/qwen3-heretic/ cp /tmp/qwen3-heretic/model.safetensors \ ~/ComfyUI/models/text_encoders/qwen_3_4b_heretic.safetensors ``` ### Supported models (via `model=` parameter) | Model | Description | VRAM | Speed | Censorship | |-------|-------------|------|-------|------------| | `flux1-schnell.safetensors` | Original (default) | ~8GB | Very fast | None | | `flux-2-klein-4b-fp8.safetensors` | **New** — with Heretic Qwen3-4B | ~12GB | Fast | **Removed** | --- ## 11. Known Limitations ### ComfyUI must run locally The MCP server connects to `COMFYUI_URL` (default: `http://localhost:8188`). ComfyUI is a local application — it does not have a cloud API. You must start it before requesting image generation. The server returns a clear error message if ComfyUI is not reachable. ### Model must be pre-loaded ComfyUI loads checkpoint models into VRAM on first use. The first generation with a model takes longer as VRAM is allocated (FLUX.1-schnell: ~8GB). Subsequent generations with the same model are faster. ```bash # Verify model is installed before generation # → ask Lumen: "list available models in ComfyUI" ``` ### AMD ROCm setup complexity AMD GPU support requires: 1. ROCm drivers installed (`rocm-smi` working) 2. PyTorch built with ROCm support (not the default CUDA build) 3. `HSA_OVERRIDE_GFX_VERSION=11.0.0` for RX 7900 XTX (gfx1100) Without these, ComfyUI will fall back to CPU — very slow (minutes per image vs. ~8 seconds on RX 7900 XTX). Check GPU is being used: ```bash # In another terminal while generating: watch -n 1 rocm-smi # VRAM usage should spike to ~8GB during generation ``` ### Timeout on large images The default `COMFYUI_TIMEOUT=120` (2 minutes) may not be enough for: - Very large resolutions (2048×2048+) - High step counts (20+) - First generation loading a new model Increase via env var: ```bash export COMFYUI_TIMEOUT=300 # 5 minutes ``` If `generate_image` returns a timeout error, the job may still be running in ComfyUI. Use `get_generation_status(prompt_id)` to check. ### Ollama image gen is macOS-only (April 2026) Ollama launched experimental image generation in January 2026, but it is **macOS-only** as of April 2026. Linux support is announced as "coming soon." When Linux support arrives, the server can switch backends via `IMAGE_BACKEND=ollama` without changing any tool signatures. ### ComfyUI history is ephemeral ComfyUI keeps generation history in memory — it is lost on restart. The `get_generation_status` tool will return `"not_found"` for old prompt IDs after a ComfyUI restart. The saved PNG file on disk persists regardless.