Files
pi_mcps/mcp/mcp-image-gen/USAGE.md
T
pplate 2f01ff0639 fix(mcp-image-gen): correct ComfyUI install instructions in USAGE.md
ComfyUI is NOT on PyPI — `pip install comfyui` fails with
"No matching distribution found". Remove the wrong Option A.

Replace with:
- Warning note that pip install does not work
- Only correct method: git clone from GitHub + pip install -r requirements.txt

ROCm status confirmed: rocm-smi 3.1.0 / ROCm-SMI-LIB 7.7.0 installed.
2026-04-04 12:20:28 +02:00

19 KiB
Raw Blame History

mcp-image-gen — Usage Guide

Comprehensive reference for using the ComfyUI-backed image generation MCP server


Table of Contents

  1. Prerequisites — ComfyUI Setup
  2. Quick Start — Running the MCP Server
  3. How to Ask Lumen to Generate Images
  4. Available Tools
  5. Parameters Reference
  6. Output Format
  7. Environment Variables
  8. Test Status
  9. Prompt Tips for FLUX.1-schnell
  10. Known Limitations

1. Prerequisites — ComfyUI Setup

ComfyUI must be running before any image generation tool call succeeds.

The MCP server connects to ComfyUI's REST API at http://localhost:8188. If ComfyUI is not running, generate_image and list_available_models will return a graceful error message — no crash.

Install ComfyUI

⚠️ ComfyUI is NOT on PyPIpip install comfyui will fail with "No matching distribution found". It must be installed from source via git clone.

# Clone from source (the only correct installation method)
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Install PyTorch with ROCm (AMD RX 7900 XTX)

Patrick's RX 7900 XTX (gfx1100, 24GB VRAM) uses the ROCm backend. Standard CUDA builds will not work on AMD hardware.

# PyTorch with ROCm 6.1 support
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.1

ROCm version note: ROCm 7.2.1 is the current production release as of April 2026. Check rocm-smi to confirm your ROCm version before installing torch.

Download FLUX.1-schnell (Primary Model)

FLUX.1-schnell is the recommended model — fast (4 steps), Apache 2.0 licensed, excellent quality.

# Download (~8GB) — place in ComfyUI/models/checkpoints/
wget https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors \
     -O ~/ComfyUI/models/checkpoints/flux1-schnell.safetensors

# Or use huggingface_hub:
huggingface-cli download black-forest-labs/FLUX.1-schnell \
    flux1-schnell.safetensors \
    --local-dir ~/ComfyUI/models/checkpoints/

You'll also need the CLIP and VAE models — see the ComfyUI FLUX guide for full model list.

Start ComfyUI (AMD ROCm)

# Standard start — listens on all interfaces at port 8188
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen

# Or with explicit port
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen --port 8188

HSA_OVERRIDE_GFX_VERSION=11.0.0 — Required for RX 7900 XTX (gfx1100). Without this, ROCm may fail to detect the GPU correctly. This tells the HIP runtime to treat the GPU as gfx1100 architecture.

Verify ComfyUI is Running

curl -s http://localhost:8188/system_stats | python3 -m json.tool | head -20

Expected response includes system object with python_version, pytorch_version, embedded_python, and comfyui_version.


2. Quick Start — Running the MCP Server

cd /home/pplate/pi_mcps/mcp/mcp-image-gen
./run.sh

run.sh automatically:

  • Sets PATH to include ~/.local/bin for uv
  • Creates IMAGE_OUTPUT_DIR (~/Pictures/mcp-generated) if it doesn't exist
  • Launches the FastMCP server via uv run src/server.py (stdio transport)

Via uv directly

cd /home/pplate/pi_mcps/mcp/mcp-image-gen
uv run src/server.py

Wired into .roo/mcp.json

The server is already configured in .roo/mcp.json:

"mcp-image-gen": {
  "command": "uv",
  "args": [
    "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
    "run", "src/server.py"
  ],
  "env": {
    "COMFYUI_URL": "http://localhost:8188",
    "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
  }
}

Roo Code / Claude Desktop will auto-start the server when any image generation tool is invoked. The MCP server itself starts in ~1 second — ComfyUI must already be running separately.

Install dependencies (first time)

cd /home/pplate/pi_mcps/mcp/mcp-image-gen
uv sync

3. How to Ask Lumen to Generate Images

Just speak naturally. Lumen will call the appropriate MCP tool automatically.

Basic generation

"Generate an image of a futuristic city at sunset"

→ generate_image(prompt="futuristic city at sunset", width=1024, height=1024, steps=4)

Specific style and size

"Create a portrait of a red fox in watercolor style, 1024x1024"

→ generate_image(
    prompt="portrait of a red fox, watercolor style, detailed fur, soft brushstrokes",
    width=1024, height=1024
  )

Reproducible with a fixed seed

"Make an image with seed 42 so I can reproduce it"

→ generate_image(prompt="...", seed=42)

The seed is reported in the text output so you can use the same seed again.

Landscape format

"Generate a wide cinematic landscape of a Norwegian fjord, 1920x1080"

→ generate_image(prompt="Norwegian fjord, cinematic, golden hour", width=1920, height=1080)

Excluding unwanted elements

"Generate a clean product photo of a coffee mug, no background clutter, no text"

→ generate_image(
    prompt="product photo of a ceramic coffee mug, studio lighting, white background",
    negative_prompt="clutter, text, watermark, blurry, shadows"
  )

More inference steps for higher quality

"Generate a highly detailed oil painting of a medieval castle, use 20 steps"

→ generate_image(
    prompt="oil painting of a medieval castle, highly detailed, dramatic lighting",
    steps=20,
    model="flux1-dev.safetensors"   # FLUX.1-dev supports higher step counts better
  )

Check what models are available

"List what models are available in ComfyUI"

→ list_available_models()

Check status of a long-running job

"What's the status of prompt ID abc-123?"

→ get_generation_status(prompt_id="abc-123")

Find out where images are saved

"Where are my generated images being saved?"

→ get_output_directory()

4. Available Tools

generate_image

Generate an image from a text prompt using ComfyUI's FLUX.1-schnell workflow.

Full signature:

async def generate_image(
    prompt: str,
    width: int = 1024,
    height: int = 1024,
    steps: int = 4,
    model: str = "flux1-schnell.safetensors",
    seed: int = -1,
    negative_prompt: str = "",
    output_dir: str = "",
) -> list[TextContent | ImageContent]

What it does:

  1. Loads the bundled flux_schnell.json ComfyUI API workflow template
  2. Injects your prompt, dimensions, seed, model into the correct workflow nodes
  3. Submits the workflow to ComfyUI via POST /api/prompt
  4. Polls /api/queue every 2 seconds until the job leaves the queue
  5. Fetches history via /api/history/{prompt_id} to find the output filename
  6. Downloads the PNG from /api/view
  7. Saves the PNG to disk as YYYYMMDD_HHMMSS_{seed}.png
  8. Returns [TextContent(path + metadata), ImageContent(base64 PNG)]

list_available_models

List all checkpoint models currently available in ComfyUI.

async def list_available_models() -> list[str]

Calls /object_info/CheckpointLoaderSimple and extracts the checkpoint name list. Use this to discover what models are installed before passing a model name to generate_image.

Example return:

["flux1-schnell.safetensors", "flux1-dev.safetensors", "sd_xl_base_1.0.safetensors"]

get_generation_status

Check the status of a queued or running generation job.

async def get_generation_status(prompt_id: str) -> dict

Return values:

status Meaning
"pending" Job is in the queue, not yet started
"running" Job is currently being processed
"completed" Job finished — image is in ComfyUI's history
"not_found" Unknown prompt_id — may have expired from history
"error" ComfyUI was unreachable

Useful when generate_image times out (default 120s) — the job may still be running in ComfyUI.


get_output_directory

Return the absolute path where generated images will be saved.

def get_output_directory() -> str

Returns the expanded, absolute path derived from IMAGE_OUTPUT_DIR env var (or ~/Pictures/mcp-generated default). The directory may not exist yet — generate_image creates it on first use.


5. Parameters Reference

Full parameter table for generate_image:

Parameter Type Default Description
prompt str (required) Text description of the image. Goes into the positive CLIP text encoder node.
width int 1024 Image width in pixels. FLUX.1-schnell: 5122048 recommended.
height int 1024 Image height in pixels. FLUX.1-schnell: 5122048 recommended.
steps int 4 Number of KSampler inference steps. FLUX.1-schnell is designed for 18 steps.
model str "flux1-schnell.safetensors" Checkpoint model filename as listed by list_available_models.
seed int -1 RNG seed for reproducibility. -1 = new random seed each call (0 to 2³²−1).
negative_prompt str "" Text description of things to exclude. Goes into negative CLIP encoder node.
output_dir str "" Override save directory. Empty = uses IMAGE_OUTPUT_DIR env var or default.
Use case Width Height
Square (default) 1024 1024
Portrait 768 1024
Landscape 1024 768
Widescreen 1280 720
HD widescreen 1920 1080
Tall portrait 512 768

VRAM note: Patrick's RX 7900 XTX has 24GB VRAM. FLUX.1-schnell requires ~8GB, so you can comfortably run 1920×1080 and even larger. FLUX.1-dev requires ~12GB.


6. Output Format

generate_image returns a list with two items when successful:

Item 1 — TextContent (file path + metadata)

Generated: /home/pplate/Pictures/mcp-generated/20260404_121500_3847291045.png
Seed: 3847291045
Elapsed: 8.3s
Size: 1024x1024, Steps: 4, Model: flux1-schnell.safetensors

The filename format is YYYYMMDD_HHMMSS_{seed}.png — the seed is embedded so you can reproduce the exact image by passing it back as the seed parameter.

Item 2 — ImageContent (inline base64 PNG)

The image displays directly in Roo Code / Claude Desktop chat as an inline image — no need to open a file browser. The same PNG is also saved to disk at the path shown in the TextContent.

{
  "type": "image",
  "mimeType": "image/png",
  "data": "<base64-encoded PNG bytes>"
}

Error responses

When ComfyUI is unreachable or an error occurs, only one TextContent is returned (no ImageContent):

ComfyUI not reachable at http://localhost:8188. Start it with: python main.py --listen
Generation timed out after 120s. prompt_id=abc-123 — use get_generation_status to check

7. Environment Variables

Configure via environment variables in .roo/mcp.json or shell:

Variable Default Description
COMFYUI_URL http://localhost:8188 Base URL of the running ComfyUI REST API. Change this if ComfyUI runs on a different host or port.
IMAGE_OUTPUT_DIR ~/Pictures/mcp-generated Directory where generated PNG files are saved. Supports ~ expansion. Created automatically on first generation.
COMFYUI_TIMEOUT 120 Maximum seconds to wait for a generation job before returning a timeout error. Increase for very large images or slow hardware.

Setting via shell

export COMFYUI_URL="http://localhost:8188"
export IMAGE_OUTPUT_DIR="/home/pplate/Pictures/ai-art"
export COMFYUI_TIMEOUT="300"
./run.sh

Setting via mcp.json env block

"mcp-image-gen": {
  "command": "uv",
  "args": ["--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", "run", "src/server.py"],
  "env": {
    "COMFYUI_URL": "http://localhost:8188",
    "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated",
    "COMFYUI_TIMEOUT": "120"
  }
}

8. Test Status

19 pytest tests — all passing. Tests mock all ComfyUI HTTP calls using respx. No running ComfyUI instance is needed to run the tests.

cd /home/pplate/pi_mcps/mcp/mcp-image-gen
uv run pytest tests/ -v

Test coverage breakdown

Test file Tests Coverage area
tests/test_server.py 19 All 4 tools + workflow builder
Test name What it verifies
test_build_flux_workflow_structure Workflow has correct node class_types
test_build_flux_workflow_params_injected All params injected into correct nodes
test_negative_prompt_included Negative prompt goes to node 33
test_random_seed_generated seed=-1 produces a valid integer in _meta
test_list_available_models Returns model list from mocked /object_info
test_list_available_models_comfyui_offline ConnectError → graceful error string
test_get_generation_status_pending prompt_id in queue_pending → "pending"
test_get_generation_status_running prompt_id in queue_running → "running"
test_get_generation_status_complete Not in queue + in history → "completed"
test_get_output_directory_default No env var → ~/Pictures/mcp-generated expanded
test_get_output_directory_custom Custom env var → that path returned
test_generate_image_success Full lifecycle: queue→poll→history→view→save
test_generate_image_comfyui_unavailable ConnectError → single TextContent error
test_generate_image_timeout COMFYUI_TIMEOUT=0 → timeout TextContent
test_generate_image_empty_prompt Empty string prompt → still succeeds
test_generate_image_long_prompt 500-char prompt → not truncated, succeeds
test_generate_image_invalid_model 404 from /prompt → error TextContent, no file saved
test_generate_image_custom_output_dir Custom output_dir param → saved there, dir created
test_generate_image_random_seed_variance seed=-1 × 2 → different seeds, different filenames

Test mock stack

  • respx — HTTP-level mocking for all ComfyUI API endpoints
  • Pillow (in conftest) — generates real PNG bytes for image response fixtures
  • monkeypatch — env vars (IMAGE_OUTPUT_DIR, COMFYUI_URL, COMFYUI_TIMEOUT) and server module attributes

Real image generation requires ComfyUI to be running. Tests prove the tool logic is correct at the protocol level.


9. Prompt Tips for FLUX.1-schnell

FLUX.1-schnell is a guidance-distilled model designed for speed at 18 steps. It responds differently from SDXL or SD1.5.

Prompt structure that works well

[subject], [style/medium], [lighting], [camera/composition], [mood/atmosphere], [quality modifiers]

Example:

ancient library at night, oil painting, warm candlelight, wide angle, mysterious atmosphere, highly detailed, sharp focus

Style keywords

Style Prompt keywords
Photography cinematic photograph, DSLR, 85mm lens, shallow depth of field, bokeh
Oil painting oil painting, thick brushstrokes, textured canvas, impressionist
Watercolor watercolor painting, soft washes, paper texture, flowing colors
Digital art digital art, concept art, artstation, octane render
Anime/illustration anime style, cel shading, vibrant colors, clean linework
Sketch pencil sketch, hand drawn, crosshatching, charcoal

Lighting keywords

  • golden hour, blue hour, dramatic lighting, rim lighting
  • studio lighting, soft diffused light, volumetric light
  • neon glow, bioluminescent, moonlit, candlelight

What works well with FLUX.1-schnell

  • Clear subject + style — "red panda in a cozy library, watercolor style"
  • Landscape scenes — fjords, forests, cities, abstract environments
  • Portrait shots — animals and characters with descriptive appearance
  • Concept art — futuristic cities, sci-fi environments, fantasy scenes
  • Low step counts — 4 steps is designed to be near-optimal for this model

What to avoid

  • Booru-style tag dumps (FLUX handles natural language better than SD1.5)
  • Contradictory instructions — "dark AND bright", "realistic AND cartoon"
  • Overly complex scenes at very small resolutions

Using the negative prompt

FLUX.1-schnell has reduced CFG guidance so negative prompts have less impact than in SDXL. Use them for broad exclusions:

negative_prompt="blurry, out of focus, watermark, text, signature, low quality, artifacts"

Reproducibility

Always save the seed from the TextContent output if you want to reproduce a result:

Seed: 3847291045

Then pass it back: seed=3847291045


10. Known Limitations

ComfyUI must run locally

The MCP server connects to COMFYUI_URL (default: http://localhost:8188). ComfyUI is a local application — it does not have a cloud API. You must start it before requesting image generation. The server returns a clear error message if ComfyUI is not reachable.

Model must be pre-loaded

ComfyUI loads checkpoint models into VRAM on first use. The first generation with a model takes longer as VRAM is allocated (FLUX.1-schnell: ~8GB). Subsequent generations with the same model are faster.

# Verify model is installed before generation
# → ask Lumen: "list available models in ComfyUI"

AMD ROCm setup complexity

AMD GPU support requires:

  1. ROCm drivers installed (rocm-smi working)
  2. PyTorch built with ROCm support (not the default CUDA build)
  3. HSA_OVERRIDE_GFX_VERSION=11.0.0 for RX 7900 XTX (gfx1100)

Without these, ComfyUI will fall back to CPU — very slow (minutes per image vs. ~8 seconds on RX 7900 XTX).

Check GPU is being used:

# In another terminal while generating:
watch -n 1 rocm-smi
# VRAM usage should spike to ~8GB during generation

Timeout on large images

The default COMFYUI_TIMEOUT=120 (2 minutes) may not be enough for:

  • Very large resolutions (2048×2048+)
  • High step counts (20+)
  • First generation loading a new model

Increase via env var:

export COMFYUI_TIMEOUT=300  # 5 minutes

If generate_image returns a timeout error, the job may still be running in ComfyUI. Use get_generation_status(prompt_id) to check.

Ollama image gen is macOS-only (April 2026)

Ollama launched experimental image generation in January 2026, but it is macOS-only as of April 2026. Linux support is announced as "coming soon." When Linux support arrives, the server can switch backends via IMAGE_BACKEND=ollama without changing any tool signatures.

ComfyUI history is ephemeral

ComfyUI keeps generation history in memory — it is lost on restart. The get_generation_status tool will return "not_found" for old prompt IDs after a ComfyUI restart. The saved PNG file on disk persists regardless.