- FastMCP server with 4 tools: generate_image, list_available_models, get_generation_status, get_output_directory - ComfyUI REST API client (httpx) polling lifecycle - FLUX.1-schnell workflow JSON template - Dual output: TextContent (path + seed) + ImageContent (base64 PNG) - 14 passing pytest tests with respx HTTP mocking - ROCm/AMD RX 7900 XTX optimized setup in README - Ollama Linux migration path documented (future)
15 KiB
mcp-image-gen — Implementation Plan
Date: 2026-04-04
Author: Lumen (for Patrick / pplate)
Status: Ready for implementation
Assessment: ASSESSMENT.md
Research Session: 39809470-6ac8-4713-adf2-79ac0eb36ba7
1. Directory Structure
mcp/mcp-image-gen/
├── ASSESSMENT.md ← Architecture assessment (this session)
├── PLAN.md ← This file
├── README.md ← Usage docs, tool table, env vars
├── pyproject.toml ← uv project + deps
├── run.sh ← Launch script (used by .roo/mcp.json)
├── src/
│ ├── __init__.py
│ ├── server.py ← FastMCP server + all tools
│ └── workflows/
│ └── flux_schnell.json ← Minimal ComfyUI API-format workflow
└── tests/
├── __init__.py
├── conftest.py ← sys.path + shared fixtures
└── test_server.py ← All tool tests (mocked ComfyUI)
2. Tool Definitions
2.1 generate_image
@mcp.tool()
async def generate_image(
prompt: str,
width: int = 1024,
height: int = 1024,
steps: int = 4,
model: str = "flux1-schnell.safetensors",
seed: int = -1,
negative_prompt: str = "",
output_dir: str = "",
) -> list:
"""
Generate an image from a text prompt using ComfyUI.
Returns both a file path (for persistence) and an inline base64 image
(for display in Claude / Roo Code chat).
Args:
prompt: Text description of the image to generate.
width: Image width in pixels (default: 1024).
height: Image height in pixels (default: 1024).
steps: Number of inference steps. FLUX.1-schnell works well at 4.
model: ComfyUI model filename (default: flux1-schnell.safetensors).
seed: Random seed for reproducibility. -1 = random.
negative_prompt: Things to exclude from the image (optional).
output_dir: Override output directory. Defaults to IMAGE_OUTPUT_DIR env var
or ~/Pictures/mcp-generated.
Returns:
[TextContent(path + metadata), ImageContent(base64 PNG)]
"""
Return type: list containing:
mcp.types.TextContent— human-readable summary with file path, seed, elapsed timemcp.types.ImageContent—type="image",data=base64_encoded_png,mimeType="image/png"
⚠️ FastMCP 3.x rule: NEVER annotate return as
-> Image(fastmcp utility type). It triggersoutput_schemageneration which breaks the early-return path. Returnmcp.types.ImageContentdirectly as part of alist— it is aContentBlockand passes through cleanly.
2.2 list_available_models
@mcp.tool()
async def list_available_models() -> str:
"""
List all checkpoint models available in ComfyUI.
Returns a newline-separated list of model filenames.
Requires ComfyUI to be running at COMFYUI_URL.
"""
Implementation: GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple → parse
input.required.ckpt_name[0] list → join with newlines.
2.3 get_generation_status
@mcp.tool()
async def get_generation_status(prompt_id: str) -> str:
"""
Check the status of a queued or running generation job.
Args:
prompt_id: The prompt ID returned by a previous generate_image call.
Returns:
Status string: "pending", "running", "completed", or "not_found".
"""
Implementation: GET {COMFYUI_URL}/api/queue → check queue_running and queue_pending
lists for matching prompt_id. If not found in either, check history endpoint.
2.4 get_output_directory
@mcp.tool()
def get_output_directory() -> str:
"""
Return the directory where generated images are saved.
Returns:
Absolute path to the output directory.
"""
Implementation: Resolve IMAGE_OUTPUT_DIR env var or default ~/Pictures/mcp-generated,
expand ~, return as string.
3. ComfyUI Integration
3.1 Workflow: Submit → Poll → Retrieve
generate_image()
│
├── 1. Load flux_schnell.json workflow template
├── 2. Parameterize: inject prompt, width, height, steps, seed, model
├── 3. POST {COMFYUI_URL}/api/prompt → {"prompt_id": "uuid"}
│
├── 4. POLL loop (max 120s, sleep 2s between)
│ GET {COMFYUI_URL}/api/queue
│ → check queue_running[].prompt_id == our id
│ → check queue_pending[].prompt_id == our id
│ → if neither: job is done
│
├── 5. GET {COMFYUI_URL}/api/history/{prompt_id}
│ → find output image filename + subfolder
│
├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output
│ → raw PNG bytes
│
├── 7. Save PNG to output_dir/{timestamp}_{seed}.png
└── 8. Return [TextContent(path + meta), ImageContent(base64)]
3.2 API Endpoints Used
| Endpoint | Method | Purpose |
|---|---|---|
/api/prompt |
POST | Submit workflow for generation |
/api/queue |
GET | Poll queue status (pending + running) |
/api/history/{prompt_id} |
GET | Get completed job output filenames |
/api/view |
GET | Download image bytes by filename |
/object_info/CheckpointLoaderSimple |
GET | List available checkpoint models |
3.3 Error Handling
| Condition | Response |
|---|---|
| ComfyUI unreachable | "ComfyUI not reachable at {url}. Start it with: python main.py --listen" |
| Timeout (>120s) | "Generation timed out after 120s. prompt_id={id} — use get_generation_status to check" |
| ComfyUI returns error in history | Extract and return the error message from history response |
| Invalid model name | ComfyUI returns error in history; surface it clearly |
| Output dir not writable | "Cannot write to output directory: {path}" |
4. Configuration
All configuration via environment variables. No hardcoded paths.
| Variable | Default | Description |
|---|---|---|
COMFYUI_URL |
http://localhost:8188 |
Base URL of running ComfyUI instance |
IMAGE_OUTPUT_DIR |
~/Pictures/mcp-generated |
Where to save generated PNG files |
COMFYUI_TIMEOUT |
120 |
Max seconds to wait for generation (int) |
.roo/mcp.json entry (to be added during implementation):
"mcp-image-gen": {
"command": "uv",
"args": [
"--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
"run", "src/server.py"
],
"env": {
"COMFYUI_URL": "http://localhost:8188",
"IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
}
}
5. pyproject.toml
[project]
name = "mcp-image-gen"
version = "0.1.0"
requires-python = ">=3.11"
description = "MCP server for local AI image generation via ComfyUI"
dependencies = [
"fastmcp>=0.1.0",
"httpx>=0.27.0",
"pillow>=10.0.0",
]
[project.optional-dependencies]
test = [
"pytest>=7.0",
"pytest-mock>=3.0",
"pytest-cov>=4.0",
"pytest-asyncio>=0.23",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.pytest.ini_options]
asyncio_mode = "auto"
Dependency rationale:
fastmcp— MCP frameworkhttpx— async HTTP client for ComfyUI REST APIpillow— validate PNG output, potential future thumbnail generationpytest-asyncio— needed for async tool tests
6. FLUX.1-schnell Workflow JSON
The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image. This is the "API format" (node-graph JSON), not the UI export format.
File: src/workflows/flux_schnell.json
{
"6": {
"class_type": "CLIPTextEncode",
"inputs": {
"clip": ["30", 1],
"text": "PROMPT_PLACEHOLDER"
}
},
"8": {
"class_type": "VAEDecode",
"inputs": {
"samples": ["13", 0],
"vae": ["30", 2]
}
},
"9": {
"class_type": "SaveImage",
"inputs": {
"filename_prefix": "mcp-image-gen",
"images": ["8", 0]
}
},
"13": {
"class_type": "KSampler",
"inputs": {
"cfg": 1.0,
"denoise": 1.0,
"latent_image": ["27", 0],
"model": ["30", 0],
"negative": ["33", 0],
"positive": ["6", 0],
"sampler_name": "euler",
"scheduler": "simple",
"seed": 42,
"steps": 4
}
},
"27": {
"class_type": "EmptySD3LatentImage",
"inputs": {
"batch_size": 1,
"height": 1024,
"width": 1024
}
},
"30": {
"class_type": "CheckpointLoaderSimple",
"inputs": {
"ckpt_name": "flux1-schnell.safetensors"
}
},
"33": {
"class_type": "CLIPTextEncode",
"inputs": {
"clip": ["30", 1],
"text": "NEGATIVE_PLACEHOLDER"
}
}
}
Parameterization at runtime (in server.py):
import json, copy
def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model):
with open(Path(__file__).parent / "workflows/flux_schnell.json") as f:
wf = json.load(f)
wf = copy.deepcopy(wf)
wf["6"]["inputs"]["text"] = prompt
wf["33"]["inputs"]["text"] = negative_prompt
wf["27"]["inputs"]["width"] = width
wf["27"]["inputs"]["height"] = height
wf["13"]["inputs"]["steps"] = steps
wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1)
wf["30"]["inputs"]["ckpt_name"] = model
return wf
7. Testing Strategy
7.1 Test Structure (tests/test_server.py)
All tests mock httpx.AsyncClient — no real ComfyUI needed.
| Test | Description |
|---|---|
test_generate_image_happy_path |
Mock submit → poll done → history → view → returns TextContent + ImageContent |
test_generate_image_comfyui_offline |
httpx.ConnectError → returns clear error string |
test_generate_image_timeout |
Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id |
test_generate_image_saves_file |
Verify PNG written to output_dir with correct filename pattern |
test_generate_image_random_seed |
seed=-1 → seed in output filename is a valid integer |
test_generate_image_custom_params |
Non-default width/height/steps/model passed through to workflow |
test_generate_image_returns_image_content |
Second item in result list is mcp.types.ImageContent with valid base64 |
test_list_available_models_happy_path |
Mock object_info response → returns model name list |
test_list_available_models_offline |
ConnectError → returns error string |
test_get_generation_status_pending |
prompt_id found in queue_pending → "pending" |
test_get_generation_status_running |
prompt_id found in queue_running → "running" |
test_get_generation_status_not_found |
prompt_id not in queue, not in history → "not_found" |
test_get_output_directory_default |
No env var → returns expanded ~/Pictures/mcp-generated |
test_get_output_directory_custom |
IMAGE_OUTPUT_DIR set → returns that path |
test_build_workflow_parameterization |
_build_workflow() injects all params correctly into JSON |
7.2 conftest.py fixtures
import sys
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
@pytest.fixture
def mock_comfyui_submit_response():
return {"prompt_id": "test-uuid-1234"}
@pytest.fixture
def mock_comfyui_queue_empty():
return {"queue_running": [], "queue_pending": []}
@pytest.fixture
def mock_comfyui_history():
return {
"test-uuid-1234": {
"outputs": {
"9": {
"images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}]
}
}
}
}
@pytest.fixture
def sample_png_bytes():
"""Minimal valid 1x1 PNG in bytes."""
import base64
# 1x1 red pixel PNG
data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg=="
return base64.b64decode(data)
7.3 Run command
cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing
8. run.sh
#!/usr/bin/env bash
BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
export PATH="$HOME/.local/bin:$PATH"
# Create output dir if it doesn't exist
OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}"
mkdir -p "$OUTPUT_DIR"
cd "$BASEDIR"
exec uv run src/server.py
9. Future: Ollama Migration Path
When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026):
Adapter pattern (no breaking changes to MCP tool signatures)
BACKEND = os.getenv("IMAGE_BACKEND", "comfyui") # or "ollama"
async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
# current ComfyUI implementation
...
async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
# POST http://localhost:11434/api/generate
# with model=Z-Image-Turbo or FLUX.2-Klein
# width, height, steps in request body
# save returned image path
...
@mcp.tool()
async def generate_image(prompt, width=1024, height=1024, steps=4, ...):
if BACKEND == "ollama":
return await _generate_ollama(...)
return await _generate_comfyui(...)
No changes to: tool signatures, return types, env vars (add IMAGE_BACKEND), tests structure.
10. Implementation Order (for Code mode)
src/workflows/flux_schnell.json— write and validate JSON structurepyproject.toml— set up project + depssrc/__init__.py— emptysrc/server.py— implement all 4 tools +_build_workflow+ polling helperstests/conftest.py— fixtures + sys.pathtests/test_server.py— all 15 testsrun.sh— launch scriptREADME.md— usage docs.roo/mcp.json— wire server in (requires switching to Code or Homelab mode for that file)uv sync && uv run pytest tests/ -v— confirm all tests pass
11. ComfyUI Setup Notes (for README)
These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed:
# Install ComfyUI (ROCm/AMD)
pip install comfyui
# Download FLUX.1-schnell model (~8GB)
# Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors
# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell
# Start ComfyUI with AMD ROCm
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen
# Verify API is running
curl http://localhost:8188/system_stats
The
HSA_OVERRIDE_GFX_VERSION=11.0.0env var may be needed for RX 7900 XTX (gfx1100) to identify correctly to ROCm libraries.