feat(mcp-image-gen): scaffold ComfyUI-backed image generation MCP server

- FastMCP server with 4 tools: generate_image, list_available_models,
  get_generation_status, get_output_directory
- ComfyUI REST API client (httpx) polling lifecycle
- FLUX.1-schnell workflow JSON template
- Dual output: TextContent (path + seed) + ImageContent (base64 PNG)
- 14 passing pytest tests with respx HTTP mocking
- ROCm/AMD RX 7900 XTX optimized setup in README
- Ollama Linux migration path documented (future)
This commit is contained in:
Patrick Plate
2026-04-04 11:49:31 +02:00
parent ba7d4bc248
commit 8112ff2f12
11 changed files with 1748 additions and 0 deletions
+496
View File
@@ -0,0 +1,496 @@
# mcp-image-gen — Implementation Plan
**Date:** 2026-04-04
**Author:** Lumen (for Patrick / pplate)
**Status:** Ready for implementation
**Assessment:** [ASSESSMENT.md](./ASSESSMENT.md)
**Research Session:** `39809470-6ac8-4713-adf2-79ac0eb36ba7`
---
## 1. Directory Structure
```
mcp/mcp-image-gen/
├── ASSESSMENT.md ← Architecture assessment (this session)
├── PLAN.md ← This file
├── README.md ← Usage docs, tool table, env vars
├── pyproject.toml ← uv project + deps
├── run.sh ← Launch script (used by .roo/mcp.json)
├── src/
│ ├── __init__.py
│ ├── server.py ← FastMCP server + all tools
│ └── workflows/
│ └── flux_schnell.json ← Minimal ComfyUI API-format workflow
└── tests/
├── __init__.py
├── conftest.py ← sys.path + shared fixtures
└── test_server.py ← All tool tests (mocked ComfyUI)
```
---
## 2. Tool Definitions
### 2.1 `generate_image`
```python
@mcp.tool()
async def generate_image(
prompt: str,
width: int = 1024,
height: int = 1024,
steps: int = 4,
model: str = "flux1-schnell.safetensors",
seed: int = -1,
negative_prompt: str = "",
output_dir: str = "",
) -> list:
"""
Generate an image from a text prompt using ComfyUI.
Returns both a file path (for persistence) and an inline base64 image
(for display in Claude / Roo Code chat).
Args:
prompt: Text description of the image to generate.
width: Image width in pixels (default: 1024).
height: Image height in pixels (default: 1024).
steps: Number of inference steps. FLUX.1-schnell works well at 4.
model: ComfyUI model filename (default: flux1-schnell.safetensors).
seed: Random seed for reproducibility. -1 = random.
negative_prompt: Things to exclude from the image (optional).
output_dir: Override output directory. Defaults to IMAGE_OUTPUT_DIR env var
or ~/Pictures/mcp-generated.
Returns:
[TextContent(path + metadata), ImageContent(base64 PNG)]
"""
```
**Return type:** `list` containing:
1. `mcp.types.TextContent` — human-readable summary with file path, seed, elapsed time
2. `mcp.types.ImageContent``type="image"`, `data=base64_encoded_png`, `mimeType="image/png"`
> ⚠️ **FastMCP 3.x rule:** NEVER annotate return as `-> Image` (fastmcp utility type). It triggers
> `output_schema` generation which breaks the early-return path. Return `mcp.types.ImageContent`
> directly as part of a `list` — it is a `ContentBlock` and passes through cleanly.
---
### 2.2 `list_available_models`
```python
@mcp.tool()
async def list_available_models() -> str:
"""
List all checkpoint models available in ComfyUI.
Returns a newline-separated list of model filenames.
Requires ComfyUI to be running at COMFYUI_URL.
"""
```
**Implementation:** `GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple` → parse
`input.required.ckpt_name[0]` list → join with newlines.
---
### 2.3 `get_generation_status`
```python
@mcp.tool()
async def get_generation_status(prompt_id: str) -> str:
"""
Check the status of a queued or running generation job.
Args:
prompt_id: The prompt ID returned by a previous generate_image call.
Returns:
Status string: "pending", "running", "completed", or "not_found".
"""
```
**Implementation:** `GET {COMFYUI_URL}/api/queue` → check `queue_running` and `queue_pending`
lists for matching `prompt_id`. If not found in either, check history endpoint.
---
### 2.4 `get_output_directory`
```python
@mcp.tool()
def get_output_directory() -> str:
"""
Return the directory where generated images are saved.
Returns:
Absolute path to the output directory.
"""
```
**Implementation:** Resolve `IMAGE_OUTPUT_DIR` env var or default `~/Pictures/mcp-generated`,
expand `~`, return as string.
---
## 3. ComfyUI Integration
### 3.1 Workflow: Submit → Poll → Retrieve
```
generate_image()
├── 1. Load flux_schnell.json workflow template
├── 2. Parameterize: inject prompt, width, height, steps, seed, model
├── 3. POST {COMFYUI_URL}/api/prompt → {"prompt_id": "uuid"}
├── 4. POLL loop (max 120s, sleep 2s between)
│ GET {COMFYUI_URL}/api/queue
│ → check queue_running[].prompt_id == our id
│ → check queue_pending[].prompt_id == our id
│ → if neither: job is done
├── 5. GET {COMFYUI_URL}/api/history/{prompt_id}
│ → find output image filename + subfolder
├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output
│ → raw PNG bytes
├── 7. Save PNG to output_dir/{timestamp}_{seed}.png
└── 8. Return [TextContent(path + meta), ImageContent(base64)]
```
### 3.2 API Endpoints Used
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/prompt` | POST | Submit workflow for generation |
| `/api/queue` | GET | Poll queue status (pending + running) |
| `/api/history/{prompt_id}` | GET | Get completed job output filenames |
| `/api/view` | GET | Download image bytes by filename |
| `/object_info/CheckpointLoaderSimple` | GET | List available checkpoint models |
### 3.3 Error Handling
| Condition | Response |
|-----------|----------|
| ComfyUI unreachable | `"ComfyUI not reachable at {url}. Start it with: python main.py --listen"` |
| Timeout (>120s) | `"Generation timed out after 120s. prompt_id={id} — use get_generation_status to check"` |
| ComfyUI returns error in history | Extract and return the error message from history response |
| Invalid model name | ComfyUI returns error in history; surface it clearly |
| Output dir not writable | `"Cannot write to output directory: {path}"` |
---
## 4. Configuration
All configuration via environment variables. No hardcoded paths.
| Variable | Default | Description |
|----------|---------|-------------|
| `COMFYUI_URL` | `http://localhost:8188` | Base URL of running ComfyUI instance |
| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Where to save generated PNG files |
| `COMFYUI_TIMEOUT` | `120` | Max seconds to wait for generation (int) |
### `.roo/mcp.json` entry (to be added during implementation):
```json
"mcp-image-gen": {
"command": "uv",
"args": [
"--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
"run", "src/server.py"
],
"env": {
"COMFYUI_URL": "http://localhost:8188",
"IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
}
}
```
---
## 5. `pyproject.toml`
```toml
[project]
name = "mcp-image-gen"
version = "0.1.0"
requires-python = ">=3.11"
description = "MCP server for local AI image generation via ComfyUI"
dependencies = [
"fastmcp>=0.1.0",
"httpx>=0.27.0",
"pillow>=10.0.0",
]
[project.optional-dependencies]
test = [
"pytest>=7.0",
"pytest-mock>=3.0",
"pytest-cov>=4.0",
"pytest-asyncio>=0.23",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.pytest.ini_options]
asyncio_mode = "auto"
```
**Dependency rationale:**
- `fastmcp` — MCP framework
- `httpx` — async HTTP client for ComfyUI REST API
- `pillow` — validate PNG output, potential future thumbnail generation
- `pytest-asyncio` — needed for async tool tests
---
## 6. FLUX.1-schnell Workflow JSON
The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image.
This is the "API format" (node-graph JSON), not the UI export format.
File: `src/workflows/flux_schnell.json`
```json
{
"6": {
"class_type": "CLIPTextEncode",
"inputs": {
"clip": ["30", 1],
"text": "PROMPT_PLACEHOLDER"
}
},
"8": {
"class_type": "VAEDecode",
"inputs": {
"samples": ["13", 0],
"vae": ["30", 2]
}
},
"9": {
"class_type": "SaveImage",
"inputs": {
"filename_prefix": "mcp-image-gen",
"images": ["8", 0]
}
},
"13": {
"class_type": "KSampler",
"inputs": {
"cfg": 1.0,
"denoise": 1.0,
"latent_image": ["27", 0],
"model": ["30", 0],
"negative": ["33", 0],
"positive": ["6", 0],
"sampler_name": "euler",
"scheduler": "simple",
"seed": 42,
"steps": 4
}
},
"27": {
"class_type": "EmptySD3LatentImage",
"inputs": {
"batch_size": 1,
"height": 1024,
"width": 1024
}
},
"30": {
"class_type": "CheckpointLoaderSimple",
"inputs": {
"ckpt_name": "flux1-schnell.safetensors"
}
},
"33": {
"class_type": "CLIPTextEncode",
"inputs": {
"clip": ["30", 1],
"text": "NEGATIVE_PLACEHOLDER"
}
}
}
```
**Parameterization at runtime** (in `server.py`):
```python
import json, copy
def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model):
with open(Path(__file__).parent / "workflows/flux_schnell.json") as f:
wf = json.load(f)
wf = copy.deepcopy(wf)
wf["6"]["inputs"]["text"] = prompt
wf["33"]["inputs"]["text"] = negative_prompt
wf["27"]["inputs"]["width"] = width
wf["27"]["inputs"]["height"] = height
wf["13"]["inputs"]["steps"] = steps
wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1)
wf["30"]["inputs"]["ckpt_name"] = model
return wf
```
---
## 7. Testing Strategy
### 7.1 Test Structure (`tests/test_server.py`)
All tests mock `httpx.AsyncClient` — no real ComfyUI needed.
| Test | Description |
|------|-------------|
| `test_generate_image_happy_path` | Mock submit → poll done → history → view → returns TextContent + ImageContent |
| `test_generate_image_comfyui_offline` | httpx.ConnectError → returns clear error string |
| `test_generate_image_timeout` | Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id |
| `test_generate_image_saves_file` | Verify PNG written to output_dir with correct filename pattern |
| `test_generate_image_random_seed` | seed=-1 → seed in output filename is a valid integer |
| `test_generate_image_custom_params` | Non-default width/height/steps/model passed through to workflow |
| `test_generate_image_returns_image_content` | Second item in result list is `mcp.types.ImageContent` with valid base64 |
| `test_list_available_models_happy_path` | Mock object_info response → returns model name list |
| `test_list_available_models_offline` | ConnectError → returns error string |
| `test_get_generation_status_pending` | prompt_id found in queue_pending → "pending" |
| `test_get_generation_status_running` | prompt_id found in queue_running → "running" |
| `test_get_generation_status_not_found` | prompt_id not in queue, not in history → "not_found" |
| `test_get_output_directory_default` | No env var → returns expanded ~/Pictures/mcp-generated |
| `test_get_output_directory_custom` | IMAGE_OUTPUT_DIR set → returns that path |
| `test_build_workflow_parameterization` | _build_workflow() injects all params correctly into JSON |
### 7.2 conftest.py fixtures
```python
import sys
from pathlib import Path
import pytest
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
@pytest.fixture
def mock_comfyui_submit_response():
return {"prompt_id": "test-uuid-1234"}
@pytest.fixture
def mock_comfyui_queue_empty():
return {"queue_running": [], "queue_pending": []}
@pytest.fixture
def mock_comfyui_history():
return {
"test-uuid-1234": {
"outputs": {
"9": {
"images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}]
}
}
}
}
@pytest.fixture
def sample_png_bytes():
"""Minimal valid 1x1 PNG in bytes."""
import base64
# 1x1 red pixel PNG
data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg=="
return base64.b64decode(data)
```
### 7.3 Run command
```bash
cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing
```
---
## 8. `run.sh`
```bash
#!/usr/bin/env bash
BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
export PATH="$HOME/.local/bin:$PATH"
# Create output dir if it doesn't exist
OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}"
mkdir -p "$OUTPUT_DIR"
cd "$BASEDIR"
exec uv run src/server.py
```
---
## 9. Future: Ollama Migration Path
When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026):
### Adapter pattern (no breaking changes to MCP tool signatures)
```python
BACKEND = os.getenv("IMAGE_BACKEND", "comfyui") # or "ollama"
async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
# current ComfyUI implementation
...
async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
# POST http://localhost:11434/api/generate
# with model=Z-Image-Turbo or FLUX.2-Klein
# width, height, steps in request body
# save returned image path
...
@mcp.tool()
async def generate_image(prompt, width=1024, height=1024, steps=4, ...):
if BACKEND == "ollama":
return await _generate_ollama(...)
return await _generate_comfyui(...)
```
**No changes to:** tool signatures, return types, env vars (add `IMAGE_BACKEND`), tests structure.
---
## 10. Implementation Order (for Code mode)
1. `src/workflows/flux_schnell.json` — write and validate JSON structure
2. `pyproject.toml` — set up project + deps
3. `src/__init__.py` — empty
4. `src/server.py` — implement all 4 tools + `_build_workflow` + polling helpers
5. `tests/conftest.py` — fixtures + sys.path
6. `tests/test_server.py` — all 15 tests
7. `run.sh` — launch script
8. `README.md` — usage docs
9. `.roo/mcp.json` — wire server in (requires switching to Code or Homelab mode for that file)
10. `uv sync && uv run pytest tests/ -v` — confirm all tests pass
---
## 11. ComfyUI Setup Notes (for README)
These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed:
```bash
# Install ComfyUI (ROCm/AMD)
pip install comfyui
# Download FLUX.1-schnell model (~8GB)
# Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors
# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell
# Start ComfyUI with AMD ROCm
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen
# Verify API is running
curl http://localhost:8188/system_stats
```
> The `HSA_OVERRIDE_GFX_VERSION=11.0.0` env var may be needed for RX 7900 XTX (gfx1100)
> to identify correctly to ROCm libraries.