8112ff2f12
- FastMCP server with 4 tools: generate_image, list_available_models, get_generation_status, get_output_directory - ComfyUI REST API client (httpx) polling lifecycle - FLUX.1-schnell workflow JSON template - Dual output: TextContent (path + seed) + ImageContent (base64 PNG) - 14 passing pytest tests with respx HTTP mocking - ROCm/AMD RX 7900 XTX optimized setup in README - Ollama Linux migration path documented (future)
497 lines
15 KiB
Markdown
497 lines
15 KiB
Markdown
# mcp-image-gen — Implementation Plan
|
|
|
|
**Date:** 2026-04-04
|
|
**Author:** Lumen (for Patrick / pplate)
|
|
**Status:** Ready for implementation
|
|
**Assessment:** [ASSESSMENT.md](./ASSESSMENT.md)
|
|
**Research Session:** `39809470-6ac8-4713-adf2-79ac0eb36ba7`
|
|
|
|
---
|
|
|
|
## 1. Directory Structure
|
|
|
|
```
|
|
mcp/mcp-image-gen/
|
|
├── ASSESSMENT.md ← Architecture assessment (this session)
|
|
├── PLAN.md ← This file
|
|
├── README.md ← Usage docs, tool table, env vars
|
|
├── pyproject.toml ← uv project + deps
|
|
├── run.sh ← Launch script (used by .roo/mcp.json)
|
|
├── src/
|
|
│ ├── __init__.py
|
|
│ ├── server.py ← FastMCP server + all tools
|
|
│ └── workflows/
|
|
│ └── flux_schnell.json ← Minimal ComfyUI API-format workflow
|
|
└── tests/
|
|
├── __init__.py
|
|
├── conftest.py ← sys.path + shared fixtures
|
|
└── test_server.py ← All tool tests (mocked ComfyUI)
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Tool Definitions
|
|
|
|
### 2.1 `generate_image`
|
|
|
|
```python
|
|
@mcp.tool()
|
|
async def generate_image(
|
|
prompt: str,
|
|
width: int = 1024,
|
|
height: int = 1024,
|
|
steps: int = 4,
|
|
model: str = "flux1-schnell.safetensors",
|
|
seed: int = -1,
|
|
negative_prompt: str = "",
|
|
output_dir: str = "",
|
|
) -> list:
|
|
"""
|
|
Generate an image from a text prompt using ComfyUI.
|
|
|
|
Returns both a file path (for persistence) and an inline base64 image
|
|
(for display in Claude / Roo Code chat).
|
|
|
|
Args:
|
|
prompt: Text description of the image to generate.
|
|
width: Image width in pixels (default: 1024).
|
|
height: Image height in pixels (default: 1024).
|
|
steps: Number of inference steps. FLUX.1-schnell works well at 4.
|
|
model: ComfyUI model filename (default: flux1-schnell.safetensors).
|
|
seed: Random seed for reproducibility. -1 = random.
|
|
negative_prompt: Things to exclude from the image (optional).
|
|
output_dir: Override output directory. Defaults to IMAGE_OUTPUT_DIR env var
|
|
or ~/Pictures/mcp-generated.
|
|
|
|
Returns:
|
|
[TextContent(path + metadata), ImageContent(base64 PNG)]
|
|
"""
|
|
```
|
|
|
|
**Return type:** `list` containing:
|
|
1. `mcp.types.TextContent` — human-readable summary with file path, seed, elapsed time
|
|
2. `mcp.types.ImageContent` — `type="image"`, `data=base64_encoded_png`, `mimeType="image/png"`
|
|
|
|
> ⚠️ **FastMCP 3.x rule:** NEVER annotate return as `-> Image` (fastmcp utility type). It triggers
|
|
> `output_schema` generation which breaks the early-return path. Return `mcp.types.ImageContent`
|
|
> directly as part of a `list` — it is a `ContentBlock` and passes through cleanly.
|
|
|
|
---
|
|
|
|
### 2.2 `list_available_models`
|
|
|
|
```python
|
|
@mcp.tool()
|
|
async def list_available_models() -> str:
|
|
"""
|
|
List all checkpoint models available in ComfyUI.
|
|
|
|
Returns a newline-separated list of model filenames.
|
|
Requires ComfyUI to be running at COMFYUI_URL.
|
|
"""
|
|
```
|
|
|
|
**Implementation:** `GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple` → parse
|
|
`input.required.ckpt_name[0]` list → join with newlines.
|
|
|
|
---
|
|
|
|
### 2.3 `get_generation_status`
|
|
|
|
```python
|
|
@mcp.tool()
|
|
async def get_generation_status(prompt_id: str) -> str:
|
|
"""
|
|
Check the status of a queued or running generation job.
|
|
|
|
Args:
|
|
prompt_id: The prompt ID returned by a previous generate_image call.
|
|
|
|
Returns:
|
|
Status string: "pending", "running", "completed", or "not_found".
|
|
"""
|
|
```
|
|
|
|
**Implementation:** `GET {COMFYUI_URL}/api/queue` → check `queue_running` and `queue_pending`
|
|
lists for matching `prompt_id`. If not found in either, check history endpoint.
|
|
|
|
---
|
|
|
|
### 2.4 `get_output_directory`
|
|
|
|
```python
|
|
@mcp.tool()
|
|
def get_output_directory() -> str:
|
|
"""
|
|
Return the directory where generated images are saved.
|
|
|
|
Returns:
|
|
Absolute path to the output directory.
|
|
"""
|
|
```
|
|
|
|
**Implementation:** Resolve `IMAGE_OUTPUT_DIR` env var or default `~/Pictures/mcp-generated`,
|
|
expand `~`, return as string.
|
|
|
|
---
|
|
|
|
## 3. ComfyUI Integration
|
|
|
|
### 3.1 Workflow: Submit → Poll → Retrieve
|
|
|
|
```
|
|
generate_image()
|
|
│
|
|
├── 1. Load flux_schnell.json workflow template
|
|
├── 2. Parameterize: inject prompt, width, height, steps, seed, model
|
|
├── 3. POST {COMFYUI_URL}/api/prompt → {"prompt_id": "uuid"}
|
|
│
|
|
├── 4. POLL loop (max 120s, sleep 2s between)
|
|
│ GET {COMFYUI_URL}/api/queue
|
|
│ → check queue_running[].prompt_id == our id
|
|
│ → check queue_pending[].prompt_id == our id
|
|
│ → if neither: job is done
|
|
│
|
|
├── 5. GET {COMFYUI_URL}/api/history/{prompt_id}
|
|
│ → find output image filename + subfolder
|
|
│
|
|
├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output
|
|
│ → raw PNG bytes
|
|
│
|
|
├── 7. Save PNG to output_dir/{timestamp}_{seed}.png
|
|
└── 8. Return [TextContent(path + meta), ImageContent(base64)]
|
|
```
|
|
|
|
### 3.2 API Endpoints Used
|
|
|
|
| Endpoint | Method | Purpose |
|
|
|----------|--------|---------|
|
|
| `/api/prompt` | POST | Submit workflow for generation |
|
|
| `/api/queue` | GET | Poll queue status (pending + running) |
|
|
| `/api/history/{prompt_id}` | GET | Get completed job output filenames |
|
|
| `/api/view` | GET | Download image bytes by filename |
|
|
| `/object_info/CheckpointLoaderSimple` | GET | List available checkpoint models |
|
|
|
|
### 3.3 Error Handling
|
|
|
|
| Condition | Response |
|
|
|-----------|----------|
|
|
| ComfyUI unreachable | `"ComfyUI not reachable at {url}. Start it with: python main.py --listen"` |
|
|
| Timeout (>120s) | `"Generation timed out after 120s. prompt_id={id} — use get_generation_status to check"` |
|
|
| ComfyUI returns error in history | Extract and return the error message from history response |
|
|
| Invalid model name | ComfyUI returns error in history; surface it clearly |
|
|
| Output dir not writable | `"Cannot write to output directory: {path}"` |
|
|
|
|
---
|
|
|
|
## 4. Configuration
|
|
|
|
All configuration via environment variables. No hardcoded paths.
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `COMFYUI_URL` | `http://localhost:8188` | Base URL of running ComfyUI instance |
|
|
| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Where to save generated PNG files |
|
|
| `COMFYUI_TIMEOUT` | `120` | Max seconds to wait for generation (int) |
|
|
|
|
### `.roo/mcp.json` entry (to be added during implementation):
|
|
|
|
```json
|
|
"mcp-image-gen": {
|
|
"command": "uv",
|
|
"args": [
|
|
"--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
|
|
"run", "src/server.py"
|
|
],
|
|
"env": {
|
|
"COMFYUI_URL": "http://localhost:8188",
|
|
"IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. `pyproject.toml`
|
|
|
|
```toml
|
|
[project]
|
|
name = "mcp-image-gen"
|
|
version = "0.1.0"
|
|
requires-python = ">=3.11"
|
|
description = "MCP server for local AI image generation via ComfyUI"
|
|
dependencies = [
|
|
"fastmcp>=0.1.0",
|
|
"httpx>=0.27.0",
|
|
"pillow>=10.0.0",
|
|
]
|
|
|
|
[project.optional-dependencies]
|
|
test = [
|
|
"pytest>=7.0",
|
|
"pytest-mock>=3.0",
|
|
"pytest-cov>=4.0",
|
|
"pytest-asyncio>=0.23",
|
|
]
|
|
|
|
[build-system]
|
|
requires = ["hatchling"]
|
|
build-backend = "hatchling.build"
|
|
|
|
[tool.pytest.ini_options]
|
|
asyncio_mode = "auto"
|
|
```
|
|
|
|
**Dependency rationale:**
|
|
- `fastmcp` — MCP framework
|
|
- `httpx` — async HTTP client for ComfyUI REST API
|
|
- `pillow` — validate PNG output, potential future thumbnail generation
|
|
- `pytest-asyncio` — needed for async tool tests
|
|
|
|
---
|
|
|
|
## 6. FLUX.1-schnell Workflow JSON
|
|
|
|
The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image.
|
|
This is the "API format" (node-graph JSON), not the UI export format.
|
|
|
|
File: `src/workflows/flux_schnell.json`
|
|
|
|
```json
|
|
{
|
|
"6": {
|
|
"class_type": "CLIPTextEncode",
|
|
"inputs": {
|
|
"clip": ["30", 1],
|
|
"text": "PROMPT_PLACEHOLDER"
|
|
}
|
|
},
|
|
"8": {
|
|
"class_type": "VAEDecode",
|
|
"inputs": {
|
|
"samples": ["13", 0],
|
|
"vae": ["30", 2]
|
|
}
|
|
},
|
|
"9": {
|
|
"class_type": "SaveImage",
|
|
"inputs": {
|
|
"filename_prefix": "mcp-image-gen",
|
|
"images": ["8", 0]
|
|
}
|
|
},
|
|
"13": {
|
|
"class_type": "KSampler",
|
|
"inputs": {
|
|
"cfg": 1.0,
|
|
"denoise": 1.0,
|
|
"latent_image": ["27", 0],
|
|
"model": ["30", 0],
|
|
"negative": ["33", 0],
|
|
"positive": ["6", 0],
|
|
"sampler_name": "euler",
|
|
"scheduler": "simple",
|
|
"seed": 42,
|
|
"steps": 4
|
|
}
|
|
},
|
|
"27": {
|
|
"class_type": "EmptySD3LatentImage",
|
|
"inputs": {
|
|
"batch_size": 1,
|
|
"height": 1024,
|
|
"width": 1024
|
|
}
|
|
},
|
|
"30": {
|
|
"class_type": "CheckpointLoaderSimple",
|
|
"inputs": {
|
|
"ckpt_name": "flux1-schnell.safetensors"
|
|
}
|
|
},
|
|
"33": {
|
|
"class_type": "CLIPTextEncode",
|
|
"inputs": {
|
|
"clip": ["30", 1],
|
|
"text": "NEGATIVE_PLACEHOLDER"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Parameterization at runtime** (in `server.py`):
|
|
|
|
```python
|
|
import json, copy
|
|
|
|
def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model):
|
|
with open(Path(__file__).parent / "workflows/flux_schnell.json") as f:
|
|
wf = json.load(f)
|
|
wf = copy.deepcopy(wf)
|
|
wf["6"]["inputs"]["text"] = prompt
|
|
wf["33"]["inputs"]["text"] = negative_prompt
|
|
wf["27"]["inputs"]["width"] = width
|
|
wf["27"]["inputs"]["height"] = height
|
|
wf["13"]["inputs"]["steps"] = steps
|
|
wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1)
|
|
wf["30"]["inputs"]["ckpt_name"] = model
|
|
return wf
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Testing Strategy
|
|
|
|
### 7.1 Test Structure (`tests/test_server.py`)
|
|
|
|
All tests mock `httpx.AsyncClient` — no real ComfyUI needed.
|
|
|
|
| Test | Description |
|
|
|------|-------------|
|
|
| `test_generate_image_happy_path` | Mock submit → poll done → history → view → returns TextContent + ImageContent |
|
|
| `test_generate_image_comfyui_offline` | httpx.ConnectError → returns clear error string |
|
|
| `test_generate_image_timeout` | Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id |
|
|
| `test_generate_image_saves_file` | Verify PNG written to output_dir with correct filename pattern |
|
|
| `test_generate_image_random_seed` | seed=-1 → seed in output filename is a valid integer |
|
|
| `test_generate_image_custom_params` | Non-default width/height/steps/model passed through to workflow |
|
|
| `test_generate_image_returns_image_content` | Second item in result list is `mcp.types.ImageContent` with valid base64 |
|
|
| `test_list_available_models_happy_path` | Mock object_info response → returns model name list |
|
|
| `test_list_available_models_offline` | ConnectError → returns error string |
|
|
| `test_get_generation_status_pending` | prompt_id found in queue_pending → "pending" |
|
|
| `test_get_generation_status_running` | prompt_id found in queue_running → "running" |
|
|
| `test_get_generation_status_not_found` | prompt_id not in queue, not in history → "not_found" |
|
|
| `test_get_output_directory_default` | No env var → returns expanded ~/Pictures/mcp-generated |
|
|
| `test_get_output_directory_custom` | IMAGE_OUTPUT_DIR set → returns that path |
|
|
| `test_build_workflow_parameterization` | _build_workflow() injects all params correctly into JSON |
|
|
|
|
### 7.2 conftest.py fixtures
|
|
|
|
```python
|
|
import sys
|
|
from pathlib import Path
|
|
import pytest
|
|
|
|
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
|
|
|
|
@pytest.fixture
|
|
def mock_comfyui_submit_response():
|
|
return {"prompt_id": "test-uuid-1234"}
|
|
|
|
@pytest.fixture
|
|
def mock_comfyui_queue_empty():
|
|
return {"queue_running": [], "queue_pending": []}
|
|
|
|
@pytest.fixture
|
|
def mock_comfyui_history():
|
|
return {
|
|
"test-uuid-1234": {
|
|
"outputs": {
|
|
"9": {
|
|
"images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
@pytest.fixture
|
|
def sample_png_bytes():
|
|
"""Minimal valid 1x1 PNG in bytes."""
|
|
import base64
|
|
# 1x1 red pixel PNG
|
|
data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg=="
|
|
return base64.b64decode(data)
|
|
```
|
|
|
|
### 7.3 Run command
|
|
|
|
```bash
|
|
cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing
|
|
```
|
|
|
|
---
|
|
|
|
## 8. `run.sh`
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
|
export PATH="$HOME/.local/bin:$PATH"
|
|
|
|
# Create output dir if it doesn't exist
|
|
OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}"
|
|
mkdir -p "$OUTPUT_DIR"
|
|
|
|
cd "$BASEDIR"
|
|
exec uv run src/server.py
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Future: Ollama Migration Path
|
|
|
|
When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026):
|
|
|
|
### Adapter pattern (no breaking changes to MCP tool signatures)
|
|
|
|
```python
|
|
BACKEND = os.getenv("IMAGE_BACKEND", "comfyui") # or "ollama"
|
|
|
|
async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
|
|
# current ComfyUI implementation
|
|
...
|
|
|
|
async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
|
|
# POST http://localhost:11434/api/generate
|
|
# with model=Z-Image-Turbo or FLUX.2-Klein
|
|
# width, height, steps in request body
|
|
# save returned image path
|
|
...
|
|
|
|
@mcp.tool()
|
|
async def generate_image(prompt, width=1024, height=1024, steps=4, ...):
|
|
if BACKEND == "ollama":
|
|
return await _generate_ollama(...)
|
|
return await _generate_comfyui(...)
|
|
```
|
|
|
|
**No changes to:** tool signatures, return types, env vars (add `IMAGE_BACKEND`), tests structure.
|
|
|
|
---
|
|
|
|
## 10. Implementation Order (for Code mode)
|
|
|
|
1. `src/workflows/flux_schnell.json` — write and validate JSON structure
|
|
2. `pyproject.toml` — set up project + deps
|
|
3. `src/__init__.py` — empty
|
|
4. `src/server.py` — implement all 4 tools + `_build_workflow` + polling helpers
|
|
5. `tests/conftest.py` — fixtures + sys.path
|
|
6. `tests/test_server.py` — all 15 tests
|
|
7. `run.sh` — launch script
|
|
8. `README.md` — usage docs
|
|
9. `.roo/mcp.json` — wire server in (requires switching to Code or Homelab mode for that file)
|
|
10. `uv sync && uv run pytest tests/ -v` — confirm all tests pass
|
|
|
|
---
|
|
|
|
## 11. ComfyUI Setup Notes (for README)
|
|
|
|
These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed:
|
|
|
|
```bash
|
|
# Install ComfyUI (ROCm/AMD)
|
|
pip install comfyui
|
|
|
|
# Download FLUX.1-schnell model (~8GB)
|
|
# Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors
|
|
# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell
|
|
|
|
# Start ComfyUI with AMD ROCm
|
|
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen
|
|
|
|
# Verify API is running
|
|
curl http://localhost:8188/system_stats
|
|
```
|
|
|
|
> The `HSA_OVERRIDE_GFX_VERSION=11.0.0` env var may be needed for RX 7900 XTX (gfx1100)
|
|
> to identify correctly to ROCm libraries.
|