# mcp-image-gen — Implementation Plan **Date:** 2026-04-04 **Author:** Lumen (for Patrick / pplate) **Status:** Ready for implementation **Assessment:** [ASSESSMENT.md](./ASSESSMENT.md) **Research Session:** `39809470-6ac8-4713-adf2-79ac0eb36ba7` --- ## 1. Directory Structure ``` mcp/mcp-image-gen/ ├── ASSESSMENT.md ← Architecture assessment (this session) ├── PLAN.md ← This file ├── README.md ← Usage docs, tool table, env vars ├── pyproject.toml ← uv project + deps ├── run.sh ← Launch script (used by .roo/mcp.json) ├── src/ │ ├── __init__.py │ ├── server.py ← FastMCP server + all tools │ └── workflows/ │ └── flux_schnell.json ← Minimal ComfyUI API-format workflow └── tests/ ├── __init__.py ├── conftest.py ← sys.path + shared fixtures └── test_server.py ← All tool tests (mocked ComfyUI) ``` --- ## 2. Tool Definitions ### 2.1 `generate_image` ```python @mcp.tool() async def generate_image( prompt: str, width: int = 1024, height: int = 1024, steps: int = 4, model: str = "flux1-schnell.safetensors", seed: int = -1, negative_prompt: str = "", output_dir: str = "", ) -> list: """ Generate an image from a text prompt using ComfyUI. Returns both a file path (for persistence) and an inline base64 image (for display in Claude / Roo Code chat). Args: prompt: Text description of the image to generate. width: Image width in pixels (default: 1024). height: Image height in pixels (default: 1024). steps: Number of inference steps. FLUX.1-schnell works well at 4. model: ComfyUI model filename (default: flux1-schnell.safetensors). seed: Random seed for reproducibility. -1 = random. negative_prompt: Things to exclude from the image (optional). output_dir: Override output directory. Defaults to IMAGE_OUTPUT_DIR env var or ~/Pictures/mcp-generated. Returns: [TextContent(path + metadata), ImageContent(base64 PNG)] """ ``` **Return type:** `list` containing: 1. `mcp.types.TextContent` — human-readable summary with file path, seed, elapsed time 2. `mcp.types.ImageContent` — `type="image"`, `data=base64_encoded_png`, `mimeType="image/png"` > ⚠️ **FastMCP 3.x rule:** NEVER annotate return as `-> Image` (fastmcp utility type). It triggers > `output_schema` generation which breaks the early-return path. Return `mcp.types.ImageContent` > directly as part of a `list` — it is a `ContentBlock` and passes through cleanly. --- ### 2.2 `list_available_models` ```python @mcp.tool() async def list_available_models() -> str: """ List all checkpoint models available in ComfyUI. Returns a newline-separated list of model filenames. Requires ComfyUI to be running at COMFYUI_URL. """ ``` **Implementation:** `GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple` → parse `input.required.ckpt_name[0]` list → join with newlines. --- ### 2.3 `get_generation_status` ```python @mcp.tool() async def get_generation_status(prompt_id: str) -> str: """ Check the status of a queued or running generation job. Args: prompt_id: The prompt ID returned by a previous generate_image call. Returns: Status string: "pending", "running", "completed", or "not_found". """ ``` **Implementation:** `GET {COMFYUI_URL}/api/queue` → check `queue_running` and `queue_pending` lists for matching `prompt_id`. If not found in either, check history endpoint. --- ### 2.4 `get_output_directory` ```python @mcp.tool() def get_output_directory() -> str: """ Return the directory where generated images are saved. Returns: Absolute path to the output directory. """ ``` **Implementation:** Resolve `IMAGE_OUTPUT_DIR` env var or default `~/Pictures/mcp-generated`, expand `~`, return as string. --- ## 3. ComfyUI Integration ### 3.1 Workflow: Submit → Poll → Retrieve ``` generate_image() │ ├── 1. Load flux_schnell.json workflow template ├── 2. Parameterize: inject prompt, width, height, steps, seed, model ├── 3. POST {COMFYUI_URL}/api/prompt → {"prompt_id": "uuid"} │ ├── 4. POLL loop (max 120s, sleep 2s between) │ GET {COMFYUI_URL}/api/queue │ → check queue_running[].prompt_id == our id │ → check queue_pending[].prompt_id == our id │ → if neither: job is done │ ├── 5. GET {COMFYUI_URL}/api/history/{prompt_id} │ → find output image filename + subfolder │ ├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output │ → raw PNG bytes │ ├── 7. Save PNG to output_dir/{timestamp}_{seed}.png └── 8. Return [TextContent(path + meta), ImageContent(base64)] ``` ### 3.2 API Endpoints Used | Endpoint | Method | Purpose | |----------|--------|---------| | `/api/prompt` | POST | Submit workflow for generation | | `/api/queue` | GET | Poll queue status (pending + running) | | `/api/history/{prompt_id}` | GET | Get completed job output filenames | | `/api/view` | GET | Download image bytes by filename | | `/object_info/CheckpointLoaderSimple` | GET | List available checkpoint models | ### 3.3 Error Handling | Condition | Response | |-----------|----------| | ComfyUI unreachable | `"ComfyUI not reachable at {url}. Start it with: python main.py --listen"` | | Timeout (>120s) | `"Generation timed out after 120s. prompt_id={id} — use get_generation_status to check"` | | ComfyUI returns error in history | Extract and return the error message from history response | | Invalid model name | ComfyUI returns error in history; surface it clearly | | Output dir not writable | `"Cannot write to output directory: {path}"` | --- ## 4. Configuration All configuration via environment variables. No hardcoded paths. | Variable | Default | Description | |----------|---------|-------------| | `COMFYUI_URL` | `http://localhost:8188` | Base URL of running ComfyUI instance | | `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Where to save generated PNG files | | `COMFYUI_TIMEOUT` | `120` | Max seconds to wait for generation (int) | ### `.roo/mcp.json` entry (to be added during implementation): ```json "mcp-image-gen": { "command": "uv", "args": [ "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", "run", "src/server.py" ], "env": { "COMFYUI_URL": "http://localhost:8188", "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated" } } ``` --- ## 5. `pyproject.toml` ```toml [project] name = "mcp-image-gen" version = "0.1.0" requires-python = ">=3.11" description = "MCP server for local AI image generation via ComfyUI" dependencies = [ "fastmcp>=0.1.0", "httpx>=0.27.0", "pillow>=10.0.0", ] [project.optional-dependencies] test = [ "pytest>=7.0", "pytest-mock>=3.0", "pytest-cov>=4.0", "pytest-asyncio>=0.23", ] [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [tool.pytest.ini_options] asyncio_mode = "auto" ``` **Dependency rationale:** - `fastmcp` — MCP framework - `httpx` — async HTTP client for ComfyUI REST API - `pillow` — validate PNG output, potential future thumbnail generation - `pytest-asyncio` — needed for async tool tests --- ## 6. FLUX.1-schnell Workflow JSON The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image. This is the "API format" (node-graph JSON), not the UI export format. File: `src/workflows/flux_schnell.json` ```json { "6": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["30", 1], "text": "PROMPT_PLACEHOLDER" } }, "8": { "class_type": "VAEDecode", "inputs": { "samples": ["13", 0], "vae": ["30", 2] } }, "9": { "class_type": "SaveImage", "inputs": { "filename_prefix": "mcp-image-gen", "images": ["8", 0] } }, "13": { "class_type": "KSampler", "inputs": { "cfg": 1.0, "denoise": 1.0, "latent_image": ["27", 0], "model": ["30", 0], "negative": ["33", 0], "positive": ["6", 0], "sampler_name": "euler", "scheduler": "simple", "seed": 42, "steps": 4 } }, "27": { "class_type": "EmptySD3LatentImage", "inputs": { "batch_size": 1, "height": 1024, "width": 1024 } }, "30": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "flux1-schnell.safetensors" } }, "33": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["30", 1], "text": "NEGATIVE_PLACEHOLDER" } } } ``` **Parameterization at runtime** (in `server.py`): ```python import json, copy def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model): with open(Path(__file__).parent / "workflows/flux_schnell.json") as f: wf = json.load(f) wf = copy.deepcopy(wf) wf["6"]["inputs"]["text"] = prompt wf["33"]["inputs"]["text"] = negative_prompt wf["27"]["inputs"]["width"] = width wf["27"]["inputs"]["height"] = height wf["13"]["inputs"]["steps"] = steps wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1) wf["30"]["inputs"]["ckpt_name"] = model return wf ``` --- ## 7. Testing Strategy ### 7.1 Test Structure (`tests/test_server.py`) All tests mock `httpx.AsyncClient` — no real ComfyUI needed. | Test | Description | |------|-------------| | `test_generate_image_happy_path` | Mock submit → poll done → history → view → returns TextContent + ImageContent | | `test_generate_image_comfyui_offline` | httpx.ConnectError → returns clear error string | | `test_generate_image_timeout` | Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id | | `test_generate_image_saves_file` | Verify PNG written to output_dir with correct filename pattern | | `test_generate_image_random_seed` | seed=-1 → seed in output filename is a valid integer | | `test_generate_image_custom_params` | Non-default width/height/steps/model passed through to workflow | | `test_generate_image_returns_image_content` | Second item in result list is `mcp.types.ImageContent` with valid base64 | | `test_list_available_models_happy_path` | Mock object_info response → returns model name list | | `test_list_available_models_offline` | ConnectError → returns error string | | `test_get_generation_status_pending` | prompt_id found in queue_pending → "pending" | | `test_get_generation_status_running` | prompt_id found in queue_running → "running" | | `test_get_generation_status_not_found` | prompt_id not in queue, not in history → "not_found" | | `test_get_output_directory_default` | No env var → returns expanded ~/Pictures/mcp-generated | | `test_get_output_directory_custom` | IMAGE_OUTPUT_DIR set → returns that path | | `test_build_workflow_parameterization` | _build_workflow() injects all params correctly into JSON | ### 7.2 conftest.py fixtures ```python import sys from pathlib import Path import pytest sys.path.insert(0, str(Path(__file__).parent.parent / "src")) @pytest.fixture def mock_comfyui_submit_response(): return {"prompt_id": "test-uuid-1234"} @pytest.fixture def mock_comfyui_queue_empty(): return {"queue_running": [], "queue_pending": []} @pytest.fixture def mock_comfyui_history(): return { "test-uuid-1234": { "outputs": { "9": { "images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}] } } } } @pytest.fixture def sample_png_bytes(): """Minimal valid 1x1 PNG in bytes.""" import base64 # 1x1 red pixel PNG data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg==" return base64.b64decode(data) ``` ### 7.3 Run command ```bash cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing ``` --- ## 8. `run.sh` ```bash #!/usr/bin/env bash BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" export PATH="$HOME/.local/bin:$PATH" # Create output dir if it doesn't exist OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}" mkdir -p "$OUTPUT_DIR" cd "$BASEDIR" exec uv run src/server.py ``` --- ## 9. Future: Ollama Migration Path When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026): ### Adapter pattern (no breaking changes to MCP tool signatures) ```python BACKEND = os.getenv("IMAGE_BACKEND", "comfyui") # or "ollama" async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir): # current ComfyUI implementation ... async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir): # POST http://localhost:11434/api/generate # with model=Z-Image-Turbo or FLUX.2-Klein # width, height, steps in request body # save returned image path ... @mcp.tool() async def generate_image(prompt, width=1024, height=1024, steps=4, ...): if BACKEND == "ollama": return await _generate_ollama(...) return await _generate_comfyui(...) ``` **No changes to:** tool signatures, return types, env vars (add `IMAGE_BACKEND`), tests structure. --- ## 10. Implementation Order (for Code mode) 1. `src/workflows/flux_schnell.json` — write and validate JSON structure 2. `pyproject.toml` — set up project + deps 3. `src/__init__.py` — empty 4. `src/server.py` — implement all 4 tools + `_build_workflow` + polling helpers 5. `tests/conftest.py` — fixtures + sys.path 6. `tests/test_server.py` — all 15 tests 7. `run.sh` — launch script 8. `README.md` — usage docs 9. `.roo/mcp.json` — wire server in (requires switching to Code or Homelab mode for that file) 10. `uv sync && uv run pytest tests/ -v` — confirm all tests pass --- ## 11. ComfyUI Setup Notes (for README) These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed: ```bash # Install ComfyUI (ROCm/AMD) pip install comfyui # Download FLUX.1-schnell model (~8GB) # Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors # Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell # Start ComfyUI with AMD ROCm HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen # Verify API is running curl http://localhost:8188/system_stats ``` > The `HSA_OVERRIDE_GFX_VERSION=11.0.0` env var may be needed for RX 7900 XTX (gfx1100) > to identify correctly to ROCm libraries.