Files
pi_mcps/mcp/mcp-image-gen/PLAN.md
T
Patrick Plate 8112ff2f12 feat(mcp-image-gen): scaffold ComfyUI-backed image generation MCP server
- FastMCP server with 4 tools: generate_image, list_available_models,
  get_generation_status, get_output_directory
- ComfyUI REST API client (httpx) polling lifecycle
- FLUX.1-schnell workflow JSON template
- Dual output: TextContent (path + seed) + ImageContent (base64 PNG)
- 14 passing pytest tests with respx HTTP mocking
- ROCm/AMD RX 7900 XTX optimized setup in README
- Ollama Linux migration path documented (future)
2026-04-04 11:49:31 +02:00

15 KiB

mcp-image-gen — Implementation Plan

Date: 2026-04-04 Author: Lumen (for Patrick / pplate) Status: Ready for implementation Assessment: ASSESSMENT.md Research Session: 39809470-6ac8-4713-adf2-79ac0eb36ba7


1. Directory Structure

mcp/mcp-image-gen/
├── ASSESSMENT.md               ← Architecture assessment (this session)
├── PLAN.md                     ← This file
├── README.md                   ← Usage docs, tool table, env vars
├── pyproject.toml              ← uv project + deps
├── run.sh                      ← Launch script (used by .roo/mcp.json)
├── src/
│   ├── __init__.py
│   ├── server.py               ← FastMCP server + all tools
│   └── workflows/
│       └── flux_schnell.json   ← Minimal ComfyUI API-format workflow
└── tests/
    ├── __init__.py
    ├── conftest.py             ← sys.path + shared fixtures
    └── test_server.py          ← All tool tests (mocked ComfyUI)

2. Tool Definitions

2.1 generate_image

@mcp.tool()
async def generate_image(
    prompt: str,
    width: int = 1024,
    height: int = 1024,
    steps: int = 4,
    model: str = "flux1-schnell.safetensors",
    seed: int = -1,
    negative_prompt: str = "",
    output_dir: str = "",
) -> list:
    """
    Generate an image from a text prompt using ComfyUI.

    Returns both a file path (for persistence) and an inline base64 image
    (for display in Claude / Roo Code chat).

    Args:
        prompt:          Text description of the image to generate.
        width:           Image width in pixels (default: 1024).
        height:          Image height in pixels (default: 1024).
        steps:           Number of inference steps. FLUX.1-schnell works well at 4.
        model:           ComfyUI model filename (default: flux1-schnell.safetensors).
        seed:            Random seed for reproducibility. -1 = random.
        negative_prompt: Things to exclude from the image (optional).
        output_dir:      Override output directory. Defaults to IMAGE_OUTPUT_DIR env var
                         or ~/Pictures/mcp-generated.

    Returns:
        [TextContent(path + metadata), ImageContent(base64 PNG)]
    """

Return type: list containing:

  1. mcp.types.TextContent — human-readable summary with file path, seed, elapsed time
  2. mcp.types.ImageContenttype="image", data=base64_encoded_png, mimeType="image/png"

⚠️ FastMCP 3.x rule: NEVER annotate return as -> Image (fastmcp utility type). It triggers output_schema generation which breaks the early-return path. Return mcp.types.ImageContent directly as part of a list — it is a ContentBlock and passes through cleanly.


2.2 list_available_models

@mcp.tool()
async def list_available_models() -> str:
    """
    List all checkpoint models available in ComfyUI.

    Returns a newline-separated list of model filenames.
    Requires ComfyUI to be running at COMFYUI_URL.
    """

Implementation: GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple → parse input.required.ckpt_name[0] list → join with newlines.


2.3 get_generation_status

@mcp.tool()
async def get_generation_status(prompt_id: str) -> str:
    """
    Check the status of a queued or running generation job.

    Args:
        prompt_id: The prompt ID returned by a previous generate_image call.

    Returns:
        Status string: "pending", "running", "completed", or "not_found".
    """

Implementation: GET {COMFYUI_URL}/api/queue → check queue_running and queue_pending lists for matching prompt_id. If not found in either, check history endpoint.


2.4 get_output_directory

@mcp.tool()
def get_output_directory() -> str:
    """
    Return the directory where generated images are saved.

    Returns:
        Absolute path to the output directory.
    """

Implementation: Resolve IMAGE_OUTPUT_DIR env var or default ~/Pictures/mcp-generated, expand ~, return as string.


3. ComfyUI Integration

3.1 Workflow: Submit → Poll → Retrieve

generate_image()
    │
    ├── 1. Load flux_schnell.json workflow template
    ├── 2. Parameterize: inject prompt, width, height, steps, seed, model
    ├── 3. POST {COMFYUI_URL}/api/prompt  →  {"prompt_id": "uuid"}
    │
    ├── 4. POLL loop (max 120s, sleep 2s between)
    │       GET {COMFYUI_URL}/api/queue
    │       → check queue_running[].prompt_id == our id
    │       → check queue_pending[].prompt_id == our id
    │       → if neither: job is done
    │
    ├── 5. GET {COMFYUI_URL}/api/history/{prompt_id}
    │       → find output image filename + subfolder
    │
    ├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output
    │       → raw PNG bytes
    │
    ├── 7. Save PNG to output_dir/{timestamp}_{seed}.png
    └── 8. Return [TextContent(path + meta), ImageContent(base64)]

3.2 API Endpoints Used

Endpoint Method Purpose
/api/prompt POST Submit workflow for generation
/api/queue GET Poll queue status (pending + running)
/api/history/{prompt_id} GET Get completed job output filenames
/api/view GET Download image bytes by filename
/object_info/CheckpointLoaderSimple GET List available checkpoint models

3.3 Error Handling

Condition Response
ComfyUI unreachable "ComfyUI not reachable at {url}. Start it with: python main.py --listen"
Timeout (>120s) "Generation timed out after 120s. prompt_id={id} — use get_generation_status to check"
ComfyUI returns error in history Extract and return the error message from history response
Invalid model name ComfyUI returns error in history; surface it clearly
Output dir not writable "Cannot write to output directory: {path}"

4. Configuration

All configuration via environment variables. No hardcoded paths.

Variable Default Description
COMFYUI_URL http://localhost:8188 Base URL of running ComfyUI instance
IMAGE_OUTPUT_DIR ~/Pictures/mcp-generated Where to save generated PNG files
COMFYUI_TIMEOUT 120 Max seconds to wait for generation (int)

.roo/mcp.json entry (to be added during implementation):

"mcp-image-gen": {
  "command": "uv",
  "args": [
    "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
    "run", "src/server.py"
  ],
  "env": {
    "COMFYUI_URL": "http://localhost:8188",
    "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
  }
}

5. pyproject.toml

[project]
name = "mcp-image-gen"
version = "0.1.0"
requires-python = ">=3.11"
description = "MCP server for local AI image generation via ComfyUI"
dependencies = [
    "fastmcp>=0.1.0",
    "httpx>=0.27.0",
    "pillow>=10.0.0",
]

[project.optional-dependencies]
test = [
    "pytest>=7.0",
    "pytest-mock>=3.0",
    "pytest-cov>=4.0",
    "pytest-asyncio>=0.23",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.pytest.ini_options]
asyncio_mode = "auto"

Dependency rationale:

  • fastmcp — MCP framework
  • httpx — async HTTP client for ComfyUI REST API
  • pillow — validate PNG output, potential future thumbnail generation
  • pytest-asyncio — needed for async tool tests

6. FLUX.1-schnell Workflow JSON

The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image. This is the "API format" (node-graph JSON), not the UI export format.

File: src/workflows/flux_schnell.json

{
  "6": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "clip": ["30", 1],
      "text": "PROMPT_PLACEHOLDER"
    }
  },
  "8": {
    "class_type": "VAEDecode",
    "inputs": {
      "samples": ["13", 0],
      "vae": ["30", 2]
    }
  },
  "9": {
    "class_type": "SaveImage",
    "inputs": {
      "filename_prefix": "mcp-image-gen",
      "images": ["8", 0]
    }
  },
  "13": {
    "class_type": "KSampler",
    "inputs": {
      "cfg": 1.0,
      "denoise": 1.0,
      "latent_image": ["27", 0],
      "model": ["30", 0],
      "negative": ["33", 0],
      "positive": ["6", 0],
      "sampler_name": "euler",
      "scheduler": "simple",
      "seed": 42,
      "steps": 4
    }
  },
  "27": {
    "class_type": "EmptySD3LatentImage",
    "inputs": {
      "batch_size": 1,
      "height": 1024,
      "width": 1024
    }
  },
  "30": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
      "ckpt_name": "flux1-schnell.safetensors"
    }
  },
  "33": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "clip": ["30", 1],
      "text": "NEGATIVE_PLACEHOLDER"
    }
  }
}

Parameterization at runtime (in server.py):

import json, copy

def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model):
    with open(Path(__file__).parent / "workflows/flux_schnell.json") as f:
        wf = json.load(f)
    wf = copy.deepcopy(wf)
    wf["6"]["inputs"]["text"] = prompt
    wf["33"]["inputs"]["text"] = negative_prompt
    wf["27"]["inputs"]["width"] = width
    wf["27"]["inputs"]["height"] = height
    wf["13"]["inputs"]["steps"] = steps
    wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1)
    wf["30"]["inputs"]["ckpt_name"] = model
    return wf

7. Testing Strategy

7.1 Test Structure (tests/test_server.py)

All tests mock httpx.AsyncClient — no real ComfyUI needed.

Test Description
test_generate_image_happy_path Mock submit → poll done → history → view → returns TextContent + ImageContent
test_generate_image_comfyui_offline httpx.ConnectError → returns clear error string
test_generate_image_timeout Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id
test_generate_image_saves_file Verify PNG written to output_dir with correct filename pattern
test_generate_image_random_seed seed=-1 → seed in output filename is a valid integer
test_generate_image_custom_params Non-default width/height/steps/model passed through to workflow
test_generate_image_returns_image_content Second item in result list is mcp.types.ImageContent with valid base64
test_list_available_models_happy_path Mock object_info response → returns model name list
test_list_available_models_offline ConnectError → returns error string
test_get_generation_status_pending prompt_id found in queue_pending → "pending"
test_get_generation_status_running prompt_id found in queue_running → "running"
test_get_generation_status_not_found prompt_id not in queue, not in history → "not_found"
test_get_output_directory_default No env var → returns expanded ~/Pictures/mcp-generated
test_get_output_directory_custom IMAGE_OUTPUT_DIR set → returns that path
test_build_workflow_parameterization _build_workflow() injects all params correctly into JSON

7.2 conftest.py fixtures

import sys
from pathlib import Path
import pytest

sys.path.insert(0, str(Path(__file__).parent.parent / "src"))

@pytest.fixture
def mock_comfyui_submit_response():
    return {"prompt_id": "test-uuid-1234"}

@pytest.fixture
def mock_comfyui_queue_empty():
    return {"queue_running": [], "queue_pending": []}

@pytest.fixture
def mock_comfyui_history():
    return {
        "test-uuid-1234": {
            "outputs": {
                "9": {
                    "images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}]
                }
            }
        }
    }

@pytest.fixture
def sample_png_bytes():
    """Minimal valid 1x1 PNG in bytes."""
    import base64
    # 1x1 red pixel PNG
    data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg=="
    return base64.b64decode(data)

7.3 Run command

cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing

8. run.sh

#!/usr/bin/env bash
BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
export PATH="$HOME/.local/bin:$PATH"

# Create output dir if it doesn't exist
OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}"
mkdir -p "$OUTPUT_DIR"

cd "$BASEDIR"
exec uv run src/server.py

9. Future: Ollama Migration Path

When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026):

Adapter pattern (no breaking changes to MCP tool signatures)

BACKEND = os.getenv("IMAGE_BACKEND", "comfyui")  # or "ollama"

async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
    # current ComfyUI implementation
    ...

async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
    # POST http://localhost:11434/api/generate
    # with model=Z-Image-Turbo or FLUX.2-Klein
    # width, height, steps in request body
    # save returned image path
    ...

@mcp.tool()
async def generate_image(prompt, width=1024, height=1024, steps=4, ...):
    if BACKEND == "ollama":
        return await _generate_ollama(...)
    return await _generate_comfyui(...)

No changes to: tool signatures, return types, env vars (add IMAGE_BACKEND), tests structure.


10. Implementation Order (for Code mode)

  1. src/workflows/flux_schnell.json — write and validate JSON structure
  2. pyproject.toml — set up project + deps
  3. src/__init__.py — empty
  4. src/server.py — implement all 4 tools + _build_workflow + polling helpers
  5. tests/conftest.py — fixtures + sys.path
  6. tests/test_server.py — all 15 tests
  7. run.sh — launch script
  8. README.md — usage docs
  9. .roo/mcp.json — wire server in (requires switching to Code or Homelab mode for that file)
  10. uv sync && uv run pytest tests/ -v — confirm all tests pass

11. ComfyUI Setup Notes (for README)

These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed:

# Install ComfyUI (ROCm/AMD)
pip install comfyui

# Download FLUX.1-schnell model (~8GB)
# Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors
# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell

# Start ComfyUI with AMD ROCm
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen

# Verify API is running
curl http://localhost:8188/system_stats

The HSA_OVERRIDE_GFX_VERSION=11.0.0 env var may be needed for RX 7900 XTX (gfx1100) to identify correctly to ROCm libraries.