# mcp-image-gen — Implementation Plan

**Date:** 2026-04-04
**Author:** Lumen (for Patrick / pplate)
**Status:** Ready for implementation
**Assessment:** [ASSESSMENT.md](./ASSESSMENT.md)
**Research Session:** `39809470-6ac8-4713-adf2-79ac0eb36ba7`

---

## 1. Directory Structure

```
mcp/mcp-image-gen/
├── ASSESSMENT.md               ← Architecture assessment (this session)
├── PLAN.md                     ← This file
├── README.md                   ← Usage docs, tool table, env vars
├── pyproject.toml              ← uv project + deps
├── run.sh                      ← Launch script (used by .roo/mcp.json)
├── src/
│   ├── __init__.py
│   ├── server.py               ← FastMCP server + all tools
│   └── workflows/
│       └── flux_schnell.json   ← Minimal ComfyUI API-format workflow
└── tests/
    ├── __init__.py
    ├── conftest.py             ← sys.path + shared fixtures
    └── test_server.py          ← All tool tests (mocked ComfyUI)
```

---

## 2. Tool Definitions

### 2.1 `generate_image`

```python
@mcp.tool()
async def generate_image(
    prompt: str,
    width: int = 1024,
    height: int = 1024,
    steps: int = 4,
    model: str = "flux1-schnell.safetensors",
    seed: int = -1,
    negative_prompt: str = "",
    output_dir: str = "",
) -> list:
    """
    Generate an image from a text prompt using ComfyUI.

    Returns both a file path (for persistence) and an inline base64 image
    (for display in Claude / Roo Code chat).

    Args:
        prompt:          Text description of the image to generate.
        width:           Image width in pixels (default: 1024).
        height:          Image height in pixels (default: 1024).
        steps:           Number of inference steps. FLUX.1-schnell works well at 4.
        model:           ComfyUI model filename (default: flux1-schnell.safetensors).
        seed:            Random seed for reproducibility. -1 = random.
        negative_prompt: Things to exclude from the image (optional).
        output_dir:      Override output directory. Defaults to IMAGE_OUTPUT_DIR env var
                         or ~/Pictures/mcp-generated.

    Returns:
        [TextContent(path + metadata), ImageContent(base64 PNG)]
    """
```

**Return type:** `list` containing:
1. `mcp.types.TextContent` — human-readable summary with file path, seed, elapsed time
2. `mcp.types.ImageContent` — `type="image"`, `data=base64_encoded_png`, `mimeType="image/png"`

> ⚠️ **FastMCP 3.x rule:** NEVER annotate return as `-> Image` (fastmcp utility type). It triggers
> `output_schema` generation which breaks the early-return path. Return `mcp.types.ImageContent`
> directly as part of a `list` — it is a `ContentBlock` and passes through cleanly.

---

### 2.2 `list_available_models`

```python
@mcp.tool()
async def list_available_models() -> str:
    """
    List all checkpoint models available in ComfyUI.

    Returns a newline-separated list of model filenames.
    Requires ComfyUI to be running at COMFYUI_URL.
    """
```

**Implementation:** `GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple` → parse
`input.required.ckpt_name[0]` list → join with newlines.

---

### 2.3 `get_generation_status`

```python
@mcp.tool()
async def get_generation_status(prompt_id: str) -> str:
    """
    Check the status of a queued or running generation job.

    Args:
        prompt_id: The prompt ID returned by a previous generate_image call.

    Returns:
        Status string: "pending", "running", "completed", or "not_found".
    """
```

**Implementation:** `GET {COMFYUI_URL}/api/queue` → check `queue_running` and `queue_pending`
lists for matching `prompt_id`. If not found in either, check history endpoint.

---

### 2.4 `get_output_directory`

```python
@mcp.tool()
def get_output_directory() -> str:
    """
    Return the directory where generated images are saved.

    Returns:
        Absolute path to the output directory.
    """
```

**Implementation:** Resolve `IMAGE_OUTPUT_DIR` env var or default `~/Pictures/mcp-generated`,
expand `~`, return as string.

---

## 3. ComfyUI Integration

### 3.1 Workflow: Submit → Poll → Retrieve

```
generate_image()
    │
    ├── 1. Load flux_schnell.json workflow template
    ├── 2. Parameterize: inject prompt, width, height, steps, seed, model
    ├── 3. POST {COMFYUI_URL}/api/prompt  →  {"prompt_id": "uuid"}
    │
    ├── 4. POLL loop (max 120s, sleep 2s between)
    │       GET {COMFYUI_URL}/api/queue
    │       → check queue_running[].prompt_id == our id
    │       → check queue_pending[].prompt_id == our id
    │       → if neither: job is done
    │
    ├── 5. GET {COMFYUI_URL}/api/history/{prompt_id}
    │       → find output image filename + subfolder
    │
    ├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output
    │       → raw PNG bytes
    │
    ├── 7. Save PNG to output_dir/{timestamp}_{seed}.png
    └── 8. Return [TextContent(path + meta), ImageContent(base64)]
```

### 3.2 API Endpoints Used

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/prompt` | POST | Submit workflow for generation |
| `/api/queue` | GET | Poll queue status (pending + running) |
| `/api/history/{prompt_id}` | GET | Get completed job output filenames |
| `/api/view` | GET | Download image bytes by filename |
| `/object_info/CheckpointLoaderSimple` | GET | List available checkpoint models |

### 3.3 Error Handling

| Condition | Response |
|-----------|----------|
| ComfyUI unreachable | `"ComfyUI not reachable at {url}. Start it with: python main.py --listen"` |
| Timeout (>120s) | `"Generation timed out after 120s. prompt_id={id} — use get_generation_status to check"` |
| ComfyUI returns error in history | Extract and return the error message from history response |
| Invalid model name | ComfyUI returns error in history; surface it clearly |
| Output dir not writable | `"Cannot write to output directory: {path}"` |

---

## 4. Configuration

All configuration via environment variables. No hardcoded paths.

| Variable | Default | Description |
|----------|---------|-------------|
| `COMFYUI_URL` | `http://localhost:8188` | Base URL of running ComfyUI instance |
| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Where to save generated PNG files |
| `COMFYUI_TIMEOUT` | `120` | Max seconds to wait for generation (int) |

### `.roo/mcp.json` entry (to be added during implementation):

```json
"mcp-image-gen": {
  "command": "uv",
  "args": [
    "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen",
    "run", "src/server.py"
  ],
  "env": {
    "COMFYUI_URL": "http://localhost:8188",
    "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated"
  }
}
```

---

## 5. `pyproject.toml`

```toml
[project]
name = "mcp-image-gen"
version = "0.1.0"
requires-python = ">=3.11"
description = "MCP server for local AI image generation via ComfyUI"
dependencies = [
    "fastmcp>=0.1.0",
    "httpx>=0.27.0",
    "pillow>=10.0.0",
]

[project.optional-dependencies]
test = [
    "pytest>=7.0",
    "pytest-mock>=3.0",
    "pytest-cov>=4.0",
    "pytest-asyncio>=0.23",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.pytest.ini_options]
asyncio_mode = "auto"
```

**Dependency rationale:**
- `fastmcp` — MCP framework
- `httpx` — async HTTP client for ComfyUI REST API
- `pillow` — validate PNG output, potential future thumbnail generation
- `pytest-asyncio` — needed for async tool tests

---

## 6. FLUX.1-schnell Workflow JSON

The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image.
This is the "API format" (node-graph JSON), not the UI export format.

File: `src/workflows/flux_schnell.json`

```json
{
  "6": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "clip": ["30", 1],
      "text": "PROMPT_PLACEHOLDER"
    }
  },
  "8": {
    "class_type": "VAEDecode",
    "inputs": {
      "samples": ["13", 0],
      "vae": ["30", 2]
    }
  },
  "9": {
    "class_type": "SaveImage",
    "inputs": {
      "filename_prefix": "mcp-image-gen",
      "images": ["8", 0]
    }
  },
  "13": {
    "class_type": "KSampler",
    "inputs": {
      "cfg": 1.0,
      "denoise": 1.0,
      "latent_image": ["27", 0],
      "model": ["30", 0],
      "negative": ["33", 0],
      "positive": ["6", 0],
      "sampler_name": "euler",
      "scheduler": "simple",
      "seed": 42,
      "steps": 4
    }
  },
  "27": {
    "class_type": "EmptySD3LatentImage",
    "inputs": {
      "batch_size": 1,
      "height": 1024,
      "width": 1024
    }
  },
  "30": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {
      "ckpt_name": "flux1-schnell.safetensors"
    }
  },
  "33": {
    "class_type": "CLIPTextEncode",
    "inputs": {
      "clip": ["30", 1],
      "text": "NEGATIVE_PLACEHOLDER"
    }
  }
}
```

**Parameterization at runtime** (in `server.py`):

```python
import json, copy

def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model):
    with open(Path(__file__).parent / "workflows/flux_schnell.json") as f:
        wf = json.load(f)
    wf = copy.deepcopy(wf)
    wf["6"]["inputs"]["text"] = prompt
    wf["33"]["inputs"]["text"] = negative_prompt
    wf["27"]["inputs"]["width"] = width
    wf["27"]["inputs"]["height"] = height
    wf["13"]["inputs"]["steps"] = steps
    wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1)
    wf["30"]["inputs"]["ckpt_name"] = model
    return wf
```

---

## 7. Testing Strategy

### 7.1 Test Structure (`tests/test_server.py`)

All tests mock `httpx.AsyncClient` — no real ComfyUI needed.

| Test | Description |
|------|-------------|
| `test_generate_image_happy_path` | Mock submit → poll done → history → view → returns TextContent + ImageContent |
| `test_generate_image_comfyui_offline` | httpx.ConnectError → returns clear error string |
| `test_generate_image_timeout` | Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id |
| `test_generate_image_saves_file` | Verify PNG written to output_dir with correct filename pattern |
| `test_generate_image_random_seed` | seed=-1 → seed in output filename is a valid integer |
| `test_generate_image_custom_params` | Non-default width/height/steps/model passed through to workflow |
| `test_generate_image_returns_image_content` | Second item in result list is `mcp.types.ImageContent` with valid base64 |
| `test_list_available_models_happy_path` | Mock object_info response → returns model name list |
| `test_list_available_models_offline` | ConnectError → returns error string |
| `test_get_generation_status_pending` | prompt_id found in queue_pending → "pending" |
| `test_get_generation_status_running` | prompt_id found in queue_running → "running" |
| `test_get_generation_status_not_found` | prompt_id not in queue, not in history → "not_found" |
| `test_get_output_directory_default` | No env var → returns expanded ~/Pictures/mcp-generated |
| `test_get_output_directory_custom` | IMAGE_OUTPUT_DIR set → returns that path |
| `test_build_workflow_parameterization` | _build_workflow() injects all params correctly into JSON |

### 7.2 conftest.py fixtures

```python
import sys
from pathlib import Path
import pytest

sys.path.insert(0, str(Path(__file__).parent.parent / "src"))

@pytest.fixture
def mock_comfyui_submit_response():
    return {"prompt_id": "test-uuid-1234"}

@pytest.fixture
def mock_comfyui_queue_empty():
    return {"queue_running": [], "queue_pending": []}

@pytest.fixture
def mock_comfyui_history():
    return {
        "test-uuid-1234": {
            "outputs": {
                "9": {
                    "images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}]
                }
            }
        }
    }

@pytest.fixture
def sample_png_bytes():
    """Minimal valid 1x1 PNG in bytes."""
    import base64
    # 1x1 red pixel PNG
    data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg=="
    return base64.b64decode(data)
```

### 7.3 Run command

```bash
cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing
```

---

## 8. `run.sh`

```bash
#!/usr/bin/env bash
BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
export PATH="$HOME/.local/bin:$PATH"

# Create output dir if it doesn't exist
OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}"
mkdir -p "$OUTPUT_DIR"

cd "$BASEDIR"
exec uv run src/server.py
```

---

## 9. Future: Ollama Migration Path

When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026):

### Adapter pattern (no breaking changes to MCP tool signatures)

```python
BACKEND = os.getenv("IMAGE_BACKEND", "comfyui")  # or "ollama"

async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
    # current ComfyUI implementation
    ...

async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir):
    # POST http://localhost:11434/api/generate
    # with model=Z-Image-Turbo or FLUX.2-Klein
    # width, height, steps in request body
    # save returned image path
    ...

@mcp.tool()
async def generate_image(prompt, width=1024, height=1024, steps=4, ...):
    if BACKEND == "ollama":
        return await _generate_ollama(...)
    return await _generate_comfyui(...)
```

**No changes to:** tool signatures, return types, env vars (add `IMAGE_BACKEND`), tests structure.

---

## 10. Implementation Order (for Code mode)

1. `src/workflows/flux_schnell.json` — write and validate JSON structure
2. `pyproject.toml` — set up project + deps
3. `src/__init__.py` — empty
4. `src/server.py` — implement all 4 tools + `_build_workflow` + polling helpers
5. `tests/conftest.py` — fixtures + sys.path
6. `tests/test_server.py` — all 15 tests
7. `run.sh` — launch script
8. `README.md` — usage docs
9. `.roo/mcp.json` — wire server in (requires switching to Code or Homelab mode for that file)
10. `uv sync && uv run pytest tests/ -v` — confirm all tests pass

---

## 11. ComfyUI Setup Notes (for README)

These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed:

```bash
# Install ComfyUI (ROCm/AMD)
pip install comfyui

# Download FLUX.1-schnell model (~8GB)
# Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors
# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell

# Start ComfyUI with AMD ROCm
HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen

# Verify API is running
curl http://localhost:8188/system_stats
```

> The `HSA_OVERRIDE_GFX_VERSION=11.0.0` env var may be needed for RX 7900 XTX (gfx1100)
> to identify correctly to ROCm libraries.