- FastMCP server with 4 tools: generate_image, list_available_models, get_generation_status, get_output_directory - ComfyUI REST API client (httpx) polling lifecycle - FLUX.1-schnell workflow JSON template - Dual output: TextContent (path + seed) + ImageContent (base64 PNG) - 14 passing pytest tests with respx HTTP mocking - ROCm/AMD RX 7900 XTX optimized setup in README - Ollama Linux migration path documented (future)
11 KiB
mcp-image-gen — Architecture Assessment
Date: 2026-04-04
Author: Lumen (for Patrick / pplate)
Status: ✅ APPROVED — ready for implementation
BigMind Research Session: 39809470-6ac8-4713-adf2-79ac0eb36ba7
1. Problem Statement
LLM agents (Claude, local models via Ollama) have no native ability to generate images. While language models excel at text, creative and technical workflows increasingly need image output — concept art, diagrams, product mockups, illustrations — all driven by a text prompt.
A FastMCP wrapper around a local image generation backend would give any MCP-capable IDE or agent the ability to produce images on demand, with full control over resolution, steps, model, and seed — without sending data to external cloud APIs.
Gap being filled: Local AI image generation accessible to LLM agents via MCP protocol, running entirely on Patrick's AMD RX 7900 XTX (24GB VRAM) with ROCm.
2. Requirements
2.1 Functional Requirements
| ID | Requirement |
|---|---|
| F-1 | Generate an image from a text prompt |
| F-2 | Support configurable resolution (width × height) |
| F-3 | Support configurable inference steps and seed for reproducibility |
| F-4 | Support negative prompts to exclude unwanted content |
| F-5 | List available models from the backend |
| F-6 | Check the status of an in-progress generation job |
| F-7 | Return generated image as both a file path AND inline base64 for agent display |
| F-8 | Configure output directory for saved images |
| F-9 | Support FLUX.1-schnell as the default model |
2.2 Non-Functional Requirements
| ID | Requirement |
|---|---|
| NF-1 | Generation time < 30 seconds for FLUX.1-schnell at 1024×1024, 4 steps |
| NF-2 | VRAM footprint < 12GB (leaves headroom on 24GB for Ollama co-existence) |
| NF-3 | Must work on AMD ROCm — no CUDA-only dependencies in the MCP server layer |
| NF-4 | No cloud API calls — fully local execution |
| NF-5 | Graceful error messages when ComfyUI is not running |
| NF-6 | MCP tools must work with FastMCP and be discoverable by Claude / Roo Code |
3. Technology Decision
3.1 Candidate Backends
| Backend | Stars | ROCm | REST API | FLUX Support | Verdict |
|---|---|---|---|---|---|
| ComfyUI | 108k | ✅ Native | ✅ localhost:8188 | ✅ FLUX.1-schnell, FLUX.1-dev | ✅ CHOSEN |
| stable-diffusion.cpp | ~15k | ✅ ROCm/Vulkan | ❌ CLI only | ✅ FLUX.1-schnell | ⚠️ Viable alternative |
| PyTorch + diffusers | — | ✅ ROCm 7.2.1 | ❌ No REST | ✅ All models | ❌ Too complex to manage |
| Ollama image gen | — | ❌ Linux: N/A | ✅ /api/generate | ✅ FLUX.2, Z-Image | ❌ macOS-only as of April 2026 |
| A1111 / Forge WebUI | — | ⚠️ Limited | ✅ :7860 | ❌ SDXL primary | ❌ Not FLUX-native |
3.2 Why ComfyUI
- ROCm native — ComfyUI's PyTorch backend runs on AMD GPUs via ROCm without forks or patches.
- REST API — ComfyUI exposes a stable HTTP API at
localhost:8188making it trivially wrappable withhttpx. No subprocess management or binary spawning needed. - Workflow-based — ComfyUI workflows are JSON graphs. The MCP server ships a minimal FLUX.1-schnell workflow that can be parameterized with prompt, size, steps, seed at runtime.
- Model ecosystem — ComfyUI's model manager supports FLUX.1, SDXL, SD3.5, ControlNet, LoRA — giving a future-proof upgrade path.
- Community size — 108k GitHub stars; extensive community support, model nodes, extensions.
- VRAM efficiency — FLUX.1-schnell requires ~8GB VRAM. Patrick's 24GB card runs it comfortably alongside Ollama.
3.3 Why NOT the Alternatives
- Ollama: Definitively blocked on Linux until further notice. No ETA for Linux image gen.
- stable-diffusion.cpp: CLI-based only — the MCP server would need to manage a subprocess, parse stdout, handle crashes. More fragile than an HTTP API.
- PyTorch + diffusers direct: Requires managing Python environments, device placement, model loading, memory management inside the MCP server process — adds significant complexity and risk of VRAM conflicts.
4. Architecture Decision
4.1 System Overview
┌─────────────────────────────────────────────────────────┐
│ LLM Agent (Claude / Roo Code / local Ollama) │
└───────────────────────────┬─────────────────────────────┘
│ MCP Protocol (stdio)
┌───────────────────────────▼─────────────────────────────┐
│ mcp-image-gen (FastMCP Python server) │
│ │
│ Tools: │
│ • generate_image(prompt, width, height, steps, ...) │
│ • list_available_models() │
│ • get_generation_status(prompt_id) │
│ • get_output_directory() │
└───────────────────────────┬─────────────────────────────┘
│ HTTP REST (httpx)
┌───────────────────────────▼─────────────────────────────┐
│ ComfyUI (localhost:8188) │
│ AMD ROCm + PyTorch │
│ FLUX.1-schnell model │
└─────────────────────────────────────────────────────────┘
│
┌───────▼───────┐
│ ~/Pictures/ │
│ mcp-generated│
└───────────────┘
4.2 Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| HTTP client | httpx (async) |
Already used in webscraper; async-friendly; clean timeout handling |
| Image return | dual: path + base64 | File path for persistence; base64 ImageContent for inline Claude display |
| ImageContent type | mcp.types.ImageContent |
FastMCP 3.x: never use fastmcp.utilities.types.Image with -> Image annotation — it breaks serialization. Return ImageContent directly as a ContentBlock. |
| Job polling | loop with sleep | ComfyUI /api/queue returns pending/running/done status; poll until done or timeout |
| Workflow format | ComfyUI API JSON | Minimal FLUX.1-schnell graph parameterized at runtime |
| Config | env vars | COMFYUI_URL, IMAGE_OUTPUT_DIR — no hardcoded paths |
| Output naming | {timestamp}_{seed}.png |
Reproducible, collision-free, sortable |
5. Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| ComfyUI not running when tool is called | High | High | Return clear error: "ComfyUI not reachable at {url}. Start with: python main.py --listen" |
| Generation timeout (>60s) | Medium | Medium | Configurable timeout; return partial status message with prompt_id so agent can poll manually |
| VRAM contention with Ollama | Medium | Medium | FLUX.1-schnell uses ~8GB; 24GB card has 16GB headroom. Document that running both simultaneously may compete at >8GB Ollama model sizes |
| ROCm driver instability | Low | High | ComfyUI falls back to CPU if ROCm unavailable — slow but functional. Document ROCm setup. |
| ComfyUI API changes | Low | Medium | Pin ComfyUI version in setup docs; the /api/prompt, /api/queue, /api/view endpoints are stable |
| Large output files | Low | Low | PNG default; add optional JPEG quality param in v2 |
| Malformed workflow JSON | Low | High | Ship a tested, minimal FLUX.1-schnell workflow; validate before submit |
6. Alternatives Considered
6.1 Ollama (Blocked)
Ollama added image generation in January 2026 (Z-Image Turbo, FLUX.2 Klein) but the feature is macOS-only as of April 2026. Linux support is listed as "coming soon" with no ETA. This was the originally preferred path (uniform API with text generation), but it is not viable on Fedora Linux today.
Migration path: When Ollama Linux image gen ships, a thin backend adapter can be added to
mcp-image-gen so it routes to Ollama instead of ComfyUI — same MCP tool signatures, different
HTTP target.
6.2 stable-diffusion.cpp
DiffuGen MCP server uses this approach. Requires:
- Building sd.cpp with ROCm/Vulkan flags
- Spawning a subprocess and parsing CLI output
- No REST API — process management in Python
Viable but more fragile than ComfyUI's HTTP API. Chosen only if ComfyUI proves unworkable.
6.3 diffusers (Python library, direct)
Would run diffusion pipeline inside the MCP server process. Problems:
- MCP server process cannot easily share GPU memory with Ollama
- Model loading adds 5-15s cold start to every MCP invocation
- Complex device placement / fp16 / ROCm configuration in server code
- Risk: VRAM OOM crashes the MCP server process entirely
7. Success Criteria
| Criterion | Measure |
|---|---|
generate_image returns a valid PNG |
File exists on disk, base64 decodes to valid PNG bytes |
| Claude can display the image inline | ImageContent returned in tool response, visible in Roo Code chat |
| FLUX.1-schnell at 1024×1024 4-step completes in <30s | Measured on RX 7900 XTX with ROCm |
list_available_models returns ComfyUI model list |
At minimum includes flux1-schnell.safetensors |
| ComfyUI offline → clear error, not crash | Tool returns error string, no MCP server exception |
| All pytest tests pass | uv run pytest tests/ -v exits 0 with ≥80% coverage |
Server wired into .roo/mcp.json |
Tool appears in Roo Code MCP tool list |
8. Open Questions
| # | Question | Owner | Priority |
|---|---|---|---|
| Q1 | Should generate_image be synchronous (block until done) or return a prompt_id immediately? |
Patrick | High — MVP will be synchronous; async polling is v2 |
| Q2 | Default output directory: ~/Pictures/mcp-generated or ~/mcp-images? |
Patrick | Low — configurable via env var |
| Q3 | Should we support SDXL as a second model in v1, or FLUX.1-schnell only? | Patrick | Low — FLUX.1-schnell only for v1 |
| Q4 | WebSocket API vs REST polling for job status? | — | ComfyUI has both; REST polling is simpler for v1 |