diff --git a/mcp/mcp-image-gen/ASSESSMENT.md b/mcp/mcp-image-gen/ASSESSMENT.md new file mode 100644 index 0000000..5d9dd4d --- /dev/null +++ b/mcp/mcp-image-gen/ASSESSMENT.md @@ -0,0 +1,199 @@ +# mcp-image-gen — Architecture Assessment + +**Date:** 2026-04-04 +**Author:** Lumen (for Patrick / pplate) +**Status:** ✅ APPROVED — ready for implementation +**BigMind Research Session:** `39809470-6ac8-4713-adf2-79ac0eb36ba7` + +--- + +## 1. Problem Statement + +LLM agents (Claude, local models via Ollama) have no native ability to generate images. While +language models excel at text, creative and technical workflows increasingly need image output — +concept art, diagrams, product mockups, illustrations — all driven by a text prompt. + +A FastMCP wrapper around a local image generation backend would give any MCP-capable IDE or +agent the ability to produce images on demand, with full control over resolution, steps, model, +and seed — without sending data to external cloud APIs. + +**Gap being filled:** Local AI image generation accessible to LLM agents via MCP protocol, +running entirely on Patrick's AMD RX 7900 XTX (24GB VRAM) with ROCm. + +--- + +## 2. Requirements + +### 2.1 Functional Requirements + +| ID | Requirement | +|----|-------------| +| F-1 | Generate an image from a text prompt | +| F-2 | Support configurable resolution (width × height) | +| F-3 | Support configurable inference steps and seed for reproducibility | +| F-4 | Support negative prompts to exclude unwanted content | +| F-5 | List available models from the backend | +| F-6 | Check the status of an in-progress generation job | +| F-7 | Return generated image as both a file path AND inline base64 for agent display | +| F-8 | Configure output directory for saved images | +| F-9 | Support FLUX.1-schnell as the default model | + +### 2.2 Non-Functional Requirements + +| ID | Requirement | +|----|-------------| +| NF-1 | Generation time < 30 seconds for FLUX.1-schnell at 1024×1024, 4 steps | +| NF-2 | VRAM footprint < 12GB (leaves headroom on 24GB for Ollama co-existence) | +| NF-3 | Must work on AMD ROCm — no CUDA-only dependencies in the MCP server layer | +| NF-4 | No cloud API calls — fully local execution | +| NF-5 | Graceful error messages when ComfyUI is not running | +| NF-6 | MCP tools must work with FastMCP and be discoverable by Claude / Roo Code | + +--- + +## 3. Technology Decision + +### 3.1 Candidate Backends + +| Backend | Stars | ROCm | REST API | FLUX Support | Verdict | +|---------|-------|------|----------|--------------|---------| +| **ComfyUI** | 108k | ✅ Native | ✅ localhost:8188 | ✅ FLUX.1-schnell, FLUX.1-dev | ✅ **CHOSEN** | +| stable-diffusion.cpp | ~15k | ✅ ROCm/Vulkan | ❌ CLI only | ✅ FLUX.1-schnell | ⚠️ Viable alternative | +| PyTorch + diffusers | — | ✅ ROCm 7.2.1 | ❌ No REST | ✅ All models | ❌ Too complex to manage | +| Ollama image gen | — | ❌ Linux: N/A | ✅ /api/generate | ✅ FLUX.2, Z-Image | ❌ macOS-only as of April 2026 | +| A1111 / Forge WebUI | — | ⚠️ Limited | ✅ :7860 | ❌ SDXL primary | ❌ Not FLUX-native | + +### 3.2 Why ComfyUI + +1. **ROCm native** — ComfyUI's PyTorch backend runs on AMD GPUs via ROCm without forks or patches. +2. **REST API** — ComfyUI exposes a stable HTTP API at `localhost:8188` making it trivially + wrappable with `httpx`. No subprocess management or binary spawning needed. +3. **Workflow-based** — ComfyUI workflows are JSON graphs. The MCP server ships a minimal + FLUX.1-schnell workflow that can be parameterized with prompt, size, steps, seed at runtime. +4. **Model ecosystem** — ComfyUI's model manager supports FLUX.1, SDXL, SD3.5, ControlNet, + LoRA — giving a future-proof upgrade path. +5. **Community size** — 108k GitHub stars; extensive community support, model nodes, extensions. +6. **VRAM efficiency** — FLUX.1-schnell requires ~8GB VRAM. Patrick's 24GB card runs it + comfortably alongside Ollama. + +### 3.3 Why NOT the Alternatives + +- **Ollama:** Definitively blocked on Linux until further notice. No ETA for Linux image gen. +- **stable-diffusion.cpp:** CLI-based only — the MCP server would need to manage a subprocess, + parse stdout, handle crashes. More fragile than an HTTP API. +- **PyTorch + diffusers direct:** Requires managing Python environments, device placement, model + loading, memory management inside the MCP server process — adds significant complexity and + risk of VRAM conflicts. + +--- + +## 4. Architecture Decision + +### 4.1 System Overview + +``` +┌─────────────────────────────────────────────────────────┐ +│ LLM Agent (Claude / Roo Code / local Ollama) │ +└───────────────────────────┬─────────────────────────────┘ + │ MCP Protocol (stdio) +┌───────────────────────────▼─────────────────────────────┐ +│ mcp-image-gen (FastMCP Python server) │ +│ │ +│ Tools: │ +│ • generate_image(prompt, width, height, steps, ...) │ +│ • list_available_models() │ +│ • get_generation_status(prompt_id) │ +│ • get_output_directory() │ +└───────────────────────────┬─────────────────────────────┘ + │ HTTP REST (httpx) +┌───────────────────────────▼─────────────────────────────┐ +│ ComfyUI (localhost:8188) │ +│ AMD ROCm + PyTorch │ +│ FLUX.1-schnell model │ +└─────────────────────────────────────────────────────────┘ + │ + ┌───────▼───────┐ + │ ~/Pictures/ │ + │ mcp-generated│ + └───────────────┘ +``` + +### 4.2 Key Decisions + +| Decision | Choice | Rationale | +|----------|--------|-----------| +| HTTP client | `httpx` (async) | Already used in webscraper; async-friendly; clean timeout handling | +| Image return | dual: path + base64 | File path for persistence; base64 `ImageContent` for inline Claude display | +| ImageContent type | `mcp.types.ImageContent` | FastMCP 3.x: **never** use `fastmcp.utilities.types.Image` with `-> Image` annotation — it breaks serialization. Return `ImageContent` directly as a `ContentBlock`. | +| Job polling | loop with sleep | ComfyUI `/api/queue` returns pending/running/done status; poll until done or timeout | +| Workflow format | ComfyUI API JSON | Minimal FLUX.1-schnell graph parameterized at runtime | +| Config | env vars | `COMFYUI_URL`, `IMAGE_OUTPUT_DIR` — no hardcoded paths | +| Output naming | `{timestamp}_{seed}.png` | Reproducible, collision-free, sortable | + +--- + +## 5. Risks + +| Risk | Likelihood | Impact | Mitigation | +|------|------------|--------|------------| +| ComfyUI not running when tool is called | High | High | Return clear error: "ComfyUI not reachable at {url}. Start with: `python main.py --listen`" | +| Generation timeout (>60s) | Medium | Medium | Configurable timeout; return partial status message with `prompt_id` so agent can poll manually | +| VRAM contention with Ollama | Medium | Medium | FLUX.1-schnell uses ~8GB; 24GB card has 16GB headroom. Document that running both simultaneously may compete at >8GB Ollama model sizes | +| ROCm driver instability | Low | High | ComfyUI falls back to CPU if ROCm unavailable — slow but functional. Document ROCm setup. | +| ComfyUI API changes | Low | Medium | Pin ComfyUI version in setup docs; the `/api/prompt`, `/api/queue`, `/api/view` endpoints are stable | +| Large output files | Low | Low | PNG default; add optional JPEG quality param in v2 | +| Malformed workflow JSON | Low | High | Ship a tested, minimal FLUX.1-schnell workflow; validate before submit | + +--- + +## 6. Alternatives Considered + +### 6.1 Ollama (Blocked) +Ollama added image generation in January 2026 (Z-Image Turbo, FLUX.2 Klein) but the feature is +**macOS-only** as of April 2026. Linux support is listed as "coming soon" with no ETA. This was +the originally preferred path (uniform API with text generation), but it is not viable on Fedora +Linux today. + +**Migration path:** When Ollama Linux image gen ships, a thin backend adapter can be added to +`mcp-image-gen` so it routes to Ollama instead of ComfyUI — same MCP tool signatures, different +HTTP target. + +### 6.2 stable-diffusion.cpp +DiffuGen MCP server uses this approach. Requires: +- Building sd.cpp with ROCm/Vulkan flags +- Spawning a subprocess and parsing CLI output +- No REST API — process management in Python + +Viable but more fragile than ComfyUI's HTTP API. Chosen only if ComfyUI proves unworkable. + +### 6.3 diffusers (Python library, direct) +Would run diffusion pipeline inside the MCP server process. Problems: +- MCP server process cannot easily share GPU memory with Ollama +- Model loading adds 5-15s cold start to every MCP invocation +- Complex device placement / fp16 / ROCm configuration in server code +- Risk: VRAM OOM crashes the MCP server process entirely + +--- + +## 7. Success Criteria + +| Criterion | Measure | +|-----------|---------| +| `generate_image` returns a valid PNG | File exists on disk, base64 decodes to valid PNG bytes | +| Claude can display the image inline | `ImageContent` returned in tool response, visible in Roo Code chat | +| FLUX.1-schnell at 1024×1024 4-step completes in <30s | Measured on RX 7900 XTX with ROCm | +| `list_available_models` returns ComfyUI model list | At minimum includes `flux1-schnell.safetensors` | +| ComfyUI offline → clear error, not crash | Tool returns error string, no MCP server exception | +| All pytest tests pass | `uv run pytest tests/ -v` exits 0 with ≥80% coverage | +| Server wired into `.roo/mcp.json` | Tool appears in Roo Code MCP tool list | + +--- + +## 8. Open Questions + +| # | Question | Owner | Priority | +|---|----------|-------|----------| +| Q1 | Should `generate_image` be synchronous (block until done) or return a `prompt_id` immediately? | Patrick | High — MVP will be synchronous; async polling is v2 | +| Q2 | Default output directory: `~/Pictures/mcp-generated` or `~/mcp-images`? | Patrick | Low — configurable via env var | +| Q3 | Should we support SDXL as a second model in v1, or FLUX.1-schnell only? | Patrick | Low — FLUX.1-schnell only for v1 | +| Q4 | WebSocket API vs REST polling for job status? | — | ComfyUI has both; REST polling is simpler for v1 | diff --git a/mcp/mcp-image-gen/PLAN.md b/mcp/mcp-image-gen/PLAN.md new file mode 100644 index 0000000..c24cf7e --- /dev/null +++ b/mcp/mcp-image-gen/PLAN.md @@ -0,0 +1,496 @@ +# mcp-image-gen — Implementation Plan + +**Date:** 2026-04-04 +**Author:** Lumen (for Patrick / pplate) +**Status:** Ready for implementation +**Assessment:** [ASSESSMENT.md](./ASSESSMENT.md) +**Research Session:** `39809470-6ac8-4713-adf2-79ac0eb36ba7` + +--- + +## 1. Directory Structure + +``` +mcp/mcp-image-gen/ +├── ASSESSMENT.md ← Architecture assessment (this session) +├── PLAN.md ← This file +├── README.md ← Usage docs, tool table, env vars +├── pyproject.toml ← uv project + deps +├── run.sh ← Launch script (used by .roo/mcp.json) +├── src/ +│ ├── __init__.py +│ ├── server.py ← FastMCP server + all tools +│ └── workflows/ +│ └── flux_schnell.json ← Minimal ComfyUI API-format workflow +└── tests/ + ├── __init__.py + ├── conftest.py ← sys.path + shared fixtures + └── test_server.py ← All tool tests (mocked ComfyUI) +``` + +--- + +## 2. Tool Definitions + +### 2.1 `generate_image` + +```python +@mcp.tool() +async def generate_image( + prompt: str, + width: int = 1024, + height: int = 1024, + steps: int = 4, + model: str = "flux1-schnell.safetensors", + seed: int = -1, + negative_prompt: str = "", + output_dir: str = "", +) -> list: + """ + Generate an image from a text prompt using ComfyUI. + + Returns both a file path (for persistence) and an inline base64 image + (for display in Claude / Roo Code chat). + + Args: + prompt: Text description of the image to generate. + width: Image width in pixels (default: 1024). + height: Image height in pixels (default: 1024). + steps: Number of inference steps. FLUX.1-schnell works well at 4. + model: ComfyUI model filename (default: flux1-schnell.safetensors). + seed: Random seed for reproducibility. -1 = random. + negative_prompt: Things to exclude from the image (optional). + output_dir: Override output directory. Defaults to IMAGE_OUTPUT_DIR env var + or ~/Pictures/mcp-generated. + + Returns: + [TextContent(path + metadata), ImageContent(base64 PNG)] + """ +``` + +**Return type:** `list` containing: +1. `mcp.types.TextContent` — human-readable summary with file path, seed, elapsed time +2. `mcp.types.ImageContent` — `type="image"`, `data=base64_encoded_png`, `mimeType="image/png"` + +> ⚠️ **FastMCP 3.x rule:** NEVER annotate return as `-> Image` (fastmcp utility type). It triggers +> `output_schema` generation which breaks the early-return path. Return `mcp.types.ImageContent` +> directly as part of a `list` — it is a `ContentBlock` and passes through cleanly. + +--- + +### 2.2 `list_available_models` + +```python +@mcp.tool() +async def list_available_models() -> str: + """ + List all checkpoint models available in ComfyUI. + + Returns a newline-separated list of model filenames. + Requires ComfyUI to be running at COMFYUI_URL. + """ +``` + +**Implementation:** `GET {COMFYUI_URL}/object_info/CheckpointLoaderSimple` → parse +`input.required.ckpt_name[0]` list → join with newlines. + +--- + +### 2.3 `get_generation_status` + +```python +@mcp.tool() +async def get_generation_status(prompt_id: str) -> str: + """ + Check the status of a queued or running generation job. + + Args: + prompt_id: The prompt ID returned by a previous generate_image call. + + Returns: + Status string: "pending", "running", "completed", or "not_found". + """ +``` + +**Implementation:** `GET {COMFYUI_URL}/api/queue` → check `queue_running` and `queue_pending` +lists for matching `prompt_id`. If not found in either, check history endpoint. + +--- + +### 2.4 `get_output_directory` + +```python +@mcp.tool() +def get_output_directory() -> str: + """ + Return the directory where generated images are saved. + + Returns: + Absolute path to the output directory. + """ +``` + +**Implementation:** Resolve `IMAGE_OUTPUT_DIR` env var or default `~/Pictures/mcp-generated`, +expand `~`, return as string. + +--- + +## 3. ComfyUI Integration + +### 3.1 Workflow: Submit → Poll → Retrieve + +``` +generate_image() + │ + ├── 1. Load flux_schnell.json workflow template + ├── 2. Parameterize: inject prompt, width, height, steps, seed, model + ├── 3. POST {COMFYUI_URL}/api/prompt → {"prompt_id": "uuid"} + │ + ├── 4. POLL loop (max 120s, sleep 2s between) + │ GET {COMFYUI_URL}/api/queue + │ → check queue_running[].prompt_id == our id + │ → check queue_pending[].prompt_id == our id + │ → if neither: job is done + │ + ├── 5. GET {COMFYUI_URL}/api/history/{prompt_id} + │ → find output image filename + subfolder + │ + ├── 6. GET {COMFYUI_URL}/api/view?filename={name}&subfolder={subfolder}&type=output + │ → raw PNG bytes + │ + ├── 7. Save PNG to output_dir/{timestamp}_{seed}.png + └── 8. Return [TextContent(path + meta), ImageContent(base64)] +``` + +### 3.2 API Endpoints Used + +| Endpoint | Method | Purpose | +|----------|--------|---------| +| `/api/prompt` | POST | Submit workflow for generation | +| `/api/queue` | GET | Poll queue status (pending + running) | +| `/api/history/{prompt_id}` | GET | Get completed job output filenames | +| `/api/view` | GET | Download image bytes by filename | +| `/object_info/CheckpointLoaderSimple` | GET | List available checkpoint models | + +### 3.3 Error Handling + +| Condition | Response | +|-----------|----------| +| ComfyUI unreachable | `"ComfyUI not reachable at {url}. Start it with: python main.py --listen"` | +| Timeout (>120s) | `"Generation timed out after 120s. prompt_id={id} — use get_generation_status to check"` | +| ComfyUI returns error in history | Extract and return the error message from history response | +| Invalid model name | ComfyUI returns error in history; surface it clearly | +| Output dir not writable | `"Cannot write to output directory: {path}"` | + +--- + +## 4. Configuration + +All configuration via environment variables. No hardcoded paths. + +| Variable | Default | Description | +|----------|---------|-------------| +| `COMFYUI_URL` | `http://localhost:8188` | Base URL of running ComfyUI instance | +| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Where to save generated PNG files | +| `COMFYUI_TIMEOUT` | `120` | Max seconds to wait for generation (int) | + +### `.roo/mcp.json` entry (to be added during implementation): + +```json +"mcp-image-gen": { + "command": "uv", + "args": [ + "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", + "run", "src/server.py" + ], + "env": { + "COMFYUI_URL": "http://localhost:8188", + "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated" + } +} +``` + +--- + +## 5. `pyproject.toml` + +```toml +[project] +name = "mcp-image-gen" +version = "0.1.0" +requires-python = ">=3.11" +description = "MCP server for local AI image generation via ComfyUI" +dependencies = [ + "fastmcp>=0.1.0", + "httpx>=0.27.0", + "pillow>=10.0.0", +] + +[project.optional-dependencies] +test = [ + "pytest>=7.0", + "pytest-mock>=3.0", + "pytest-cov>=4.0", + "pytest-asyncio>=0.23", +] + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[tool.pytest.ini_options] +asyncio_mode = "auto" +``` + +**Dependency rationale:** +- `fastmcp` — MCP framework +- `httpx` — async HTTP client for ComfyUI REST API +- `pillow` — validate PNG output, potential future thumbnail generation +- `pytest-asyncio` — needed for async tool tests + +--- + +## 6. FLUX.1-schnell Workflow JSON + +The minimal ComfyUI API-format workflow for FLUX.1-schnell text-to-image. +This is the "API format" (node-graph JSON), not the UI export format. + +File: `src/workflows/flux_schnell.json` + +```json +{ + "6": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["30", 1], + "text": "PROMPT_PLACEHOLDER" + } + }, + "8": { + "class_type": "VAEDecode", + "inputs": { + "samples": ["13", 0], + "vae": ["30", 2] + } + }, + "9": { + "class_type": "SaveImage", + "inputs": { + "filename_prefix": "mcp-image-gen", + "images": ["8", 0] + } + }, + "13": { + "class_type": "KSampler", + "inputs": { + "cfg": 1.0, + "denoise": 1.0, + "latent_image": ["27", 0], + "model": ["30", 0], + "negative": ["33", 0], + "positive": ["6", 0], + "sampler_name": "euler", + "scheduler": "simple", + "seed": 42, + "steps": 4 + } + }, + "27": { + "class_type": "EmptySD3LatentImage", + "inputs": { + "batch_size": 1, + "height": 1024, + "width": 1024 + } + }, + "30": { + "class_type": "CheckpointLoaderSimple", + "inputs": { + "ckpt_name": "flux1-schnell.safetensors" + } + }, + "33": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["30", 1], + "text": "NEGATIVE_PLACEHOLDER" + } + } +} +``` + +**Parameterization at runtime** (in `server.py`): + +```python +import json, copy + +def _build_workflow(prompt, negative_prompt, width, height, steps, seed, model): + with open(Path(__file__).parent / "workflows/flux_schnell.json") as f: + wf = json.load(f) + wf = copy.deepcopy(wf) + wf["6"]["inputs"]["text"] = prompt + wf["33"]["inputs"]["text"] = negative_prompt + wf["27"]["inputs"]["width"] = width + wf["27"]["inputs"]["height"] = height + wf["13"]["inputs"]["steps"] = steps + wf["13"]["inputs"]["seed"] = seed if seed != -1 else random.randint(0, 2**32 - 1) + wf["30"]["inputs"]["ckpt_name"] = model + return wf +``` + +--- + +## 7. Testing Strategy + +### 7.1 Test Structure (`tests/test_server.py`) + +All tests mock `httpx.AsyncClient` — no real ComfyUI needed. + +| Test | Description | +|------|-------------| +| `test_generate_image_happy_path` | Mock submit → poll done → history → view → returns TextContent + ImageContent | +| `test_generate_image_comfyui_offline` | httpx.ConnectError → returns clear error string | +| `test_generate_image_timeout` | Poll loop exceeds COMFYUI_TIMEOUT → returns timeout message with prompt_id | +| `test_generate_image_saves_file` | Verify PNG written to output_dir with correct filename pattern | +| `test_generate_image_random_seed` | seed=-1 → seed in output filename is a valid integer | +| `test_generate_image_custom_params` | Non-default width/height/steps/model passed through to workflow | +| `test_generate_image_returns_image_content` | Second item in result list is `mcp.types.ImageContent` with valid base64 | +| `test_list_available_models_happy_path` | Mock object_info response → returns model name list | +| `test_list_available_models_offline` | ConnectError → returns error string | +| `test_get_generation_status_pending` | prompt_id found in queue_pending → "pending" | +| `test_get_generation_status_running` | prompt_id found in queue_running → "running" | +| `test_get_generation_status_not_found` | prompt_id not in queue, not in history → "not_found" | +| `test_get_output_directory_default` | No env var → returns expanded ~/Pictures/mcp-generated | +| `test_get_output_directory_custom` | IMAGE_OUTPUT_DIR set → returns that path | +| `test_build_workflow_parameterization` | _build_workflow() injects all params correctly into JSON | + +### 7.2 conftest.py fixtures + +```python +import sys +from pathlib import Path +import pytest + +sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + +@pytest.fixture +def mock_comfyui_submit_response(): + return {"prompt_id": "test-uuid-1234"} + +@pytest.fixture +def mock_comfyui_queue_empty(): + return {"queue_running": [], "queue_pending": []} + +@pytest.fixture +def mock_comfyui_history(): + return { + "test-uuid-1234": { + "outputs": { + "9": { + "images": [{"filename": "mcp-image-gen_00001_.png", "subfolder": "", "type": "output"}] + } + } + } + } + +@pytest.fixture +def sample_png_bytes(): + """Minimal valid 1x1 PNG in bytes.""" + import base64 + # 1x1 red pixel PNG + data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8BQDwADhQGAWjR9awAAAABJRU5ErkJggg==" + return base64.b64decode(data) +``` + +### 7.3 Run command + +```bash +cd mcp/mcp-image-gen && uv run pytest tests/ -v --cov=src --cov-report=term-missing +``` + +--- + +## 8. `run.sh` + +```bash +#!/usr/bin/env bash +BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +export PATH="$HOME/.local/bin:$PATH" + +# Create output dir if it doesn't exist +OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}" +mkdir -p "$OUTPUT_DIR" + +cd "$BASEDIR" +exec uv run src/server.py +``` + +--- + +## 9. Future: Ollama Migration Path + +When Ollama adds Linux image generation support (ETA unknown, announced "coming soon" April 2026): + +### Adapter pattern (no breaking changes to MCP tool signatures) + +```python +BACKEND = os.getenv("IMAGE_BACKEND", "comfyui") # or "ollama" + +async def _generate_comfyui(prompt, width, height, steps, model, seed, negative_prompt, output_dir): + # current ComfyUI implementation + ... + +async def _generate_ollama(prompt, width, height, steps, model, seed, negative_prompt, output_dir): + # POST http://localhost:11434/api/generate + # with model=Z-Image-Turbo or FLUX.2-Klein + # width, height, steps in request body + # save returned image path + ... + +@mcp.tool() +async def generate_image(prompt, width=1024, height=1024, steps=4, ...): + if BACKEND == "ollama": + return await _generate_ollama(...) + return await _generate_comfyui(...) +``` + +**No changes to:** tool signatures, return types, env vars (add `IMAGE_BACKEND`), tests structure. + +--- + +## 10. Implementation Order (for Code mode) + +1. `src/workflows/flux_schnell.json` — write and validate JSON structure +2. `pyproject.toml` — set up project + deps +3. `src/__init__.py` — empty +4. `src/server.py` — implement all 4 tools + `_build_workflow` + polling helpers +5. `tests/conftest.py` — fixtures + sys.path +6. `tests/test_server.py` — all 15 tests +7. `run.sh` — launch script +8. `README.md` — usage docs +9. `.roo/mcp.json` — wire server in (requires switching to Code or Homelab mode for that file) +10. `uv sync && uv run pytest tests/ -v` — confirm all tests pass + +--- + +## 11. ComfyUI Setup Notes (for README) + +These are prerequisites for the MCP server to work. Patrick must have ComfyUI installed: + +```bash +# Install ComfyUI (ROCm/AMD) +pip install comfyui + +# Download FLUX.1-schnell model (~8GB) +# Place in ComfyUI/models/checkpoints/flux1-schnell.safetensors +# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell + +# Start ComfyUI with AMD ROCm +HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen + +# Verify API is running +curl http://localhost:8188/system_stats +``` + +> The `HSA_OVERRIDE_GFX_VERSION=11.0.0` env var may be needed for RX 7900 XTX (gfx1100) +> to identify correctly to ROCm libraries. diff --git a/mcp/mcp-image-gen/README.md b/mcp/mcp-image-gen/README.md new file mode 100644 index 0000000..dea9a71 --- /dev/null +++ b/mcp/mcp-image-gen/README.md @@ -0,0 +1,178 @@ +# mcp-image-gen + +**FastMCP server for AI image generation via ComfyUI.** + +This MCP server wraps a locally running [ComfyUI](https://github.com/comfyanonymous/ComfyUI) instance, exposing image generation as MCP tools callable from Roo Code, Claude Desktop, or any MCP-compatible client. It supports FLUX.1-schnell, FLUX.1-dev, SDXL, and any other ComfyUI-compatible checkpoint model. Generated images are saved to disk **and** returned as inline base64 so Claude can display them directly in chat. + +--- + +## Prerequisites + +1. **ComfyUI** installed and running at `http://localhost:8188` +2. At least one checkpoint model downloaded (see ComfyUI Setup below) +3. **Python 3.11+** and **uv** installed on the system + +--- + +## Installation + +```bash +cd mcp/mcp-image-gen +uv sync +``` + +--- + +## Configuration + +All configuration is via environment variables: + +| Variable | Default | Description | +|---|---|---| +| `COMFYUI_URL` | `http://localhost:8188` | Base URL of the running ComfyUI instance | +| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Directory where generated PNG files are saved | +| `COMFYUI_TIMEOUT` | `120` | Max seconds to wait for generation before timeout | + +--- + +## Usage + +### Add to `.roo/mcp.json` (Roo Code) + +```json +"mcp-image-gen": { + "command": "uv", + "args": [ + "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", + "run", "src/server.py" + ], + "env": { + "COMFYUI_URL": "http://localhost:8188", + "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated" + } +} +``` + +### Add to Claude Desktop (`claude_desktop_config.json`) + +```json +{ + "mcpServers": { + "mcp-image-gen": { + "command": "uv", + "args": [ + "--directory", "/home/pplate/pi_mcps/mcp/mcp-image-gen", + "run", "src/server.py" + ], + "env": { + "COMFYUI_URL": "http://localhost:8188", + "IMAGE_OUTPUT_DIR": "/home/pplate/Pictures/mcp-generated" + } + } + } +} +``` + +### Run directly + +```bash +cd mcp/mcp-image-gen +./run.sh +``` + +--- + +## Available Tools + +| Tool | Description | +|---|---| +| `generate_image` | Generate an image from a text prompt. Returns file path + inline base64 PNG. | +| `list_available_models` | List all checkpoint models loaded in ComfyUI. | +| `get_generation_status` | Check status of a running/queued generation by `prompt_id`. | +| `get_output_directory` | Return the current output directory path. | + +### `generate_image` parameters + +| Parameter | Default | Description | +|---|---|---| +| `prompt` | *(required)* | Text description of the image | +| `width` | `1024` | Image width in pixels | +| `height` | `1024` | Image height in pixels | +| `steps` | `4` | Inference steps (FLUX.1-schnell: 4 is optimal) | +| `model` | `flux1-schnell.safetensors` | Checkpoint model filename | +| `seed` | `-1` | Seed for reproducibility (`-1` = random) | +| `negative_prompt` | `""` | Things to exclude from the image | +| `output_dir` | *(IMAGE_OUTPUT_DIR)* | Override output directory | + +--- + +## ComfyUI Setup (Fedora + AMD ROCm) + +```bash +# Install ComfyUI +pip install comfyui + +# Download FLUX.1-schnell model (~8GB, Apache 2.0) +# Place in: ComfyUI/models/checkpoints/flux1-schnell.safetensors +# Source: https://huggingface.co/black-forest-labs/FLUX.1-schnell + +# Start ComfyUI with ROCm support for AMD RX 7900 XTX +HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py --listen + +# Verify the API is reachable +curl http://localhost:8188/system_stats +``` + +> **Note:** `HSA_OVERRIDE_GFX_VERSION=11.0.0` may be needed for the RX 7900 XTX (gfx1100) +> to be recognized correctly by ROCm libraries. + +### PyTorch with ROCm (if needed separately) + +```bash +pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.1 +``` + +--- + +## Testing + +```bash +cd mcp/mcp-image-gen +uv run pytest tests/ -v +``` + +All tests mock the ComfyUI HTTP API — no running ComfyUI instance needed. + +--- + +## Ollama Migration Path + +When Ollama adds Linux image generation support (announced "coming soon" as of April 2026, currently macOS-only), this server can switch backends via a single env var: + +```bash +IMAGE_BACKEND=ollama # currently only "comfyui" is implemented +``` + +The tool signatures, return types, and MCP interface will remain unchanged — only the underlying HTTP calls switch from ComfyUI to Ollama's `/api/generate` endpoint. + +--- + +## Architecture + +``` +Roo Code / Claude Desktop + │ + │ MCP (stdio) + ▼ + mcp-image-gen (FastMCP) + │ + │ HTTP REST + ▼ + ComfyUI @ localhost:8188 + │ + │ ROCm / AMD GPU + ▼ + FLUX.1-schnell / SDXL / SD3.5 +``` + +The server submits a FLUX.1-schnell ComfyUI API-format workflow, polls until complete, downloads the PNG, saves it to disk, and returns both a text summary and a base64-encoded inline image. diff --git a/mcp/mcp-image-gen/pyproject.toml b/mcp/mcp-image-gen/pyproject.toml new file mode 100644 index 0000000..acd24b5 --- /dev/null +++ b/mcp/mcp-image-gen/pyproject.toml @@ -0,0 +1,41 @@ +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "mcp-image-gen" +version = "0.1.0" +description = "MCP server for AI image generation via ComfyUI (FLUX, SDXL)" +readme = "README.md" +requires-python = ">=3.11" +license = "MIT" +authors = [{name = "Patrick Plate", email = "patrickplate@gmx.de"}] +dependencies = [ + "fastmcp>=2.0.0", + "httpx>=0.27.0", + "pillow>=10.0.0", +] + +[tool.hatch.version] +path = "src/__init__.py" + +[tool.hatch.build.targets.sdist] +include = ["/src", "/tests"] + +[tool.hatch.build.targets.wheel] +include = ["/src", "/tests"] + +[tool.pytest.ini_options] +testpaths = ["tests"] +python_files = "test_*.py" +python_classes = "Test*" +python_functions = "test_*" +asyncio_mode = "auto" + +[dependency-groups] +dev = [ + "pytest>=8.0.0", + "pytest-asyncio>=0.23.0", + "respx>=0.21.0", + "pillow>=10.0.0", +] diff --git a/mcp/mcp-image-gen/run.sh b/mcp/mcp-image-gen/run.sh new file mode 100755 index 0000000..9a03279 --- /dev/null +++ b/mcp/mcp-image-gen/run.sh @@ -0,0 +1,13 @@ +#!/usr/bin/env bash +# Run mcp-image-gen MCP server +set -euo pipefail + +BASEDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +export PATH="$HOME/.local/bin:$PATH" + +# Create output dir if it doesn't exist +OUTPUT_DIR="${IMAGE_OUTPUT_DIR:-$HOME/Pictures/mcp-generated}" +mkdir -p "$OUTPUT_DIR" + +cd "$BASEDIR" +exec uv run src/server.py diff --git a/mcp/mcp-image-gen/src/__init__.py b/mcp/mcp-image-gen/src/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/mcp/mcp-image-gen/src/server.py b/mcp/mcp-image-gen/src/server.py new file mode 100644 index 0000000..8451b7d --- /dev/null +++ b/mcp/mcp-image-gen/src/server.py @@ -0,0 +1,384 @@ +"""mcp-image-gen — FastMCP server for AI image generation via ComfyUI.""" + +import asyncio +import base64 +import copy +import json +import os +import random +import time +from datetime import datetime +from pathlib import Path + +import httpx +from fastmcp import FastMCP +from mcp.types import ImageContent, TextContent + +# --------------------------------------------------------------------------- +# Configuration +# --------------------------------------------------------------------------- + +COMFYUI_URL = os.environ.get("COMFYUI_URL", "http://localhost:8188").rstrip("/") +IMAGE_OUTPUT_DIR = os.environ.get("IMAGE_OUTPUT_DIR", "~/Pictures/mcp-generated") +COMFYUI_TIMEOUT = int(os.environ.get("COMFYUI_TIMEOUT", "120")) + +# Path to the bundled FLUX.1-schnell workflow template +_WORKFLOW_PATH = Path(__file__).parent / "workflows" / "flux_schnell.json" + +mcp = FastMCP("mcp-image-gen") + + +# --------------------------------------------------------------------------- +# ComfyUI client +# --------------------------------------------------------------------------- + +class ComfyUIClient: + """Async HTTP client wrapper for the ComfyUI REST API.""" + + def __init__(self, base_url: str = COMFYUI_URL): + self.base_url = base_url.rstrip("/") + + async def queue_prompt(self, workflow: dict) -> str: + """Submit a workflow to ComfyUI and return the prompt_id.""" + payload = {"prompt": workflow} + async with httpx.AsyncClient(timeout=30.0) as client: + resp = await client.post(f"{self.base_url}/api/prompt", json=payload) + resp.raise_for_status() + return resp.json()["prompt_id"] + + async def get_status(self, prompt_id: str) -> dict: + """Return the current queue state (queue_running + queue_pending lists).""" + async with httpx.AsyncClient(timeout=10.0) as client: + resp = await client.get(f"{self.base_url}/api/queue") + resp.raise_for_status() + return resp.json() + + async def get_history(self, prompt_id: str) -> dict: + """Return the history entry for a completed prompt_id.""" + async with httpx.AsyncClient(timeout=10.0) as client: + resp = await client.get(f"{self.base_url}/api/history/{prompt_id}") + resp.raise_for_status() + return resp.json() + + async def get_image(self, filename: str, subfolder: str, folder_type: str) -> bytes: + """Download image bytes from ComfyUI's /api/view endpoint.""" + params = {"filename": filename, "subfolder": subfolder, "type": folder_type} + async with httpx.AsyncClient(timeout=60.0) as client: + resp = await client.get(f"{self.base_url}/api/view", params=params) + resp.raise_for_status() + return resp.content + + async def get_models(self) -> list[str]: + """Return the list of available checkpoint model filenames.""" + async with httpx.AsyncClient(timeout=10.0) as client: + resp = await client.get( + f"{self.base_url}/object_info/CheckpointLoaderSimple" + ) + resp.raise_for_status() + data = resp.json() + # ComfyUI returns: {"CheckpointLoaderSimple": {"input": {"required": {"ckpt_name": [["model1.safetensors", ...], ...]}}}} + node_info = data.get("CheckpointLoaderSimple", {}) + ckpt_list = ( + node_info.get("input", {}) + .get("required", {}) + .get("ckpt_name", [[]])[0] + ) + return ckpt_list if isinstance(ckpt_list, list) else [] + + +# --------------------------------------------------------------------------- +# Workflow builder +# --------------------------------------------------------------------------- + +def build_flux_workflow( + prompt: str, + neg_prompt: str, + width: int, + height: int, + steps: int, + seed: int, + model: str, +) -> dict: + """Build a ComfyUI API-format workflow dict for FLUX.1-schnell text-to-image. + + This is a pure function — no I/O, fully testable. + """ + with open(_WORKFLOW_PATH) as f: + wf = json.load(f) + wf = copy.deepcopy(wf) + + actual_seed = seed if seed != -1 else random.randint(0, 2**32 - 1) + + wf["6"]["inputs"]["text"] = prompt + wf["33"]["inputs"]["text"] = neg_prompt + wf["27"]["inputs"]["width"] = width + wf["27"]["inputs"]["height"] = height + wf["13"]["inputs"]["steps"] = steps + wf["13"]["inputs"]["seed"] = actual_seed + wf["30"]["inputs"]["ckpt_name"] = model + + # Attach the actual seed as metadata so callers can retrieve it + wf["_meta"] = {"actual_seed": actual_seed} + return wf + + +# --------------------------------------------------------------------------- +# Tools +# --------------------------------------------------------------------------- + +@mcp.tool() +async def generate_image( + prompt: str, + width: int = 1024, + height: int = 1024, + steps: int = 4, + model: str = "flux1-schnell.safetensors", + seed: int = -1, + negative_prompt: str = "", + output_dir: str = "", +) -> list: + """Generate an image from a text prompt using ComfyUI. + + Returns both a file path (for persistence) and an inline base64 image + (for display in Claude / Roo Code chat). + + Args: + prompt: Text description of the image to generate. + width: Image width in pixels (default: 1024). + height: Image height in pixels (default: 1024). + steps: Number of inference steps. FLUX.1-schnell works well at 4. + model: ComfyUI model filename (default: flux1-schnell.safetensors). + seed: Random seed for reproducibility. -1 = random. + negative_prompt: Things to exclude from the image (optional). + output_dir: Override output directory. Defaults to IMAGE_OUTPUT_DIR env var + or ~/Pictures/mcp-generated. + + Returns: + [TextContent(path + metadata), ImageContent(base64 PNG)] + """ + # Resolve output directory + resolved_output_dir = Path( + output_dir or IMAGE_OUTPUT_DIR + ).expanduser().resolve() + + client = ComfyUIClient(COMFYUI_URL) + + # Build and submit workflow + try: + workflow = build_flux_workflow( + prompt=prompt, + neg_prompt=negative_prompt, + width=width, + height=height, + steps=steps, + seed=seed, + model=model, + ) + actual_seed = workflow["_meta"]["actual_seed"] + + prompt_id = await client.queue_prompt(workflow) + except httpx.ConnectError: + return [ + TextContent( + type="text", + text=( + f"ComfyUI not reachable at {COMFYUI_URL}. " + "Start it with: python main.py --listen" + ), + ) + ] + except httpx.HTTPStatusError as e: + return [ + TextContent( + type="text", + text=f"ComfyUI returned an error: {e.response.status_code} — {e.response.text}", + ) + ] + + # Poll until done + start = time.time() + while True: + elapsed = time.time() - start + if elapsed > COMFYUI_TIMEOUT: + return [ + TextContent( + type="text", + text=( + f"Generation timed out after {COMFYUI_TIMEOUT}s. " + f"prompt_id={prompt_id} — use get_generation_status to check" + ), + ) + ] + + try: + queue = await client.get_status(prompt_id) + except (httpx.ConnectError, httpx.HTTPStatusError): + await asyncio.sleep(2) + continue + + running_ids = [item[1] for item in queue.get("queue_running", [])] + pending_ids = [item[1] for item in queue.get("queue_pending", [])] + + if prompt_id not in running_ids and prompt_id not in pending_ids: + break # Job is done + + await asyncio.sleep(2) + + elapsed = time.time() - start + + # Retrieve history to find output filename + try: + history = await client.get_history(prompt_id) + except (httpx.ConnectError, httpx.HTTPStatusError) as e: + return [ + TextContent( + type="text", + text=f"Failed to retrieve generation history: {e}", + ) + ] + + job = history.get(prompt_id, {}) + outputs = job.get("outputs", {}) + + # Find SaveImage node output (node "9" in our workflow) + image_info = None + for node_id, node_output in outputs.items(): + images = node_output.get("images", []) + if images: + image_info = images[0] + break + + if not image_info: + return [ + TextContent( + type="text", + text=f"No output image found in history for prompt_id={prompt_id}", + ) + ] + + # Download image bytes + try: + image_bytes = await client.get_image( + filename=image_info["filename"], + subfolder=image_info.get("subfolder", ""), + folder_type=image_info.get("type", "output"), + ) + except (httpx.ConnectError, httpx.HTTPStatusError) as e: + return [ + TextContent( + type="text", + text=f"Failed to download generated image: {e}", + ) + ] + + # Save to disk + try: + resolved_output_dir.mkdir(parents=True, exist_ok=True) + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + filename = f"{timestamp}_{actual_seed}.png" + out_path = resolved_output_dir / filename + out_path.write_bytes(image_bytes) + except OSError as e: + return [ + TextContent( + type="text", + text=f"Cannot write to output directory: {resolved_output_dir} — {e}", + ) + ] + + # Encode as base64 for inline display + b64_data = base64.b64encode(image_bytes).decode("utf-8") + + return [ + TextContent( + type="text", + text=( + f"Generated: {out_path}\n" + f"Seed: {actual_seed}\n" + f"Elapsed: {elapsed:.1f}s\n" + f"Size: {width}x{height}, Steps: {steps}, Model: {model}" + ), + ), + ImageContent( + type="image", + data=b64_data, + mimeType="image/png", + ), + ] + + +@mcp.tool() +async def list_available_models() -> list[str]: + """List all checkpoint models available in ComfyUI. + + Returns a list of model filenames available for use with generate_image. + Requires ComfyUI to be running at COMFYUI_URL. + """ + client = ComfyUIClient(COMFYUI_URL) + try: + return await client.get_models() + except httpx.ConnectError: + return [ + f"ComfyUI not reachable at {COMFYUI_URL}. " + "Start it with: python main.py --listen" + ] + except httpx.HTTPStatusError as e: + return [f"ComfyUI error: {e.response.status_code}"] + + +@mcp.tool() +async def get_generation_status(prompt_id: str) -> dict: + """Check the status of a queued or running generation job. + + Args: + prompt_id: The prompt ID returned by a previous generate_image call. + + Returns: + Dict with 'status' key: "pending", "running", "completed", or "not_found". + """ + client = ComfyUIClient(COMFYUI_URL) + try: + queue = await client.get_status(prompt_id) + running_ids = [item[1] for item in queue.get("queue_running", [])] + pending_ids = [item[1] for item in queue.get("queue_pending", [])] + + if prompt_id in running_ids: + return {"status": "running", "prompt_id": prompt_id} + if prompt_id in pending_ids: + return {"status": "pending", "prompt_id": prompt_id} + + # Not in queue — check history + try: + history = await client.get_history(prompt_id) + if prompt_id in history: + return {"status": "completed", "prompt_id": prompt_id} + except (httpx.ConnectError, httpx.HTTPStatusError): + pass + + return {"status": "not_found", "prompt_id": prompt_id} + + except httpx.ConnectError: + return { + "status": "error", + "message": f"ComfyUI not reachable at {COMFYUI_URL}", + } + except httpx.HTTPStatusError as e: + return {"status": "error", "message": f"HTTP {e.response.status_code}"} + + +@mcp.tool() +def get_output_directory() -> str: + """Return the directory where generated images are saved. + + Returns: + Absolute path to the output directory (may not exist yet). + """ + return str(Path(IMAGE_OUTPUT_DIR).expanduser().resolve()) + + +# --------------------------------------------------------------------------- +# Entry point +# --------------------------------------------------------------------------- + +if __name__ == "__main__": + mcp.run(transport="stdio") diff --git a/mcp/mcp-image-gen/src/workflows/flux_schnell.json b/mcp/mcp-image-gen/src/workflows/flux_schnell.json new file mode 100644 index 0000000..e809015 --- /dev/null +++ b/mcp/mcp-image-gen/src/workflows/flux_schnell.json @@ -0,0 +1,59 @@ +{ + "6": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["30", 1], + "text": "PROMPT_PLACEHOLDER" + } + }, + "8": { + "class_type": "VAEDecode", + "inputs": { + "samples": ["13", 0], + "vae": ["30", 2] + } + }, + "9": { + "class_type": "SaveImage", + "inputs": { + "filename_prefix": "mcp-image-gen", + "images": ["8", 0] + } + }, + "13": { + "class_type": "KSampler", + "inputs": { + "cfg": 1.0, + "denoise": 1.0, + "latent_image": ["27", 0], + "model": ["30", 0], + "negative": ["33", 0], + "positive": ["6", 0], + "sampler_name": "euler", + "scheduler": "simple", + "seed": 42, + "steps": 4 + } + }, + "27": { + "class_type": "EmptySD3LatentImage", + "inputs": { + "batch_size": 1, + "height": 1024, + "width": 1024 + } + }, + "30": { + "class_type": "CheckpointLoaderSimple", + "inputs": { + "ckpt_name": "flux1-schnell.safetensors" + } + }, + "33": { + "class_type": "CLIPTextEncode", + "inputs": { + "clip": ["30", 1], + "text": "NEGATIVE_PLACEHOLDER" + } + } +} diff --git a/mcp/mcp-image-gen/tests/__init__.py b/mcp/mcp-image-gen/tests/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/mcp/mcp-image-gen/tests/conftest.py b/mcp/mcp-image-gen/tests/conftest.py new file mode 100644 index 0000000..23ace3e --- /dev/null +++ b/mcp/mcp-image-gen/tests/conftest.py @@ -0,0 +1,76 @@ +"""Pytest fixtures for mcp-image-gen tests.""" + +import base64 +import io +import sys +from pathlib import Path + +import pytest + +# Make src/ importable +sys.path.insert(0, str(Path(__file__).parent.parent / "src")) + + +@pytest.fixture(autouse=True) +def comfyui_url(monkeypatch): + """Set COMFYUI_URL to a test URL for all tests.""" + monkeypatch.setenv("COMFYUI_URL", "http://test-comfyui:8188") + # Also patch the module-level constant in server + import server + monkeypatch.setattr(server, "COMFYUI_URL", "http://test-comfyui:8188") + + +@pytest.fixture +def sample_image_bytes(): + """Generate a 1x1 red pixel PNG as bytes using Pillow.""" + from PIL import Image + + img = Image.new("RGB", (1, 1), color=(255, 0, 0)) + buf = io.BytesIO() + img.save(buf, format="PNG") + return buf.getvalue() + + +@pytest.fixture +def mock_history_response(): + """Sample ComfyUI history response for prompt_id='test-uuid-1234'.""" + return { + "test-uuid-1234": { + "outputs": { + "9": { + "images": [ + { + "filename": "mcp-image-gen_00001_.png", + "subfolder": "", + "type": "output", + } + ] + } + }, + "status": {"completed": True}, + } + } + + +@pytest.fixture +def queue_empty(): + """ComfyUI queue response with nothing running or pending.""" + return {"queue_running": [], "queue_pending": []} + + +@pytest.fixture +def queue_with_pending(): + """ComfyUI queue response with our test prompt pending.""" + return { + "queue_running": [], + "queue_pending": [[1, "test-uuid-1234", {}, {}]], + } + + +@pytest.fixture +def queue_with_running(): + """ComfyUI queue response with our test prompt running.""" + return { + "queue_running": [[1, "test-uuid-1234", {}, {}]], + "queue_pending": [], + } diff --git a/mcp/mcp-image-gen/tests/test_server.py b/mcp/mcp-image-gen/tests/test_server.py new file mode 100644 index 0000000..f055011 --- /dev/null +++ b/mcp/mcp-image-gen/tests/test_server.py @@ -0,0 +1,302 @@ +"""Tests for mcp-image-gen server — all ComfyUI HTTP calls mocked via respx.""" + +import base64 +import json +import os +from pathlib import Path +from unittest.mock import AsyncMock, MagicMock, patch + +import httpx +import pytest +import respx + +# Import the server module (sys.path set by conftest.py) +import server +from server import ( + ComfyUIClient, + build_flux_workflow, + generate_image, + get_generation_status, + get_output_directory, + list_available_models, +) + +COMFYUI_BASE = "http://test-comfyui:8188" + + +# --------------------------------------------------------------------------- +# build_flux_workflow — pure function, no mocking needed +# --------------------------------------------------------------------------- + + +def test_build_flux_workflow_structure(): + """Verify build_flux_workflow returns a dict with correct node types.""" + wf = build_flux_workflow( + prompt="a red cat", + neg_prompt="ugly", + width=512, + height=768, + steps=8, + seed=42, + model="flux1-schnell.safetensors", + ) + assert wf["6"]["class_type"] == "CLIPTextEncode" + assert wf["8"]["class_type"] == "VAEDecode" + assert wf["9"]["class_type"] == "SaveImage" + assert wf["13"]["class_type"] == "KSampler" + assert wf["27"]["class_type"] == "EmptySD3LatentImage" + assert wf["30"]["class_type"] == "CheckpointLoaderSimple" + assert wf["33"]["class_type"] == "CLIPTextEncode" + + +def test_build_flux_workflow_params_injected(): + """Verify all parameters are injected into correct nodes.""" + wf = build_flux_workflow( + prompt="a blue whale", + neg_prompt="cartoonish", + width=512, + height=768, + steps=8, + seed=12345, + model="sdxl.safetensors", + ) + assert wf["6"]["inputs"]["text"] == "a blue whale" + assert wf["33"]["inputs"]["text"] == "cartoonish" + assert wf["27"]["inputs"]["width"] == 512 + assert wf["27"]["inputs"]["height"] == 768 + assert wf["13"]["inputs"]["steps"] == 8 + assert wf["13"]["inputs"]["seed"] == 12345 + assert wf["30"]["inputs"]["ckpt_name"] == "sdxl.safetensors" + + +def test_negative_prompt_included(): + """Verify negative prompt appears in workflow node 33 when provided.""" + wf = build_flux_workflow( + prompt="forest", + neg_prompt="blurry, dark", + width=1024, + height=1024, + steps=4, + seed=1, + model="flux1-schnell.safetensors", + ) + assert wf["33"]["inputs"]["text"] == "blurry, dark" + + +def test_random_seed_generated(): + """seed=-1 generates a random seed each call.""" + wf1 = build_flux_workflow("cat", "", 512, 512, 4, -1, "flux1-schnell.safetensors") + wf2 = build_flux_workflow("cat", "", 512, 512, 4, -1, "flux1-schnell.safetensors") + seed1 = wf1["_meta"]["actual_seed"] + seed2 = wf2["_meta"]["actual_seed"] + # Both are valid integers + assert isinstance(seed1, int) + assert 0 <= seed1 < 2**32 + # With overwhelming probability they differ + # (1/2^32 chance of collision — negligible for a test) + # We just verify _meta is populated + assert "_meta" in wf1 + assert "_meta" in wf2 + + +# --------------------------------------------------------------------------- +# list_available_models +# --------------------------------------------------------------------------- + + +@respx.mock +@pytest.mark.asyncio +async def test_list_available_models(): + """Mock /object_info, verify model list is returned.""" + mock_response = { + "CheckpointLoaderSimple": { + "input": { + "required": { + "ckpt_name": [ + ["flux1-schnell.safetensors", "sdxl.safetensors"], + {}, + ] + } + } + } + } + respx.get(f"{COMFYUI_BASE}/object_info/CheckpointLoaderSimple").mock( + return_value=httpx.Response(200, json=mock_response) + ) + + result = await list_available_models() + assert "flux1-schnell.safetensors" in result + assert "sdxl.safetensors" in result + + +@respx.mock +@pytest.mark.asyncio +async def test_list_available_models_comfyui_offline(): + """When ComfyUI is unreachable, list_available_models returns error message.""" + respx.get(f"{COMFYUI_BASE}/object_info/CheckpointLoaderSimple").mock( + side_effect=httpx.ConnectError("connection refused") + ) + + result = await list_available_models() + assert len(result) == 1 + assert "not reachable" in result[0].lower() + + +# --------------------------------------------------------------------------- +# get_generation_status +# --------------------------------------------------------------------------- + + +@respx.mock +@pytest.mark.asyncio +async def test_get_generation_status_pending(queue_with_pending): + """prompt_id in queue_pending → status is 'pending'.""" + respx.get(f"{COMFYUI_BASE}/api/queue").mock( + return_value=httpx.Response(200, json=queue_with_pending) + ) + + result = await get_generation_status("test-uuid-1234") + assert result["status"] == "pending" + assert result["prompt_id"] == "test-uuid-1234" + + +@respx.mock +@pytest.mark.asyncio +async def test_get_generation_status_running(queue_with_running): + """prompt_id in queue_running → status is 'running'.""" + respx.get(f"{COMFYUI_BASE}/api/queue").mock( + return_value=httpx.Response(200, json=queue_with_running) + ) + + result = await get_generation_status("test-uuid-1234") + assert result["status"] == "running" + + +@respx.mock +@pytest.mark.asyncio +async def test_get_generation_status_complete(queue_empty, mock_history_response): + """prompt_id not in queue + found in history → status is 'completed'.""" + respx.get(f"{COMFYUI_BASE}/api/queue").mock( + return_value=httpx.Response(200, json=queue_empty) + ) + respx.get(f"{COMFYUI_BASE}/api/history/test-uuid-1234").mock( + return_value=httpx.Response(200, json=mock_history_response) + ) + + result = await get_generation_status("test-uuid-1234") + assert result["status"] == "completed" + + +# --------------------------------------------------------------------------- +# get_output_directory +# --------------------------------------------------------------------------- + + +def test_get_output_directory_default(monkeypatch): + """No IMAGE_OUTPUT_DIR env var → returns expanded ~/Pictures/mcp-generated.""" + monkeypatch.delenv("IMAGE_OUTPUT_DIR", raising=False) + monkeypatch.setattr(server, "IMAGE_OUTPUT_DIR", "~/Pictures/mcp-generated") + + result = get_output_directory() + assert result == str(Path("~/Pictures/mcp-generated").expanduser().resolve()) + assert "~" not in result # expanded + + +def test_get_output_directory_custom(monkeypatch, tmp_path): + """IMAGE_OUTPUT_DIR set → returns that path.""" + custom = str(tmp_path / "custom-output") + monkeypatch.setenv("IMAGE_OUTPUT_DIR", custom) + monkeypatch.setattr(server, "IMAGE_OUTPUT_DIR", custom) + + result = get_output_directory() + assert result == str(Path(custom).expanduser().resolve()) + + +# --------------------------------------------------------------------------- +# generate_image +# --------------------------------------------------------------------------- + + +@respx.mock +@pytest.mark.asyncio +async def test_generate_image_success( + tmp_path, sample_image_bytes, mock_history_response, queue_empty, monkeypatch +): + """Mock full lifecycle: queue → poll done → history → view. Verify outputs.""" + monkeypatch.setattr(server, "IMAGE_OUTPUT_DIR", str(tmp_path)) + + # 1. POST /api/prompt → prompt_id + respx.post(f"{COMFYUI_BASE}/api/prompt").mock( + return_value=httpx.Response(200, json={"prompt_id": "test-uuid-1234"}) + ) + # 2. GET /api/queue → empty (job done immediately) + respx.get(f"{COMFYUI_BASE}/api/queue").mock( + return_value=httpx.Response(200, json=queue_empty) + ) + # 3. GET /api/history/test-uuid-1234 + respx.get(f"{COMFYUI_BASE}/api/history/test-uuid-1234").mock( + return_value=httpx.Response(200, json=mock_history_response) + ) + # 4. GET /api/view → image bytes + respx.get(f"{COMFYUI_BASE}/api/view").mock( + return_value=httpx.Response(200, content=sample_image_bytes) + ) + + result = await generate_image( + prompt="a red cat", + output_dir=str(tmp_path), + ) + + # Should return [TextContent, ImageContent] + assert len(result) == 2 + text_content = result[0] + image_content = result[1] + + # TextContent has path info + assert "Generated:" in text_content.text + assert str(tmp_path) in text_content.text + + # ImageContent has valid base64 PNG + assert image_content.type == "image" + assert image_content.mimeType == "image/png" + decoded = base64.b64decode(image_content.data) + assert decoded[:8] == b"\x89PNG\r\n\x1a\n" # PNG magic bytes + + # File was actually saved + saved_files = list(tmp_path.glob("*.png")) + assert len(saved_files) == 1 + + +@respx.mock +@pytest.mark.asyncio +async def test_generate_image_comfyui_unavailable(): + """ComfyUI unreachable → returns graceful error message as single TextContent.""" + respx.post(f"{COMFYUI_BASE}/api/prompt").mock( + side_effect=httpx.ConnectError("connection refused") + ) + + result = await generate_image(prompt="a cat") + + assert len(result) == 1 + assert "not reachable" in result[0].text.lower() + + +@respx.mock +@pytest.mark.asyncio +async def test_generate_image_timeout(monkeypatch, queue_with_pending): + """Poll loop never completes within timeout → returns timeout error.""" + monkeypatch.setattr(server, "COMFYUI_TIMEOUT", 0) # instant timeout + + respx.post(f"{COMFYUI_BASE}/api/prompt").mock( + return_value=httpx.Response(200, json={"prompt_id": "test-uuid-1234"}) + ) + # Queue always shows job pending → never finishes + respx.get(f"{COMFYUI_BASE}/api/queue").mock( + return_value=httpx.Response(200, json=queue_with_pending) + ) + + result = await generate_image(prompt="slow image") + + assert len(result) == 1 + assert "timed out" in result[0].text.lower() + assert "test-uuid-1234" in result[0].text