# Task: Add ESRGAN Upscaler to mcp-image-gen **Datum:** 2026-04-10 **Status:** Ready to implement **Depends on:** mcp-image-gen working ✅, FLUX.2 Klein Heretic working ✅ --- ## Goal Add an `upscale_image()` MCP tool that takes an existing PNG path (from a previous `generate_image()` call) and upscales it 2× or 4× using a Real-ESRGAN model — **no diffusion re-generation**, just fast post-processing (~5–10s). Result: A 1024×1024 → 4096×4096 pipeline in two tool calls: ```python result = generate_image("...", model="flux-2-klein-4b.safetensors", steps=20) # → ~/Pictures/mcp-generated/foo_20260410_123456_12345.png upscaled = upscale_image( input_path="~/Pictures/mcp-generated/foo_20260410_123456_12345.png", scale=4 ) # → ~/Pictures/mcp-generated/foo_20260410_123456_12345_4x.png (4096×4096) ``` --- ## Why ESRGAN (Option B) over Latent Upscale | Method | Time overhead | Quality | Requires diffusion? | |--------|--------------|---------|---------------------| | ESRGAN image upscale | ~5–10s | ✅ Very sharp details | ❌ No | | Latent upscale + KSampler | ~50% extra gen time | ✅ Good, consistent style | ✅ Yes | | UltimateSDUpscale (tiled) | ~4× gen time | ✅ Highest quality | ✅ Yes | ESRGAN is the clear winner for "I want a bigger version of this image quickly." --- ## Model to Use **`4x-UltraSharp.pth`** — the community standard for photorealistic upscaling. - Source: https://huggingface.co/Kim2091/UltraSharp - Download: `huggingface-cli download Kim2091/UltraSharp 4x-UltraSharp.pth --local-dir ~/ComfyUI/models/upscale_models/` - Size: ~67MB - Scale factor: 4× (can also be used for 2× via image resize after) Alternative: `RealESRGAN_x4plus.pth` (in ComfyUI's model downloader, general purpose) --- ## ComfyUI Workflow: `esrgan_upscale.json` Minimal workflow — 3 nodes: ``` LoadImage → UpscaleModelLoader + ImageUpscaleWithModel → SaveImage ``` Node layout: ```json { "1": { "class_type": "LoadImage", "inputs": { "image": "__INPUT_PATH__" } }, "2": { "class_type": "UpscaleModelLoader", "inputs": { "model_name": "4x-UltraSharp.pth" } }, "3": { "class_type": "ImageUpscaleWithModel", "inputs": { "upscale_model": ["2", 0], "image": ["1", 0] } }, "4": { "class_type": "SaveImage", "inputs": { "images": ["3", 0], "filename_prefix": "__OUTPUT_PREFIX__" } } } ``` **Note:** `LoadImage` in ComfyUI requires the image to be in `~/ComfyUI/input/` — the workflow builder must copy the input file there first (or use `ETN_LoadImageBase64` if available). See "Implementation Notes" below. --- ## MCP Tool Signature Add to [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py): ```python @mcp.tool() async def upscale_image( input_path: Annotated[str, Field(description="Path to input PNG (absolute or ~-relative). Must be a file previously generated by generate_image().")], scale: Annotated[int, Field(description="Upscale factor: 2 or 4 (default: 4). 4x-UltraSharp always runs at 4x; scale=2 applies a 0.5 resize after.")] = 4, output_dir: Annotated[str, Field(description="Override output directory. Defaults to same dir as input_path.")] = "", name: Annotated[str, Field(description="Optional output filename prefix. Defaults to input filename + _4x or _2x.")] = "", ) -> list: """Upscale an existing image using Real-ESRGAN (4x-UltraSharp). No diffusion re-generation — pure post-processing (~5-10s). Input must be a PNG file. Output is saved alongside the input by default. Returns both a file path and an inline base64 image for display. """ ``` --- ## Implementation Notes ### The `LoadImage` ComfyUI constraint ComfyUI's built-in `LoadImage` node only accepts filenames relative to `~/ComfyUI/input/`, not arbitrary paths. Two solutions: **Solution A (simplest):** Copy input to `~/ComfyUI/input/` before submitting workflow, use basename as `image` param, delete after. **Solution B:** Use `ETN_LoadImageBase64` node (part of `ComfyUI-ETN` custom node extension) — accepts a base64-encoded image directly. Check if installed: ```bash ls ~/ComfyUI/custom_nodes/ | grep -i etn ``` **Recommended:** Start with Solution A (copy to input dir) — no dependencies. If `ComfyUI-ETN` is present, prefer Solution B for cleanliness. ### Scale=2 handling `4x-UltraSharp.pth` always outputs 4×. For `scale=2`, upscale at 4× then resize the result image to 50% with PIL before saving. This is still sharper than native 2× bilinear upscaling. ### Output filename convention Input: `foo_20260410_123456_12345.png` Output `scale=4`: `foo_20260410_123456_12345_4x.png` Output `scale=2`: `foo_20260410_123456_12345_2x.png` --- ## Files to Create/Modify | File | Change | |------|--------| | [`mcp/mcp-image-gen/src/workflows/esrgan_upscale.json`](../mcp/mcp-image-gen/src/workflows/esrgan_upscale.json) | New — ESRGAN workflow | | [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py) | Add `upscale_image()` tool + helpers | | [`mcp/mcp-image-gen/tests/test_upscale.py`](../mcp/mcp-image-gen/tests/test_upscale.py) | New test file | **No changes to:** workflow registry, existing tools, `generate_image()`. --- ## Pre-flight: Download Model ```bash huggingface-cli download Kim2091/UltraSharp \ 4x-UltraSharp.pth \ --local-dir ~/ComfyUI/models/upscale_models/ ``` Verify ComfyUI sees it: ```bash curl -s http://localhost:8188/object_info/UpscaleModelLoader | \ python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d['UpscaleModelLoader']['input']['required']['model_name'][0]))" ``` --- ## Test Cases | Test | Input | Expected | |------|-------|----------| | `test_upscale_4x` | 1024×1024 PNG | 4096×4096 PNG, `_4x.png` suffix | | `test_upscale_2x` | 1024×1024 PNG | 2048×2048 PNG, `_2x.png` suffix | | `test_invalid_path` | nonexistent path | Error TextContent returned | | `test_output_dir_override` | valid PNG + `output_dir=/tmp` | saved to /tmp | | `test_default_output_dir` | valid PNG, no output_dir | saved alongside input | --- ## Success Criteria - [ ] `4x-UltraSharp.pth` present in `~/ComfyUI/models/upscale_models/` - [ ] `upscale_image("path/to/1024.png", scale=4)` returns 4096×4096 PNG - [ ] Output file saved with `_4x.png` suffix - [ ] Inline base64 image returned for display in chat - [ ] All 5 test cases pass - [ ] No changes to existing `generate_image()` tests