6.4 KiB
Task: Add ESRGAN Upscaler to mcp-image-gen
Datum: 2026-04-10
Status: Ready to implement
Depends on: mcp-image-gen working ✅, FLUX.2 Klein Heretic working ✅
Goal
Add an upscale_image() MCP tool that takes an existing PNG path (from a previous generate_image() call) and upscales it 2× or 4× using a Real-ESRGAN model — no diffusion re-generation, just fast post-processing (~5–10s).
Result: A 1024×1024 → 4096×4096 pipeline in two tool calls:
result = generate_image("...", model="flux-2-klein-4b.safetensors", steps=20)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345.png
upscaled = upscale_image(
input_path="~/Pictures/mcp-generated/foo_20260410_123456_12345.png",
scale=4
)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345_4x.png (4096×4096)
Why ESRGAN (Option B) over Latent Upscale
| Method | Time overhead | Quality | Requires diffusion? |
|---|---|---|---|
| ESRGAN image upscale | ~5–10s | ✅ Very sharp details | ❌ No |
| Latent upscale + KSampler | ~50% extra gen time | ✅ Good, consistent style | ✅ Yes |
| UltimateSDUpscale (tiled) | ~4× gen time | ✅ Highest quality | ✅ Yes |
ESRGAN is the clear winner for "I want a bigger version of this image quickly."
Model to Use
4x-UltraSharp.pth — the community standard for photorealistic upscaling.
- Source: https://huggingface.co/Kim2091/UltraSharp
- Download:
huggingface-cli download Kim2091/UltraSharp 4x-UltraSharp.pth --local-dir ~/ComfyUI/models/upscale_models/ - Size: ~67MB
- Scale factor: 4× (can also be used for 2× via image resize after)
Alternative: RealESRGAN_x4plus.pth (in ComfyUI's model downloader, general purpose)
ComfyUI Workflow: esrgan_upscale.json
Minimal workflow — 3 nodes:
LoadImage → UpscaleModelLoader + ImageUpscaleWithModel → SaveImage
Node layout:
{
"1": {
"class_type": "LoadImage",
"inputs": {
"image": "__INPUT_PATH__"
}
},
"2": {
"class_type": "UpscaleModelLoader",
"inputs": {
"model_name": "4x-UltraSharp.pth"
}
},
"3": {
"class_type": "ImageUpscaleWithModel",
"inputs": {
"upscale_model": ["2", 0],
"image": ["1", 0]
}
},
"4": {
"class_type": "SaveImage",
"inputs": {
"images": ["3", 0],
"filename_prefix": "__OUTPUT_PREFIX__"
}
}
}
Note: LoadImage in ComfyUI requires the image to be in ~/ComfyUI/input/ — the workflow builder must copy the input file there first (or use ETN_LoadImageBase64 if available). See "Implementation Notes" below.
MCP Tool Signature
Add to mcp/mcp-image-gen/src/server.py:
@mcp.tool()
async def upscale_image(
input_path: Annotated[str, Field(description="Path to input PNG (absolute or ~-relative). Must be a file previously generated by generate_image().")],
scale: Annotated[int, Field(description="Upscale factor: 2 or 4 (default: 4). 4x-UltraSharp always runs at 4x; scale=2 applies a 0.5 resize after.")] = 4,
output_dir: Annotated[str, Field(description="Override output directory. Defaults to same dir as input_path.")] = "",
name: Annotated[str, Field(description="Optional output filename prefix. Defaults to input filename + _4x or _2x.")] = "",
) -> list:
"""Upscale an existing image using Real-ESRGAN (4x-UltraSharp).
No diffusion re-generation — pure post-processing (~5-10s).
Input must be a PNG file. Output is saved alongside the input by default.
Returns both a file path and an inline base64 image for display.
"""
Implementation Notes
The LoadImage ComfyUI constraint
ComfyUI's built-in LoadImage node only accepts filenames relative to ~/ComfyUI/input/, not arbitrary paths. Two solutions:
Solution A (simplest): Copy input to ~/ComfyUI/input/ before submitting workflow, use basename as image param, delete after.
Solution B: Use ETN_LoadImageBase64 node (part of ComfyUI-ETN custom node extension) — accepts a base64-encoded image directly. Check if installed:
ls ~/ComfyUI/custom_nodes/ | grep -i etn
Recommended: Start with Solution A (copy to input dir) — no dependencies. If ComfyUI-ETN is present, prefer Solution B for cleanliness.
Scale=2 handling
4x-UltraSharp.pth always outputs 4×. For scale=2, upscale at 4× then resize the result image to 50% with PIL before saving. This is still sharper than native 2× bilinear upscaling.
Output filename convention
Input: foo_20260410_123456_12345.png
Output scale=4: foo_20260410_123456_12345_4x.png
Output scale=2: foo_20260410_123456_12345_2x.png
Files to Create/Modify
| File | Change |
|---|---|
mcp/mcp-image-gen/src/workflows/esrgan_upscale.json |
New — ESRGAN workflow |
mcp/mcp-image-gen/src/server.py |
Add upscale_image() tool + helpers |
mcp/mcp-image-gen/tests/test_upscale.py |
New test file |
No changes to: workflow registry, existing tools, generate_image().
Pre-flight: Download Model
huggingface-cli download Kim2091/UltraSharp \
4x-UltraSharp.pth \
--local-dir ~/ComfyUI/models/upscale_models/
Verify ComfyUI sees it:
curl -s http://localhost:8188/object_info/UpscaleModelLoader | \
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d['UpscaleModelLoader']['input']['required']['model_name'][0]))"
Test Cases
| Test | Input | Expected |
|---|---|---|
test_upscale_4x |
1024×1024 PNG | 4096×4096 PNG, _4x.png suffix |
test_upscale_2x |
1024×1024 PNG | 2048×2048 PNG, _2x.png suffix |
test_invalid_path |
nonexistent path | Error TextContent returned |
test_output_dir_override |
valid PNG + output_dir=/tmp |
saved to /tmp |
test_default_output_dir |
valid PNG, no output_dir | saved alongside input |
Success Criteria
4x-UltraSharp.pthpresent in~/ComfyUI/models/upscale_models/upscale_image("path/to/1024.png", scale=4)returns 4096×4096 PNG- Output file saved with
_4x.pngsuffix - Inline base64 image returned for display in chat
- All 5 test cases pass
- No changes to existing
generate_image()tests