195 lines
6.4 KiB
Markdown
195 lines
6.4 KiB
Markdown
# Task: Add ESRGAN Upscaler to mcp-image-gen
|
||
|
||
**Datum:** 2026-04-10
|
||
**Status:** Ready to implement
|
||
**Depends on:** mcp-image-gen working ✅, FLUX.2 Klein Heretic working ✅
|
||
|
||
---
|
||
|
||
## Goal
|
||
|
||
Add an `upscale_image()` MCP tool that takes an existing PNG path (from a previous `generate_image()` call) and upscales it 2× or 4× using a Real-ESRGAN model — **no diffusion re-generation**, just fast post-processing (~5–10s).
|
||
|
||
Result: A 1024×1024 → 4096×4096 pipeline in two tool calls:
|
||
```python
|
||
result = generate_image("...", model="flux-2-klein-4b.safetensors", steps=20)
|
||
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345.png
|
||
|
||
upscaled = upscale_image(
|
||
input_path="~/Pictures/mcp-generated/foo_20260410_123456_12345.png",
|
||
scale=4
|
||
)
|
||
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345_4x.png (4096×4096)
|
||
```
|
||
|
||
---
|
||
|
||
## Why ESRGAN (Option B) over Latent Upscale
|
||
|
||
| Method | Time overhead | Quality | Requires diffusion? |
|
||
|--------|--------------|---------|---------------------|
|
||
| ESRGAN image upscale | ~5–10s | ✅ Very sharp details | ❌ No |
|
||
| Latent upscale + KSampler | ~50% extra gen time | ✅ Good, consistent style | ✅ Yes |
|
||
| UltimateSDUpscale (tiled) | ~4× gen time | ✅ Highest quality | ✅ Yes |
|
||
|
||
ESRGAN is the clear winner for "I want a bigger version of this image quickly."
|
||
|
||
---
|
||
|
||
## Model to Use
|
||
|
||
**`4x-UltraSharp.pth`** — the community standard for photorealistic upscaling.
|
||
|
||
- Source: https://huggingface.co/Kim2091/UltraSharp
|
||
- Download: `huggingface-cli download Kim2091/UltraSharp 4x-UltraSharp.pth --local-dir ~/ComfyUI/models/upscale_models/`
|
||
- Size: ~67MB
|
||
- Scale factor: 4× (can also be used for 2× via image resize after)
|
||
|
||
Alternative: `RealESRGAN_x4plus.pth` (in ComfyUI's model downloader, general purpose)
|
||
|
||
---
|
||
|
||
## ComfyUI Workflow: `esrgan_upscale.json`
|
||
|
||
Minimal workflow — 3 nodes:
|
||
|
||
```
|
||
LoadImage → UpscaleModelLoader + ImageUpscaleWithModel → SaveImage
|
||
```
|
||
|
||
Node layout:
|
||
|
||
```json
|
||
{
|
||
"1": {
|
||
"class_type": "LoadImage",
|
||
"inputs": {
|
||
"image": "__INPUT_PATH__"
|
||
}
|
||
},
|
||
"2": {
|
||
"class_type": "UpscaleModelLoader",
|
||
"inputs": {
|
||
"model_name": "4x-UltraSharp.pth"
|
||
}
|
||
},
|
||
"3": {
|
||
"class_type": "ImageUpscaleWithModel",
|
||
"inputs": {
|
||
"upscale_model": ["2", 0],
|
||
"image": ["1", 0]
|
||
}
|
||
},
|
||
"4": {
|
||
"class_type": "SaveImage",
|
||
"inputs": {
|
||
"images": ["3", 0],
|
||
"filename_prefix": "__OUTPUT_PREFIX__"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Note:** `LoadImage` in ComfyUI requires the image to be in `~/ComfyUI/input/` — the workflow builder must copy the input file there first (or use `ETN_LoadImageBase64` if available). See "Implementation Notes" below.
|
||
|
||
---
|
||
|
||
## MCP Tool Signature
|
||
|
||
Add to [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py):
|
||
|
||
```python
|
||
@mcp.tool()
|
||
async def upscale_image(
|
||
input_path: Annotated[str, Field(description="Path to input PNG (absolute or ~-relative). Must be a file previously generated by generate_image().")],
|
||
scale: Annotated[int, Field(description="Upscale factor: 2 or 4 (default: 4). 4x-UltraSharp always runs at 4x; scale=2 applies a 0.5 resize after.")] = 4,
|
||
output_dir: Annotated[str, Field(description="Override output directory. Defaults to same dir as input_path.")] = "",
|
||
name: Annotated[str, Field(description="Optional output filename prefix. Defaults to input filename + _4x or _2x.")] = "",
|
||
) -> list:
|
||
"""Upscale an existing image using Real-ESRGAN (4x-UltraSharp).
|
||
|
||
No diffusion re-generation — pure post-processing (~5-10s).
|
||
Input must be a PNG file. Output is saved alongside the input by default.
|
||
|
||
Returns both a file path and an inline base64 image for display.
|
||
"""
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Notes
|
||
|
||
### The `LoadImage` ComfyUI constraint
|
||
|
||
ComfyUI's built-in `LoadImage` node only accepts filenames relative to `~/ComfyUI/input/`, not arbitrary paths. Two solutions:
|
||
|
||
**Solution A (simplest):** Copy input to `~/ComfyUI/input/` before submitting workflow, use basename as `image` param, delete after.
|
||
|
||
**Solution B:** Use `ETN_LoadImageBase64` node (part of `ComfyUI-ETN` custom node extension) — accepts a base64-encoded image directly. Check if installed:
|
||
```bash
|
||
ls ~/ComfyUI/custom_nodes/ | grep -i etn
|
||
```
|
||
|
||
**Recommended:** Start with Solution A (copy to input dir) — no dependencies. If `ComfyUI-ETN` is present, prefer Solution B for cleanliness.
|
||
|
||
### Scale=2 handling
|
||
|
||
`4x-UltraSharp.pth` always outputs 4×. For `scale=2`, upscale at 4× then resize the result image to 50% with PIL before saving. This is still sharper than native 2× bilinear upscaling.
|
||
|
||
### Output filename convention
|
||
|
||
Input: `foo_20260410_123456_12345.png`
|
||
Output `scale=4`: `foo_20260410_123456_12345_4x.png`
|
||
Output `scale=2`: `foo_20260410_123456_12345_2x.png`
|
||
|
||
---
|
||
|
||
## Files to Create/Modify
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| [`mcp/mcp-image-gen/src/workflows/esrgan_upscale.json`](../mcp/mcp-image-gen/src/workflows/esrgan_upscale.json) | New — ESRGAN workflow |
|
||
| [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py) | Add `upscale_image()` tool + helpers |
|
||
| [`mcp/mcp-image-gen/tests/test_upscale.py`](../mcp/mcp-image-gen/tests/test_upscale.py) | New test file |
|
||
|
||
**No changes to:** workflow registry, existing tools, `generate_image()`.
|
||
|
||
---
|
||
|
||
## Pre-flight: Download Model
|
||
|
||
```bash
|
||
huggingface-cli download Kim2091/UltraSharp \
|
||
4x-UltraSharp.pth \
|
||
--local-dir ~/ComfyUI/models/upscale_models/
|
||
```
|
||
|
||
Verify ComfyUI sees it:
|
||
```bash
|
||
curl -s http://localhost:8188/object_info/UpscaleModelLoader | \
|
||
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d['UpscaleModelLoader']['input']['required']['model_name'][0]))"
|
||
```
|
||
|
||
---
|
||
|
||
## Test Cases
|
||
|
||
| Test | Input | Expected |
|
||
|------|-------|----------|
|
||
| `test_upscale_4x` | 1024×1024 PNG | 4096×4096 PNG, `_4x.png` suffix |
|
||
| `test_upscale_2x` | 1024×1024 PNG | 2048×2048 PNG, `_2x.png` suffix |
|
||
| `test_invalid_path` | nonexistent path | Error TextContent returned |
|
||
| `test_output_dir_override` | valid PNG + `output_dir=/tmp` | saved to /tmp |
|
||
| `test_default_output_dir` | valid PNG, no output_dir | saved alongside input |
|
||
|
||
---
|
||
|
||
## Success Criteria
|
||
|
||
- [ ] `4x-UltraSharp.pth` present in `~/ComfyUI/models/upscale_models/`
|
||
- [ ] `upscale_image("path/to/1024.png", scale=4)` returns 4096×4096 PNG
|
||
- [ ] Output file saved with `_4x.png` suffix
|
||
- [ ] Inline base64 image returned for display in chat
|
||
- [ ] All 5 test cases pass
|
||
- [ ] No changes to existing `generate_image()` tests
|