Files
pi_mcps/plans/upscaler-workflow.md
T

195 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Task: Add ESRGAN Upscaler to mcp-image-gen
**Datum:** 2026-04-10
**Status:** Ready to implement
**Depends on:** mcp-image-gen working ✅, FLUX.2 Klein Heretic working ✅
---
## Goal
Add an `upscale_image()` MCP tool that takes an existing PNG path (from a previous `generate_image()` call) and upscales it 2× or 4× using a Real-ESRGAN model — **no diffusion re-generation**, just fast post-processing (~510s).
Result: A 1024×1024 → 4096×4096 pipeline in two tool calls:
```python
result = generate_image("...", model="flux-2-klein-4b.safetensors", steps=20)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345.png
upscaled = upscale_image(
input_path="~/Pictures/mcp-generated/foo_20260410_123456_12345.png",
scale=4
)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345_4x.png (4096×4096)
```
---
## Why ESRGAN (Option B) over Latent Upscale
| Method | Time overhead | Quality | Requires diffusion? |
|--------|--------------|---------|---------------------|
| ESRGAN image upscale | ~510s | ✅ Very sharp details | ❌ No |
| Latent upscale + KSampler | ~50% extra gen time | ✅ Good, consistent style | ✅ Yes |
| UltimateSDUpscale (tiled) | ~4× gen time | ✅ Highest quality | ✅ Yes |
ESRGAN is the clear winner for "I want a bigger version of this image quickly."
---
## Model to Use
**`4x-UltraSharp.pth`** — the community standard for photorealistic upscaling.
- Source: https://huggingface.co/Kim2091/UltraSharp
- Download: `huggingface-cli download Kim2091/UltraSharp 4x-UltraSharp.pth --local-dir ~/ComfyUI/models/upscale_models/`
- Size: ~67MB
- Scale factor: 4× (can also be used for 2× via image resize after)
Alternative: `RealESRGAN_x4plus.pth` (in ComfyUI's model downloader, general purpose)
---
## ComfyUI Workflow: `esrgan_upscale.json`
Minimal workflow — 3 nodes:
```
LoadImage → UpscaleModelLoader + ImageUpscaleWithModel → SaveImage
```
Node layout:
```json
{
"1": {
"class_type": "LoadImage",
"inputs": {
"image": "__INPUT_PATH__"
}
},
"2": {
"class_type": "UpscaleModelLoader",
"inputs": {
"model_name": "4x-UltraSharp.pth"
}
},
"3": {
"class_type": "ImageUpscaleWithModel",
"inputs": {
"upscale_model": ["2", 0],
"image": ["1", 0]
}
},
"4": {
"class_type": "SaveImage",
"inputs": {
"images": ["3", 0],
"filename_prefix": "__OUTPUT_PREFIX__"
}
}
}
```
**Note:** `LoadImage` in ComfyUI requires the image to be in `~/ComfyUI/input/` — the workflow builder must copy the input file there first (or use `ETN_LoadImageBase64` if available). See "Implementation Notes" below.
---
## MCP Tool Signature
Add to [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py):
```python
@mcp.tool()
async def upscale_image(
input_path: Annotated[str, Field(description="Path to input PNG (absolute or ~-relative). Must be a file previously generated by generate_image().")],
scale: Annotated[int, Field(description="Upscale factor: 2 or 4 (default: 4). 4x-UltraSharp always runs at 4x; scale=2 applies a 0.5 resize after.")] = 4,
output_dir: Annotated[str, Field(description="Override output directory. Defaults to same dir as input_path.")] = "",
name: Annotated[str, Field(description="Optional output filename prefix. Defaults to input filename + _4x or _2x.")] = "",
) -> list:
"""Upscale an existing image using Real-ESRGAN (4x-UltraSharp).
No diffusion re-generation — pure post-processing (~5-10s).
Input must be a PNG file. Output is saved alongside the input by default.
Returns both a file path and an inline base64 image for display.
"""
```
---
## Implementation Notes
### The `LoadImage` ComfyUI constraint
ComfyUI's built-in `LoadImage` node only accepts filenames relative to `~/ComfyUI/input/`, not arbitrary paths. Two solutions:
**Solution A (simplest):** Copy input to `~/ComfyUI/input/` before submitting workflow, use basename as `image` param, delete after.
**Solution B:** Use `ETN_LoadImageBase64` node (part of `ComfyUI-ETN` custom node extension) — accepts a base64-encoded image directly. Check if installed:
```bash
ls ~/ComfyUI/custom_nodes/ | grep -i etn
```
**Recommended:** Start with Solution A (copy to input dir) — no dependencies. If `ComfyUI-ETN` is present, prefer Solution B for cleanliness.
### Scale=2 handling
`4x-UltraSharp.pth` always outputs 4×. For `scale=2`, upscale at 4× then resize the result image to 50% with PIL before saving. This is still sharper than native 2× bilinear upscaling.
### Output filename convention
Input: `foo_20260410_123456_12345.png`
Output `scale=4`: `foo_20260410_123456_12345_4x.png`
Output `scale=2`: `foo_20260410_123456_12345_2x.png`
---
## Files to Create/Modify
| File | Change |
|------|--------|
| [`mcp/mcp-image-gen/src/workflows/esrgan_upscale.json`](../mcp/mcp-image-gen/src/workflows/esrgan_upscale.json) | New — ESRGAN workflow |
| [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py) | Add `upscale_image()` tool + helpers |
| [`mcp/mcp-image-gen/tests/test_upscale.py`](../mcp/mcp-image-gen/tests/test_upscale.py) | New test file |
**No changes to:** workflow registry, existing tools, `generate_image()`.
---
## Pre-flight: Download Model
```bash
huggingface-cli download Kim2091/UltraSharp \
4x-UltraSharp.pth \
--local-dir ~/ComfyUI/models/upscale_models/
```
Verify ComfyUI sees it:
```bash
curl -s http://localhost:8188/object_info/UpscaleModelLoader | \
python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d['UpscaleModelLoader']['input']['required']['model_name'][0]))"
```
---
## Test Cases
| Test | Input | Expected |
|------|-------|----------|
| `test_upscale_4x` | 1024×1024 PNG | 4096×4096 PNG, `_4x.png` suffix |
| `test_upscale_2x` | 1024×1024 PNG | 2048×2048 PNG, `_2x.png` suffix |
| `test_invalid_path` | nonexistent path | Error TextContent returned |
| `test_output_dir_override` | valid PNG + `output_dir=/tmp` | saved to /tmp |
| `test_default_output_dir` | valid PNG, no output_dir | saved alongside input |
---
## Success Criteria
- [ ] `4x-UltraSharp.pth` present in `~/ComfyUI/models/upscale_models/`
- [ ] `upscale_image("path/to/1024.png", scale=4)` returns 4096×4096 PNG
- [ ] Output file saved with `_4x.png` suffix
- [ ] Inline base64 image returned for display in chat
- [ ] All 5 test cases pass
- [ ] No changes to existing `generate_image()` tests