# Task: Add ESRGAN Upscaler to mcp-image-gen

**Datum:** 2026-04-10  
**Status:** Ready to implement  
**Depends on:** mcp-image-gen working ✅, FLUX.2 Klein Heretic working ✅

---

## Goal

Add an `upscale_image()` MCP tool that takes an existing PNG path (from a previous `generate_image()` call) and upscales it 2× or 4× using a Real-ESRGAN model — **no diffusion re-generation**, just fast post-processing (~5–10s).

Result: A 1024×1024 → 4096×4096 pipeline in two tool calls:
```python
result = generate_image("...", model="flux-2-klein-4b.safetensors", steps=20)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345.png

upscaled = upscale_image(
    input_path="~/Pictures/mcp-generated/foo_20260410_123456_12345.png",
    scale=4
)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345_4x.png (4096×4096)
```

---

## Why ESRGAN (Option B) over Latent Upscale

| Method | Time overhead | Quality | Requires diffusion? |
|--------|--------------|---------|---------------------|
| ESRGAN image upscale | ~5–10s | ✅ Very sharp details | ❌ No |
| Latent upscale + KSampler | ~50% extra gen time | ✅ Good, consistent style | ✅ Yes |
| UltimateSDUpscale (tiled) | ~4× gen time | ✅ Highest quality | ✅ Yes |

ESRGAN is the clear winner for "I want a bigger version of this image quickly."

---

## Model to Use

**`4x-UltraSharp.pth`** — the community standard for photorealistic upscaling.

- Source: https://huggingface.co/Kim2091/UltraSharp  
- Download: `huggingface-cli download Kim2091/UltraSharp 4x-UltraSharp.pth --local-dir ~/ComfyUI/models/upscale_models/`
- Size: ~67MB  
- Scale factor: 4× (can also be used for 2× via image resize after)

Alternative: `RealESRGAN_x4plus.pth` (in ComfyUI's model downloader, general purpose)

---

## ComfyUI Workflow: `esrgan_upscale.json`

Minimal workflow — 3 nodes:

```
LoadImage → UpscaleModelLoader + ImageUpscaleWithModel → SaveImage
```

Node layout:

```json
{
  "1": {
    "class_type": "LoadImage",
    "inputs": {
      "image": "__INPUT_PATH__"
    }
  },
  "2": {
    "class_type": "UpscaleModelLoader",
    "inputs": {
      "model_name": "4x-UltraSharp.pth"
    }
  },
  "3": {
    "class_type": "ImageUpscaleWithModel",
    "inputs": {
      "upscale_model": ["2", 0],
      "image": ["1", 0]
    }
  },
  "4": {
    "class_type": "SaveImage",
    "inputs": {
      "images": ["3", 0],
      "filename_prefix": "__OUTPUT_PREFIX__"
    }
  }
}
```

**Note:** `LoadImage` in ComfyUI requires the image to be in `~/ComfyUI/input/` — the workflow builder must copy the input file there first (or use `ETN_LoadImageBase64` if available). See "Implementation Notes" below.

---

## MCP Tool Signature

Add to [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py):

```python
@mcp.tool()
async def upscale_image(
    input_path: Annotated[str, Field(description="Path to input PNG (absolute or ~-relative). Must be a file previously generated by generate_image().")],
    scale: Annotated[int, Field(description="Upscale factor: 2 or 4 (default: 4). 4x-UltraSharp always runs at 4x; scale=2 applies a 0.5 resize after.")] = 4,
    output_dir: Annotated[str, Field(description="Override output directory. Defaults to same dir as input_path.")] = "",
    name: Annotated[str, Field(description="Optional output filename prefix. Defaults to input filename + _4x or _2x.")] = "",
) -> list:
    """Upscale an existing image using Real-ESRGAN (4x-UltraSharp).

    No diffusion re-generation — pure post-processing (~5-10s).
    Input must be a PNG file. Output is saved alongside the input by default.

    Returns both a file path and an inline base64 image for display.
    """
```

---

## Implementation Notes

### The `LoadImage` ComfyUI constraint

ComfyUI's built-in `LoadImage` node only accepts filenames relative to `~/ComfyUI/input/`, not arbitrary paths. Two solutions:

**Solution A (simplest):** Copy input to `~/ComfyUI/input/` before submitting workflow, use basename as `image` param, delete after.

**Solution B:** Use `ETN_LoadImageBase64` node (part of `ComfyUI-ETN` custom node extension) — accepts a base64-encoded image directly. Check if installed:
```bash
ls ~/ComfyUI/custom_nodes/ | grep -i etn
```

**Recommended:** Start with Solution A (copy to input dir) — no dependencies. If `ComfyUI-ETN` is present, prefer Solution B for cleanliness.

### Scale=2 handling

`4x-UltraSharp.pth` always outputs 4×. For `scale=2`, upscale at 4× then resize the result image to 50% with PIL before saving. This is still sharper than native 2× bilinear upscaling.

### Output filename convention

Input: `foo_20260410_123456_12345.png`  
Output `scale=4`: `foo_20260410_123456_12345_4x.png`  
Output `scale=2`: `foo_20260410_123456_12345_2x.png`

---

## Files to Create/Modify

| File | Change |
|------|--------|
| [`mcp/mcp-image-gen/src/workflows/esrgan_upscale.json`](../mcp/mcp-image-gen/src/workflows/esrgan_upscale.json) | New — ESRGAN workflow |
| [`mcp/mcp-image-gen/src/server.py`](../mcp/mcp-image-gen/src/server.py) | Add `upscale_image()` tool + helpers |
| [`mcp/mcp-image-gen/tests/test_upscale.py`](../mcp/mcp-image-gen/tests/test_upscale.py) | New test file |

**No changes to:** workflow registry, existing tools, `generate_image()`.

---

## Pre-flight: Download Model

```bash
huggingface-cli download Kim2091/UltraSharp \
  4x-UltraSharp.pth \
  --local-dir ~/ComfyUI/models/upscale_models/
```

Verify ComfyUI sees it:
```bash
curl -s http://localhost:8188/object_info/UpscaleModelLoader | \
  python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d['UpscaleModelLoader']['input']['required']['model_name'][0]))"
```

---

## Test Cases

| Test | Input | Expected |
|------|-------|----------|
| `test_upscale_4x` | 1024×1024 PNG | 4096×4096 PNG, `_4x.png` suffix |
| `test_upscale_2x` | 1024×1024 PNG | 2048×2048 PNG, `_2x.png` suffix |
| `test_invalid_path` | nonexistent path | Error TextContent returned |
| `test_output_dir_override` | valid PNG + `output_dir=/tmp` | saved to /tmp |
| `test_default_output_dir` | valid PNG, no output_dir | saved alongside input |

---

## Success Criteria

- [ ] `4x-UltraSharp.pth` present in `~/ComfyUI/models/upscale_models/`
- [ ] `upscale_image("path/to/1024.png", scale=4)` returns 4096×4096 PNG
- [ ] Output file saved with `_4x.png` suffix
- [ ] Inline base64 image returned for display in chat
- [ ] All 5 test cases pass
- [ ] No changes to existing `generate_image()` tests