Files
pi_mcps/plans/upscaler-workflow.md
T

6.4 KiB
Raw Blame History

Task: Add ESRGAN Upscaler to mcp-image-gen

Datum: 2026-04-10
Status: Ready to implement
Depends on: mcp-image-gen working , FLUX.2 Klein Heretic working


Goal

Add an upscale_image() MCP tool that takes an existing PNG path (from a previous generate_image() call) and upscales it 2× or 4× using a Real-ESRGAN model — no diffusion re-generation, just fast post-processing (~510s).

Result: A 1024×1024 → 4096×4096 pipeline in two tool calls:

result = generate_image("...", model="flux-2-klein-4b.safetensors", steps=20)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345.png

upscaled = upscale_image(
    input_path="~/Pictures/mcp-generated/foo_20260410_123456_12345.png",
    scale=4
)
# → ~/Pictures/mcp-generated/foo_20260410_123456_12345_4x.png (4096×4096)

Why ESRGAN (Option B) over Latent Upscale

Method Time overhead Quality Requires diffusion?
ESRGAN image upscale ~510s Very sharp details No
Latent upscale + KSampler ~50% extra gen time Good, consistent style Yes
UltimateSDUpscale (tiled) ~4× gen time Highest quality Yes

ESRGAN is the clear winner for "I want a bigger version of this image quickly."


Model to Use

4x-UltraSharp.pth — the community standard for photorealistic upscaling.

  • Source: https://huggingface.co/Kim2091/UltraSharp
  • Download: huggingface-cli download Kim2091/UltraSharp 4x-UltraSharp.pth --local-dir ~/ComfyUI/models/upscale_models/
  • Size: ~67MB
  • Scale factor: 4× (can also be used for 2× via image resize after)

Alternative: RealESRGAN_x4plus.pth (in ComfyUI's model downloader, general purpose)


ComfyUI Workflow: esrgan_upscale.json

Minimal workflow — 3 nodes:

LoadImage → UpscaleModelLoader + ImageUpscaleWithModel → SaveImage

Node layout:

{
  "1": {
    "class_type": "LoadImage",
    "inputs": {
      "image": "__INPUT_PATH__"
    }
  },
  "2": {
    "class_type": "UpscaleModelLoader",
    "inputs": {
      "model_name": "4x-UltraSharp.pth"
    }
  },
  "3": {
    "class_type": "ImageUpscaleWithModel",
    "inputs": {
      "upscale_model": ["2", 0],
      "image": ["1", 0]
    }
  },
  "4": {
    "class_type": "SaveImage",
    "inputs": {
      "images": ["3", 0],
      "filename_prefix": "__OUTPUT_PREFIX__"
    }
  }
}

Note: LoadImage in ComfyUI requires the image to be in ~/ComfyUI/input/ — the workflow builder must copy the input file there first (or use ETN_LoadImageBase64 if available). See "Implementation Notes" below.


MCP Tool Signature

Add to mcp/mcp-image-gen/src/server.py:

@mcp.tool()
async def upscale_image(
    input_path: Annotated[str, Field(description="Path to input PNG (absolute or ~-relative). Must be a file previously generated by generate_image().")],
    scale: Annotated[int, Field(description="Upscale factor: 2 or 4 (default: 4). 4x-UltraSharp always runs at 4x; scale=2 applies a 0.5 resize after.")] = 4,
    output_dir: Annotated[str, Field(description="Override output directory. Defaults to same dir as input_path.")] = "",
    name: Annotated[str, Field(description="Optional output filename prefix. Defaults to input filename + _4x or _2x.")] = "",
) -> list:
    """Upscale an existing image using Real-ESRGAN (4x-UltraSharp).

    No diffusion re-generation — pure post-processing (~5-10s).
    Input must be a PNG file. Output is saved alongside the input by default.

    Returns both a file path and an inline base64 image for display.
    """

Implementation Notes

The LoadImage ComfyUI constraint

ComfyUI's built-in LoadImage node only accepts filenames relative to ~/ComfyUI/input/, not arbitrary paths. Two solutions:

Solution A (simplest): Copy input to ~/ComfyUI/input/ before submitting workflow, use basename as image param, delete after.

Solution B: Use ETN_LoadImageBase64 node (part of ComfyUI-ETN custom node extension) — accepts a base64-encoded image directly. Check if installed:

ls ~/ComfyUI/custom_nodes/ | grep -i etn

Recommended: Start with Solution A (copy to input dir) — no dependencies. If ComfyUI-ETN is present, prefer Solution B for cleanliness.

Scale=2 handling

4x-UltraSharp.pth always outputs 4×. For scale=2, upscale at 4× then resize the result image to 50% with PIL before saving. This is still sharper than native 2× bilinear upscaling.

Output filename convention

Input: foo_20260410_123456_12345.png
Output scale=4: foo_20260410_123456_12345_4x.png
Output scale=2: foo_20260410_123456_12345_2x.png


Files to Create/Modify

File Change
mcp/mcp-image-gen/src/workflows/esrgan_upscale.json New — ESRGAN workflow
mcp/mcp-image-gen/src/server.py Add upscale_image() tool + helpers
mcp/mcp-image-gen/tests/test_upscale.py New test file

No changes to: workflow registry, existing tools, generate_image().


Pre-flight: Download Model

huggingface-cli download Kim2091/UltraSharp \
  4x-UltraSharp.pth \
  --local-dir ~/ComfyUI/models/upscale_models/

Verify ComfyUI sees it:

curl -s http://localhost:8188/object_info/UpscaleModelLoader | \
  python3 -c "import sys,json; d=json.load(sys.stdin); print('\n'.join(d['UpscaleModelLoader']['input']['required']['model_name'][0]))"

Test Cases

Test Input Expected
test_upscale_4x 1024×1024 PNG 4096×4096 PNG, _4x.png suffix
test_upscale_2x 1024×1024 PNG 2048×2048 PNG, _2x.png suffix
test_invalid_path nonexistent path Error TextContent returned
test_output_dir_override valid PNG + output_dir=/tmp saved to /tmp
test_default_output_dir valid PNG, no output_dir saved alongside input

Success Criteria

  • 4x-UltraSharp.pth present in ~/ComfyUI/models/upscale_models/
  • upscale_image("path/to/1024.png", scale=4) returns 4096×4096 PNG
  • Output file saved with _4x.png suffix
  • Inline base64 image returned for display in chat
  • All 5 test cases pass
  • No changes to existing generate_image() tests