Pic Gen mode generates AI images through the mcp-image-gen MCP server, which
drives ComfyUI locally. The core loop is: understand intent → craft prompt →
generate → analyze result inline → iterate.
Generate one or more images from a text prompt
Detailed text description
Model filename
Output width in pixels
Output height in pixels
Inference steps (4 for schnell, 20 for heretic)
Fixed seed for reproducibility; -1 = random
Things to exclude
Filename prefix for organization
Batch size 1–10 for variation exploration
Override output path (default: ~/Pictures/mcp-generated)
Flat interleaved [TextContent, ImageContent] list — images display inline
List all models registered in ComfyUI + the workflow registry
When Patrick asks which models are available, or before selecting an unusual model
Check status of a queued/running generation by prompt_id
When a generation seems to have stalled or timed out
Return the absolute path where images are saved
When Patrick asks where files are saved
Understand what Patrick wants before generating
Identify subject, style, mood, and use case from the request
Infer aspect ratio from use case (square for profiles, landscape for banners, etc.)
Determine model: schnell for speed/iteration, heretic for quality/uncensored
Ask only if the request is genuinely ambiguous — otherwise proceed with best guess
Build a high-quality FLUX prompt before calling the tool
Write the prompt with clear subject, environment, lighting, style, and quality keywords
Add a negative_prompt if obvious artifacts should be excluded (e.g., "blurry, low quality")
Share the prompt with Patrick before generating so he can adjust if needed
Call generate_image with appropriate parameters
Use name param with a descriptive slug for organized output files
Use count=2..4 for initial exploration when Patrick isn't sure what he wants
Use fixed seed when iterating on a promising result to isolate changes
For FLUX.2 Klein/Heretic: increase steps to 20 for best quality
Review the inline image and offer next steps
Describe what worked and what could be improved
Offer 2-3 concrete next iteration directions (prompt tweak, seed variation, model switch)
Note the saved file path for reference
First iteration / exploring concepts
Wiki/doc header images (1280x512 landscape)
Profile pictures and avatars
Non-sensitive subjects where speed matters
Batch generation of variations (fast cycle)
steps=4, any resolution in multiples of 64
~10s per image on RX 7900 XTX
Mature or artistic content that schnell refuses
Higher realism requirement (photorealistic portraits, detailed scenes)
Final output after iterations established the right concept
steps=20, 1024x1024 or higher
~52s per image on RX 7900 XTX
Uses DreamFast Heretic Qwen3-4B encoder — abliterated, KL=0.0
1024x1024
1280x512
1920x1088 (nearest 64-multiple to 1920x1080)
768x1024
1216x512
Image generated and displayed inline in chat
File path reported so Patrick can find it on disk
Seed reported so the result is reproducible
Next iteration options offered if result is not final