Files

T

Patrick Plate 4a99a3625a fix(mcp-image-gen): rewrite flux2_klein_heretic workflow with CFGGuider + correct node types

- Replace FluxDisableGuidance+BasicGuider chain with CFGGuider (cfg=5)
- CLIPLoader: add device='default', keep type='flux2'
- UNETLoader: weight_dtype='default' (not fp8_e4m3fn — avoids dimension mismatch)
- VAEDecode/SaveImage: updated node IDs (11→VAEDecode, 12→SaveImage)
- Encoder: qwen_3_4b_bfl.safetensors (7.5GB BFL-merged shards)
- Tests: update heretic model assertions for new node structure (37/37 pass)
- Add RECAP doc with root cause analysis and session history

2026-04-10 20:21:12 +02:00

4.6 KiB

Raw Permalink Blame History

FLUX.2 Klein 4B + Heretic — Session Recap

Date: 2026-04-10
Status: Code complete, live generation BLOCKED by encoder dimension mismatch

What We Achieved ✅

Code Infrastructure (Solid)

mcp-image-gen/src/server.py — Generic workflow registry with model-based dispatch, _inject_workflow_params() works recursively on any node layout
mcp-image-gen/tests/test_server.py — 37/37 tests passing
Gitea — pushed to main (commit 38d26ad)
The architecture is right: adding a new model = add 1 JSON file + 1 registry entry

Models Downloaded (on disk)

File	Location	Status
`flux-2-klein-4b.safetensors`	`~/ComfyUI/models/diffusion_models/`	✅ 7.3GB
`qwen_3_4b_bfl.safetensors`	`~/ComfyUI/models/text_encoders/`	✅ merged from BFL shards
`qwen_3_4b.safetensors` (z_image)	`~/ComfyUI/models/text_encoders/split_files/`	✅ wrong model
`Qwen3-4B-Q8_0.gguf`	`~/ComfyUI/models/text_encoders/`	✅ wrong arch
ComfyUI-GGUF extension	`~/ComfyUI/custom_nodes/ComfyUI-GGUF`	✅ installed

What Failed and Why ❌

The Error (persistent)

mat1 and mat2 shapes cannot be multiplied (512x4096 and 7680x3072)

Root Cause Analysis

Node 13 (SamplerCustomAdvanced) fails — meaning the conditioning vector from the text encoder doesn't match the diffusion model's expected input.

Component	Expected	Got
FLUX.2 Klein 4B conditioning input	7680-dim (2560 × 3)	4096-dim

Why 7680 = 2560 × 3?
FLUX models concatenate text embeddings across multiple time steps. The BFL Qwen3 encoder has hidden_size=2560, so the concatenated output is 2560×3=7680.

Why 4096?
Every other Qwen3 variant (z_image_turbo, official Qwen repo GGUF) uses standard Qwen3 with hidden_size=4096 — these are for Z-Image and text generation respectively, NOT for FLUX.2 Klein.

What We Tried (and Why Each Failed)

CLIPLoader type=flux → wrong architecture (FLUX.1 style)
CLIPLoader type=flux2 → correct node, wrong encoder file (z_image Qwen)
CLIPLoaderGGUF type=flux2 → correct node, wrong GGUF (standard Qwen3)
CLIPLoader type=flux2 + qwen_3_4b_bfl.safetensors → merged BFL shards, but still fails
Workflow: KSampler → doesn't work with FLUX.2 (different architecture)
Workflow: SamplerCustomAdvanced + BasicGuider + Flux2Scheduler → correct architecture but encoding mismatch persists

The Real Missing Piece

The BFL FLUX.2 Klein text encoder in Diffusers format is designed for use via transformers/diffusers pipeline, NOT via ComfyUI's CLIPLoader. ComfyUI reads the weights differently. The weights are there but ComfyUI doesn't know how to map model.embed_tokens, model.layers.N.* etc. to the CLIP interface it expects.

The correct encoder file for ComfyUI is Comfy-Org/vae-text-encorder-for-flux-klein-4b — the 7.5GB file we downloaded IS the right one, but ComfyUI is likely loading it with the wrong adapter in the CLIPLoader.

Clean Approach — What We Need to Do

Option A: Use ComfyUI Web UI (Easiest)

Open http://localhost:8188 in browser
Load the "Flux.2 Klein 4B Text-to-Image" workflow template (it's in the UI Templates)
Export the working API JSON (Ctrl+Shift+E or Settings → Save as API format)
Replace our flux2_klein_heretic.json with the exported JSON
Add placeholders and test

This gives us the verified working node graph without guessing. 10 minutes.

Option B: Find a Working API JSON online

Reddit r/comfyui has working FLUX.2 Klein workflows
Export format is what we need

Then: Add Heretic

Once we have a working standard workflow:

Download the actual Heretic-abliterated version of the BFL encoder (once it's published)
Swap encoder filename in the JSON

My Recommendation

Do Option A right now. Open http://localhost:8188, load the template, export to API format, paste the JSON. We'll be running in 10 minutes instead of guessing node names.

The MCP server code is solid — the only broken piece is flux2_klein_heretic.json. Once we have the right JSON from the UI, everything else works.

Files to Clean Up (After We Have the Right JSON)

# Remove wrong encoders (save ~8GB)
rm ~/ComfyUI/models/text_encoders/qwen_3_4b.safetensors   # z_image version
rm ~/ComfyUI/models/text_encoders/qwen_3_4b_flux2.safetensors

# Keep
# ~/ComfyUI/models/text_encoders/qwen_3_4b_bfl.safetensors  ← correct encoder
# ~/ComfyUI/models/text_encoders/Qwen3-4B-Q8_0.gguf          ← maybe useful later

4.6 KiB Raw Permalink Blame History Unescape Escape