fix(mcp-image-gen): rewrite flux2_klein_heretic workflow with CFGGuider + correct node types

- Replace FluxDisableGuidance+BasicGuider chain with CFGGuider (cfg=5) - CLIPLoader: add device='default', keep type='flux2' - UNETLoader: weight_dtype='default' (not fp8_e4m3fn — avoids dimension mismatch) - VAEDecode/SaveImage: updated node IDs (11→VAEDecode, 12→SaveImage) - Encoder: qwen_3_4b_bfl.safetensors (7.5GB BFL-merged shards) - Tests: update heretic model assertions for new node structure (37/37 pass) - Add RECAP doc with root cause analysis and session history
2026-04-10 20:21:12 +02:00
parent 38d26adb1f
commit 4a99a3625a
3 changed files with 192 additions and 55 deletions
@@ -0,0 +1,104 @@
+# FLUX.2 Klein 4B + Heretic — Session Recap
+
+**Date:** 2026-04-10  
+**Status:** Code complete, live generation BLOCKED by encoder dimension mismatch  
+
+---
+
+## What We Achieved ✅
+
+### Code Infrastructure (Solid)
+- **`mcp-image-gen/src/server.py`** — Generic workflow registry with model-based dispatch, `_inject_workflow_params()` works recursively on any node layout
+- **`mcp-image-gen/tests/test_server.py`** — 37/37 tests passing
+- **Gitea** — pushed to main (commit `38d26ad`)
+- The architecture is right: adding a new model = add 1 JSON file + 1 registry entry
+
+### Models Downloaded (on disk)
+| File | Location | Status |
+|------|----------|--------|
+| `flux-2-klein-4b.safetensors` | `~/ComfyUI/models/diffusion_models/` | ✅ 7.3GB |
+| `qwen_3_4b_bfl.safetensors` | `~/ComfyUI/models/text_encoders/` | ✅ merged from BFL shards |
+| `qwen_3_4b.safetensors` (z_image) | `~/ComfyUI/models/text_encoders/split_files/` | ✅ wrong model |
+| `Qwen3-4B-Q8_0.gguf` | `~/ComfyUI/models/text_encoders/` | ✅ wrong arch |
+| ComfyUI-GGUF extension | `~/ComfyUI/custom_nodes/ComfyUI-GGUF` | ✅ installed |
+
+---
+
+## What Failed and Why ❌
+
+### The Error (persistent)
+```
+mat1 and mat2 shapes cannot be multiplied (512x4096 and 7680x3072)
+```
+
+### Root Cause Analysis
+
+**Node 13** (`SamplerCustomAdvanced`) fails — meaning the conditioning vector from the text encoder doesn't match the diffusion model's expected input.
+
+| Component | Expected | Got |
+|-----------|----------|-----|
+| FLUX.2 Klein 4B conditioning input | **7680-dim** (2560 × 3) | **4096-dim** |
+
+**Why 7680 = 2560 × 3?**  
+FLUX models concatenate text embeddings across multiple time steps. The BFL Qwen3 encoder has `hidden_size=2560`, so the concatenated output is 2560×3=7680.
+
+**Why 4096?**  
+Every other Qwen3 variant (z_image_turbo, official Qwen repo GGUF) uses standard Qwen3 with `hidden_size=4096` — these are for Z-Image and text generation respectively, NOT for FLUX.2 Klein.
+
+### What We Tried (and Why Each Failed)
+1. `CLIPLoader type=flux` → wrong architecture (FLUX.1 style)
+2. `CLIPLoader type=flux2` → correct node, wrong encoder file (z_image Qwen)
+3. `CLIPLoaderGGUF type=flux2` → correct node, wrong GGUF (standard Qwen3)
+4. `CLIPLoader type=flux2 + qwen_3_4b_bfl.safetensors` → merged BFL shards, but still fails
+5. Workflow: `KSampler` → doesn't work with FLUX.2 (different architecture)
+6. Workflow: `SamplerCustomAdvanced + BasicGuider + Flux2Scheduler` → correct architecture but encoding mismatch persists
+
+### The Real Missing Piece
+
+The BFL FLUX.2 Klein text encoder in Diffusers format is designed for use via `transformers/diffusers` pipeline, NOT via ComfyUI's `CLIPLoader`. ComfyUI reads the weights differently. The weights are there but ComfyUI doesn't know how to map `model.embed_tokens`, `model.layers.N.*` etc. to the CLIP interface it expects.
+
+**The correct encoder file for ComfyUI** is `Comfy-Org/vae-text-encorder-for-flux-klein-4b` — the 7.5GB file we downloaded IS the right one, but ComfyUI is likely loading it with the wrong adapter in the `CLIPLoader`.
+
+---
+
+## Clean Approach — What We Need to Do
+
+### Option A: Use ComfyUI Web UI (Easiest)
+1. Open `http://localhost:8188` in browser
+2. Load the "Flux.2 Klein 4B Text-to-Image" workflow template (it's in the UI Templates)
+3. **Export the working API JSON** (Ctrl+Shift+E or Settings → Save as API format)
+4. Replace our `flux2_klein_heretic.json` with the exported JSON
+5. Add placeholders and test
+
+This gives us the **verified working node graph** without guessing. 10 minutes.
+
+### Option B: Find a Working API JSON online
+- Reddit r/comfyui has working FLUX.2 Klein workflows
+- Export format is what we need
+
+### Then: Add Heretic
+Once we have a working standard workflow:
+1. Download the actual Heretic-abliterated version of the BFL encoder (once it's published)
+2. Swap encoder filename in the JSON
+
+---
+
+## My Recommendation
+
+**Do Option A right now.** Open `http://localhost:8188`, load the template, export to API format, paste the JSON. We'll be running in 10 minutes instead of guessing node names.
+
+The MCP server code is solid — the only broken piece is `flux2_klein_heretic.json`. Once we have the right JSON from the UI, everything else works.
+
+---
+
+## Files to Clean Up (After We Have the Right JSON)
+
+```bash
+# Remove wrong encoders (save ~8GB)
+rm ~/ComfyUI/models/text_encoders/qwen_3_4b.safetensors   # z_image version
+rm ~/ComfyUI/models/text_encoders/qwen_3_4b_flux2.safetensors
+
+# Keep
+# ~/ComfyUI/models/text_encoders/qwen_3_4b_bfl.safetensors  ← correct encoder
+# ~/ComfyUI/models/text_encoders/Qwen3-4B-Q8_0.gguf          ← maybe useful later
+```