Files
pi_mcps/plans/heretic-flux2-klein-RECAP.md
Patrick Plate 4a99a3625a fix(mcp-image-gen): rewrite flux2_klein_heretic workflow with CFGGuider + correct node types
- Replace FluxDisableGuidance+BasicGuider chain with CFGGuider (cfg=5)
- CLIPLoader: add device='default', keep type='flux2'
- UNETLoader: weight_dtype='default' (not fp8_e4m3fn — avoids dimension mismatch)
- VAEDecode/SaveImage: updated node IDs (11→VAEDecode, 12→SaveImage)
- Encoder: qwen_3_4b_bfl.safetensors (7.5GB BFL-merged shards)
- Tests: update heretic model assertions for new node structure (37/37 pass)
- Add RECAP doc with root cause analysis and session history
2026-04-10 20:21:12 +02:00

105 lines
4.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FLUX.2 Klein 4B + Heretic — Session Recap
**Date:** 2026-04-10
**Status:** Code complete, live generation BLOCKED by encoder dimension mismatch
---
## What We Achieved ✅
### Code Infrastructure (Solid)
- **`mcp-image-gen/src/server.py`** — Generic workflow registry with model-based dispatch, `_inject_workflow_params()` works recursively on any node layout
- **`mcp-image-gen/tests/test_server.py`** — 37/37 tests passing
- **Gitea** — pushed to main (commit `38d26ad`)
- The architecture is right: adding a new model = add 1 JSON file + 1 registry entry
### Models Downloaded (on disk)
| File | Location | Status |
|------|----------|--------|
| `flux-2-klein-4b.safetensors` | `~/ComfyUI/models/diffusion_models/` | ✅ 7.3GB |
| `qwen_3_4b_bfl.safetensors` | `~/ComfyUI/models/text_encoders/` | ✅ merged from BFL shards |
| `qwen_3_4b.safetensors` (z_image) | `~/ComfyUI/models/text_encoders/split_files/` | ✅ wrong model |
| `Qwen3-4B-Q8_0.gguf` | `~/ComfyUI/models/text_encoders/` | ✅ wrong arch |
| ComfyUI-GGUF extension | `~/ComfyUI/custom_nodes/ComfyUI-GGUF` | ✅ installed |
---
## What Failed and Why ❌
### The Error (persistent)
```
mat1 and mat2 shapes cannot be multiplied (512x4096 and 7680x3072)
```
### Root Cause Analysis
**Node 13** (`SamplerCustomAdvanced`) fails — meaning the conditioning vector from the text encoder doesn't match the diffusion model's expected input.
| Component | Expected | Got |
|-----------|----------|-----|
| FLUX.2 Klein 4B conditioning input | **7680-dim** (2560 × 3) | **4096-dim** |
**Why 7680 = 2560 × 3?**
FLUX models concatenate text embeddings across multiple time steps. The BFL Qwen3 encoder has `hidden_size=2560`, so the concatenated output is 2560×3=7680.
**Why 4096?**
Every other Qwen3 variant (z_image_turbo, official Qwen repo GGUF) uses standard Qwen3 with `hidden_size=4096` — these are for Z-Image and text generation respectively, NOT for FLUX.2 Klein.
### What We Tried (and Why Each Failed)
1. `CLIPLoader type=flux` → wrong architecture (FLUX.1 style)
2. `CLIPLoader type=flux2` → correct node, wrong encoder file (z_image Qwen)
3. `CLIPLoaderGGUF type=flux2` → correct node, wrong GGUF (standard Qwen3)
4. `CLIPLoader type=flux2 + qwen_3_4b_bfl.safetensors` → merged BFL shards, but still fails
5. Workflow: `KSampler` → doesn't work with FLUX.2 (different architecture)
6. Workflow: `SamplerCustomAdvanced + BasicGuider + Flux2Scheduler` → correct architecture but encoding mismatch persists
### The Real Missing Piece
The BFL FLUX.2 Klein text encoder in Diffusers format is designed for use via `transformers/diffusers` pipeline, NOT via ComfyUI's `CLIPLoader`. ComfyUI reads the weights differently. The weights are there but ComfyUI doesn't know how to map `model.embed_tokens`, `model.layers.N.*` etc. to the CLIP interface it expects.
**The correct encoder file for ComfyUI** is `Comfy-Org/vae-text-encorder-for-flux-klein-4b` — the 7.5GB file we downloaded IS the right one, but ComfyUI is likely loading it with the wrong adapter in the `CLIPLoader`.
---
## Clean Approach — What We Need to Do
### Option A: Use ComfyUI Web UI (Easiest)
1. Open `http://localhost:8188` in browser
2. Load the "Flux.2 Klein 4B Text-to-Image" workflow template (it's in the UI Templates)
3. **Export the working API JSON** (Ctrl+Shift+E or Settings → Save as API format)
4. Replace our `flux2_klein_heretic.json` with the exported JSON
5. Add placeholders and test
This gives us the **verified working node graph** without guessing. 10 minutes.
### Option B: Find a Working API JSON online
- Reddit r/comfyui has working FLUX.2 Klein workflows
- Export format is what we need
### Then: Add Heretic
Once we have a working standard workflow:
1. Download the actual Heretic-abliterated version of the BFL encoder (once it's published)
2. Swap encoder filename in the JSON
---
## My Recommendation
**Do Option A right now.** Open `http://localhost:8188`, load the template, export to API format, paste the JSON. We'll be running in 10 minutes instead of guessing node names.
The MCP server code is solid — the only broken piece is `flux2_klein_heretic.json`. Once we have the right JSON from the UI, everything else works.
---
## Files to Clean Up (After We Have the Right JSON)
```bash
# Remove wrong encoders (save ~8GB)
rm ~/ComfyUI/models/text_encoders/qwen_3_4b.safetensors # z_image version
rm ~/ComfyUI/models/text_encoders/qwen_3_4b_flux2.safetensors
# Keep
# ~/ComfyUI/models/text_encoders/qwen_3_4b_bfl.safetensors ← correct encoder
# ~/ComfyUI/models/text_encoders/Qwen3-4B-Q8_0.gguf ← maybe useful later
```