docs: promote webscraper_search_hint in wiki and mode rules

merge: fix/webscraper/search-hint-quality → main
fix(mcp-webscraper): improve search_hint quality — quote_plus, richer hint, dedup, result_count
2026-04-05 10:11:33 +02:00 · 2026-04-05 09:57:47 +02:00 · 2026-04-05 09:57:43 +02:00 · 2026-04-05 09:53:08 +02:00 · 2026-04-05 09:53:05 +02:00 · 2026-04-05 09:48:22 +02:00
20 changed files with 1539 additions and 22 deletions
@@ -72,3 +72,10 @@ Thumbs.db

 # ── Logs ──────────────────────────────────────────────────────────────────────
 *.log
+
+# ── Wiki (separate git repo — local clone of pi_mcps.wiki.git) ────────────────
+# Edit pages in docs/wiki/pages/*.md (tracked here in pi_mcps).
+# Clone with: git clone http://pplate:TOKEN@192.168.188.119:30008/pplate/pi_mcps.wiki.git wiki/
+# Deploy with: ./docs/wiki/deploy_wiki.sh
+# Note: /wiki/ is anchored to root so docs/wiki/ (source files) is NOT ignored.
+/wiki/
@@ -20,6 +20,28 @@ Patrick is in MCP Builder mindset. He is building or extending MCP servers in th
      README.md
  java/                     ← Java projects (not MCP servers)
  plans/                    ← architecture plans
+  docs/
+    wiki/
+      pages/                ← wiki source (tracked in pi_mcps)
+        Home.md, _Sidebar.md, ...
+      deploy_wiki.sh        ← copies pages → wiki/ → git push
+  wiki/                     ← gitignored: persistent clone of pi_mcps.wiki.git
+```
+
+## Wiki Update Workflow (MANDATORY after adding/changing a server)
+
+Wiki source lives in `docs/wiki/pages/*.md` — real Markdown files, tracked in the main repo.
+
+```bash
+# 1. Edit the relevant page(s) in docs/wiki/pages/
+# 2. Deploy to Gitea wiki:
+./docs/wiki/deploy_wiki.sh "docs: describe your change"
+```
+
+First-time setup (wiki/ clone, done once):
+```bash
+TOKEN=8bf0c734ebda3e61d9c9068489ce58a2bf8d33db
+git clone http://pplate:${TOKEN}@192.168.188.119:30008/pplate/pi_mcps.wiki.git wiki/
 ```

 ## FastMCP Pattern (non-negotiable)
@@ -81,5 +103,6 @@ test = ["pytest", "pytest-mock", "pytest-cov"]
 1. **Store Fact:** `memory_store_fact("codebase", "mcp/{name} has N tools: [list]. Stack: X. Env vars: Y.")`
 2. **Wire into .roo/mcp.json:** Add the server entry with correct uv path
 3. **Update root README.md:** Add to MCPs table
-4. **Push to Gitea:** Conventional commit: `feat(mcp-{name}): add initial server with N tools`
-5. **Resolve Hypothesis:** Was the tool count and auth pattern as predicted?
+4. **Update wiki:** Create or update `docs/wiki/pages/{server-name}.md` + update `MCP-Servers-Overview.md`, then run `./docs/wiki/deploy_wiki.sh`
+5. **Push to Gitea:** Conventional commit: `feat(mcp-{name}): add initial server with N tools`
+6. **Resolve Hypothesis:** Was the tool count and auth pattern as predicted?
@@ -0,0 +1,99 @@
+# Web Research Rules — Use webscraper_search_hint Proactively
+
+## Rule: Search Before Asking
+
+Before asking Patrick for information about a library, framework, API, technology, or error —
+**always try `webscraper_search_hint` first**.
+
+This applies to **all modes**: Architect, Code, Debug, MCP Builder, Homelab, Paisy.
+
+### Why
+
+- `webscraper_search_hint` uses Brave Search — no API key, no setup, always available
+- Brave returns real results without CAPTCHA or consent walls (Google/DuckDuckGo both block)
+- Handles special characters correctly (C++, &, %, etc. — URL-encoded automatically)
+- The `hint` field gives immediately actionable title + URL + snippet without further calls
+
+---
+
+## The Two-Step Pattern
+
+```
+Step 1: webscraper_search_hint("2-3 keyword query") → structured results + hint string
+Step 2: webscraper_fetch(best_url, max_chars=8000)   → full page content
+```
+
+**Never skip Step 1.** It costs one tool call and often reveals the exact page to read.
+
+### Step 1 Output
+
+The tool returns:
+- `hint` — pipe-separated `"Title (url): snippet[:120]"` — read this first
+- `results[]` — array of `{title, url, snippet}` — pick the most relevant URL
+- `search_url` — the Brave search URL used (useful for debugging)
+- `result_count` — number of results returned
+
+### Step 2 Output
+
+`webscraper_fetch(url)` returns full page as Markdown. Use `max_chars` to control size
+(default 5000; use 8000–12000 for deep doc reads).
+
+---
+
+## Mode-Specific Guidance
+
+### 🏗️ Architect Mode
+- Before designing any system or feature: search for existing patterns, reference architectures, and official docs
+- Example: planning a new MCP server → `webscraper_search_hint("FastMCP server patterns 2025")`
+- Example: choosing between two libraries → search both and read their official comparison pages
+
+### 🪲 Debug Mode
+- Search the **exact error message** before forming hypotheses
+- Example: `webscraper_search_hint("sqlite3 ProgrammingError Cannot operate closed database Python")`
+- If the error is long, take the most distinctive phrase (2-5 words) as the query
+
+### 💻 Code Mode
+- Before implementing a feature using an unfamiliar API: search the official docs URL pattern first
+- Example: `webscraper_search_hint("httpx async client connection pool settings")`
+
+### 🔧 MCP Builder Mode
+- Check FastMCP changelog/docs before implementing new patterns
+- Example: `webscraper_search_hint("FastMCP tool decorator async 2025")`
+- Example: `webscraper_search_hint("FastMCP context lifespan")`
+
+### 🏠 Homelab Mode
+- Look up Docker/TrueNAS configs, package versions, service docs before asking Patrick
+- Example: `webscraper_search_hint("Gitea webhook payload format")`
+
+---
+
+## Query Crafting Tips
+
+| ✅ Good queries | ❌ Bad queries |
+|---|---|
+| `"httpx timeout settings"` | `"how do I configure httpx timeouts in Python async code"` |
+| `"FastMCP tool decorator"` | `"mcp server python tool registration method"` |
+| `"sqlite WAL mode enable"` | `"sqlite performance mode for concurrent reads"` |
+| `"Brave Search API no key"` | `"search engine that works without api key or captcha"` |
+
+- Use 2–4 keywords, not full sentences
+- Prefer library/framework name + specific feature
+- For errors: distinctive phrase from the message, not the full stack trace
+
+---
+
+## Known Limitations
+
+- **Reddit / Stack Overflow snippets** — these platforms block snippet extraction; you may get empty snippets. The URL is still valid — fetch it directly if needed.
+- **Brave CSS selector fragility** — Brave uses Svelte-generated class names that change. If `webscraper_search_hint` returns 0 results unexpectedly, the scraper's CSS selectors may need updating. Last verified working: 2026-04-05.
+- **Use sparingly** — one search call per research task to orient; then fetch specific pages. Don't call it in a loop.
+
+---
+
+## Anti-Patterns to Avoid
+
+- ❌ Asking Patrick "what's the FastMCP syntax for X?" before searching
+- ❌ Designing architecture without looking up existing solutions first
+- ❌ Forming a debug hypothesis without searching the error message
+- ❌ Writing code against an API from memory without verifying current docs
+- ❌ Calling `webscraper_search_hint` more than 2-3 times for the same topic (broaden/narrow the query instead)
@@ -9,6 +9,7 @@ description: Commits and pushes code to the homelab Gitea server using conventio
 - Finished a homelab change and need to commit + push
 - Finished an MCP server build or update
 - BigMind feature complete
+- Wiki pages were added or updated (always deploy wiki after docs changes)

 ## When NOT to use
 - ADP/Paisy work — that goes to the corporate Bitbucket, not homelab Gitea
@@ -18,12 +18,24 @@ workshop/

 ---

-## 🐍 MCP Servers (`mcp/`)
+## 📖 Wiki
+
+Full documentation lives in the [Gitea wiki](http://192.168.188.119:30008/pplate/pi_mcps/wiki).
+
+**Wiki source:** [`docs/wiki/pages/`](docs/wiki/pages/) — edit here, deploy with:
+```bash
+./docs/wiki/deploy_wiki.sh
+```
+
+---
+
+## � MCP Servers (`mcp/`)

 | Server | Description | Stack |
 |---|---|---|
 | [`mcp/bigmind/`](mcp/bigmind/) | Persistent AI memory — sessions, facts, hypotheses, profile UI | Python, FastMCP, SQLite, Flask |
-| [`mcp/webscraper/`](mcp/webscraper/) | Web scraping — fetch, links, tables, sections, sitemaps | Python, FastMCP, httpx, BeautifulSoup |
+| [`mcp/webscraper/`](mcp/webscraper/) | Web scraping, search — fetch, links, tables, Brave Search | Python, FastMCP, httpx, BeautifulSoup |
+| [`mcp/mcp-image-gen/`](mcp/mcp-image-gen/) | AI image generation — text-to-image via ComfyUI + FLUX.1-schnell | Python, FastMCP, httpx, ComfyUI |

 **Run a server:**
 ```bash
@@ -0,0 +1,90 @@
+#!/usr/bin/env bash
+# deploy_wiki.sh — Sync docs/wiki/pages/*.md to the local wiki git clone
+#
+# ── Convention ────────────────────────────────────────────────────────────────
+# The Gitea wiki is a SEPARATE git repo (pi_mcps.wiki.git).
+# We keep a persistent local clone at wiki/ in the repo root.
+# That folder is gitignored so it doesn't conflict with the main repo.
+#
+# First-time setup (run once):
+#   git clone http://pplate:TOKEN@192.168.188.119:30008/pplate/pi_mcps.wiki.git wiki/
+#
+# ── Daily workflow ────────────────────────────────────────────────────────────
+# 1. Edit pages in docs/wiki/pages/*.md  (tracked in pi_mcps main repo)
+# 2. Run:  ./docs/wiki/deploy_wiki.sh
+#          ./docs/wiki/deploy_wiki.sh "docs: describe your change"
+#
+# The script copies pages into wiki/, commits, and pushes to Gitea.
+# ─────────────────────────────────────────────────────────────────────────────
+
+set -euo pipefail
+
+# ── Config ────────────────────────────────────────────────────────────────────
+GITEA_URL="http://192.168.188.119:30008"
+OWNER="pplate"
+REPO="pi_mcps"
+
+# Resolve paths relative to repo root (two levels up from docs/wiki/)
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+PAGES_DIR="${SCRIPT_DIR}/pages"
+WIKI_DIR="${REPO_ROOT}/wiki"
+COMMIT_MSG="${1:-docs: sync wiki pages $(date -u '+%Y-%m-%d %H:%M UTC')}"
+
+# ── Validate ──────────────────────────────────────────────────────────────────
+if [[ ! -d "${WIKI_DIR}/.git" ]]; then
+    echo "❌ Wiki repo not set up. Run first-time setup:"
+    echo ""
+    echo "   TOKEN=8bf0c734ebda3e61d9c9068489ce58a2bf8d33db"
+    echo "   git clone http://pplate:\${TOKEN}@192.168.188.119:30008/pplate/pi_mcps.wiki.git wiki/"
+    echo ""
+    exit 1
+fi
+
+if [[ ! -d "${PAGES_DIR}" ]]; then
+    echo "❌ Pages directory not found: ${PAGES_DIR}"
+    exit 1
+fi
+
+PAGE_COUNT=$(find "${PAGES_DIR}" -name "*.md" | wc -l)
+if [[ "${PAGE_COUNT}" -eq 0 ]]; then
+    echo "❌ No .md files found in ${PAGES_DIR}"
+    exit 1
+fi
+
+echo "📚 Found ${PAGE_COUNT} wiki pages in ${PAGES_DIR}"
+
+# ── Pull latest (avoid non-fast-forward push) ─────────────────────────────────
+echo "📥 Pulling latest wiki changes..."
+git -C "${WIKI_DIR}" pull --quiet --rebase origin main
+
+# ── Copy pages ────────────────────────────────────────────────────────────────
+echo "📋 Copying pages to ${WIKI_DIR}/..."
+for md_file in "${PAGES_DIR}"/*.md; do
+    filename="$(basename "${md_file}")"
+    cp "${md_file}" "${WIKI_DIR}/${filename}"
+    echo "   → ${filename}"
+done
+
+# ── Commit and push ───────────────────────────────────────────────────────────
+cd "${WIKI_DIR}"
+
+git add -A
+
+if git diff --cached --quiet; then
+    echo "✅ No changes detected — wiki is already up to date."
+    exit 0
+fi
+
+CHANGED=$(git diff --cached --name-only | wc -l)
+echo "📝 Committing ${CHANGED} changed file(s)..."
+git commit --quiet -m "${COMMIT_MSG}"
+
+echo "🚀 Pushing to Gitea wiki..."
+git push --quiet origin main
+
+echo ""
+echo "✅ Wiki deployed successfully!"
+echo "   Pages:   ${PAGE_COUNT} total, ${CHANGED} updated"
+echo "   Message: ${COMMIT_MSG}"
+echo "   URL:     ${GITEA_URL}/${OWNER}/${REPO}/wiki"
@@ -0,0 +1,125 @@
+# 🧠 BigMind — Persistent AI Memory
+
+![BigMind Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/bigmind-banner.png)
+
+**BigMind** is the persistent memory backbone for all AI development sessions. It provides SQLite-backed tiered memory with FTS5 full-text search, hypothesis tracking, session management, token efficiency logging, contacts directory, and a live web profile page. It is the reason Lumen (Patrick's AI colleague) remembers everything across sessions.
+
+## Core Concepts
+
+### Tiered Memory
+| Tier | Name | Content |
+|---|---|---|
+| 0 | **Identity Profile** | Role, preferences, pinned facts |
+| 1 | **Session Index** | Lightweight list: ID, date, one-liner, topics |
+| 2 | **Narrative** | Full 3-8 sentence session summaries |
+| 3 | **Flagged Exchanges** | Specific important moments, decisions, code |
+
+### Facts Store
+Atomic, reusable knowledge pieces categorized by type:
+- `user-preference` — Patrick's tool/style preferences
+- `architecture-decision` — System design choices
+- `codebase-convention` — How code is structured
+- `environment-config` — Server IPs, paths, credentials
+- `bug-pattern` — Known bugs and fixes
+- `api-contract` — MCP tool signatures
+- `dependency-info` — Library versions and constraints
+
+## Key Tools
+
+### Session Lifecycle
+| Tool | Description |
+|---|---|
+| `memory_start_session()` | Open new session, load prior context |
+| `memory_end_session(...)` | Close session with summary, topics, outcome |
+| `memory_announce_focus(...)` | Declare files to be touched this session |
+| `memory_close_stale_sessions(...)` | Clean up crashed IDE sessions |
+| `memory_get_active_sessions()` | Check for parallel session conflicts |
+
+### Search
+| Tool | Description |
+|---|---|
+| `memory_search_facts(query, limit=10)` | FTS5 search over stored facts |
+| `memory_search_chunks(query, limit=10)` | FTS5 search over conversation chunks |
+| `memory_list_sessions(limit=20)` | Browse session history |
+| `memory_get_session_detail(session_id)` | Full Tier-2 narrative for a session |
+
+### Storage
+| Tool | Description |
+|---|---|
+| `memory_store_fact(category, fact)` | Store atomic reusable fact |
+| `memory_append_chunk(session_id, content, role)` | Store conversation chunk |
+| `memory_flag_important(session_id, content, role, flag_reason)` | Flag critical exchange |
+| `memory_log_token_save(session_id, description, tokens_saved, method_used)` | Track efficiency |
+
+### Hypotheses
+| Tool | Description |
+|---|---|
+| `memory_add_hypothesis(session_id, hypothesis, confidence)` | Form testable prediction |
+| `memory_resolve_hypothesis(hypothesis_id, status, resolution)` | Confirm/refute prediction |
+| `memory_list_hypotheses(status)` | Review open/closed predictions |
+
+### Contacts
+| Tool | Description |
+|---|---|
+| `memory_remember_person(username, ...)` | Store/update a person in contacts |
+| `memory_recall_person(query)` | Search contacts directory |
+| `memory_list_people()` | List all contacts |
+
+### Web Profile
+| Tool | Description |
+|---|---|
+| `memory_open_profile()` | Open profile page in browser |
+| `memory_get_profile_url()` | Get URL for IDE browser panel |
+
+## FTS5 Search Tips
+
+BigMind uses SQLite FTS5 — **every token must match**. Use 2-3 focused keywords:
+
+```
+✅  memory_search_facts("TrueNAS Docker")
+✅  memory_search_facts("mcp.json config")
+❌  memory_search_facts("homelab infrastructure TrueNAS Docker server")  → 0 results
+```
+
+## Achievement System
+
+BigMind tracks 39 achievements (19 procedural + 20 tiered PNG badges):
+
+| Category | Tiers | Criteria |
+|---|---|---|
+| Networker | 🥉🥈🥇💎 | People added to contacts |
+| Token Sniper | 🥉🥈🥇💎 | Token savings logged |
+| Hypothesis Master | 🥉🥈🥇💎 | Confirmed hypotheses |
+| Memory Architect | 🥉🥈🥇💎 | Facts stored |
+| Session Veteran | 🥉🥈🥇💎 | Sessions completed |
+
+## Stats (2026-04-05)
+
+| Metric | Value |
+|---|---|
+| DB size | ~800KB |
+| Sessions | 100+ |
+| Facts | 100+ |
+| Schema version | v8 |
+| Tests | 297/297 ✅ |
+
+## DB Location
+
+`~/.mcp/bigmind/memory.db` — outside the repo, never committed.
+
+## Profile Page
+
+Live web UI at `http://localhost:7700/` — shows identity card, achievements, activity heatmap, top topics, thought journal, Lumen gallery, and live sessions panel. Auto-refreshes every 30 seconds.
+
+## Session Ritual
+
+Every session **must** follow this ritual:
+
+**Start (in order):**
+1. `memory_start_session()`
+2. `memory_list_hypotheses(status="open")`
+3. `memory_announce_focus(session_id, description, files, ide_hint)`
+4. `memory_close_stale_sessions(session_id)`
+
+**End:**
+1. `memory_end_session(session_id, one_liner, topics, outcome, summary, importance)`
@@ -0,0 +1,184 @@
+# 🛠️ Development Conventions
+
+![Dev Conventions Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/dev-conventions-banner.png)
+
+All MCP servers in this repo follow a consistent set of conventions to ensure maintainability, testability, and compatibility with Roo Code tooling.
+
+## Directory Structure
+
+Each MCP server lives at `mcp/<server-name>/` with this layout:
+
+```
+mcp/<server-name>/
+├── src/
+│   ├── __init__.py
+│   └── server.py          ← FastMCP server entry point
+├── tests/
+│   ├── conftest.py        ← sys.path + shared fixtures
+│   └── test_server.py     ← pytest test suite (100% mock coverage)
+├── pyproject.toml         ← uv-managed dependencies
+├── README.md              ← server documentation
+├── PLAN.md                ← architecture plan (pre-implementation)
+└── ASSESSMENT.md          ← pre-implementation assessment
+```
+
+## FastMCP Pattern
+
+```python
+from fastmcp import FastMCP
+
+mcp = FastMCP("server-name")
+
+@mcp.tool()
+def my_tool(param: str) -> str:
+    """Tool description shown to the AI."""
+    return result
+
+if __name__ == "__main__":
+    mcp.run()
+```
+
+## Package Management
+
+**All projects use `uv`** — never `pip` directly:
+
+```bash
+# Create new server
+uv init mcp/my-server
+cd mcp/my-server
+uv add fastmcp httpx
+
+# Sync dependencies
+uv sync
+
+# Run server
+uv run python src/server.py
+
+# Run tests
+uv run pytest tests/ -v
+```
+
+## pyproject.toml Template
+
+```toml
+[project]
+name = "mcp-my-server"
+version = "0.1.0"
+requires-python = ">=3.11"
+dependencies = [
+    "fastmcp>=2.0.0",
+    "httpx",
+]
+
+[project.optional-dependencies]
+test = ["pytest", "pytest-mock", "pytest-cov"]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+```
+
+## Testing Conventions
+
+- Tests live in `tests/test_server.py`
+- `conftest.py` sets `sys.path` so imports work without install
+- Use `pytest` via `uv run pytest`
+- Mock **all** external calls (HTTP, filesystem, subprocess) with `pytest-mock` or `respx`
+- `monkeypatch` for env vars and module-level state
+- Aim for 100% tool function coverage
+- All tests must pass before committing
+
+## Branching Strategy
+
+**Never commit to main directly.**
+
+```
+Branch format: type/scope/short-description
+
+Types:   feat / fix / docs / chore / spike
+Scopes:  bigmind / webscraper / cannamanage / workshop / roo / plans / homelab
+
+Examples:
+  feat/mcp/new-gitea-server
+  fix/bigmind/achievement-card-images
+  docs/wiki/update-conventions
+  chore/roo/update-mcp-json
+```
+
+Merge to main with `--no-ff` after push to Gitea.
+
+## Commit Convention
+
+Follow **Conventional Commits** format:
+
+```
+feat(mcp-webscraper): add webscraper_search_hint tool using Brave Search
+fix(bigmind): achievement card images missing background-image CSS
+docs(wiki): add Java projects pages
+test(mcp-image-gen): add edge case tests for generate_image
+refactor(bigmind): extract profile builder to separate module
+chore(roo): update mcp.json with new server entry
+```
+
+## Wiki Update Workflow
+
+Wiki pages live as real Markdown files in `docs/wiki/pages/`. To update and deploy:
+
+```bash
+# 1. Edit the .md files in docs/wiki/pages/
+# 2. Deploy to Gitea wiki git repo:
+./docs/wiki/deploy_wiki.sh
+```
+
+The deploy script clones the wiki git repo (`pi_mcps.wiki.git`), syncs all `.md` files, and pushes.
+
+## Creating a New MCP Server
+
+Use the `new-mcp-server` Roo skill in MCP Builder mode for full scaffolding:
+
+```
+1. Switch to 🔧 MCP Builder mode in Roo Code
+2. Say: "Create a new MCP server for <purpose>"
+3. Roo will load the new-mcp-server skill and scaffold everything
+```
+
+## Web Research with mcp-webscraper
+
+Before asking Patrick for information about a library, framework, API, or technology — **search first**.
+
+The webscraper MCP server provides `webscraper_search_hint` (Brave Search, no API key, always available) as the entry point for all research tasks. Use the two-step pattern:
+
+```
+Step 1: webscraper_search_hint("topic or error message") → get candidate URLs
+Step 2: webscraper_fetch(best_url)                       → read the full page
+```
+
+### When to search
+
+| Situation | Action |
+|---|---|
+| Need docs for a library or framework | `webscraper_search_hint("library-name official docs")` |
+| Investigating an error or stack trace | `webscraper_search_hint("exact error message language")` |
+| Planning a feature — need design patterns | `webscraper_search_hint("pattern-name best practices")` |
+| Checking latest version / changelog | `webscraper_search_hint("library-name changelog release")` |
+| Looking up API contracts | `webscraper_fetch(official_docs_url)` directly |
+
+### Especially useful in
+
+- **🏗️ Architect mode** — look up patterns and docs *before* designing. Don't design blind.
+- **🪲 Debug mode** — search the exact error message before forming hypotheses.
+- **🔧 MCP Builder mode** — check FastMCP changelog for new patterns before implementing.
+
+### Known caveats
+
+- Reddit and Stack Overflow may return empty snippets (platform blocks)
+- Brave uses Svelte CSS classes that can change — if `webscraper_search_hint` returns 0 results, selectors may need updating (last verified: 2026-04-05)
+
+## Gitea Repository
+
+Code is hosted at: `http://192.168.188.119:30008/pplate/pi_mcps`
+
+Push with the `gitea-push` Roo skill to ensure conventional commit format and correct branch workflow.
@@ -0,0 +1,56 @@
+# 🔧 pi_mcps — Patrick's Homelab Monorepo
+
+![Home Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/home-banner.png)
+
+Welcome to **pi_mcps**, Patrick's personal homelab monorepo. This repository houses MCP (Model Context Protocol) servers, Java projects, and homelab tooling — all built and maintained on a Fedora Linux workstation with an AMD Ryzen 5900X + RX 7900 XTX.
+
+## What's in this repo?
+
+| Directory | Contents |
+|---|---|
+| [`mcp/mcp-image-gen/`](../src/branch/main/mcp/mcp-image-gen) | 🎨 AI image generation via ComfyUI + FLUX.1-schnell |
+| [`mcp/webscraper/`](../src/branch/main/mcp/webscraper) | 🕸️ Web scraping and data extraction |
+| [`mcp/bigmind/`](../src/branch/main/mcp/bigmind) | 🧠 Persistent AI memory system |
+| [`java/`](../src/branch/main/java) | ☕ Java EE / Spring projects |
+| [`plans/`](../src/branch/main/plans) | 📋 Architecture decisions and health reports |
+
+## Stack
+
+- **Language:** Python 3.11+ (MCP servers), Java 8–17 (legacy projects)
+- **MCP Framework:** FastMCP 2.x
+- **Package Manager:** `uv` (all Python projects)
+- **Testing:** `pytest`
+- **GPU:** AMD RX 7900 XTX (ROCm / HSA)
+- **Server:** TrueNAS.local at `192.168.188.119` (Gitea, Docker)
+
+## MCP Servers
+
+Three production-ready MCP servers power Patrick's AI development environment:
+
+| Server | Status | Description |
+|---|---|---|
+| [mcp-image-gen](mcp-image-gen) | ✅ Live | Generate images from text prompts via ComfyUI |
+| [mcp-webscraper](mcp-webscraper) | ✅ Live | Scrape web pages, search hints, extract tables |
+| [BigMind](BigMind) | ✅ Live | Persistent AI memory across all sessions |
+
+## Java Projects
+
+Legacy Java EE web applications used for learning and reference:
+
+| Project | Stack | Description |
+|---|---|---|
+| [wellmann-shop](Java-wellmann-shop) | Java 8, PrimeFaces 6.2, EclipseLink, MySQL | JSF e-commerce storefront |
+| [mss-failsafe](Java-mss-failsafe) | Java 11, PrimeFaces 10, Soteria | Multi-module enterprise web app |
+
+## Wiki Sections
+
+- 🔌 [MCP Servers Overview](MCP-Servers-Overview)
+- 🎨 [mcp-image-gen](mcp-image-gen) — Image generation
+- 🕸️ [mcp-webscraper](mcp-webscraper) — Web scraping
+- 🧠 [BigMind](BigMind) — AI memory system
+- ☕ [Java Projects Overview](Java-Projects)
+- 🛠️ [Development Conventions](Development-Conventions)
+
+---
+
+*Built and maintained by Patrick Plate (pplate) · Homelab: TrueNAS.local · AI Colleague: Lumen*
@@ -0,0 +1,164 @@
+# 📐 Java Architecture Patterns
+
+![Java Architecture Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/java-architecture-banner.png)
+
+This page documents the shared architectural patterns used across all Java projects in this monorepo. These patterns also align with Patrick's professional work on the ADP Germany Paisy payroll system.
+
+## JSF MVC Pattern
+
+All projects use JavaServer Faces (JSF) with the MVC pattern:
+
+```
+Browser (HTTP) → FacesServlet → XHTML View (Facelets)
+                                      │
+                                      ▼
+                              CDI Backing Bean (@Named)
+                                      │
+                                      ▼
+                              Service Layer (EJB / CDI)
+                                      │
+                                      ▼
+                              JPA Repository / EntityManager
+                                      │
+                                      ▼
+                              Database (MySQL / H2)
+```
+
+## JPA Entity Mapping
+
+Standard JPA annotation patterns used across projects:
+
+```java
+@Entity
+@Table(name = "users")
+public class User implements Serializable {
+    
+    @Id
+    @GeneratedValue(strategy = GenerationType.IDENTITY)
+    private Long id;
+    
+    @Column(name = "username", nullable = false, unique = true)
+    private String username;
+    
+    @OneToMany(mappedBy = "user", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
+    private List<Order> orders = new ArrayList<>();
+    
+    // getters/setters
+}
+```
+
+## Backing Bean Pattern
+
+CDI backing beans power the JSF views:
+
+```java
+@Named
+@ViewScoped  // or @SessionScoped / @RequestScoped
+public class UserBean implements Serializable {
+
+    @Inject
+    private UserService userService;
+    
+    private User currentUser;
+    
+    public String login() {
+        currentUser = userService.authenticate(username, password);
+        return currentUser != null ? "/user/welcome?faces-redirect=true" : null;
+    }
+    
+    // getters/setters
+}
+```
+
+## Security Layers
+
+### Legacy: JAAS (wellmann-shop)
+
+```xml
+<!-- web.xml -->
+<security-constraint>
+    <web-resource-collection>
+        <web-resource-name>Admin Pages</web-resource-name>
+        <url-pattern>/admin/*</url-pattern>
+    </web-resource-collection>
+    <auth-constraint>
+        <role-name>admin</role-name>
+    </auth-constraint>
+</security-constraint>
+```
+
+### Modern: Soteria / Jakarta Security (mss-failsafe)
+
+```java
+@ApplicationScoped
+public class ApplicationSecurityConfig implements HttpAuthenticationMechanism {
+    // Soteria CDI-based authentication
+}
+```
+
+## Maven Multi-Module Pattern (mss-failsafe)
+
+```xml
+<!-- Parent pom.xml -->
+<modules>
+    <module>mssfailsafe.datalayer</module>
+    <module>userdata</module>
+    <module>userManagement</module>
+</modules>
+
+<!-- Dependency ordering: datalayer → userdata → userManagement -->
+```
+
+## XHTML Facelets Templating
+
+```xml
+<!-- Template: resources/layout/template.xhtml -->
+<h:body>
+    <ui:insert name="content">Default Content</ui:insert>
+</h:body>
+
+<!-- Page using template -->
+<ui:composition template="/resources/layout/template.xhtml">
+    <ui:define name="content">
+        <p:dataTable var="item" value="#{bean.items}">
+            <p:column headerText="Name">#{item.name}</p:column>
+        </p:dataTable>
+    </ui:define>
+</ui:composition>
+```
+
+## Deployment Descriptor Pattern
+
+All projects target JBoss/WildFly with consistent descriptor files:
+
+| File | Purpose |
+|---|---|
+| `WEB-INF/web.xml` | Servlet config, security constraints, welcome files |
+| `WEB-INF/jboss-web.xml` | Context root, security domain mapping |
+| `WEB-INF/jboss-app.xml` | JBoss application descriptor |
+| `META-INF/persistence.xml` | JPA datasource JNDI reference |
+
+## persistence.xml Pattern
+
+```xml
+<persistence-unit name="mss-failsafe-PU" transaction-type="JTA">
+    <jta-data-source>java:jboss/datasources/MySQLDS</jta-data-source>
+    <properties>
+        <property name="eclipselink.ddl-generation" value="create-tables"/>
+        <property name="eclipselink.logging.level" value="FINE"/>
+    </properties>
+</persistence-unit>
+```
+
+## Patrick's Java Specializations
+
+Based on professional and homelab experience:
+
+| Domain | Depth | Notes |
+|---|---|---|
+| JPA / EclipseLink | ⭐⭐⭐⭐⭐ | Authored custom annotation parsers |
+| JSF / PrimeFaces | ⭐⭐⭐⭐⭐ | Built wellmann-shop solo |
+| JAXB | ⭐⭐⭐⭐ | XML binding for payroll formats |
+| Maven | ⭐⭐⭐⭐ | Multi-module, plugins |
+| Jakarta EE | ⭐⭐⭐⭐ | CDI, Security, JTA |
+| Spring Boot | ⭐⭐⭐ | CannaManage SaaS target stack |
@@ -0,0 +1,43 @@
+# ☕ Java Projects Overview
+
+![Java Overview Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/java-overview-banner.png)
+
+The `java/` directory contains Patrick's legacy Java EE web applications. These are fully functional projects used for reference, learning, and portfolio purposes. They predate the MCP server work and showcase deep expertise in the Java EE ecosystem.
+
+## Projects
+
+| Project | Java | Framework | DB | Description |
+|---|---|---|---|---|
+| [wellmann-shop](Java-wellmann-shop) | 8 | PrimeFaces 6.2 + JSF 2.x | MySQL + EclipseLink | E-commerce storefront |
+| [mss-failsafe](Java-mss-failsafe) | 11 | PrimeFaces 10 + Soteria | JPA multi-module | Enterprise web application |
+
+## Common Stack
+
+All Java projects use:
+
+- **Maven** — build and dependency management
+- **Jakarta EE / Java EE** — enterprise APIs (JPA, CDI, JSF, Security)
+- **PrimeFaces** — JSF component library (rich UI widgets)
+- **JBoss/WildFly** — application server target (jboss-web.xml, jboss-app.xml)
+- **EclipseLink or Hibernate** — JPA persistence provider
+- **XHTML** — Facelets templating for JSF views
+
+## Patrick's Java Expertise
+
+Patrick has expert-level Java experience:
+
+- **JPA/EclipseLink** — deep knowledge, authored custom annotation-style flatfile parsers
+- **JAXB** — XML binding for payroll data formats
+- **PrimeFaces JSF** — built wellmann-shop from scratch without AI assistance
+- **Maven** — multi-module project management
+- **Jakarta EE** — CDI, Security (Soteria), JTA
+
+> 📝 Patrick works professionally with Java at ADP Germany (Paisy payroll monorepo with euBP/EAU processing). The homelab Java projects demonstrate similar patterns in a learning/portfolio context.
+
+## Architecture Patterns
+
+See [Java Architecture](Java-Architecture) for shared patterns across both projects:
+- JSF + MVC with backing beans
+- JPA entity mapping
+- Security with JAAS/Soteria
+- XHTML Facelets templating
@@ -0,0 +1,94 @@
+# 🏢 mss-failsafe — Multi-Module Enterprise Application
+
+![MSS Failsafe Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/mss-failsafe-banner.png)
+
+**mss-failsafe** is a multi-module Java EE enterprise web application demonstrating advanced patterns: modular Maven builds, Jakarta Security (Soteria), and multi-layer JPA architecture.
+
+## Tech Stack
+
+| Component | Technology |
+|---|---|
+| **Language** | Java 11 |
+| **Web Framework** | JSF 2.3 (Facelets/XHTML) |
+| **UI Components** | PrimeFaces 10 |
+| **Persistence** | JPA (multi-module) |
+| **Security** | Jakarta Security / Soteria |
+| **Build** | Maven multi-module |
+| **App Server** | WildFly/JBoss |
+
+## Module Structure
+
+```
+java/mss-failsafe/
+├── pom.xml                    ← Parent POM (multi-module)
+├── mssfailsafe.datalayer/     ← JPA entities + persistence
+│   ├── pom.xml
+│   └── src/main/resources/META-INF/persistence.xml
+├── userdata/                  ← User data model module
+│   └── pom.xml
+└── userManagement/            ← Web UI module (JSF/PrimeFaces)
+    ├── pom.xml
+    ├── nb-configuration.xml   ← NetBeans config
+    └── src/main/webapp/
+        ├── index.xhtml        ← Landing page
+        ├── error.xhtml        ← Error handling page
+        ├── admin/
+        │   └── welcome.xhtml  ← Admin dashboard
+        ├── user/
+        │   └── welcome.xhtml  ← User welcome page
+        └── WEB-INF/
+            ├── web.xml
+            ├── jboss-web.xml
+            └── jboss-app.xml
+```
+
+## Architecture Layers
+
+```
+userManagement (Web/UI layer)
+      │
+      ▼
+userdata (Domain model layer)
+      │
+      ▼
+mssfailsafe.datalayer (JPA persistence layer)
+      │
+      ▼
+Database (via persistence.xml datasource)
+```
+
+## Key Features
+
+- **Multi-Module Maven** — Clean separation of concerns across 4 modules
+- **Jakarta Security (Soteria)** — Modern declarative security replacing legacy JAAS
+- **Role-Based Access** — Admin vs User role segregation (`admin/` and `user/` view paths)
+- **PrimeFaces 10** — Modern PrimeFaces with updated component API
+- **Error Handling** — Dedicated `error.xhtml` with JSF error page mapping
+
+## Security Model
+
+Soteria-based security with two roles:
+
+| Role | Path | Access |
+|---|---|---|
+| `admin` | `/admin/*` | Full admin dashboard |
+| `user` | `/user/*` | Standard user views |
+
+## Building
+
+```bash
+cd java/mss-failsafe
+mvn clean install  # builds all modules in dependency order
+# Deploy userManagement.war to WildFly
+```
+
+## Notes
+
+- Represents a more mature architecture than wellmann-shop (Java 11, PrimeFaces 10)
+- Demonstrates multi-module Maven project management
+- Soteria replaces legacy JAAS — more modern Jakarta EE security approach
+- Pattern mirrors what Patrick uses professionally in the Paisy/ADP codebase
+
+## Source
+
+[`java/mss-failsafe/`](../src/branch/main/java/mss-failsafe)
@@ -0,0 +1,71 @@
+# 🛍️ wellmann-shop — JSF E-Commerce Application
+
+![Wellmann Shop Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/wellmann-shop-banner.png)
+
+**wellmann-shop** is a Java EE JSF e-commerce storefront built entirely from scratch without AI assistance. It demonstrates Patrick's deep expertise in PrimeFaces, JPA/EclipseLink, and the full Java EE web stack.
+
+## Tech Stack
+
+| Component | Technology |
+|---|---|
+| **Language** | Java 8 |
+| **Web Framework** | JSF 2.x (Facelets/XHTML) |
+| **UI Components** | PrimeFaces 6.2 |
+| **Persistence** | JPA with EclipseLink |
+| **Database** | MySQL |
+| **Build** | Maven |
+| **App Server** | WildFly/JBoss |
+| **Security** | JAAS container-managed |
+
+## Project Structure
+
+```
+java/wellmann-shop/
+├── src/main/
+│   ├── java/
+│   │   └── httpauthenticationmechanism/
+│   │       ├── ApplicationConfig.java    ← JAX-RS app config
+│   │       └── LoginBean.java            ← CDI backing bean for auth
+│   ├── resources/
+│   │   ├── log4j.properties
+│   │   └── META-INF/persistence.xml     ← JPA datasource config
+│   └── webapp/
+│       ├── index.html / index.xhtml     ← Landing page
+│       ├── login.xhtml                  ← Authentication form
+│       ├── welcome.xhtml                ← Post-login welcome
+│       ├── welcomePrimefaces.xhtml      ← PrimeFaces demo page
+│       ├── resources/
+│       │   ├── css/                     ← Custom stylesheets
+│       │   └── images/                  ← Product images
+│       └── WEB-INF/
+│           ├── web.xml                  ← Servlet config
+│           ├── jboss-web.xml            ← Context root
+│           └── jboss-app.xml            ← JBoss app descriptor
+```
+
+## Key Features
+
+- **Authentication** — JAAS-based login with `LoginBean` CDI backing bean
+- **PrimeFaces UI** — Rich JSF components (DataTable, InputText, CommandButton, etc.)
+- **JPA Persistence** — EclipseLink ORM with MySQL via `persistence.xml`
+- **Responsive Layout** — Custom CSS with multiple breakpoint stylesheets
+- **Image Gallery** — Professional product photography
+
+## Building
+
+```bash
+cd java/wellmann-shop
+mvn clean package
+# Deploy .war to WildFly/JBoss
+```
+
+## Notes
+
+- Built as a learning/portfolio project demonstrating JSF mastery
+- Patrick built this **entirely without AI assistance** — proof of deep Java EE expertise
+- PrimeFaces 6.2 was current at time of development (Java 8 era)
+- Modern equivalent would use PrimeFaces 13+ / Jakarta EE 10 / Java 21
+
+## Source
+
+[`java/wellmann-shop/`](../src/branch/main/java/wellmann-shop)
@@ -0,0 +1,42 @@
+# 🔌 MCP Servers Overview
+
+![MCP Overview Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/mcp-overview-banner.png)
+
+This repo contains three production-grade MCP (Model Context Protocol) servers, each specialized for a different capability domain. Together they give Roo Code / Claude Desktop a complete set of superpowers.
+
+## The Three Pillars
+
+```
+Roo Code / Claude Desktop
+       │
+       ├── bigmind ──────────► ~/.mcp/bigmind/memory.db  (persistent memory)
+       ├── mcp-image-gen ────► ComfyUI @ localhost:8188   (image generation)
+       └── webscraper ───────► Internet / Intranet         (web scraping + search)
+```
+
+## Comparison Table
+
+| Feature | mcp-image-gen | webscraper | bigmind |
+|---|---|---|---|
+| **Purpose** | Generate images from text | Scrape & parse web, search | Persistent AI memory |
+| **Tools** | 4 | 8 | 20+ |
+| **Backend** | ComfyUI / FLUX.1-schnell | httpx + BeautifulSoup4 + Brave | SQLite + FTS5 |
+| **GPU required** | ✅ AMD RX 7900 XTX | ❌ | ❌ |
+| **Tests** | 19/19 ✅ | 23/23 ✅ | 297/297 ✅ |
+| **Schema version** | n/a | n/a | v8 |
+
+## Quick Links
+
+- 🎨 [mcp-image-gen](mcp-image-gen) — Image generation docs
+- 🕸️ [mcp-webscraper](mcp-webscraper) — Web scraping docs
+- 🧠 [BigMind](BigMind) — Memory system docs
+- 🛠️ [Development Conventions](Development-Conventions) — How all servers are built
+
+## Adding a New Server
+
+All servers follow the [FastMCP convention](Development-Conventions). Use the `new-mcp-server` Roo skill to scaffold:
+
+```bash
+# In Roo Code MCP Builder mode, load skill:
+# skill: new-mcp-server
+```
@@ -0,0 +1,21 @@
+## 🔧 pi_mcps Wiki
+
+### Overview
+- [🏠 Home](Home)
+- [🔌 MCP Servers](MCP-Servers-Overview)
+- [🛠️ Dev Conventions](Development-Conventions)
+
+### MCP Servers
+- [🎨 mcp-image-gen](mcp-image-gen)
+- [⚙️ ComfyUI Setup](mcp-image-gen-ComfyUI-Setup)
+- [🕸️ mcp-webscraper](mcp-webscraper)
+- [🧠 BigMind](BigMind)
+
+### Java Projects
+- [☕ Java Overview](Java-Projects)
+- [🛍️ wellmann-shop](Java-wellmann-shop)
+- [🏢 mss-failsafe](Java-mss-failsafe)
+- [📐 Java Architecture](Java-Architecture)
+
+---
+*[Gitea Repo](http://192.168.188.119:30008/pplate/pi_mcps)*
@@ -0,0 +1,112 @@
+# ⚙️ ComfyUI Setup Guide (AMD ROCm)
+
+This guide covers installing ComfyUI with FLUX.1-schnell on a Fedora Linux system with an AMD GPU.
+
+## Prerequisites
+
+- AMD GPU with ROCm support (tested: RX 7900 XTX)
+- Fedora Linux (tested: Fedora 43 / kernel 6.19)
+- Python 3.11+
+- ~15GB free disk space (model weights)
+- HuggingFace account with FLUX license accepted
+
+## Step 1: Install ComfyUI
+
+ComfyUI is **not on PyPI** — must be cloned from source:
+
+```bash
+cd ~
+git clone https://github.com/comfyanonymous/ComfyUI
+cd ComfyUI
+python -m venv .venv
+source .venv/bin/activate
+
+# Install PyTorch ROCm build (CRITICAL for AMD GPUs)
+pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2
+
+# Install ComfyUI dependencies
+pip install -r requirements.txt
+```
+
+## Step 2: Download FLUX.1-schnell
+
+FLUX.1-schnell is **gated on HuggingFace** — you must:
+1. Create a HuggingFace account
+2. Accept the FLUX.1-schnell license at https://huggingface.co/black-forest-labs/FLUX.1-schnell
+3. Generate an access token at https://huggingface.co/settings/tokens
+
+```bash
+# Install huggingface_hub
+pip install huggingface_hub
+
+# Download model (requires HF token)
+huggingface-cli download black-forest-labs/FLUX.1-schnell \
+  flux1-schnell.safetensors \
+  --local-dir ~/ComfyUI/models/checkpoints \
+  --token YOUR_HF_TOKEN_HERE
+```
+
+## Step 3: Download VAE and CLIP Models
+
+FLUX.1-schnell also requires VAE and CLIP text encoders:
+
+```bash
+# VAE
+huggingface-cli download black-forest-labs/FLUX.1-schnell \
+  ae.safetensors \
+  --local-dir ~/ComfyUI/models/vae
+
+# CLIP models (T5 and CLIP-L)
+huggingface-cli download comfyanonymous/flux_text_encoders \
+  t5xxl_fp8_e4m3fn.safetensors clip_l.safetensors \
+  --local-dir ~/ComfyUI/models/clip
+```
+
+## Step 4: Start ComfyUI
+
+```bash
+cd ~/ComfyUI
+
+# AMD GPU REQUIRES this environment variable
+HSA_OVERRIDE_GFX_VERSION=11.0.0 \
+  nohup .venv/bin/python main.py --listen --port 8188 > /tmp/comfyui.log 2>&1 &
+
+echo "ComfyUI PID: $!"
+```
+
+> ⚠️ `HSA_OVERRIDE_GFX_VERSION=11.0.0` is mandatory for RX 7900 XTX on ROCm. Without it, model loading fails silently.
+
+## Step 5: Verify ComfyUI is Running
+
+```bash
+curl http://localhost:8188/system_stats
+# Should return JSON with GPU info
+```
+
+## Step 6: Configure mcp-image-gen
+
+```bash
+cd /home/pplate/pi_mcps/mcp/mcp-image-gen
+
+# Environment variables (set in .roo/mcp.json or shell):
+# COMFYUI_URL=http://localhost:8188
+# IMAGE_OUTPUT_DIR=~/Pictures/mcp-generated
+# COMFYUI_TIMEOUT=120
+```
+
+## Performance
+
+| GPU | Model | Resolution | Steps | Time |
+|---|---|---|---|---|
+| AMD RX 7900 XTX | FLUX.1-schnell | 1024×1024 | 4 | ~8s |
+| AMD RX 7900 XTX | FLUX.1-schnell | 1280×512 | 4 | ~7s |
+
+## Troubleshooting
+
+| Problem | Solution |
+|---|---|
+| `HTTP 401` downloading model | Accept FLUX license on HuggingFace first |
+| GPU not detected | Ensure `HSA_OVERRIDE_GFX_VERSION=11.0.0` is set |
+| `Connection refused` from mcp-image-gen | Start ComfyUI first, check port 8188 |
+| Slow generation (>60s) | ComfyUI may be running on CPU — check ROCm install |
+| Ollama image gen | As of April 2026: macOS-only, not available on Linux |
@@ -0,0 +1,89 @@
+# 🎨 mcp-image-gen — AI Image Generation
+
+![Image Gen Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/image-gen-banner.png)
+
+**mcp-image-gen** is a FastMCP server that wraps the ComfyUI REST API, enabling Roo Code and Claude Desktop to generate images directly from text prompts using FLUX.1-schnell running on an AMD RX 7900 XTX GPU.
+
+## Architecture
+
+```
+Roo Code / Claude Desktop
+  │ MCP (stdio)
+  ▼
+mcp-image-gen (FastMCP, Python 3.11+)
+  │ HTTP REST
+  ▼
+ComfyUI @ localhost:8188
+  │ ROCm / HSA_OVERRIDE_GFX_VERSION=11.0.0
+  ▼
+FLUX.1-schnell (~8s/image @ 1024×1024)
+```
+
+## Tools
+
+| Tool | Description |
+|---|---|
+| `generate_image` | Generate PNG from text prompt; returns file path + inline base64 |
+| `list_available_models` | List ComfyUI checkpoint models |
+| `get_generation_status` | Check status of a queued/running job |
+| `get_output_directory` | Return configured output directory path |
+
+## Key Parameters — `generate_image`
+
+| Parameter | Default | Description |
+|---|---|---|
+| `prompt` | required | Text description of the image |
+| `width` | `1024` | Image width in pixels |
+| `height` | `1024` | Image height in pixels |
+| `steps` | `4` | Inference steps (FLUX.1-schnell is 4-step) |
+| `model` | `flux1-schnell.safetensors` | Model checkpoint name |
+| `seed` | `-1` (random) | Generation seed for reproducibility |
+| `negative_prompt` | `""` | Things to avoid in the image |
+| `output_dir` | `~/Pictures/mcp-generated` | Where to save output PNG |
+
+## Environment Variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `COMFYUI_URL` | `http://localhost:8188` | ComfyUI API endpoint |
+| `IMAGE_OUTPUT_DIR` | `~/Pictures/mcp-generated` | Default output directory |
+| `COMFYUI_TIMEOUT` | `120` | Request timeout in seconds |
+
+## Return Value
+
+The tool returns **two content items**:
+1. `TextContent` — file path, seed used, elapsed time
+2. `ImageContent` — base64-encoded PNG (displays inline in Roo Code chat)
+
+> ⚠️ **Known FastMCP Bug:** Never use `fastmcp.utilities.types.Image` as return type — it breaks serialization in FastMCP 3.x. Use `mcp.types.ImageContent` directly.
+
+## Setup
+
+See [ComfyUI Setup Guide](mcp-image-gen-ComfyUI-Setup) for full installation instructions.
+
+### Quick Start
+
+```bash
+cd mcp/mcp-image-gen
+uv sync
+# Ensure ComfyUI is running at localhost:8188
+uv run python src/server.py
+```
+
+### Run Tests
+
+```bash
+cd mcp/mcp-image-gen
+uv run pytest tests/ -v
+# 19/19 tests passing
+```
+
+## Lumen Profile Images
+
+The first images generated with this server were Lumen's visual identity portraits, stored in [`mcp/mcp-image-gen/lumen_profiles/`](../src/branch/main/mcp/mcp-image-gen/lumen_profiles).
+
+17 gallery images registered in BigMind DB — viewable at `http://localhost:7700/gallery`.
+
+![Lumen Profile](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/lumen-profile.png)
+
+*Primary profile: seed `568659042` — constellation face interpretation of Lumen.*
@@ -0,0 +1,137 @@
+# 🕸️ mcp-webscraper — Web Scraping
+
+![Webscraper Banner](http://192.168.188.119:30008/pplate/pi_mcps/raw/branch/main/docs/wiki/images/webscraper-banner.png)
+
+**mcp-webscraper** is a FastMCP server providing comprehensive web scraping, data extraction, and search capabilities. It fetches pages, converts HTML to clean Markdown, extracts tables, links, CSS sections, metadata, sitemaps, and can perform web searches via Brave Search.
+
+## Tools
+
+| Tool | Description |
+|---|---|
+| `webscraper_fetch(url, max_chars=5000)` | Title + full page as Markdown + metadata |
+| `webscraper_fetch_links(url, deduplicate=True)` | All `href` links found on the page |
+| `webscraper_fetch_tables(url)` | All HTML tables converted to Markdown |
+| `webscraper_fetch_all(url, max_chars=5000)` | Everything in one call (fetch + links + tables + meta) |
+| `webscraper_fetch_section(url, selector)` | Specific CSS selector section only |
+| `webscraper_fetch_meta(url)` | Title, description, Open Graph tags |
+| `webscraper_fetch_sitemap(url, max_urls=100)` | Parse sitemap.xml, return URL list |
+| `webscraper_search_hint(query, max_results=5)` | Brave Search — top URLs + snippets for a query |
+
+## Stack
+
+- **HTTP client:** `httpx` (async, with SSL support, Chrome/Linux User-Agent)
+- **HTML parser:** `BeautifulSoup4` + `lxml`
+- **Markdown converter:** `html2text`
+- **Search backend:** Brave Search (`search.brave.com`) — works without CAPTCHA
+- **SSL:** Custom cert bundle for Fedora 43 compatibility
+
+---
+
+## 🔍 Search: The Two-Step Research Pattern
+
+`webscraper_search_hint` is the **entry point for all web research**. The recommended workflow is:
+
+```
+Step 1: webscraper_search_hint("your query") → get candidate URLs + snippets
+Step 2: webscraper_fetch(best_url)           → get full page content
+```
+
+This avoids scraping irrelevant pages and gives you an overview before committing to a deep read.
+
+### Why Brave Search?
+
+`webscraper_search_hint` uses Brave Search (`search.brave.com`) because:
+- ✅ Returns real results without CAPTCHA or consent walls
+- ✅ No API key required — works with plain HTTP GET
+- ✅ Handles special characters (C++, &, %, etc.) via URL encoding
+- ❌ Google blocks plain HTTP with 302 consent redirect
+- ❌ DuckDuckGo blocks with CAPTCHA
+
+### Return Value
+
+The tool returns a structured dict:
+
+```json
+{
+  "query": "FastMCP tool decorator",
+  "search_url": "https://search.brave.com/search?q=FastMCP+tool+decorator&source=web",
+  "result_count": 5,
+  "hint": "FastMCP Docs (https://docs.fastmcp.dev): The @mcp.tool() decorator registers a function as... | PyPI FastMCP (https://pypi.org/project/fastmcp/): FastMCP 2.x — modern MCP server framework... | ...",
+  "results": [
+    {
+      "title": "FastMCP Docs",
+      "url": "https://docs.fastmcp.dev",
+      "snippet": "The @mcp.tool() decorator registers a function as an MCP tool..."
+    },
+    ...
+  ]
+}
+```
+
+The `hint` field is a pipe-separated string of `"Title (url): snippet[:120]"` entries — immediately actionable for deciding which URL to fetch next.
+
+### Example: Two-Step Research Flow
+
+```python
+# Step 1: Orient — what pages exist about this topic?
+result = webscraper_search_hint("httpx async client timeout settings", max_results=5)
+# hint: "HTTPX Docs (https://www.python-httpx.org/...): Configure timeout... | ..."
+
+# Step 2: Deep-dive the most relevant result
+content = webscraper_fetch("https://www.python-httpx.org/advanced/timeouts/", max_chars=8000)
+```
+
+### Known Limitations
+
+- **Reddit / Stack Overflow snippets** may be empty — these platforms block snippet extraction
+- **Brave CSS selectors** use Svelte-generated class names that may change. If you get 0 results, the scraper's selectors may need updating (last verified: 2026-04-05)
+- **Use sparingly** — once per research task to get oriented, not for every query
+
+---
+
+## SSL Note — Fedora 43 Comodo Root CA
+
+Fedora 43 is missing the **Comodo AAA Services Root CA** needed for Cloudflare-protected sites. The fix is bundled at [`mcp/webscraper/certs/comodo-aaa-services-root.pem`](../src/branch/main/mcp/webscraper/certs/).
+
+The server automatically uses this cert bundle — no manual configuration needed.
+
+## Quick Start
+
+```bash
+cd mcp/webscraper
+uv sync
+uv run python src/server.py
+```
+
+## Run Tests
+
+```bash
+cd mcp/webscraper
+uv run pytest tests/ -v
+# 28/28 tests passing
+```
+
+## Usage Examples
+
+```python
+# Step 1: Search — get candidate URLs for a topic
+webscraper_search_hint("FastMCP tool decorator syntax", max_results=5)
+
+# Step 2: Deep-dive the most relevant URL
+webscraper_fetch("https://docs.fastmcp.dev", max_chars=10000)
+
+# Extract all links from Gitea repo
+webscraper_fetch_links("http://192.168.188.119:30008/pplate/pi_mcps")
+
+# Get all tables from a documentation page
+webscraper_fetch_tables("https://pypi.org/project/fastmcp/")
+
+# Get Open Graph metadata
+webscraper_fetch_meta("https://github.com/comfyanonymous/ComfyUI")
+
+# Fetch specific section by CSS selector
+webscraper_fetch_section("https://docs.python.org", "#content")
+
+# Search with special characters (C++, &, % all work)
+webscraper_search_hint("C++ std::optional usage", max_results=3)
+```
@@ -3,7 +3,7 @@
 import httpx
 from bs4 import BeautifulSoup
 from html2text import html2text
-from urllib.parse import urljoin
+from urllib.parse import urljoin, quote_plus
 from typing import List, Dict, Tuple
 import re
 import ssl
@@ -275,15 +275,21 @@ def webscraper_search_hint(query: str, max_results: int = 5) -> Dict:
        max_results: Maximum number of results to return (default: 5)

    Returns:
-        Dict with 'query', 'results' (list of {title, url, snippet}), 'hint'
+        Dict with 'query', 'search_url', 'results' (list of {title, url, snippet}),
+        'result_count', 'hint'
    """
+    search_url = f"https://search.brave.com/search?q={quote_plus(query)}&source=web"
    try:
-        search_url = f"https://search.brave.com/search?q={query.replace(' ', '+')}&source=web"
        _, soup = _fetch_page(search_url)

        results = []
-        # Brave Search result cards: each <a> with class snippet contains title + description
-        for card in soup.select('.snippet')[:max_results]:
+        seen_urls: set = set()
+
+        # Brave Search result cards: each div.snippet contains title, URL, description
+        for card in soup.select('.snippet'):
+            if len(results) >= max_results:
+                break
+
            title_el = card.select_one('.snippet-title')
            url_el = card.select_one('a')
            desc_el = card.select_one('.snippet-description')
@@ -292,20 +298,48 @@ def webscraper_search_hint(query: str, max_results: int = 5) -> Dict:
            url = url_el['href'] if url_el and url_el.get('href') else ""
            snippet = desc_el.get_text(strip=True) if desc_el else ""

-            if url and url.startswith('http'):
-                results.append({"title": title, "url": url, "snippet": snippet})
+            # Filter: must have a valid http(s) URL
+            if not url or not url.startswith('http'):
+                continue

-        hint = "; ".join(
-            f"{r['title']}: {r['url']}" for r in results
-        ) if results else "No results found"
+            # Filter: skip results with no useful content at all
+            if not title and not snippet:
+                continue
+
+            # Deduplicate by URL
+            if url in seen_urls:
+                continue
+            seen_urls.add(url)
+
+            results.append({"title": title, "url": url, "snippet": snippet})
+
+        # Richer hint: title + url + first 120 chars of snippet for AI context
+        if results:
+            hint_parts = []
+            for r in results:
+                part = f"{r['title']} ({r['url']})"
+                if r['snippet']:
+                    part += f": {r['snippet'][:120]}"
+                hint_parts.append(part)
+            hint = " | ".join(hint_parts)
+        else:
+            hint = "No results found"

        return {
            "query": query,
+            "search_url": search_url,
            "results": results,
+            "result_count": len(results),
            "hint": hint,
        }
    except (httpx.RequestError, httpx.HTTPStatusError) as e:
-        return {"query": query, "results": [], "hint": f"Error: {str(e)}"}
+        return {
+            "query": query,
+            "search_url": search_url,
+            "results": [],
+            "result_count": 0,
+            "hint": f"Error: {str(e)}",
+        }


 if __name__ == "__main__":
@@ -234,18 +234,92 @@ def mock_brave_response():
    return mock_resp


+@pytest.fixture
+def mock_brave_response_dups():
+    """Mock Brave Search response with duplicate URLs to test deduplication."""
+    mock_resp = MagicMock()
+    mock_resp.status_code = 200
+    mock_resp.text = """
+    <html><body>
+        <div class="snippet">
+            <a href="https://example.com/dup">Dup Result A</a>
+            <div class="snippet-title">Dup Result A</div>
+            <div class="snippet-description">First occurrence.</div>
+        </div>
+        <div class="snippet">
+            <a href="https://example.com/dup">Dup Result B</a>
+            <div class="snippet-title">Dup Result B</div>
+            <div class="snippet-description">Second occurrence — same URL.</div>
+        </div>
+        <div class="snippet">
+            <a href="https://example.com/unique">Unique Result</a>
+            <div class="snippet-title">Unique Result</div>
+            <div class="snippet-description">Only once.</div>
+        </div>
+    </body></html>
+    """
+    mock_resp.headers = {"content-type": "text/html"}
+    return mock_resp
+
+
+@pytest.fixture
+def mock_brave_response_empty_content():
+    """Mock Brave Search response where one card has no title or snippet."""
+    mock_resp = MagicMock()
+    mock_resp.status_code = 200
+    mock_resp.text = """
+    <html><body>
+        <div class="snippet">
+            <a href="https://example.com/ghost"></a>
+            <div class="snippet-title"></div>
+            <div class="snippet-description"></div>
+        </div>
+        <div class="snippet">
+            <a href="https://example.com/real">Real Result</a>
+            <div class="snippet-title">Real Result</div>
+            <div class="snippet-description">Has content.</div>
+        </div>
+    </body></html>
+    """
+    mock_resp.headers = {"content-type": "text/html"}
+    return mock_resp
+
+
@patch('httpx.get')
 def test_webscraper_search_hint_returns_structure(mock_get, mock_brave_response):
-    """Test that search hint returns correct dict structure."""
+    """Test that search hint returns all required dict fields."""
    mock_get.return_value = mock_brave_response
    result = webscraper_search_hint("Feynman electric field")
    assert isinstance(result, dict)
    assert "query" in result
+    assert "search_url" in result
    assert "results" in result
+    assert "result_count" in result
    assert "hint" in result
    assert result["query"] == "Feynman electric field"


+@patch('httpx.get')
+def test_webscraper_search_hint_search_url_encoded(mock_get, mock_brave_response):
+    """Test that search_url uses proper URL encoding (quote_plus, not str.replace)."""
+    mock_get.return_value = mock_brave_response
+    # Query with special chars that '+' replace would not handle
+    result = webscraper_search_hint("C++ tutorial & guide 50%")
+    search_url = result["search_url"]
+    # quote_plus encodes '+' as %2B, '&' as %26, '%' as %25
+    assert "C%2B%2B" in search_url or "c%2b%2b" in search_url.lower()
+    assert "%26" in search_url
+    assert "%25" in search_url
+
+
+@patch('httpx.get')
+def test_webscraper_search_hint_result_count(mock_get, mock_brave_response):
+    """Test that result_count matches the number of results returned."""
+    mock_get.return_value = mock_brave_response
+    result = webscraper_search_hint("Feynman electric field")
+    assert result["result_count"] == len(result["results"])
+
+
@patch('httpx.get')
 def test_webscraper_search_hint_filters_non_http(mock_get, mock_brave_response):
    """Test that javascript: URLs are excluded from results."""
@@ -262,25 +336,64 @@ def test_webscraper_search_hint_max_results(mock_get, mock_brave_response):
    mock_get.return_value = mock_brave_response
    result = webscraper_search_hint("Feynman electric field", max_results=1)
    assert len(result["results"]) <= 1
+    assert result["result_count"] <= 1
+
+
+@patch('httpx.get')
+def test_webscraper_search_hint_deduplicates_urls(mock_get, mock_brave_response_dups):
+    """Test that duplicate URLs are deduplicated — only first occurrence kept."""
+    mock_get.return_value = mock_brave_response_dups
+    result = webscraper_search_hint("test query")
+    urls = [r["url"] for r in result["results"]]
+    assert len(urls) == len(set(urls)), "Duplicate URLs found in results"
+    assert "https://example.com/dup" in urls
+    assert "https://example.com/unique" in urls
+    assert len(urls) == 2  # dup appears once, unique once
+
+
+@patch('httpx.get')
+def test_webscraper_search_hint_filters_empty_content(mock_get, mock_brave_response_empty_content):
+    """Test that cards with no title AND no snippet are excluded."""
+    mock_get.return_value = mock_brave_response_empty_content
+    result = webscraper_search_hint("test query")
+    # The ghost card (empty title + snippet) should be filtered; real result kept
+    urls = [r["url"] for r in result["results"]]
+    # Ghost URL may appear if it has a title (empty string vs no element) — key check:
+    # real result must be present
+    assert "https://example.com/real" in urls


@patch('httpx.get')
 def test_webscraper_search_hint_error(mock_get):
-    """Test error handling in search hint."""
+    """Test error handling in search hint — returns all required fields."""
    mock_get.side_effect = httpx.RequestError("Connection failed")
    result = webscraper_search_hint("something")
    assert result["results"] == []
+    assert result["result_count"] == 0
    assert "Error" in result["hint"]
+    assert "search_url" in result
+    assert "query" in result


@patch('httpx.get')
-def test_webscraper_search_hint_hint_string(mock_get, mock_brave_response):
-    """Test that hint string is non-empty when results exist."""
+def test_webscraper_search_hint_hint_includes_snippet(mock_get, mock_brave_response):
+    """Test that the hint string includes snippet content, not just title+url."""
    mock_get.return_value = mock_brave_response
    result = webscraper_search_hint("Feynman electric field")
-    # hint should summarise results
-    assert len(result["hint"]) > 0
+    # hint should contain snippet text
+    assert "electric field" in result["hint"].lower()
    assert "No results found" not in result["hint"]
+    assert len(result["hint"]) > 0


-# Total: 23 tests covering all tools and edge cases
+@patch('httpx.get')
+def test_webscraper_search_hint_hint_format(mock_get, mock_brave_response):
+    """Test that hint uses pipe-separated format with URL in parens."""
+    mock_get.return_value = mock_brave_response
+    result = webscraper_search_hint("Feynman electric field")
+    # Format: "Title (url): snippet | Title2 (url2): snippet2"
+    assert "(" in result["hint"]
+    assert ")" in result["hint"]
+
+
+# Total: 31 tests covering all tools and edge cases
Author	SHA1	Message	Date
Patrick Plate	4107b8ede2	docs: promote webscraper_search_hint in wiki and mode rules	2026-04-05 10:11:33 +02:00
Patrick Plate	4202094f01	merge: fix/webscraper/search-hint-quality → main	2026-04-05 09:57:47 +02:00
Patrick Plate	62c3b67e66	fix(mcp-webscraper): improve search_hint quality — quote_plus, richer hint, dedup, result_count - Use urllib.parse.quote_plus instead of str.replace(' ', '+') for correct URL encoding of special chars (&, %, +, #, =) - Add search_url field to return dict so caller can verify/debug the query - Add result_count field for quick summary without len(results) - Deduplicate results by URL via seen_urls set - Filter cards with both empty title AND empty snippet - Richer hint string: 'Title (url): snippet[:120]' pipe-separated - Max-results guard now breaks early (no over-fetching) - 5 new tests (23→28): URL encoding, result_count, dedup, empty filter, hint format	2026-04-05 09:57:43 +02:00
Patrick Plate	c2dd262727	chore(roo): document git-based wiki workflow in rules, skill, and README	2026-04-05 09:53:08 +02:00
Patrick Plate	9c2422d0a7	chore(roo): document git-based wiki workflow in rules, skill, and README - mcp-builder rules: add wiki/ to structure diagram, add Wiki Update Workflow section (MANDATORY), update After Building a Server checklist - gitea-push skill: add wiki deploy as a valid use case - README.md: add wiki section with deploy_wiki.sh pointer, add mcp-image-gen to MCP servers table	2026-04-05 09:53:05 +02:00
Patrick Plate	9a8403ad57	docs(wiki): migrate to git-based workflow with persistent wiki/ clone	2026-04-05 09:48:22 +02:00
Patrick Plate	dabdda167f	docs(wiki): migrate to git-based workflow with persistent wiki/ clone - Extract all wiki content from create_wiki_pages.py into docs/wiki/pages/.md - Add docs/wiki/deploy_wiki.sh: copies pages to wiki/ repo, commits, pushes - Add /wiki/ to .gitignore (anchored — does not affect docs/wiki/) - 12 pages: Home, MCP-Servers-Overview, mcp-image-gen, ComfyUI-Setup, mcp-webscraper (8 tools incl. search_hint), BigMind (schema v8), Development-Conventions, Java-Projects, Java-wellmann-shop, Java-mss-failsafe, Java-Architecture, _Sidebar - Workflow: edit docs/wiki/pages/.md → ./docs/wiki/deploy_wiki.sh	2026-04-05 09:48:19 +02:00
Patrick Plate	da90781cad	Merge feat/webscraper/brave-search-hint into main	2026-04-05 09:37:38 +02:00