chore: reorganize into polyglot monorepo (workshop)

- Move bigmind/ -> mcp/bigmind/
- Move webscraper/ -> mcp/webscraper/
- Move mss-failsafe/ -> java/mss-failsafe/
- Move Wellmann-Shop/ -> java/wellmann-shop/ (normalize to kebab-case)
- Add .roo/ IDE config files to tracking
- Add plans/REPO_STRATEGY.md (monorepo strategy document)
- Expand .gitignore: Java/Maven, Node/TS, coverage, uv.lock
- Rewrite README.md as navigation index
- Update .roo/mcp.json webscraper path to mcp/webscraper/
This commit is contained in:
Patrick Plate
2026-04-04 08:51:15 +02:00
parent 4167e15ed9
commit 155d56e8e8
1598 changed files with 19429 additions and 23 deletions
+152
View File
@@ -0,0 +1,152 @@
# Webscraper SSL Certificate Verification — Assessment
**Date:** 2026-04-03
**Status:** ✅ RESOLVED
**Severity:** High — SSL verification completely disabled (`verify=False`)
---
## 1. Problem Statement
The webscraper MCP server cannot verify SSL certificates when making HTTPS requests.
The current code uses `verify=False` in `_fetch_page()` (line 15 of `src/server.py`) as a
band-aid, which **disables all SSL verification** — leaving the scraper vulnerable to
man-in-the-middle attacks and silently accepting invalid/expired certificates.
## 2. Reproduction
```
$ uv run python -c "import httpx; httpx.get('https://example.com', timeout=10)"
httpx.ConnectError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:
unable to get local issuer certificate (_ssl.c:1081)
```
Even `openssl s_client` fails:
```
depth=2 C=US, O=SSL Corporation, CN=SSL.com TLS Transit ECC CA R2
verify error:num=20:unable to get local issuer certificate
Verify return code: 20 (unable to get local issuer certificate)
```
Yet `curl https://example.com` **succeeds** (exit code 0).
## 3. Root Cause Analysis
### 3.1 Hypotheses Considered (7)
| # | Hypothesis | Verdict |
|---|-----------|---------|
| 1 | certifi bundle outdated/missing root CA | ✅ **CONFIRMED** — "AAA Certificate Services" (Comodo root) is absent from certifi 2026.02.25 |
| 2 | System PEM bundle missing root CA | ✅ **CONFIRMED** — 0 matches for "AAA Certificate Services" in `/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem` |
| 3 | Python 3.14 SSL behavior change | ❌ System Python 3.14 has same issue — not Python-version specific |
| 4 | OpenSSL 3.5.4 incompatibility | ❌ curl uses same OpenSSL and succeeds |
| 5 | Expired/revoked certificate | ❌ Certificate chain is valid (curl succeeds) |
| 6 | Missing intermediate certificates | ❌ Server sends full chain (3 certs), only root is missing from stores |
| 7 | httpx library bug | ❌ Same failure with raw `ssl.create_default_context()` |
### 3.2 The Actual Root Cause (2 issues)
**Issue A — PEM bundle gap:** The Cloudflare certificate chain for `example.com`
terminates at "AAA Certificate Services" (a Comodo root CA). This root CA is:
-**Missing** from `certifi` 2026.02.25 (`cacert.pem`, 272KB)
-**Missing** from Fedora's extracted PEM bundle (`/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem`)
-**Present** in Fedora's p11-kit native trust store (`trust list` shows "Comodo AAA Services root")
This is why `curl` succeeds — curl on Fedora 43 uses the OpenSSL provider mechanism
which can access p11-kit's PKCS#11 trust store directly, bypassing the PEM file.
**Issue B — `verify=False` band-aid:** Instead of fixing the certificate verification,
the current code disables it entirely with `verify=False`, which:
- Accepts expired certificates
- Accepts self-signed certificates
- Is vulnerable to MITM attacks
- Produces `InsecureRequestWarning` noise in logs
### 3.3 Environment Details
| Component | Version |
|-----------|---------|
| Python | 3.14.3 (Fedora system) |
| OpenSSL | 3.5.4 |
| httpx | 0.28.1 |
| certifi | 2026.02.25 |
| ca-certificates | 2025.2.80_v9.0.304-1.2.fc43 |
| OS | Fedora 43 (kernel 6.19) |
## 4. Proposed Fix
### Use `truststore` to access the native OS trust store
The [`truststore`](https://truststore.readthedocs.io/) library provides an `ssl.SSLContext`-like API
that accesses the **native OS certificate store** (p11-kit on Linux, Security framework on macOS,
CryptoAPI on Windows). This is the [official recommendation from httpx](https://www.python-httpx.org/advanced/ssl/).
**Changes implemented:**
### Approach A: truststore (REJECTED — did not work)
`truststore.SSLContext` was tested but loaded 0 certs on this Fedora 43 / OpenSSL 3.5.4 setup.
`cert_store_stats()` raises `NotImplementedError`. The PKCS#11 provider in `openssl.cnf` is
commented out. This approach was abandoned.
### Approach B: certifi + extra certs directory (IMPLEMENTED ✅)
1. **`webscraper/certs/comodo-aaa-services-root.pem`** — Missing root CA extracted from p11-kit
2. **`src/server.py`** — New `_build_ssl_context()` at module load:
```python
import ssl
import certifi
from pathlib import Path
_EXTRA_CERTS_DIR = Path(__file__).resolve().parent.parent / "certs"
def _build_ssl_context() -> ssl.SSLContext:
"""Build an SSL context from certifi + extra bundled root certs."""
ctx = ssl.create_default_context(cafile=certifi.where())
if _EXTRA_CERTS_DIR.is_dir():
for pem in _EXTRA_CERTS_DIR.glob("*.pem"):
ctx.load_verify_locations(cafile=str(pem))
return ctx
_SSL_CTX = _build_ssl_context()
```
### Why this approach?
| Approach | Problem |
|----------|---------|
| `verify=False` | **Previous** — disabled all security |
| `verify=certifi.where()` | certifi bundle doesn't have the Comodo root CA |
| `ssl.create_default_context()` | Uses the same broken system PEM file |
| `sudo update-ca-trust` | System-level fix, requires root, didn't fully work |
| `truststore.SSLContext` | ❌ Loaded 0 certs on this setup, NotImplementedError |
| **certifi + extra certs dir** | ✅ **Works!** Certifi base + project-bundled missing CAs |
### Benefits of this approach:
- No `verify=False` — proper SSL verification restored
- Missing CAs can be added by dropping `.pem` files into `certs/`
- No extra dependencies beyond certifi (already a transitive dep of httpx)
- SSL context built once at module load — no per-request overhead
- Works on all platforms (certifi is cross-platform)
### System-level fix (optional, for curl and other apps):
```bash
sudo cp webscraper/certs/comodo-aaa-services-root.pem /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust extract
```
## 5. Test Impact
- Existing tests use mocked `httpx.get` calls → **no test changes needed for SSL**
- Fixed pre-existing `test_404` bug: `HTTPStatusError` requires `request=` kwarg (httpx API)
- Fixed `test_404` assertion: error message must include "404" text
- **18/18 tests passing**
## 6. Risk Assessment
| Risk | Level | Mitigation |
|------|-------|------------|
| Bundled cert expires (2028-12-31) | Low | Well before then, certifi/system will include it |
| Some Cloudflare URLs fail on other machines | Low | Same cert can be added to `certs/` |
| New missing CAs in the future | Low | Drop `.pem` into `certs/` — no code change needed |