- Move bigmind/ -> mcp/bigmind/ - Move webscraper/ -> mcp/webscraper/ - Move mss-failsafe/ -> java/mss-failsafe/ - Move Wellmann-Shop/ -> java/wellmann-shop/ (normalize to kebab-case) - Add .roo/ IDE config files to tracking - Add plans/REPO_STRATEGY.md (monorepo strategy document) - Expand .gitignore: Java/Maven, Node/TS, coverage, uv.lock - Rewrite README.md as navigation index - Update .roo/mcp.json webscraper path to mcp/webscraper/
6.2 KiB
Webscraper SSL Certificate Verification — Assessment
Date: 2026-04-03
Status: ✅ RESOLVED
Severity: High — SSL verification completely disabled (verify=False)
1. Problem Statement
The webscraper MCP server cannot verify SSL certificates when making HTTPS requests.
The current code uses verify=False in _fetch_page() (line 15 of src/server.py) as a
band-aid, which disables all SSL verification — leaving the scraper vulnerable to
man-in-the-middle attacks and silently accepting invalid/expired certificates.
2. Reproduction
$ uv run python -c "import httpx; httpx.get('https://example.com', timeout=10)"
httpx.ConnectError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:
unable to get local issuer certificate (_ssl.c:1081)
Even openssl s_client fails:
depth=2 C=US, O=SSL Corporation, CN=SSL.com TLS Transit ECC CA R2
verify error:num=20:unable to get local issuer certificate
Verify return code: 20 (unable to get local issuer certificate)
Yet curl https://example.com succeeds (exit code 0).
3. Root Cause Analysis
3.1 Hypotheses Considered (7)
| # | Hypothesis | Verdict |
|---|---|---|
| 1 | certifi bundle outdated/missing root CA | ✅ CONFIRMED — "AAA Certificate Services" (Comodo root) is absent from certifi 2026.02.25 |
| 2 | System PEM bundle missing root CA | ✅ CONFIRMED — 0 matches for "AAA Certificate Services" in /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem |
| 3 | Python 3.14 SSL behavior change | ❌ System Python 3.14 has same issue — not Python-version specific |
| 4 | OpenSSL 3.5.4 incompatibility | ❌ curl uses same OpenSSL and succeeds |
| 5 | Expired/revoked certificate | ❌ Certificate chain is valid (curl succeeds) |
| 6 | Missing intermediate certificates | ❌ Server sends full chain (3 certs), only root is missing from stores |
| 7 | httpx library bug | ❌ Same failure with raw ssl.create_default_context() |
3.2 The Actual Root Cause (2 issues)
Issue A — PEM bundle gap: The Cloudflare certificate chain for example.com
terminates at "AAA Certificate Services" (a Comodo root CA). This root CA is:
- ❌ Missing from
certifi2026.02.25 (cacert.pem, 272KB) - ❌ Missing from Fedora's extracted PEM bundle (
/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem) - ✅ Present in Fedora's p11-kit native trust store (
trust listshows "Comodo AAA Services root")
This is why curl succeeds — curl on Fedora 43 uses the OpenSSL provider mechanism
which can access p11-kit's PKCS#11 trust store directly, bypassing the PEM file.
Issue B — verify=False band-aid: Instead of fixing the certificate verification,
the current code disables it entirely with verify=False, which:
- Accepts expired certificates
- Accepts self-signed certificates
- Is vulnerable to MITM attacks
- Produces
InsecureRequestWarningnoise in logs
3.3 Environment Details
| Component | Version |
|---|---|
| Python | 3.14.3 (Fedora system) |
| OpenSSL | 3.5.4 |
| httpx | 0.28.1 |
| certifi | 2026.02.25 |
| ca-certificates | 2025.2.80_v9.0.304-1.2.fc43 |
| OS | Fedora 43 (kernel 6.19) |
4. Proposed Fix
Use truststore to access the native OS trust store
The truststore library provides an ssl.SSLContext-like API
that accesses the native OS certificate store (p11-kit on Linux, Security framework on macOS,
CryptoAPI on Windows). This is the official recommendation from httpx.
Changes implemented:
Approach A: truststore (REJECTED — did not work)
truststore.SSLContext was tested but loaded 0 certs on this Fedora 43 / OpenSSL 3.5.4 setup.
cert_store_stats() raises NotImplementedError. The PKCS#11 provider in openssl.cnf is
commented out. This approach was abandoned.
Approach B: certifi + extra certs directory (IMPLEMENTED ✅)
webscraper/certs/comodo-aaa-services-root.pem— Missing root CA extracted from p11-kitsrc/server.py— New_build_ssl_context()at module load:
import ssl
import certifi
from pathlib import Path
_EXTRA_CERTS_DIR = Path(__file__).resolve().parent.parent / "certs"
def _build_ssl_context() -> ssl.SSLContext:
"""Build an SSL context from certifi + extra bundled root certs."""
ctx = ssl.create_default_context(cafile=certifi.where())
if _EXTRA_CERTS_DIR.is_dir():
for pem in _EXTRA_CERTS_DIR.glob("*.pem"):
ctx.load_verify_locations(cafile=str(pem))
return ctx
_SSL_CTX = _build_ssl_context()
Why this approach?
| Approach | Problem |
|---|---|
verify=False |
Previous — disabled all security |
verify=certifi.where() |
certifi bundle doesn't have the Comodo root CA |
ssl.create_default_context() |
Uses the same broken system PEM file |
sudo update-ca-trust |
System-level fix, requires root, didn't fully work |
truststore.SSLContext |
❌ Loaded 0 certs on this setup, NotImplementedError |
| certifi + extra certs dir | ✅ Works! Certifi base + project-bundled missing CAs |
Benefits of this approach:
- No
verify=False— proper SSL verification restored - Missing CAs can be added by dropping
.pemfiles intocerts/ - No extra dependencies beyond certifi (already a transitive dep of httpx)
- SSL context built once at module load — no per-request overhead
- Works on all platforms (certifi is cross-platform)
System-level fix (optional, for curl and other apps):
sudo cp webscraper/certs/comodo-aaa-services-root.pem /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust extract
5. Test Impact
- Existing tests use mocked
httpx.getcalls → no test changes needed for SSL - Fixed pre-existing
test_404bug:HTTPStatusErrorrequiresrequest=kwarg (httpx API) - Fixed
test_404assertion: error message must include "404" text - 18/18 tests passing
6. Risk Assessment
| Risk | Level | Mitigation |
|---|---|---|
| Bundled cert expires (2028-12-31) | Low | Well before then, certifi/system will include it |
| Some Cloudflare URLs fail on other machines | Low | Same cert can be added to certs/ |
| New missing CAs in the future | Low | Drop .pem into certs/ — no code change needed |