Files
pi_mcps/mcp/webscraper/ASSESSMENT.md
T
Patrick Plate 155d56e8e8 chore: reorganize into polyglot monorepo (workshop)
- Move bigmind/ -> mcp/bigmind/
- Move webscraper/ -> mcp/webscraper/
- Move mss-failsafe/ -> java/mss-failsafe/
- Move Wellmann-Shop/ -> java/wellmann-shop/ (normalize to kebab-case)
- Add .roo/ IDE config files to tracking
- Add plans/REPO_STRATEGY.md (monorepo strategy document)
- Expand .gitignore: Java/Maven, Node/TS, coverage, uv.lock
- Rewrite README.md as navigation index
- Update .roo/mcp.json webscraper path to mcp/webscraper/
2026-04-04 08:51:15 +02:00

6.2 KiB

Webscraper SSL Certificate Verification — Assessment

Date: 2026-04-03 Status: RESOLVED Severity: High — SSL verification completely disabled (verify=False)


1. Problem Statement

The webscraper MCP server cannot verify SSL certificates when making HTTPS requests. The current code uses verify=False in _fetch_page() (line 15 of src/server.py) as a band-aid, which disables all SSL verification — leaving the scraper vulnerable to man-in-the-middle attacks and silently accepting invalid/expired certificates.

2. Reproduction

$ uv run python -c "import httpx; httpx.get('https://example.com', timeout=10)"
httpx.ConnectError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:
  unable to get local issuer certificate (_ssl.c:1081)

Even openssl s_client fails:

depth=2 C=US, O=SSL Corporation, CN=SSL.com TLS Transit ECC CA R2
verify error:num=20:unable to get local issuer certificate
Verify return code: 20 (unable to get local issuer certificate)

Yet curl https://example.com succeeds (exit code 0).

3. Root Cause Analysis

3.1 Hypotheses Considered (7)

# Hypothesis Verdict
1 certifi bundle outdated/missing root CA CONFIRMED — "AAA Certificate Services" (Comodo root) is absent from certifi 2026.02.25
2 System PEM bundle missing root CA CONFIRMED — 0 matches for "AAA Certificate Services" in /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
3 Python 3.14 SSL behavior change System Python 3.14 has same issue — not Python-version specific
4 OpenSSL 3.5.4 incompatibility curl uses same OpenSSL and succeeds
5 Expired/revoked certificate Certificate chain is valid (curl succeeds)
6 Missing intermediate certificates Server sends full chain (3 certs), only root is missing from stores
7 httpx library bug Same failure with raw ssl.create_default_context()

3.2 The Actual Root Cause (2 issues)

Issue A — PEM bundle gap: The Cloudflare certificate chain for example.com terminates at "AAA Certificate Services" (a Comodo root CA). This root CA is:

  • Missing from certifi 2026.02.25 (cacert.pem, 272KB)
  • Missing from Fedora's extracted PEM bundle (/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem)
  • Present in Fedora's p11-kit native trust store (trust list shows "Comodo AAA Services root")

This is why curl succeeds — curl on Fedora 43 uses the OpenSSL provider mechanism which can access p11-kit's PKCS#11 trust store directly, bypassing the PEM file.

Issue B — verify=False band-aid: Instead of fixing the certificate verification, the current code disables it entirely with verify=False, which:

  • Accepts expired certificates
  • Accepts self-signed certificates
  • Is vulnerable to MITM attacks
  • Produces InsecureRequestWarning noise in logs

3.3 Environment Details

Component Version
Python 3.14.3 (Fedora system)
OpenSSL 3.5.4
httpx 0.28.1
certifi 2026.02.25
ca-certificates 2025.2.80_v9.0.304-1.2.fc43
OS Fedora 43 (kernel 6.19)

4. Proposed Fix

Use truststore to access the native OS trust store

The truststore library provides an ssl.SSLContext-like API that accesses the native OS certificate store (p11-kit on Linux, Security framework on macOS, CryptoAPI on Windows). This is the official recommendation from httpx.

Changes implemented:

Approach A: truststore (REJECTED — did not work)

truststore.SSLContext was tested but loaded 0 certs on this Fedora 43 / OpenSSL 3.5.4 setup. cert_store_stats() raises NotImplementedError. The PKCS#11 provider in openssl.cnf is commented out. This approach was abandoned.

Approach B: certifi + extra certs directory (IMPLEMENTED )

  1. webscraper/certs/comodo-aaa-services-root.pem — Missing root CA extracted from p11-kit
  2. src/server.py — New _build_ssl_context() at module load:
import ssl
import certifi
from pathlib import Path

_EXTRA_CERTS_DIR = Path(__file__).resolve().parent.parent / "certs"

def _build_ssl_context() -> ssl.SSLContext:
    """Build an SSL context from certifi + extra bundled root certs."""
    ctx = ssl.create_default_context(cafile=certifi.where())
    if _EXTRA_CERTS_DIR.is_dir():
        for pem in _EXTRA_CERTS_DIR.glob("*.pem"):
            ctx.load_verify_locations(cafile=str(pem))
    return ctx

_SSL_CTX = _build_ssl_context()

Why this approach?

Approach Problem
verify=False Previous — disabled all security
verify=certifi.where() certifi bundle doesn't have the Comodo root CA
ssl.create_default_context() Uses the same broken system PEM file
sudo update-ca-trust System-level fix, requires root, didn't fully work
truststore.SSLContext Loaded 0 certs on this setup, NotImplementedError
certifi + extra certs dir Works! Certifi base + project-bundled missing CAs

Benefits of this approach:

  • No verify=False — proper SSL verification restored
  • Missing CAs can be added by dropping .pem files into certs/
  • No extra dependencies beyond certifi (already a transitive dep of httpx)
  • SSL context built once at module load — no per-request overhead
  • Works on all platforms (certifi is cross-platform)

System-level fix (optional, for curl and other apps):

sudo cp webscraper/certs/comodo-aaa-services-root.pem /etc/pki/ca-trust/source/anchors/
sudo update-ca-trust extract

5. Test Impact

  • Existing tests use mocked httpx.get calls → no test changes needed for SSL
  • Fixed pre-existing test_404 bug: HTTPStatusError requires request= kwarg (httpx API)
  • Fixed test_404 assertion: error message must include "404" text
  • 18/18 tests passing

6. Risk Assessment

Risk Level Mitigation
Bundled cert expires (2028-12-31) Low Well before then, certifi/system will include it
Some Cloudflare URLs fail on other machines Low Same cert can be added to certs/
New missing CAs in the future Low Drop .pem into certs/ — no code change needed