# Webscraper SSL Certificate Verification — Assessment **Date:** 2026-04-03 **Status:** ✅ RESOLVED **Severity:** High — SSL verification completely disabled (`verify=False`) --- ## 1. Problem Statement The webscraper MCP server cannot verify SSL certificates when making HTTPS requests. The current code uses `verify=False` in `_fetch_page()` (line 15 of `src/server.py`) as a band-aid, which **disables all SSL verification** — leaving the scraper vulnerable to man-in-the-middle attacks and silently accepting invalid/expired certificates. ## 2. Reproduction ``` $ uv run python -c "import httpx; httpx.get('https://example.com', timeout=10)" httpx.ConnectError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1081) ``` Even `openssl s_client` fails: ``` depth=2 C=US, O=SSL Corporation, CN=SSL.com TLS Transit ECC CA R2 verify error:num=20:unable to get local issuer certificate Verify return code: 20 (unable to get local issuer certificate) ``` Yet `curl https://example.com` **succeeds** (exit code 0). ## 3. Root Cause Analysis ### 3.1 Hypotheses Considered (7) | # | Hypothesis | Verdict | |---|-----------|---------| | 1 | certifi bundle outdated/missing root CA | ✅ **CONFIRMED** — "AAA Certificate Services" (Comodo root) is absent from certifi 2026.02.25 | | 2 | System PEM bundle missing root CA | ✅ **CONFIRMED** — 0 matches for "AAA Certificate Services" in `/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem` | | 3 | Python 3.14 SSL behavior change | ❌ System Python 3.14 has same issue — not Python-version specific | | 4 | OpenSSL 3.5.4 incompatibility | ❌ curl uses same OpenSSL and succeeds | | 5 | Expired/revoked certificate | ❌ Certificate chain is valid (curl succeeds) | | 6 | Missing intermediate certificates | ❌ Server sends full chain (3 certs), only root is missing from stores | | 7 | httpx library bug | ❌ Same failure with raw `ssl.create_default_context()` | ### 3.2 The Actual Root Cause (2 issues) **Issue A — PEM bundle gap:** The Cloudflare certificate chain for `example.com` terminates at "AAA Certificate Services" (a Comodo root CA). This root CA is: - ❌ **Missing** from `certifi` 2026.02.25 (`cacert.pem`, 272KB) - ❌ **Missing** from Fedora's extracted PEM bundle (`/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem`) - ✅ **Present** in Fedora's p11-kit native trust store (`trust list` shows "Comodo AAA Services root") This is why `curl` succeeds — curl on Fedora 43 uses the OpenSSL provider mechanism which can access p11-kit's PKCS#11 trust store directly, bypassing the PEM file. **Issue B — `verify=False` band-aid:** Instead of fixing the certificate verification, the current code disables it entirely with `verify=False`, which: - Accepts expired certificates - Accepts self-signed certificates - Is vulnerable to MITM attacks - Produces `InsecureRequestWarning` noise in logs ### 3.3 Environment Details | Component | Version | |-----------|---------| | Python | 3.14.3 (Fedora system) | | OpenSSL | 3.5.4 | | httpx | 0.28.1 | | certifi | 2026.02.25 | | ca-certificates | 2025.2.80_v9.0.304-1.2.fc43 | | OS | Fedora 43 (kernel 6.19) | ## 4. Proposed Fix ### Use `truststore` to access the native OS trust store The [`truststore`](https://truststore.readthedocs.io/) library provides an `ssl.SSLContext`-like API that accesses the **native OS certificate store** (p11-kit on Linux, Security framework on macOS, CryptoAPI on Windows). This is the [official recommendation from httpx](https://www.python-httpx.org/advanced/ssl/). **Changes implemented:** ### Approach A: truststore (REJECTED — did not work) `truststore.SSLContext` was tested but loaded 0 certs on this Fedora 43 / OpenSSL 3.5.4 setup. `cert_store_stats()` raises `NotImplementedError`. The PKCS#11 provider in `openssl.cnf` is commented out. This approach was abandoned. ### Approach B: certifi + extra certs directory (IMPLEMENTED ✅) 1. **`webscraper/certs/comodo-aaa-services-root.pem`** — Missing root CA extracted from p11-kit 2. **`src/server.py`** — New `_build_ssl_context()` at module load: ```python import ssl import certifi from pathlib import Path _EXTRA_CERTS_DIR = Path(__file__).resolve().parent.parent / "certs" def _build_ssl_context() -> ssl.SSLContext: """Build an SSL context from certifi + extra bundled root certs.""" ctx = ssl.create_default_context(cafile=certifi.where()) if _EXTRA_CERTS_DIR.is_dir(): for pem in _EXTRA_CERTS_DIR.glob("*.pem"): ctx.load_verify_locations(cafile=str(pem)) return ctx _SSL_CTX = _build_ssl_context() ``` ### Why this approach? | Approach | Problem | |----------|---------| | `verify=False` | **Previous** — disabled all security | | `verify=certifi.where()` | certifi bundle doesn't have the Comodo root CA | | `ssl.create_default_context()` | Uses the same broken system PEM file | | `sudo update-ca-trust` | System-level fix, requires root, didn't fully work | | `truststore.SSLContext` | ❌ Loaded 0 certs on this setup, NotImplementedError | | **certifi + extra certs dir** | ✅ **Works!** Certifi base + project-bundled missing CAs | ### Benefits of this approach: - No `verify=False` — proper SSL verification restored - Missing CAs can be added by dropping `.pem` files into `certs/` - No extra dependencies beyond certifi (already a transitive dep of httpx) - SSL context built once at module load — no per-request overhead - Works on all platforms (certifi is cross-platform) ### System-level fix (optional, for curl and other apps): ```bash sudo cp webscraper/certs/comodo-aaa-services-root.pem /etc/pki/ca-trust/source/anchors/ sudo update-ca-trust extract ``` ## 5. Test Impact - Existing tests use mocked `httpx.get` calls → **no test changes needed for SSL** - Fixed pre-existing `test_404` bug: `HTTPStatusError` requires `request=` kwarg (httpx API) - Fixed `test_404` assertion: error message must include "404" text - **18/18 tests passing** ## 6. Risk Assessment | Risk | Level | Mitigation | |------|-------|------------| | Bundled cert expires (2028-12-31) | Low | Well before then, certifi/system will include it | | Some Cloudflare URLs fail on other machines | Low | Same cert can be added to `certs/` | | New missing CAs in the future | Low | Drop `.pem` into `certs/` — no code change needed |