5a96359bb1
- _build_ssl_context() loads certifi bundle + all *.pem from certs/ dir - _SSL_CTX singleton built at module load, passed to httpx.get(verify=...) - Fixes SSLCertVerificationError on Cloudflare-served sites on Fedora 43 (Comodo AAA root cert missing from system trust store) - test_server.py: fix HTTPStatusError mock to include request= param
Webscraper MCP Server
MCP server for web scraping operations: fetch pages, extract links/tables, parse sitemaps.
Tools
webscraper_fetch(url, max_chars=5000)— Title + markdown body + metadatawebscraper_fetch_links(url, deduplicate=True)— Extract all hrefswebscraper_fetch_tables(url)— HTML tables as markdownwebscraper_fetch_all(url, max_chars=5000)— Everything in one callwebscraper_fetch_section(url, selector)— Specific CSS sectionwebscraper_fetch_meta(url)— Title, description, OG tagswebscraper_fetch_sitemap(url, max_urls=100)— Sitemap URL list
Stack
- httpx (HTTP client)
- BeautifulSoup4 + lxml (HTML parsing)
- html2text (HTML to markdown)
Run
./run.sh # uv sync && uv run src/server.py
Tests
uv run pytest tests/ --cov=src
MCP Config
Add to .roo/mcp.json:
"webscraper": {
"command": "uv",
"args": ["run", "--directory", "/home/pplate/pi_mcps/webscraper", "src/server.py"]
}