🕸️ mcp-webscraper — Web Scraping

mcp-webscraper is a FastMCP server providing comprehensive web scraping and data extraction capabilities. It fetches pages, converts HTML to clean Markdown, extracts tables, links, CSS sections, metadata, and sitemaps.

Tools

Tool	Description
`webscraper_fetch(url, max_chars=5000)`	Title + full page as Markdown + metadata
`webscraper_fetch_links(url, deduplicate=True)`	All `href` links found on the page
`webscraper_fetch_tables(url)`	All HTML tables converted to Markdown
`webscraper_fetch_all(url, max_chars=5000)`	Everything in one call
`webscraper_fetch_section(url, selector)`	Specific CSS selector section only
`webscraper_fetch_meta(url)`	Title, description, Open Graph tags
`webscraper_fetch_sitemap(url, max_urls=100)`	Parse sitemap.xml, return URL list

Stack

HTTP client: httpx (async, with SSL support)
HTML parser: BeautifulSoup4 + lxml
Markdown converter: html2text

SSL Note — Fedora 43

Fedora 43 is missing the Comodo AAA Services Root CA needed for Cloudflare-protected sites. The fix is bundled at mcp/webscraper/certs/comodo-aaa-services-root.pem — applied automatically, no manual config needed.

Quick Start

cd mcp/webscraper
uv sync
./run.sh

Usage Examples

# Fetch a page as Markdown
webscraper_fetch("https://docs.fastmcp.dev", max_chars=10000)

# Extract all links from Gitea repo
webscraper_fetch_links("http://192.168.188.119:30008/pplate/pi_mcps")

# Get all tables
webscraper_fetch_tables("https://pypi.org/project/fastmcp/")

# Get Open Graph metadata
webscraper_fetch_meta("https://github.com/comfyanonymous/ComfyUI")

# Fetch specific section by CSS selector
webscraper_fetch_section("https://docs.python.org", "#content")

🕸️ mcp-webscraper — Web Scraping

Tools

Stack

SSL Note — Fedora 43

Quick Start

Usage Examples

🔧 pi_mcps Wiki

Overview

MCP Servers

Java Projects

🌿 CannaManage