Page:
mcp-webscraper
Pages
BigMind
CannaManage 01 Charter
CannaManage 02 UserStories
CannaManage 03 Architecture
CannaManage 04 Flowcharts
CannaManage 05 API
CannaManage 06 Wireframes
CannaManage 07 CodingStandards
CannaManage 08 TestPlan
CannaManage 09 Deployment
CannaManage 10 Retrospective
CannaManage Home
Development Conventions
Home
Java Architecture
Java Projects
Java mss failsafe
Java wellmann shop
MCP-Servers-Overview
MCP Servers Overview
mcp image gen ComfyUI Setup
mcp-image-gen
mcp image gen
mcp-webscraper
mcp webscraper
mss-failsafe
wellmann-shop
Clone
1
mcp-webscraper
pplate edited this page 2026-04-04 14:35:21 +02:00
Table of Contents
🕸️ mcp-webscraper — Web Scraping
mcp-webscraper is a FastMCP server providing comprehensive web scraping and data extraction capabilities. It fetches pages, converts HTML to clean Markdown, extracts tables, links, CSS sections, metadata, and sitemaps.
Tools
| Tool | Description |
|---|---|
webscraper_fetch(url, max_chars=5000) |
Title + full page as Markdown + metadata |
webscraper_fetch_links(url, deduplicate=True) |
All href links found on the page |
webscraper_fetch_tables(url) |
All HTML tables converted to Markdown |
webscraper_fetch_all(url, max_chars=5000) |
Everything in one call |
webscraper_fetch_section(url, selector) |
Specific CSS selector section only |
webscraper_fetch_meta(url) |
Title, description, Open Graph tags |
webscraper_fetch_sitemap(url, max_urls=100) |
Parse sitemap.xml, return URL list |
Stack
- HTTP client:
httpx(async, with SSL support) - HTML parser:
BeautifulSoup4+lxml - Markdown converter:
html2text
SSL Note — Fedora 43
Fedora 43 is missing the Comodo AAA Services Root CA needed for Cloudflare-protected sites. The fix is bundled at mcp/webscraper/certs/comodo-aaa-services-root.pem — applied automatically, no manual config needed.
Quick Start
cd mcp/webscraper
uv sync
./run.sh
Usage Examples
# Fetch a page as Markdown
webscraper_fetch("https://docs.fastmcp.dev", max_chars=10000)
# Extract all links from Gitea repo
webscraper_fetch_links("http://192.168.188.119:30008/pplate/pi_mcps")
# Get all tables
webscraper_fetch_tables("https://pypi.org/project/fastmcp/")
# Get Open Graph metadata
webscraper_fetch_meta("https://github.com/comfyanonymous/ComfyUI")
# Fetch specific section by CSS selector
webscraper_fetch_section("https://docs.python.org", "#content")
🔧 pi_mcps Wiki
Overview
MCP Servers
Java Projects
🌿 CannaManage
- 🏠 Overview
- 📋 Project Charter
- 📖 User Stories
- 🏗️ Architecture
- 🔄 Flow Charts
- 🔌 API Spec
- 🎨 Wireframes
- 📏 Coding Standards
- 🧪 Test Plan
- 🚀 Deployment
- 🔍 Retrospective
