1010 B
1010 B
Webscraper MCP Server
MCP server for web scraping operations: fetch pages, extract links/tables, parse sitemaps.
Tools
webscraper_fetch(url, max_chars=5000)— Title + markdown body + metadatawebscraper_fetch_links(url, deduplicate=True)— Extract all hrefswebscraper_fetch_tables(url)— HTML tables as markdownwebscraper_fetch_all(url, max_chars=5000)— Everything in one callwebscraper_fetch_section(url, selector)— Specific CSS sectionwebscraper_fetch_meta(url)— Title, description, OG tagswebscraper_fetch_sitemap(url, max_urls=100)— Sitemap URL list
Stack
- httpx (HTTP client)
- BeautifulSoup4 + lxml (HTML parsing)
- html2text (HTML to markdown)
Run
./run.sh # uv sync && uv run src/server.py
Tests
uv run pytest tests/ --cov=src
MCP Config
Add to .roo/mcp.json:
"webscraper": {
"command": "uv",
"args": ["run", "--directory", "/home/pplate/pi_mcps/webscraper", "src/server.py"]
}