# Webscraper MCP Server MCP server for web scraping operations: fetch pages, extract links/tables, parse sitemaps. ## Tools - `webscraper_fetch(url, max_chars=5000)` — Title + markdown body + metadata - `webscraper_fetch_links(url, deduplicate=True)` — Extract all hrefs - `webscraper_fetch_tables(url)` — HTML tables as markdown - `webscraper_fetch_all(url, max_chars=5000)` — Everything in one call - `webscraper_fetch_section(url, selector)` — Specific CSS section - `webscraper_fetch_meta(url)` — Title, description, OG tags - `webscraper_fetch_sitemap(url, max_urls=100)` — Sitemap URL list ## Stack - httpx (HTTP client) - BeautifulSoup4 + lxml (HTML parsing) - html2text (HTML to markdown) ## Run ```bash ./run.sh # uv sync && uv run src/server.py ``` ## Tests ```bash uv run pytest tests/ --cov=src ``` ## MCP Config Add to `.roo/mcp.json`: ```json "webscraper": { "command": "uv", "args": ["run", "--directory", "/home/pplate/pi_mcps/webscraper", "src/server.py"] } ```