pi_mcps/mcp/webscraper/README.md

# Webscraper MCP Server

MCP server for web scraping operations: fetch pages, extract links/tables, parse sitemaps.

## Tools

- `webscraper_fetch(url, max_chars=5000)` — Title + markdown body + metadata
- `webscraper_fetch_links(url, deduplicate=True)` — Extract all hrefs
- `webscraper_fetch_tables(url)` — HTML tables as markdown
- `webscraper_fetch_all(url, max_chars=5000)` — Everything in one call
- `webscraper_fetch_section(url, selector)` — Specific CSS section
- `webscraper_fetch_meta(url)` — Title, description, OG tags
- `webscraper_fetch_sitemap(url, max_urls=100)` — Sitemap URL list

## Stack

- httpx (HTTP client)
- BeautifulSoup4 + lxml (HTML parsing)
- html2text (HTML to markdown)

## Run

```bash
./run.sh  # uv sync && uv run src/server.py
```

## Tests

```bash
uv run pytest tests/ --cov=src
```

## MCP Config

Add to `.roo/mcp.json`:

```json
"webscraper": {
  "command": "uv",
  "args": ["run", "--directory", "/home/pplate/pi_mcps/webscraper", "src/server.py"]
}
```