155d56e8e8
- Move bigmind/ -> mcp/bigmind/ - Move webscraper/ -> mcp/webscraper/ - Move mss-failsafe/ -> java/mss-failsafe/ - Move Wellmann-Shop/ -> java/wellmann-shop/ (normalize to kebab-case) - Add .roo/ IDE config files to tracking - Add plans/REPO_STRATEGY.md (monorepo strategy document) - Expand .gitignore: Java/Maven, Node/TS, coverage, uv.lock - Rewrite README.md as navigation index - Update .roo/mcp.json webscraper path to mcp/webscraper/
Webscraper MCP Server
MCP server for web scraping operations: fetch pages, extract links/tables, parse sitemaps.
Tools
webscraper_fetch(url, max_chars=5000)— Title + markdown body + metadatawebscraper_fetch_links(url, deduplicate=True)— Extract all hrefswebscraper_fetch_tables(url)— HTML tables as markdownwebscraper_fetch_all(url, max_chars=5000)— Everything in one callwebscraper_fetch_section(url, selector)— Specific CSS sectionwebscraper_fetch_meta(url)— Title, description, OG tagswebscraper_fetch_sitemap(url, max_urls=100)— Sitemap URL list
Stack
- httpx (HTTP client)
- BeautifulSoup4 + lxml (HTML parsing)
- html2text (HTML to markdown)
Run
./run.sh # uv sync && uv run src/server.py
Tests
uv run pytest tests/ --cov=src
MCP Config
Add to .roo/mcp.json:
"webscraper": {
"command": "uv",
"args": ["run", "--directory", "/home/pplate/pi_mcps/webscraper", "src/server.py"]
}