chore: reorganize into polyglot monorepo (workshop)

- Move bigmind/ -> mcp/bigmind/
- Move webscraper/ -> mcp/webscraper/
- Move mss-failsafe/ -> java/mss-failsafe/
- Move Wellmann-Shop/ -> java/wellmann-shop/ (normalize to kebab-case)
- Add .roo/ IDE config files to tracking
- Add plans/REPO_STRATEGY.md (monorepo strategy document)
- Expand .gitignore: Java/Maven, Node/TS, coverage, uv.lock
- Rewrite README.md as navigation index
- Update .roo/mcp.json webscraper path to mcp/webscraper/
This commit is contained in:
Patrick Plate
2026-04-04 08:51:15 +02:00
parent 4167e15ed9
commit 155d56e8e8
1598 changed files with 19429 additions and 23 deletions
+42
View File
@@ -0,0 +1,42 @@
# Webscraper MCP Server
MCP server for web scraping operations: fetch pages, extract links/tables, parse sitemaps.
## Tools
- `webscraper_fetch(url, max_chars=5000)` — Title + markdown body + metadata
- `webscraper_fetch_links(url, deduplicate=True)` — Extract all hrefs
- `webscraper_fetch_tables(url)` — HTML tables as markdown
- `webscraper_fetch_all(url, max_chars=5000)` — Everything in one call
- `webscraper_fetch_section(url, selector)` — Specific CSS section
- `webscraper_fetch_meta(url)` — Title, description, OG tags
- `webscraper_fetch_sitemap(url, max_urls=100)` — Sitemap URL list
## Stack
- httpx (HTTP client)
- BeautifulSoup4 + lxml (HTML parsing)
- html2text (HTML to markdown)
## Run
```bash
./run.sh # uv sync && uv run src/server.py
```
## Tests
```bash
uv run pytest tests/ --cov=src
```
## MCP Config
Add to `.roo/mcp.json`:
```json
"webscraper": {
"command": "uv",
"args": ["run", "--directory", "/home/pplate/pi_mcps/webscraper", "src/server.py"]
}
```