Files

T

Patrick Plate 155d56e8e8 chore: reorganize into polyglot monorepo (workshop)

- Move bigmind/ -> mcp/bigmind/
- Move webscraper/ -> mcp/webscraper/
- Move mss-failsafe/ -> java/mss-failsafe/
- Move Wellmann-Shop/ -> java/wellmann-shop/ (normalize to kebab-case)
- Add .roo/ IDE config files to tracking
- Add plans/REPO_STRATEGY.md (monorepo strategy document)
- Expand .gitignore: Java/Maven, Node/TS, coverage, uv.lock
- Rewrite README.md as navigation index
- Update .roo/mcp.json webscraper path to mcp/webscraper/

2026-04-04 08:51:15 +02:00

certs

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

src

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

tests

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

.coverage

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

ASSESSMENT.md

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

coverage.xml

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

pyproject.toml

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

README.md

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

run.sh

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

uv.lock

chore: reorganize into polyglot monorepo (workshop)

2026-04-04 08:51:15 +02:00

README.md

Webscraper MCP Server

MCP server for web scraping operations: fetch pages, extract links/tables, parse sitemaps.

Tools

webscraper_fetch(url, max_chars=5000) — Title + markdown body + metadata
webscraper_fetch_links(url, deduplicate=True) — Extract all hrefs
webscraper_fetch_tables(url) — HTML tables as markdown
webscraper_fetch_all(url, max_chars=5000) — Everything in one call
webscraper_fetch_section(url, selector) — Specific CSS section
webscraper_fetch_meta(url) — Title, description, OG tags
webscraper_fetch_sitemap(url, max_urls=100) — Sitemap URL list

Stack

httpx (HTTP client)
BeautifulSoup4 + lxml (HTML parsing)
html2text (HTML to markdown)

Run

./run.sh  # uv sync && uv run src/server.py

Tests

uv run pytest tests/ --cov=src

MCP Config

Add to .roo/mcp.json:

"webscraper": {
  "command": "uv",
  "args": ["run", "--directory", "/home/pplate/pi_mcps/webscraper", "src/server.py"]
}