Files
pi_mcps/.roo/rules/05-webscraper-research.md

4.2 KiB
Raw Permalink Blame History

Web Research Rules — Use webscraper_search_hint Proactively

Rule: Search Before Asking

Before asking Patrick for information about a library, framework, API, technology, or error — always try webscraper_search_hint first.

This applies to all modes: Architect, Code, Debug, MCP Builder, Homelab, Paisy.

Why

  • webscraper_search_hint uses Brave Search — no API key, no setup, always available
  • Brave returns real results without CAPTCHA or consent walls (Google/DuckDuckGo both block)
  • Handles special characters correctly (C++, &, %, etc. — URL-encoded automatically)
  • The hint field gives immediately actionable title + URL + snippet without further calls

The Two-Step Pattern

Step 1: webscraper_search_hint("2-3 keyword query") → structured results + hint string
Step 2: webscraper_fetch(best_url, max_chars=8000)   → full page content

Never skip Step 1. It costs one tool call and often reveals the exact page to read.

Step 1 Output

The tool returns:

  • hint — pipe-separated "Title (url): snippet[:120]" — read this first
  • results[] — array of {title, url, snippet} — pick the most relevant URL
  • search_url — the Brave search URL used (useful for debugging)
  • result_count — number of results returned

Step 2 Output

webscraper_fetch(url) returns full page as Markdown. Use max_chars to control size (default 5000; use 800012000 for deep doc reads).


Mode-Specific Guidance

🏗️ Architect Mode

  • Before designing any system or feature: search for existing patterns, reference architectures, and official docs
  • Example: planning a new MCP server → webscraper_search_hint("FastMCP server patterns 2025")
  • Example: choosing between two libraries → search both and read their official comparison pages

🪲 Debug Mode

  • Search the exact error message before forming hypotheses
  • Example: webscraper_search_hint("sqlite3 ProgrammingError Cannot operate closed database Python")
  • If the error is long, take the most distinctive phrase (2-5 words) as the query

💻 Code Mode

  • Before implementing a feature using an unfamiliar API: search the official docs URL pattern first
  • Example: webscraper_search_hint("httpx async client connection pool settings")

🔧 MCP Builder Mode

  • Check FastMCP changelog/docs before implementing new patterns
  • Example: webscraper_search_hint("FastMCP tool decorator async 2025")
  • Example: webscraper_search_hint("FastMCP context lifespan")

🏠 Homelab Mode

  • Look up Docker/TrueNAS configs, package versions, service docs before asking Patrick
  • Example: webscraper_search_hint("Gitea webhook payload format")

Query Crafting Tips

Good queries Bad queries
"httpx timeout settings" "how do I configure httpx timeouts in Python async code"
"FastMCP tool decorator" "mcp server python tool registration method"
"sqlite WAL mode enable" "sqlite performance mode for concurrent reads"
"Brave Search API no key" "search engine that works without api key or captcha"
  • Use 24 keywords, not full sentences
  • Prefer library/framework name + specific feature
  • For errors: distinctive phrase from the message, not the full stack trace

Known Limitations

  • Reddit / Stack Overflow snippets — these platforms block snippet extraction; you may get empty snippets. The URL is still valid — fetch it directly if needed.
  • Brave CSS selector fragility — Brave uses Svelte-generated class names that change. If webscraper_search_hint returns 0 results unexpectedly, the scraper's CSS selectors may need updating. Last verified working: 2026-04-05.
  • Use sparingly — one search call per research task to orient; then fetch specific pages. Don't call it in a loop.

Anti-Patterns to Avoid

  • Asking Patrick "what's the FastMCP syntax for X?" before searching
  • Designing architecture without looking up existing solutions first
  • Forming a debug hypothesis without searching the error message
  • Writing code against an API from memory without verifying current docs
  • Calling webscraper_search_hint more than 2-3 times for the same topic (broaden/narrow the query instead)