- sprint12-analysis.md (full page audit) - sprint12-plan.md (button fix plan) - sprint12-testplan.md (button fix test plan) - sprint12-phase2-integration-tests.md (v3, expert-approved) - sprint12-phase2-panel-review.md (3 review cycles, 95% confidence) - sprint12-code-review.md (approved with comments, blockers fixed)
35 KiB
Expert Panel Review: Sprint 12 Phase 2 — Integration Tests with Seed DB
Datum: 2026-06-18
Artifact: docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md (v1)
Panel: Domain Expert 🏛️ | Architecture Expert 🔧 | Risk & Reliability Expert 🛡️
Ticket: CANNAMANAGE-SPRINT12
🏛️ Expert 1: Domain Expert (Cannabis Club Regulatory Compliance / KCanG)
Assessment: ⚠️ Mostly Sound — 2 Gaps
Strengths
- Under-21 quota testing is present — Test case 3.3 #4 explicitly tests the 30g/month limit for under-21 members (Jonas Weber,
is_under_21=true). This is KCanG §3 Abs. 2 critical. - Destruction records in seed data — Plan includes compliance audit trail (V23
destruction_records), which is mandatory for KCanG §16 documentation requirements. - Multi-role seed accounts — Admin, Staff, Member accounts cover the role hierarchy that regulatory audits require (who did what, with what authority).
- Deterministic UUIDs — Critical for regulatory audit trail assertions: you can verify exactly which member received which distribution.
Findings
| # | Severity | Finding | Recommendation |
|---|---|---|---|
| D-1 | ⚠️ Warning | Adult quota limit not tested. Plan tests under-21 limit (30g/month) but does NOT test the adult limit (50g/month, max 25g/day). KCanG §3 Abs. 1 requires both. | Add test case: Attempt >25g single distribution for adult member → expect rejection |
| D-2 | ⚠️ Warning | No THC% limit test for under-21. KCanG §3 Abs. 2 Nr. 3 limits THC to 10% for under-21 members. Jonas Weber (under-21) is distributed "CBD Critical Mass" (5% THC) — but there's no test that tries to distribute a high-THC strain to an under-21 member. | Add negative test: Distribute "Amnesia Haze" (22% THC) to Jonas → expect rejection with specific error |
| D-3 | ℹ️ Info | Compliance deadlines seed uses fixed statuses. The plan mentions PENDING, OVERDUE, COMPLETED but uses fixed dates. If tests rely on OVERDUE status, date drift will eventually make assertions wrong. | Use NOW() - INTERVAL for overdue deadlines, or recalculate status at assertion time |
| D-4 | ℹ️ Info | Distribution recorded_by references member UUID, not admin UUID. In practice, distributions should be recorded by staff/admin, not self-service. The seed shows member c1...001 as recorded_by which is technically allowed but non-standard for audits. |
Consider using admin UUID b1000000-...001 as recorded_by for realism |
| D-5 | ✅ Good | Document retention testing. Documents seed covers all categories (SATZUNG, PROTOKOLL, VERTRAG, SONSTIGES) — important for KCanG §19 documentation duties. | |
| D-6 | ✅ Good | Grow tracking covers lifecycle. SEEDLING → VEGETATIVE → FLOWERING stages match KCanG §2 cultivation documentation requirements. |
Domain Verdict: ⚠️ ACCEPTABLE with D-1 and D-2 as recommended additions
The plan covers the core regulatory-critical paths (quota enforcement, audit trail, compliance deadlines, document management). The two missing negative tests (D-1 daily limit, D-2 THC% limit for under-21) are important regulatory edge cases but not plan-blocking — they can be added during implementation.
🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)
Assessment: ⚠️ Concerns — 3 Issues (1 blocking)
Strengths
- Flyway repeatable migration for seed (Option A) — Cleanest solution.
R__seed_test_data.sqlwith profile-gated location is idiomatic Spring Boot + Flyway. Correct choice. - API client abstraction — Separating UI assertions from API-level verification is architecturally sound. Enables both black-box and white-box testing.
- Suite-level reset with ordered execution — Given
fullyParallel: falseis already in the config, this is the pragmatic choice over per-test isolation. - Existing Playwright project structure — Plan correctly identifies adding an
integrationproject withdependencies: ["setup"]— aligns with the existing 3-project pattern.
Findings
| # | Severity | Finding | Recommendation |
|---|---|---|---|
| A-1 | ❌ Blocking | Seed timing contradiction. Section 5.3 correctly identifies that init.sql as /docker-entrypoint-initdb.d/99-seed.sql runs at Postgres init — BEFORE Flyway creates tables (Flyway runs on backend startup). The recommended Option A (Flyway R__seed_test_data.sql) solves this, but Section 5.1 still shows ./scripts/seed/init.sql:/docker-entrypoint-initdb.d/99-seed.sql:ro in the docker-compose override. These are contradictory — you can't do both. If you use Option A, the volume mount becomes dead code. If you keep the volume mount, Option A is unnecessary. |
Remove the 99-seed.sql volume mount from docker-compose.test.yml Section 5.1. The seed must come from Flyway R__ migration only. The scripts/seed/ files become the source-of-truth for copy into src/main/resources/db/migration/test/. |
| A-2 | ⚠️ Warning | tmpfs for Postgres may cause issues with Docker Desktop on macOS. Docker Desktop's Linux VM doesn't handle tmpfs the same as native Linux. On some versions, Postgres fails to start with tmpfs due to permission issues or the VM not forwarding tmpfs syscalls correctly. CI (GitHub Actions on ubuntu-latest) is fine, but local development on macOS may fail. |
Add a conditional: use tmpfs only when CI=true, otherwise use regular volume. Or document this as "CI-only optimization" and keep the named volume for local test runs. |
| A-3 | ⚠️ Warning | The seed container is still referenced in docker-compose.test.yml but plan says "Remove seed container." Section 5.2 states "Remove seed container" as an improvement, but the existing file still has it. The plan must be explicit: in the new docker-compose.test.yml, the seed service is replaced by the Flyway R__ migration + the global-setup DB readiness check. |
Explicitly state: delete the seed service from docker-compose.test.yml. Replace the depends_on: seed: condition: service_completed_successfully on the playwright service with depends_on: backend: condition: service_healthy. |
| A-4 | ⚠️ Warning | Playwright runs inside Docker container but mounts host ./cannamanage-frontend. The volume mount ./cannamanage-frontend:/app means the container uses host node_modules (if present) or needs to install them. Since Playwright image doesn't include project deps, the test command (npx playwright test) will fail unless deps are installed first. Current system-test.spec.ts works because it's a single file with minimal deps, but 12 integration spec files with helpers will need the full pnpm install step. |
Add pnpm install --frozen-lockfile before the test command in the playwright service, or use a multi-stage Dockerfile for the playwright service that pre-installs deps. |
| A-5 | ℹ️ Info | No explicit base URL override for API client. The ApiClient connects to the backend directly (baseUrl). Inside Docker, this should be http://backend:8080, not http://localhost:8080. The plan shows BASE_URL: http://frontend:3000 in the environment but doesn't define a BACKEND_URL for direct API calls. |
Add BACKEND_URL: http://backend:8080 to the playwright service environment. The API client should read from process.env.BACKEND_URL ?? 'http://localhost:8080'. |
| A-6 | ✅ Good | Authentication reuse via storageState — Correct pattern, avoids per-test login overhead. | |
| A-7 | ✅ Good | Profile-gated test endpoint (POST /api/v1/test/reset-db only on test profile) — Proper security boundary. |
Architecture Verdict: ⚠️ REVISE — A-1 is contradictory and needs resolution
The seed timing contradiction (A-1) is a plan consistency error that will cause confusion during implementation. A-2 through A-5 are addressable during implementation without plan revision, but A-1 needs explicit resolution in the plan document.
🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Coverage)
Assessment: ⚠️ Acceptable — manageable risks
Strengths
- "Never use
waitForTimeout" — Explicitly stated in Section 4.3. This is the #1 rule for non-flaky Playwright tests. - Trace collection on first retry — Correct debugging strategy for CI.
- Liberal timeouts for Docker networking — 90s test / 60s navigation accounts for cold-start Docker overhead.
- DB verification via API — Tests don't solely rely on UI state, which is inherently more fragile.
- Test ordering strategy — Read-only tests first, CRUD tests last. Reduces state pollution.
Findings
| # | Severity | Finding | Recommendation |
|---|---|---|---|
| R-1 | ⚠️ Warning | 60+ tests in <5 minutes is ambitious. Each test navigates, waits for API response, performs assertions. With Docker networking latency, expect 3-8 seconds per test. 60 tests × 5s average = 5 minutes. CRUD tests that submit forms and wait for toasts will be slower. Real expectation: 6-8 minutes. | Adjust success criterion to "< 8 minutes" or reduce test count per spec. Alternatively, enable parallel test execution for read-only specs (they don't mutate state). |
| R-2 | ⚠️ Warning | Suite-level reset means CRUD test failures corrupt state for subsequent tests. If test 3.2#3 (create member) fails mid-way (e.g., form submitted but toast timeout), the DB now has a partial member. All subsequent tests that count rows will fail with confusing "expected 5, got 6" errors. | Add a "CRUD section reset" mechanism: before each CRUD-heavy describe block, call apiClient.resetDb(). Or structure specs so each CRUD test verifies against its own created data, not against absolute counts. |
| R-3 | ⚠️ Warning | No mention of data-testid strategy. Section 9 lists this as an "open question" but it's actually critical for test reliability. CSS selectors and text-based selectors ("5 Mitglieder") are brittle — a translation change, number format change, or design refactor breaks tests. |
Decide NOW: use data-testid attributes on all interactive elements and key display elements. This is not optional for a 60+ test suite — it's a prerequisite. |
| R-4 | ⚠️ Warning | Hardcoded expected values in tests. Many assertions reference specific values: "5 Mitglieder", "Northern Lights 18.5% THC", "30€". If the seed data changes (even a typo fix), these tests break. | Create a seed-constants.ts file that exports expected values derived from a single source of truth. Tests import from this file. When seed changes, update one file. |
| R-5 | ℹ️ Info | No retry strategy for the DB reset endpoint. If the POST /api/v1/test/reset-db call fails or times out (backend GC pause, connection pool exhaustion), the entire suite fails. |
Add retry logic (3 attempts, 2s backoff) in the global-setup for the DB reset call. |
| R-6 | ℹ️ Info | Monthly quota seed uses fixed year/month (2024/12). Tests checking quota display may show "no quota for current month" because the seed references December 2024, not the current month. | Use dynamic month in seed OR test assertions should navigate to the historical period, not assume "current month" view shows seed data. |
| R-7 | ℹ️ Info | No parallel execution plan for read-only tests. The plan states fullyParallel: false globally, but read-only tests (all #1 and #2 cases per spec) could safely run in parallel since they don't mutate state. This would cut execution time by 30-40%. |
Consider splitting into two Playwright projects: integration-read (parallel) and integration-write (serial). |
| R-8 | ✅ Good | Artifact collection strategy is comprehensive — screenshots, traces, backend logs, HTML report. This is sufficient for debugging CI failures. | |
| R-9 | ✅ Good | CI timeout of 10 minutes — Appropriate safety margin above the expected 5-8 minute runtime. |
Reliability Verdict: ⚠️ ACCEPTABLE — R-2 and R-3 are the main risks
The plan will produce working tests, but without data-testid (R-3) and with suite-level-only reset (R-2), expect a 15-25% maintenance burden from flaky/cascading failures within the first 3 months. These are addressable during implementation without plan revision.
Panel Synthesis
Confidence Scores
| Expert | Confidence | Reasoning |
|---|---|---|
| 🏛️ Domain | 82% | Core regulatory paths covered; 2 missing edge cases (daily limit, THC% limit for under-21) are non-blocking |
| 🔧 Architecture | 65% | Seed timing contradiction (A-1) is a consistency error that will cause implementation confusion |
| 🛡️ Reliability | 75% | Fundamentally sound approach but data-testid decision and reset granularity need resolution |
Overall Panel Confidence: 74%
Combined Findings by Severity
❌ Blocking (1)
| ID | Expert | Finding |
|---|---|---|
| A-1 | 🔧 Architecture | Seed timing contradiction: docker-compose.test.yml still mounts init.sql to docker-entrypoint-initdb.d while recommending Flyway R__ migration. These are mutually exclusive approaches — plan must pick one and remove the other. |
⚠️ Warnings (9)
| ID | Expert | Finding |
|---|---|---|
| D-1 | 🏛️ Domain | No adult daily quota limit (25g/day) test |
| D-2 | 🏛️ Domain | No THC% limit test for under-21 members |
| A-2 | 🔧 Architecture | tmpfs may fail on Docker Desktop macOS |
| A-3 | 🔧 Architecture | seed container removal not explicitly reflected in compose |
| A-4 | 🔧 Architecture | Playwright container needs pnpm install before tests |
| A-5 | 🔧 Architecture | No BACKEND_URL env for API client inside Docker |
| R-1 | 🛡️ Reliability | 5-minute target unrealistic for 60+ tests with Docker overhead |
| R-2 | 🛡️ Reliability | Suite-level reset causes cascading failures on CRUD test errors |
| R-3 | 🛡️ Reliability | data-testid strategy must be decided before implementation |
ℹ️ Info (6)
| ID | Expert | Finding |
|---|---|---|
| D-3 | 🏛️ Domain | Fixed-date compliance deadlines may drift |
| D-4 | 🏛️ Domain | recorded_by should reference admin, not member |
| R-4 | 🛡️ Reliability | Hardcoded expected values — use seed-constants.ts |
| R-5 | 🛡️ Reliability | No retry on DB reset endpoint |
| R-6 | 🛡️ Reliability | Monthly quota seed uses fixed 2024/12, not current month |
| R-7 | 🛡️ Reliability | Read-only tests could run in parallel for speed |
Panel Verdict
🔄 REVISE — 1 blocking finding must be resolved in the plan
Required before implementation:
- Resolve A-1: Remove the
docker-entrypoint-initdb.dvolume mount from the proposed docker-compose.test.yml changes. Make it unambiguous that seed data flows throughR__seed_test_data.sqlvia Flyway only.
Strongly recommended (can be addressed during implementation):
2. Resolve R-3: Commit to data-testid attributes as the selector strategy. Add a section "Selector Strategy" to the plan.
3. Add D-1 and D-2 test cases to the Distributions spec (regulatory completeness).
4. Add pnpm install to the playwright service command (A-4).
Nice-to-have (implementation decisions):
5. Adjust execution time target from 5 min to 8 min (R-1).
6. Add BACKEND_URL environment variable for API client (A-5).
7. Create seed-constants.ts for maintainable assertions (R-4).
Panel review completed 2026-06-18 by Plan Reviewer mode.
v2 Re-Review (2026-06-18)
Reviewer: Roo (Plan Reviewer)
Document reviewed: docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md (v2)
Purpose: Verify all v1 findings have been properly addressed
❌ Blocker Resolution
| ID | v1 Finding | v2 Status | Evidence |
|---|---|---|---|
| A-1 | Seed timing contradiction: docker-compose mounts init.sql to docker-entrypoint-initdb.d while recommending Flyway R__ migration |
✅ Resolved | Section 1.3 explicitly declares "Decision (v2): Flyway-only seeding — NO Docker docker-entrypoint-initdb.d mount." Section 5.1 docker-compose.test.yml has NO volume mount on the db service, with comment confirming intent. The contradiction is eliminated. |
⚠️ Warning Resolution
| ID | Expert | v1 Finding | v2 Status | Evidence |
|---|---|---|---|---|
| D-1 | 🏛️ | No adult daily quota limit (25g/day) test | ✅ Resolved | New 13-kcang-regulatory.spec.ts — test #1 (26g → rejection), test #6 (23g+3g exceeds daily), test #7 (23g+2g = exactly 25g → success). Thorough coverage of boundary. |
| D-2 | 🏛️ | No THC% limit test for under-21 members | ✅ Resolved | KCanG spec tests #3 (22% THC to U21 → rejection), #4 (0.5% THC to U21 → success), #5 (30g+ low-THC to U21 → quota error), #9 (UI shows "max. 10% THC" notice). Excellent coverage. |
| A-2 | 🔧 | tmpfs may fail on Docker Desktop macOS; network aliases unclear | ⚠️ Partially | Section 5.4 clearly documents Docker service-name networking (no custom aliases needed). However, tmpfs is still unconditional — no CI-only gating. Risk is LOW: CI runs on ubuntu-latest (native Linux), and local devs can override with docker compose -f docker-compose.yml -f docker-compose.test.yml up. Acceptable for implementation. |
| A-3 | 🔧 | seed container removal not explicitly reflected in compose |
✅ Resolved | Section 5.1 docker-compose.test.yml defines exactly 4 services: db, backend, frontend, playwright. No seed service exists. playwright depends on backend: condition: service_healthy. Clean. |
| A-4 | 🔧 | Playwright container needs pnpm install before tests |
✅ Resolved | Section 5.2 introduces Dockerfile.playwright with pnpm install --frozen-lockfile at build time. Dependencies are pre-installed in the image. |
| A-5 | 🔧 | No BACKEND_URL env for API client inside Docker | ✅ Resolved | Section 5.1 playwright service has API_URL: http://backend:8080. Section 2.7 global-setup uses this for health checks and API client initialization. |
| R-1 | 🛡️ | 5-minute target unrealistic for 60+ tests with Docker overhead | ⚠️ Partially | Plan now has per-test reset (~500ms × 70 = 35s overhead), making 5 min MORE ambitious. CI timeout is 10 min (appropriate). Test tagging splits @smoke (PR, <2 min) from @full (merge, 10 min). Section 7 success criteria still says "< 5 minutes total" — this is optimistic but the CI timeout and tagging strategy make it non-blocking. |
| R-2 | 🛡️ | Suite-level reset causes cascading failures on CRUD test errors | ✅ Resolved | Section 2.1 switches to per-test reset: "Decision (v2): Per-test reset via backend API endpoint + beforeEach hook." Each test calls apiClient.resetDb() — complete state isolation between tests. |
| R-3 | 🛡️ | data-testid strategy must be decided before implementation |
✅ Resolved | Section 2.2 commits to data-testid as mandatory. Naming convention defined (<page>-<component>-<identifier>), selectors.ts centralized file shown, implementation tracked as Phase 2C sub-task. |
ℹ️ Info Finding Resolution
| ID | Expert | v1 Finding | v2 Status | Evidence |
|---|---|---|---|---|
| D-3 | 🏛️ | Fixed-date compliance deadlines may drift | ✅ Resolved | Section 1.4 principle #3: "Use relative dates where possible (NOW() - INTERVAL '7 days') for time-sensitive tests". Per-test reset re-runs seed each time, keeping relative dates fresh. |
| D-4 | 🏛️ | recorded_by should reference admin, not member |
ℹ️ Noted | Not explicitly addressed in v2 — minor realism improvement, can be done during seed implementation. |
| R-4 | 🛡️ | Hardcoded expected values — use seed-constants.ts | ⚠️ Partially | No explicit seed-constants.ts file created, but deterministic UUIDs (Section 1.4 #1) + per-test reset ensure assertions are stable. The selectors.ts centralizes locators but not data values. Acceptable — can be added during implementation when duplication becomes apparent. |
| R-5 | 🛡️ | No retry on DB reset endpoint | ℹ️ Noted | No explicit retry logic shown for resetDb(). Implementation detail — the global-setup health check ensures backend is ready before any reset calls. Low risk. |
| R-6 | 🛡️ | Monthly quota seed uses fixed 2024/12, not current month | ✅ Resolved | Relative dates principle + Section 4.2 test tagging strategy (@smoke/@full with --grep). Seed data for quotas will use relative dates per principle #3. |
| R-7 | 🛡️ | Read-only tests could run in parallel for speed | ℹ️ Noted | Not adopted in v2. Per-test reset makes parallelism less critical (each test is independent). Can be optimized later if execution time becomes a concern. |
New Findings in v2
| ID | Severity | Finding | Impact |
|---|---|---|---|
| v2-1 | ℹ️ Info | Volume mount + Dockerfile build overlap. Section 5.1 mounts ./cannamanage-frontend/e2e:/app/e2e:ro into the playwright container which also COPYs e2e/ in Dockerfile.playwright. The volume mount overrides the built-in files at runtime. |
This is an intentional Docker pattern (enables test iteration without rebuild) but should be documented to avoid confusion. Non-blocking. |
| v2-2 | ℹ️ Info | Success criterion "< 5 minutes total" vs per-test reset overhead. 70+ tests × ~500ms reset + test execution (3-8s each) = realistic estimate is 6-8 minutes for @full suite. CI timeout of 10 min is correct, but the stated target is optimistic. |
Cosmetic — CI timeout is appropriate. Adjust success criterion to "< 8 minutes" post-implementation based on actual measurements. |
| v2-3 | ℹ️ Info | Dockerfile.playwright pinned to v1.49.0 — this should match the Playwright version in package.json to avoid browser/API version mismatches. |
Document: keep Dockerfile image version in sync with @playwright/test version in package.json. |
Updated Confidence Scores
| Expert | v1 Confidence | v2 Confidence | Change | Reasoning |
|---|---|---|---|---|
| 🏛️ Domain | 82% | 95% | +13 | D-1 and D-2 fully resolved with 9 comprehensive KCanG test cases. Regulatory coverage is now excellent. |
| 🔧 Architecture | 65% | 92% | +27 | A-1 blocker completely eliminated. Dockerfile.playwright, API_URL, health checks — all addressed. Only tmpfs (A-2) remains partially open but low-risk. |
| 🛡️ Reliability | 75% | 90% | +15 | Per-test reset (R-2) and data-testid commitment (R-3) are the two highest-impact improvements. Execution time target (R-1) is slightly optimistic but non-blocking. |
Overall Panel Confidence: 92% (v1: 74%, Δ: +18)
Panel Verdict (v2)
✅ APPROVED — Plan is ready for implementation
All blocking findings resolved. All critical warnings addressed. Remaining items (A-2 tmpfs conditional, R-1 time target, R-4 seed-constants.ts) are implementation-time decisions that don't require plan revision.
Key improvements in v2:
- Seed timing contradiction eliminated — Flyway-only, no Docker init.sql
- Per-test DB reset — complete test isolation via
beforeEach+ API endpoint data-testidstrategy committed — naming convention, selectors.ts, mandatory for Phase 2C- KCanG regulatory spec added — 9 comprehensive edge case tests covering §3 Abs. 1 + Abs. 2
- Dedicated Playwright Dockerfile with pre-installed dependencies
- Clear Docker networking documentation and health check strategy
Recommendation: Proceed to implementation. The 3 ℹ️ info items (v2-1, v2-2, v2-3) can be addressed during Phase 2C/2D without plan revision.
v2 Re-review completed 2026-06-18 by Plan Reviewer mode.
v3 Final Review (2026-06-18)
Reviewer: Roo (Plan Reviewer)
Document reviewed: docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md (v3 — final revision)
Purpose: Final gate review before implementation begins. All v2 partial/info items should now be fully resolved.
🏛️ Expert 1: Domain Expert (KCanG Regulatory Compliance)
Assessment: ✅ Excellent — No remaining gaps
| # | Check | Verdict | Evidence |
|---|---|---|---|
| D-1 | Adult daily 25g limit tested | ✅ | KCanG spec tests #1, #6, #7 — boundary cases (26g reject, 23+3 reject, 23+2 pass) |
| D-2 | Under-21 THC% limit tested | ✅ | KCanG spec tests #3, #4, #5, #9 — Amnesia Haze 22% → reject, CBD Critical Mass 0.5% → pass, UI notice |
| D-3 | Compliance deadlines use relative dates | ✅ | Section 1.4 principle #3: NOW() - INTERVAL for time-sensitive tests, per-test reset keeps fresh |
| D-4 | recorded_by references admin UUID |
✅ | Section 1.2: explicitly states "v3: recorded_by = admin UUID b1000000-...001, not member UUID" |
| D-5 | Monthly quota limits differentiated (50g adult vs 30g U21) | ✅ | seed-constants.ts exports KCANG.ADULT_MONTHLY_LIMIT_G: 50 and UNDER21_MONTHLY_LIMIT_G: 30 |
| D-6 | Seed covers all regulatory-critical entity types | ✅ | Destruction records, compliance deadlines, distribution audit trail, grow lifecycle all seeded |
New observation (v3):
| # | Severity | Finding | Impact |
|---|---|---|---|
| D-7 | ✅ Good | seed-constants.ts exports KCanG limits as constants. Tests import KCANG.ADULT_DAILY_LIMIT_G etc. — if regulations change (unlikely short-term, but possible with KCanG amendments), there is a single point of update. |
Excellent maintainability for regulatory compliance. |
| D-8 | ℹ️ Info | No "combined monthly" test (adult at 50g boundary). Tests cover 25g daily and under-21 30g monthly, but no test explicitly hits the adult 50g monthly ceiling. KCanG spec #2 tests 45g+6g=51g exceeding monthly, which implicitly covers it. | Covered via test #2 — no action needed. |
Domain Verdict: ✅ APPROVED — confidence 97%
All KCanG regulatory paths are comprehensively tested. The seed-constants.ts file makes regulatory limit changes trivial to propagate. The recorded_by fix ensures audit trail realism. No remaining domain gaps.
🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)
Assessment: ✅ Sound — All concerns resolved
| # | v2 Finding | v3 Resolution | Verdict |
|---|---|---|---|
| A-1 | Seed timing contradiction | ✅ Eliminated in v2 (Flyway-only, confirmed in v3) | ✅ |
| A-2 | tmpfs macOS issues | ✅ docker-compose.test.local.yml override with named volume (Section 5.5) |
✅ |
| A-3 | Seed container removal | ✅ Only 4 services in compose: db, backend, frontend, playwright |
✅ |
| A-4 | Playwright needs pnpm install | ✅ Dockerfile.playwright with pnpm install --frozen-lockfile at build time |
✅ |
| A-5 | No BACKEND_URL for API client | ✅ API_URL: http://backend:8080 in playwright service env |
✅ |
| v2-1 | Volume + Dockerfile overlap confusion | ✅ Explicit comment in Section 5.1 explaining the intentional pattern | ✅ |
| v2-3 | Playwright version pinning | ✅ Bold warning in Section 5.2: must match @playwright/test in package.json |
✅ |
Architecture analysis of v3 additions:
| # | Severity | Finding | Impact |
|---|---|---|---|
| A-8 | ✅ Good | docker-compose.test.local.yml is a clean override pattern. Using compose file stacking (-f base -f override) is the Docker-native way to handle env-specific differences. No conditional logic in the base file — declarative and predictable. |
Well-architected. |
| A-9 | ✅ Good | seed-constants.ts placement at cannamanage-frontend/e2e/seed-constants.ts. Correctly lives in the test scope, not in application code. Imported by specs via relative path. No runtime dependency. |
Clean separation of concerns. |
| A-10 | ℹ️ Info | No automated enforcement of seed-constants.ts usage. The "Rule: never hardcode seed values in specs" is documented but not lint-enforced. A custom ESLint rule (e.g., forbid UUID/number literals in spec files) could catch violations during PR review. |
Low priority — PR review process is sufficient for a team of this size. Future improvement. |
| A-11 | ℹ️ Info | R__seed_test_data.sql checksum behavior. Flyway repeatable migrations re-execute ONLY when the file checksum changes. If a developer adds seed data but forgets to update seed-constants.ts, tests will pass locally (fresh DB) but may confuse on next run. |
Documented in Section 8 risks. Acceptable. |
Architecture Verdict: ✅ APPROVED — confidence 95%
All v1/v2 architectural concerns are fully resolved. The tmpfs override pattern is clean. The Dockerfile.playwright with version pinning is well-documented. Docker networking is clear. The volume mount explanation in v3 eliminates the last source of implementation confusion.
🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Maintenance)
Assessment: ✅ Solid — all high-risk items resolved
| # | v2 Finding | v3 Resolution | Verdict |
|---|---|---|---|
| R-1 | 5-minute target unrealistic | ✅ Changed to "< 8 minutes for @full, < 2 minutes for @smoke" (Section 7) | ✅ |
| R-2 | Suite-level reset causes cascading failures | ✅ Resolved in v2 (per-test reset via beforeEach) | ✅ |
| R-3 | data-testid strategy undecided | ✅ Resolved in v2 (mandatory, naming convention, selectors.ts) | ✅ |
| R-4 | Hardcoded expected values | ✅ Complete seed-constants.ts with UUIDs, member data, KCanG limits, counts (Section 2.8) |
✅ |
| R-5 | No retry on DB reset endpoint | ℹ️ Not explicitly added, but global-setup health check warms backend + pool before tests | Acceptable |
| R-6 | Fixed year/month in quota seed | ✅ Relative dates principle (Section 1.4 #3) + per-test reset re-runs seed | ✅ |
| R-7 | No parallel execution for read-only tests | ℹ️ Not adopted — per-test reset makes it less critical | Acceptable |
Reliability analysis of v3 additions:
| # | Severity | Finding | Impact |
|---|---|---|---|
| R-10 | ✅ Good | seed-constants.ts is comprehensive. Covers UUIDs, member metadata (isUnder21, quotaUsedG), strain THC%, KCanG limits, counts for every entity type. 80+ exported constants. |
Single source of truth dramatically reduces maintenance burden from spec-level hardcoding. |
| R-11 | ✅ Good | Time targets are now realistic. 8-minute full suite with 10-minute CI timeout gives 25% safety margin. 2-minute smoke suite is achievable with ~15 tagged tests at 5-8s each. | Prevents false-failure CI red due to optimistic expectations. |
| R-12 | ✅ Good | tmpfs override is opt-in, not opt-out. Developers only use the local override if they experience issues. Default (tmpfs) works on CI and most macOS Docker Desktop versions. No "if CI then X else Y" conditional complexity. | Simple mental model for developers. |
| R-13 | ℹ️ Info | seed-constants.ts → R__seed_test_data.sql synchronization is manual. When someone changes the SQL seed, they must also update the TypeScript constants file. There's no automated check that these stay in sync. | Acceptable risk for a small team. Could add a CI check later (parse SQL → generate constants → compare). |
Flakiness risk assessment (final):
| Risk Factor | v1 Score | v3 Score | Change |
|---|---|---|---|
| State pollution between tests | 🔴 High | 🟢 Low | Per-test reset eliminates |
| Selector brittleness | 🔴 High | 🟢 Low | data-testid mandatory |
| Hardcoded assertion values | 🟡 Medium | 🟢 Low | seed-constants.ts |
| Time-dependent test data | 🟡 Medium | 🟢 Low | Relative dates in seed |
| Docker networking flakiness | 🟡 Medium | 🟢 Low | Health checks + liberal timeouts |
| CI execution time pressure | 🟡 Medium | 🟢 Low | Realistic 8-min target + 10-min timeout |
| macOS local dev issues | 🟡 Medium | 🟢 Low | docker-compose.test.local.yml override |
Estimated maintenance burden: < 5% of test development time (was 15-25% estimated in v1 review).
Reliability Verdict: ✅ APPROVED — confidence 94%
All high-risk flakiness vectors have been systematically addressed. The seed-constants.ts file is the most impactful v3 addition from a reliability perspective — it eliminates the "change seed, break 40 tests" failure mode. The realistic time targets prevent CI false-reds. The per-test reset (from v2) provides complete test isolation.
Panel Synthesis (v3 Final)
Confidence Scores
| Expert | v1 | v2 | v3 | Reasoning |
|---|---|---|---|---|
| 🏛️ Domain | 82% | 95% | 97% | recorded_by fix + KCanG constants in seed-constants.ts close the last gaps |
| 🔧 Architecture | 65% | 92% | 95% | tmpfs override + volume explanation + version pinning = all concerns fully resolved |
| 🛡️ Reliability | 75% | 90% | 94% | seed-constants.ts + realistic time targets eliminate remaining medium-risk items |
Overall Panel Confidence: 95% (v1: 74% → v2: 92% → v3: 95%)
Remaining Items (all ℹ️ Info — none blocking)
| ID | Expert | Finding | Priority |
|---|---|---|---|
| A-10 | 🔧 | No lint enforcement of seed-constants usage | Future improvement |
| A-11 | 🔧 | SQL ↔ TS constants sync is manual | Acceptable for team size |
| R-5 | 🛡️ | No explicit retry on DB reset endpoint | Low risk, health check mitigates |
| R-13 | 🛡️ | seed-constants.ts sync with SQL is manual | Can add CI check post-implementation |
None of these require plan revision. All are implementation-time decisions or future improvements.
Panel Verdict (v3 Final)
✅ APPROVED — Plan is complete, correct, and ready for implementation
GO recommendation: Proceed immediately to Phase 2A.
All 3 experts approve without blocking findings. The plan has matured through 3 iterations:
| Version | Verdict | Blockers | Confidence |
|---|---|---|---|
| v1 | 🔄 REVISE | 1 (seed timing) | 74% |
| v2 | ✅ APPROVED | 0 | 92% |
| v3 | ✅ APPROVED | 0 | 95% |
v3 specifically resolved:
- ✅ tmpfs conditional —
docker-compose.test.local.ymloverride (clean, declarative) - ✅ Realistic time targets —
@full< 8 min,@smoke< 2 min (was "<5 min") - ✅
seed-constants.ts— 80+ exported constants, single source of truth for all test assertions - ✅
recorded_byfix — admin UUID for audit trail realism - ✅ Volume overlap documentation — intentional pattern explained inline
- ✅ Version pinning — Playwright Docker image ↔ package.json sync rule with bold warning
Quality assessment: This is a production-grade integration test plan. The architecture (Flyway seed → per-test reset → data-testid selectors → seed-constants.ts) forms a cohesive, maintainable system. The phased implementation (2A→2E over 4 days) is realistic and correctly ordered.
No further revision needed. Implementation can begin.
v3 Final panel review completed 2026-06-18 by Plan Reviewer mode.