cannamanage/docs/sprint-12/cannamanage-sprint12-phase2-panel-review.md

# Expert Panel Review: Sprint 12 Phase 2 — Integration Tests with Seed DB

**Datum:** 2026-06-18
**Artifact:** `docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md` (v1)
**Panel:** Domain Expert 🏛️ | Architecture Expert 🔧 | Risk & Reliability Expert 🛡️
**Ticket:** CANNAMANAGE-SPRINT12

---

## 🏛️ Expert 1: Domain Expert (Cannabis Club Regulatory Compliance / KCanG)

### Assessment: ⚠️ Mostly Sound — 2 Gaps

#### Strengths

1. **Under-21 quota testing is present** — Test case 3.3 #4 explicitly tests the 30g/month limit for under-21 members (Jonas Weber, `is_under_21=true`). This is KCanG §3 Abs. 2 critical.
2. **Destruction records in seed data** — Plan includes compliance audit trail (V23 `destruction_records`), which is mandatory for KCanG §16 documentation requirements.
3. **Multi-role seed accounts** — Admin, Staff, Member accounts cover the role hierarchy that regulatory audits require (who did what, with what authority).
4. **Deterministic UUIDs** — Critical for regulatory audit trail assertions: you can verify exactly which member received which distribution.

#### Findings

| # | Severity | Finding | Recommendation |
|---|----------|---------|----------------|
| D-1 | ⚠️ Warning | **Adult quota limit not tested.** Plan tests under-21 limit (30g/month) but does NOT test the adult limit (50g/month, max 25g/day). KCanG §3 Abs. 1 requires both. | Add test case: Attempt >25g single distribution for adult member → expect rejection |
| D-2 | ⚠️ Warning | **No THC% limit test for under-21.** KCanG §3 Abs. 2 Nr. 3 limits THC to 10% for under-21 members. Jonas Weber (under-21) is distributed "CBD Critical Mass" (5% THC) — but there's no test that tries to distribute a high-THC strain to an under-21 member. | Add negative test: Distribute "Amnesia Haze" (22% THC) to Jonas → expect rejection with specific error |
| D-3 | ℹ️ Info | **Compliance deadlines seed uses fixed statuses.** The plan mentions PENDING, OVERDUE, COMPLETED but uses fixed dates. If tests rely on OVERDUE status, date drift will eventually make assertions wrong. | Use `NOW() - INTERVAL` for overdue deadlines, or recalculate status at assertion time |
| D-4 | ℹ️ Info | **Distribution `recorded_by` references member UUID, not admin UUID.** In practice, distributions should be recorded by staff/admin, not self-service. The seed shows member `c1...001` as `recorded_by` which is technically allowed but non-standard for audits. | Consider using admin UUID `b1000000-...001` as `recorded_by` for realism |
| D-5 | ✅ Good | **Document retention testing.** Documents seed covers all categories (SATZUNG, PROTOKOLL, VERTRAG, SONSTIGES) — important for KCanG §19 documentation duties. |
| D-6 | ✅ Good | **Grow tracking covers lifecycle.** SEEDLING → VEGETATIVE → FLOWERING stages match KCanG §2 cultivation documentation requirements. |

#### Domain Verdict: ⚠️ ACCEPTABLE with D-1 and D-2 as recommended additions

The plan covers the core regulatory-critical paths (quota enforcement, audit trail, compliance deadlines, document management). The two missing negative tests (D-1 daily limit, D-2 THC% limit for under-21) are important regulatory edge cases but not plan-blocking — they can be added during implementation.

---

## 🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)

### Assessment: ⚠️ Concerns — 3 Issues (1 blocking)

#### Strengths

1. **Flyway repeatable migration for seed** (Option A) — Cleanest solution. `R__seed_test_data.sql` with profile-gated location is idiomatic Spring Boot + Flyway. Correct choice.
2. **API client abstraction** — Separating UI assertions from API-level verification is architecturally sound. Enables both black-box and white-box testing.
3. **Suite-level reset with ordered execution** — Given `fullyParallel: false` is already in the config, this is the pragmatic choice over per-test isolation.
4. **Existing Playwright project structure** — Plan correctly identifies adding an `integration` project with `dependencies: ["setup"]` — aligns with the existing 3-project pattern.

#### Findings

| # | Severity | Finding | Recommendation |
|---|----------|---------|----------------|
| A-1 | ❌ **Blocking** | **Seed timing contradiction.** Section 5.3 correctly identifies that `init.sql` as `/docker-entrypoint-initdb.d/99-seed.sql` runs at Postgres init — BEFORE Flyway creates tables (Flyway runs on backend startup). The recommended Option A (Flyway `R__seed_test_data.sql`) solves this, but Section 5.1 still shows `./scripts/seed/init.sql:/docker-entrypoint-initdb.d/99-seed.sql:ro` in the docker-compose override. These are contradictory — you can't do both. If you use Option A, the volume mount becomes dead code. If you keep the volume mount, Option A is unnecessary. | **Remove the `99-seed.sql` volume mount from `docker-compose.test.yml` Section 5.1.** The seed must come from Flyway `R__` migration only. The `scripts/seed/` files become the source-of-truth for copy into `src/main/resources/db/migration/test/`. |
| A-2 | ⚠️ Warning | **`tmpfs` for Postgres may cause issues with Docker Desktop on macOS.** Docker Desktop's Linux VM doesn't handle tmpfs the same as native Linux. On some versions, Postgres fails to start with tmpfs due to permission issues or the VM not forwarding tmpfs syscalls correctly. CI (GitHub Actions on `ubuntu-latest`) is fine, but local development on macOS may fail. | Add a conditional: use tmpfs only when `CI=true`, otherwise use regular volume. Or document this as "CI-only optimization" and keep the named volume for local test runs. |
| A-3 | ⚠️ Warning | **The `seed` container is still referenced in docker-compose.test.yml but plan says "Remove seed container."** Section 5.2 states "Remove `seed` container" as an improvement, but the existing file still has it. The plan must be explicit: in the new `docker-compose.test.yml`, the `seed` service is replaced by the Flyway `R__` migration + the global-setup DB readiness check. | Explicitly state: delete the `seed` service from docker-compose.test.yml. Replace the `depends_on: seed: condition: service_completed_successfully` on the `playwright` service with `depends_on: backend: condition: service_healthy`. |
| A-4 | ⚠️ Warning | **Playwright runs inside Docker container but mounts host `./cannamanage-frontend`.** The volume mount `./cannamanage-frontend:/app` means the container uses host `node_modules` (if present) or needs to install them. Since Playwright image doesn't include project deps, the test command (`npx playwright test`) will fail unless deps are installed first. Current `system-test.spec.ts` works because it's a single file with minimal deps, but 12 integration spec files with helpers will need the full `pnpm install` step. | Add `pnpm install --frozen-lockfile` before the test command in the playwright service, or use a multi-stage Dockerfile for the playwright service that pre-installs deps. |
| A-5 | ℹ️ Info | **No explicit base URL override for API client.** The `ApiClient` connects to the backend directly (`baseUrl`). Inside Docker, this should be `http://backend:8080`, not `http://localhost:8080`. The plan shows `BASE_URL: http://frontend:3000` in the environment but doesn't define a `BACKEND_URL` for direct API calls. | Add `BACKEND_URL: http://backend:8080` to the playwright service environment. The API client should read from `process.env.BACKEND_URL ?? 'http://localhost:8080'`. |
| A-6 | ✅ Good | **Authentication reuse via storageState** — Correct pattern, avoids per-test login overhead. |
| A-7 | ✅ Good | **Profile-gated test endpoint** (`POST /api/v1/test/reset-db` only on `test` profile) — Proper security boundary. |

#### Architecture Verdict: ⚠️ REVISE — A-1 is contradictory and needs resolution

The seed timing contradiction (A-1) is a plan consistency error that will cause confusion during implementation. A-2 through A-5 are addressable during implementation without plan revision, but A-1 needs explicit resolution in the plan document.

---

## 🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Coverage)

### Assessment: ⚠️ Acceptable — manageable risks

#### Strengths

1. **"Never use `waitForTimeout`"** — Explicitly stated in Section 4.3. This is the #1 rule for non-flaky Playwright tests.
2. **Trace collection on first retry** — Correct debugging strategy for CI.
3. **Liberal timeouts for Docker networking** — 90s test / 60s navigation accounts for cold-start Docker overhead.
4. **DB verification via API** — Tests don't solely rely on UI state, which is inherently more fragile.
5. **Test ordering strategy** — Read-only tests first, CRUD tests last. Reduces state pollution.

#### Findings

| # | Severity | Finding | Recommendation |
|---|----------|---------|----------------|
| R-1 | ⚠️ Warning | **60+ tests in <5 minutes is ambitious.** Each test navigates, waits for API response, performs assertions. With Docker networking latency, expect 3-8 seconds per test. 60 tests × 5s average = 5 minutes. CRUD tests that submit forms and wait for toasts will be slower. Real expectation: 6-8 minutes. | Adjust success criterion to "< 8 minutes" or reduce test count per spec. Alternatively, enable parallel test execution for read-only specs (they don't mutate state). |
| R-2 | ⚠️ Warning | **Suite-level reset means CRUD test failures corrupt state for subsequent tests.** If test 3.2#3 (create member) fails mid-way (e.g., form submitted but toast timeout), the DB now has a partial member. All subsequent tests that count rows will fail with confusing "expected 5, got 6" errors. | Add a "CRUD section reset" mechanism: before each CRUD-heavy describe block, call `apiClient.resetDb()`. Or structure specs so each CRUD test verifies against its own created data, not against absolute counts. |
| R-3 | ⚠️ Warning | **No mention of `data-testid` strategy.** Section 9 lists this as an "open question" but it's actually critical for test reliability. CSS selectors and text-based selectors (`"5 Mitglieder"`) are brittle — a translation change, number format change, or design refactor breaks tests. | Decide NOW: use `data-testid` attributes on all interactive elements and key display elements. This is not optional for a 60+ test suite — it's a prerequisite. |
| R-4 | ⚠️ Warning | **Hardcoded expected values in tests.** Many assertions reference specific values: "5 Mitglieder", "Northern Lights 18.5% THC", "30€". If the seed data changes (even a typo fix), these tests break. | Create a `seed-constants.ts` file that exports expected values derived from a single source of truth. Tests import from this file. When seed changes, update one file. |
| R-5 | ℹ️ Info | **No retry strategy for the DB reset endpoint.** If the `POST /api/v1/test/reset-db` call fails or times out (backend GC pause, connection pool exhaustion), the entire suite fails. | Add retry logic (3 attempts, 2s backoff) in the global-setup for the DB reset call. |
| R-6 | ℹ️ Info | **Monthly quota seed uses fixed year/month (2024/12).** Tests checking quota display may show "no quota for current month" because the seed references December 2024, not the current month. | Use dynamic month in seed OR test assertions should navigate to the historical period, not assume "current month" view shows seed data. |
| R-7 | ℹ️ Info | **No parallel execution plan for read-only tests.** The plan states `fullyParallel: false` globally, but read-only tests (all #1 and #2 cases per spec) could safely run in parallel since they don't mutate state. This would cut execution time by 30-40%. | Consider splitting into two Playwright projects: `integration-read` (parallel) and `integration-write` (serial). |
| R-8 | ✅ Good | **Artifact collection strategy is comprehensive** — screenshots, traces, backend logs, HTML report. This is sufficient for debugging CI failures. |
| R-9 | ✅ Good | **CI timeout of 10 minutes** — Appropriate safety margin above the expected 5-8 minute runtime. |

#### Reliability Verdict: ⚠️ ACCEPTABLE — R-2 and R-3 are the main risks

The plan will produce working tests, but without `data-testid` (R-3) and with suite-level-only reset (R-2), expect a 15-25% maintenance burden from flaky/cascading failures within the first 3 months. These are addressable during implementation without plan revision.

---

## Panel Synthesis

### Confidence Scores

| Expert | Confidence | Reasoning |
|--------|-----------|-----------|
| 🏛️ Domain | 82% | Core regulatory paths covered; 2 missing edge cases (daily limit, THC% limit for under-21) are non-blocking |
| 🔧 Architecture | 65% | Seed timing contradiction (A-1) is a consistency error that will cause implementation confusion |
| 🛡️ Reliability | 75% | Fundamentally sound approach but `data-testid` decision and reset granularity need resolution |

**Overall Panel Confidence: 74%**

### Combined Findings by Severity

#### ❌ Blocking (1)

| ID | Expert | Finding |
|----|--------|---------|
| A-1 | 🔧 Architecture | Seed timing contradiction: docker-compose.test.yml still mounts `init.sql` to `docker-entrypoint-initdb.d` while recommending Flyway `R__` migration. These are mutually exclusive approaches — plan must pick one and remove the other. |

#### ⚠️ Warnings (9)

| ID | Expert | Finding |
|----|--------|---------|
| D-1 | 🏛️ Domain | No adult daily quota limit (25g/day) test |
| D-2 | 🏛️ Domain | No THC% limit test for under-21 members |
| A-2 | 🔧 Architecture | tmpfs may fail on Docker Desktop macOS |
| A-3 | 🔧 Architecture | `seed` container removal not explicitly reflected in compose |
| A-4 | 🔧 Architecture | Playwright container needs `pnpm install` before tests |
| A-5 | 🔧 Architecture | No BACKEND_URL env for API client inside Docker |
| R-1 | 🛡️ Reliability | 5-minute target unrealistic for 60+ tests with Docker overhead |
| R-2 | 🛡️ Reliability | Suite-level reset causes cascading failures on CRUD test errors |
| R-3 | 🛡️ Reliability | `data-testid` strategy must be decided before implementation |

#### ℹ️ Info (6)

| ID | Expert | Finding |
|----|--------|---------|
| D-3 | 🏛️ Domain | Fixed-date compliance deadlines may drift |
| D-4 | 🏛️ Domain | `recorded_by` should reference admin, not member |
| R-4 | 🛡️ Reliability | Hardcoded expected values — use seed-constants.ts |
| R-5 | 🛡️ Reliability | No retry on DB reset endpoint |
| R-6 | 🛡️ Reliability | Monthly quota seed uses fixed 2024/12, not current month |
| R-7 | 🛡️ Reliability | Read-only tests could run in parallel for speed |

---

## Panel Verdict

### 🔄 REVISE — 1 blocking finding must be resolved in the plan

**Required before implementation:**
1. Resolve A-1: Remove the `docker-entrypoint-initdb.d` volume mount from the proposed docker-compose.test.yml changes. Make it unambiguous that seed data flows through `R__seed_test_data.sql` via Flyway only.

**Strongly recommended (can be addressed during implementation):**
2. Resolve R-3: Commit to `data-testid` attributes as the selector strategy. Add a section "Selector Strategy" to the plan.
3. Add D-1 and D-2 test cases to the Distributions spec (regulatory completeness).
4. Add `pnpm install` to the playwright service command (A-4).

**Nice-to-have (implementation decisions):**
5. Adjust execution time target from 5 min to 8 min (R-1).
6. Add `BACKEND_URL` environment variable for API client (A-5).
7. Create `seed-constants.ts` for maintainable assertions (R-4).

---

*Panel review completed 2026-06-18 by Plan Reviewer mode.*

---

## v2 Re-Review (2026-06-18)

**Reviewer:** Roo (Plan Reviewer)
**Document reviewed:** `docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md` (v2)
**Purpose:** Verify all v1 findings have been properly addressed

---

### ❌ Blocker Resolution

| ID | v1 Finding | v2 Status | Evidence |
|----|-----------|-----------|----------|
| A-1 | Seed timing contradiction: docker-compose mounts `init.sql` to `docker-entrypoint-initdb.d` while recommending Flyway `R__` migration | ✅ **Resolved** | Section 1.3 explicitly declares "Decision (v2): Flyway-only seeding — NO Docker docker-entrypoint-initdb.d mount." Section 5.1 `docker-compose.test.yml` has NO volume mount on the `db` service, with comment confirming intent. The contradiction is eliminated. |

---

### ⚠️ Warning Resolution

| ID | Expert | v1 Finding | v2 Status | Evidence |
|----|--------|-----------|-----------|----------|
| D-1 | 🏛️ | No adult daily quota limit (25g/day) test | ✅ **Resolved** | New `13-kcang-regulatory.spec.ts` — test #1 (26g → rejection), test #6 (23g+3g exceeds daily), test #7 (23g+2g = exactly 25g → success). Thorough coverage of boundary. |
| D-2 | 🏛️ | No THC% limit test for under-21 members | ✅ **Resolved** | KCanG spec tests #3 (22% THC to U21 → rejection), #4 (0.5% THC to U21 → success), #5 (30g+ low-THC to U21 → quota error), #9 (UI shows "max. 10% THC" notice). Excellent coverage. |
| A-2 | 🔧 | tmpfs may fail on Docker Desktop macOS; network aliases unclear | ⚠️ **Partially** | Section 5.4 clearly documents Docker service-name networking (no custom aliases needed). However, `tmpfs` is still unconditional — no CI-only gating. Risk is LOW: CI runs on `ubuntu-latest` (native Linux), and local devs can override with `docker compose -f docker-compose.yml -f docker-compose.test.yml up`. Acceptable for implementation. |
| A-3 | 🔧 | `seed` container removal not explicitly reflected in compose | ✅ **Resolved** | Section 5.1 `docker-compose.test.yml` defines exactly 4 services: `db`, `backend`, `frontend`, `playwright`. No `seed` service exists. `playwright` depends on `backend: condition: service_healthy`. Clean. |
| A-4 | 🔧 | Playwright container needs `pnpm install` before tests | ✅ **Resolved** | Section 5.2 introduces `Dockerfile.playwright` with `pnpm install --frozen-lockfile` at build time. Dependencies are pre-installed in the image. |
| A-5 | 🔧 | No BACKEND_URL env for API client inside Docker | ✅ **Resolved** | Section 5.1 playwright service has `API_URL: http://backend:8080`. Section 2.7 global-setup uses this for health checks and API client initialization. |
| R-1 | 🛡️ | 5-minute target unrealistic for 60+ tests with Docker overhead | ⚠️ **Partially** | Plan now has per-test reset (~500ms × 70 = 35s overhead), making 5 min MORE ambitious. CI timeout is 10 min (appropriate). Test tagging splits `@smoke` (PR, <2 min) from `@full` (merge, 10 min). Section 7 success criteria still says "< 5 minutes total" — this is optimistic but the CI timeout and tagging strategy make it non-blocking. |
| R-2 | 🛡️ | Suite-level reset causes cascading failures on CRUD test errors | ✅ **Resolved** | Section 2.1 switches to per-test reset: "Decision (v2): Per-test reset via backend API endpoint + `beforeEach` hook." Each test calls `apiClient.resetDb()` — complete state isolation between tests. |
| R-3 | 🛡️ | `data-testid` strategy must be decided before implementation | ✅ **Resolved** | Section 2.2 commits to `data-testid` as mandatory. Naming convention defined (`<page>-<component>-<identifier>`), `selectors.ts` centralized file shown, implementation tracked as Phase 2C sub-task. |

---

### ℹ️ Info Finding Resolution

| ID | Expert | v1 Finding | v2 Status | Evidence |
|----|--------|-----------|-----------|----------|
| D-3 | 🏛️ | Fixed-date compliance deadlines may drift | ✅ **Resolved** | Section 1.4 principle #3: "Use relative dates where possible (`NOW() - INTERVAL '7 days'`) for time-sensitive tests". Per-test reset re-runs seed each time, keeping relative dates fresh. |
| D-4 | 🏛️ | `recorded_by` should reference admin, not member | ℹ️ Noted | Not explicitly addressed in v2 — minor realism improvement, can be done during seed implementation. |
| R-4 | 🛡️ | Hardcoded expected values — use seed-constants.ts | ⚠️ **Partially** | No explicit `seed-constants.ts` file created, but deterministic UUIDs (Section 1.4 #1) + per-test reset ensure assertions are stable. The `selectors.ts` centralizes locators but not data values. Acceptable — can be added during implementation when duplication becomes apparent. |
| R-5 | 🛡️ | No retry on DB reset endpoint | ℹ️ Noted | No explicit retry logic shown for `resetDb()`. Implementation detail — the global-setup health check ensures backend is ready before any reset calls. Low risk. |
| R-6 | 🛡️ | Monthly quota seed uses fixed 2024/12, not current month | ✅ **Resolved** | Relative dates principle + Section 4.2 test tagging strategy (`@smoke`/`@full` with `--grep`). Seed data for quotas will use relative dates per principle #3. |
| R-7 | 🛡️ | Read-only tests could run in parallel for speed | ℹ️ Noted | Not adopted in v2. Per-test reset makes parallelism less critical (each test is independent). Can be optimized later if execution time becomes a concern. |

---

### New Findings in v2

| ID | Severity | Finding | Impact |
|----|----------|---------|--------|
| v2-1 | ℹ️ Info | **Volume mount + Dockerfile build overlap.** Section 5.1 mounts `./cannamanage-frontend/e2e:/app/e2e:ro` into the playwright container which also COPYs `e2e/` in `Dockerfile.playwright`. The volume mount overrides the built-in files at runtime. | This is an intentional Docker pattern (enables test iteration without rebuild) but should be documented to avoid confusion. Non-blocking. |
| v2-2 | ℹ️ Info | **Success criterion "< 5 minutes total" vs per-test reset overhead.** 70+ tests × ~500ms reset + test execution (3-8s each) = realistic estimate is 6-8 minutes for `@full` suite. CI timeout of 10 min is correct, but the stated target is optimistic. | Cosmetic — CI timeout is appropriate. Adjust success criterion to "< 8 minutes" post-implementation based on actual measurements. |
| v2-3 | ℹ️ Info | **`Dockerfile.playwright` pinned to `v1.49.0`** — this should match the Playwright version in `package.json` to avoid browser/API version mismatches. | Document: keep Dockerfile image version in sync with `@playwright/test` version in `package.json`. |

---

### Updated Confidence Scores

| Expert | v1 Confidence | v2 Confidence | Change | Reasoning |
|--------|--------------|--------------|--------|-----------|
| 🏛️ Domain | 82% | 95% | +13 | D-1 and D-2 fully resolved with 9 comprehensive KCanG test cases. Regulatory coverage is now excellent. |
| 🔧 Architecture | 65% | 92% | +27 | A-1 blocker completely eliminated. Dockerfile.playwright, API_URL, health checks — all addressed. Only tmpfs (A-2) remains partially open but low-risk. |
| 🛡️ Reliability | 75% | 90% | +15 | Per-test reset (R-2) and data-testid commitment (R-3) are the two highest-impact improvements. Execution time target (R-1) is slightly optimistic but non-blocking. |

**Overall Panel Confidence: 92%** (v1: 74%, Δ: +18)

---

### Panel Verdict (v2)

## ✅ APPROVED — Plan is ready for implementation

All blocking findings resolved. All critical warnings addressed. Remaining items (A-2 tmpfs conditional, R-1 time target, R-4 seed-constants.ts) are implementation-time decisions that don't require plan revision.

**Key improvements in v2:**
1. Seed timing contradiction eliminated — Flyway-only, no Docker init.sql
2. Per-test DB reset — complete test isolation via `beforeEach` + API endpoint
3. `data-testid` strategy committed — naming convention, selectors.ts, mandatory for Phase 2C
4. KCanG regulatory spec added — 9 comprehensive edge case tests covering §3 Abs. 1 + Abs. 2
5. Dedicated Playwright Dockerfile with pre-installed dependencies
6. Clear Docker networking documentation and health check strategy

**Recommendation:** Proceed to implementation. The 3 ℹ️ info items (v2-1, v2-2, v2-3) can be addressed during Phase 2C/2D without plan revision.

---

*v2 Re-review completed 2026-06-18 by Plan Reviewer mode.*

---

## v3 Final Review (2026-06-18)

**Reviewer:** Roo (Plan Reviewer)
**Document reviewed:** `docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md` (v3 — final revision)
**Purpose:** Final gate review before implementation begins. All v2 partial/info items should now be fully resolved.

---

### 🏛️ Expert 1: Domain Expert (KCanG Regulatory Compliance)

#### Assessment: ✅ Excellent — No remaining gaps

| # | Check | Verdict | Evidence |
|---|-------|---------|----------|
| D-1 | Adult daily 25g limit tested | ✅ | KCanG spec tests #1, #6, #7 — boundary cases (26g reject, 23+3 reject, 23+2 pass) |
| D-2 | Under-21 THC% limit tested | ✅ | KCanG spec tests #3, #4, #5, #9 — Amnesia Haze 22% → reject, CBD Critical Mass 0.5% → pass, UI notice |
| D-3 | Compliance deadlines use relative dates | ✅ | Section 1.4 principle #3: `NOW() - INTERVAL` for time-sensitive tests, per-test reset keeps fresh |
| D-4 | `recorded_by` references admin UUID | ✅ | Section 1.2: explicitly states "v3: `recorded_by` = admin UUID `b1000000-...001`, not member UUID" |
| D-5 | Monthly quota limits differentiated (50g adult vs 30g U21) | ✅ | `seed-constants.ts` exports `KCANG.ADULT_MONTHLY_LIMIT_G: 50` and `UNDER21_MONTHLY_LIMIT_G: 30` |
| D-6 | Seed covers all regulatory-critical entity types | ✅ | Destruction records, compliance deadlines, distribution audit trail, grow lifecycle all seeded |

**New observation (v3):**

| # | Severity | Finding | Impact |
|---|----------|---------|--------|
| D-7 | ✅ Good | **`seed-constants.ts` exports KCanG limits as constants.** Tests import `KCANG.ADULT_DAILY_LIMIT_G` etc. — if regulations change (unlikely short-term, but possible with KCanG amendments), there is a single point of update. | Excellent maintainability for regulatory compliance. |
| D-8 | ℹ️ Info | **No "combined monthly" test (adult at 50g boundary).** Tests cover 25g daily and under-21 30g monthly, but no test explicitly hits the adult 50g monthly ceiling. KCanG spec #2 tests 45g+6g=51g exceeding monthly, which implicitly covers it. | Covered via test #2 — no action needed. |

#### Domain Verdict: ✅ APPROVED — confidence 97%

All KCanG regulatory paths are comprehensively tested. The `seed-constants.ts` file makes regulatory limit changes trivial to propagate. The `recorded_by` fix ensures audit trail realism. No remaining domain gaps.

---

### 🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)

#### Assessment: ✅ Sound — All concerns resolved

| # | v2 Finding | v3 Resolution | Verdict |
|---|-----------|---------------|---------|
| A-1 | Seed timing contradiction | ✅ Eliminated in v2 (Flyway-only, confirmed in v3) | ✅ |
| A-2 | tmpfs macOS issues | ✅ `docker-compose.test.local.yml` override with named volume (Section 5.5) | ✅ |
| A-3 | Seed container removal | ✅ Only 4 services in compose: `db`, `backend`, `frontend`, `playwright` | ✅ |
| A-4 | Playwright needs pnpm install | ✅ `Dockerfile.playwright` with `pnpm install --frozen-lockfile` at build time | ✅ |
| A-5 | No BACKEND_URL for API client | ✅ `API_URL: http://backend:8080` in playwright service env | ✅ |
| v2-1 | Volume + Dockerfile overlap confusion | ✅ Explicit comment in Section 5.1 explaining the intentional pattern | ✅ |
| v2-3 | Playwright version pinning | ✅ Bold warning in Section 5.2: must match `@playwright/test` in package.json | ✅ |

**Architecture analysis of v3 additions:**

| # | Severity | Finding | Impact |
|---|----------|---------|--------|
| A-8 | ✅ Good | **`docker-compose.test.local.yml` is a clean override pattern.** Using compose file stacking (`-f base -f override`) is the Docker-native way to handle env-specific differences. No conditional logic in the base file — declarative and predictable. | Well-architected. |
| A-9 | ✅ Good | **`seed-constants.ts` placement at `cannamanage-frontend/e2e/seed-constants.ts`.** Correctly lives in the test scope, not in application code. Imported by specs via relative path. No runtime dependency. | Clean separation of concerns. |
| A-10 | ℹ️ Info | **No automated enforcement of `seed-constants.ts` usage.** The "Rule: never hardcode seed values in specs" is documented but not lint-enforced. A custom ESLint rule (e.g., forbid UUID/number literals in spec files) could catch violations during PR review. | Low priority — PR review process is sufficient for a team of this size. Future improvement. |
| A-11 | ℹ️ Info | **`R__seed_test_data.sql` checksum behavior.** Flyway repeatable migrations re-execute ONLY when the file checksum changes. If a developer adds seed data but forgets to update `seed-constants.ts`, tests will pass locally (fresh DB) but may confuse on next run. | Documented in Section 8 risks. Acceptable. |

#### Architecture Verdict: ✅ APPROVED — confidence 95%

All v1/v2 architectural concerns are fully resolved. The tmpfs override pattern is clean. The Dockerfile.playwright with version pinning is well-documented. Docker networking is clear. The volume mount explanation in v3 eliminates the last source of implementation confusion.

---

### 🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Maintenance)

#### Assessment: ✅ Solid — all high-risk items resolved

| # | v2 Finding | v3 Resolution | Verdict |
|---|-----------|---------------|---------|
| R-1 | 5-minute target unrealistic | ✅ Changed to "< 8 minutes for @full, < 2 minutes for @smoke" (Section 7) | ✅ |
| R-2 | Suite-level reset causes cascading failures | ✅ Resolved in v2 (per-test reset via beforeEach) | ✅ |
| R-3 | data-testid strategy undecided | ✅ Resolved in v2 (mandatory, naming convention, selectors.ts) | ✅ |
| R-4 | Hardcoded expected values | ✅ Complete `seed-constants.ts` with UUIDs, member data, KCanG limits, counts (Section 2.8) | ✅ |
| R-5 | No retry on DB reset endpoint | ℹ️ Not explicitly added, but global-setup health check warms backend + pool before tests | Acceptable |
| R-6 | Fixed year/month in quota seed | ✅ Relative dates principle (Section 1.4 #3) + per-test reset re-runs seed | ✅ |
| R-7 | No parallel execution for read-only tests | ℹ️ Not adopted — per-test reset makes it less critical | Acceptable |

**Reliability analysis of v3 additions:**

| # | Severity | Finding | Impact |
|---|----------|---------|--------|
| R-10 | ✅ Good | **`seed-constants.ts` is comprehensive.** Covers UUIDs, member metadata (`isUnder21`, `quotaUsedG`), strain THC%, KCanG limits, counts for every entity type. 80+ exported constants. | Single source of truth dramatically reduces maintenance burden from spec-level hardcoding. |
| R-11 | ✅ Good | **Time targets are now realistic.** 8-minute full suite with 10-minute CI timeout gives 25% safety margin. 2-minute smoke suite is achievable with ~15 tagged tests at 5-8s each. | Prevents false-failure CI red due to optimistic expectations. |
| R-12 | ✅ Good | **tmpfs override is opt-in, not opt-out.** Developers only use the local override if they experience issues. Default (tmpfs) works on CI and most macOS Docker Desktop versions. No "if CI then X else Y" conditional complexity. | Simple mental model for developers. |
| R-13 | ℹ️ Info | **seed-constants.ts → R__seed_test_data.sql synchronization is manual.** When someone changes the SQL seed, they must also update the TypeScript constants file. There's no automated check that these stay in sync. | Acceptable risk for a small team. Could add a CI check later (parse SQL → generate constants → compare). |

**Flakiness risk assessment (final):**

| Risk Factor | v1 Score | v3 Score | Change |
|-------------|----------|----------|--------|
| State pollution between tests | 🔴 High | 🟢 Low | Per-test reset eliminates |
| Selector brittleness | 🔴 High | 🟢 Low | data-testid mandatory |
| Hardcoded assertion values | 🟡 Medium | 🟢 Low | seed-constants.ts |
| Time-dependent test data | 🟡 Medium | 🟢 Low | Relative dates in seed |
| Docker networking flakiness | 🟡 Medium | 🟢 Low | Health checks + liberal timeouts |
| CI execution time pressure | 🟡 Medium | 🟢 Low | Realistic 8-min target + 10-min timeout |
| macOS local dev issues | 🟡 Medium | 🟢 Low | docker-compose.test.local.yml override |

**Estimated maintenance burden:** < 5% of test development time (was 15-25% estimated in v1 review).

#### Reliability Verdict: ✅ APPROVED — confidence 94%

All high-risk flakiness vectors have been systematically addressed. The `seed-constants.ts` file is the most impactful v3 addition from a reliability perspective — it eliminates the "change seed, break 40 tests" failure mode. The realistic time targets prevent CI false-reds. The per-test reset (from v2) provides complete test isolation.

---

### Panel Synthesis (v3 Final)

#### Confidence Scores

| Expert | v1 | v2 | v3 | Reasoning |
|--------|----|----|----|-----------|
| 🏛️ Domain | 82% | 95% | **97%** | `recorded_by` fix + KCanG constants in `seed-constants.ts` close the last gaps |
| 🔧 Architecture | 65% | 92% | **95%** | tmpfs override + volume explanation + version pinning = all concerns fully resolved |
| 🛡️ Reliability | 75% | 90% | **94%** | `seed-constants.ts` + realistic time targets eliminate remaining medium-risk items |

**Overall Panel Confidence: 95%** (v1: 74% → v2: 92% → v3: 95%)

#### Remaining Items (all ℹ️ Info — none blocking)

| ID | Expert | Finding | Priority |
|----|--------|---------|----------|
| A-10 | 🔧 | No lint enforcement of seed-constants usage | Future improvement |
| A-11 | 🔧 | SQL ↔ TS constants sync is manual | Acceptable for team size |
| R-5 | 🛡️ | No explicit retry on DB reset endpoint | Low risk, health check mitigates |
| R-13 | 🛡️ | seed-constants.ts sync with SQL is manual | Can add CI check post-implementation |

None of these require plan revision. All are implementation-time decisions or future improvements.

---

## Panel Verdict (v3 Final)

### ✅ APPROVED — Plan is complete, correct, and ready for implementation

**GO recommendation: Proceed immediately to Phase 2A.**

All 3 experts approve without blocking findings. The plan has matured through 3 iterations:

| Version | Verdict | Blockers | Confidence |
|---------|---------|----------|------------|
| v1 | 🔄 REVISE | 1 (seed timing) | 74% |
| v2 | ✅ APPROVED | 0 | 92% |
| v3 | ✅ APPROVED | 0 | 95% |

**v3 specifically resolved:**
1. ✅ tmpfs conditional — `docker-compose.test.local.yml` override (clean, declarative)
2. ✅ Realistic time targets — `@full` < 8 min, `@smoke` < 2 min (was "<5 min")
3. ✅ `seed-constants.ts` — 80+ exported constants, single source of truth for all test assertions
4. ✅ `recorded_by` fix — admin UUID for audit trail realism
5. ✅ Volume overlap documentation — intentional pattern explained inline
6. ✅ Version pinning — Playwright Docker image ↔ package.json sync rule with bold warning

**Quality assessment:** This is a production-grade integration test plan. The architecture (Flyway seed → per-test reset → data-testid selectors → seed-constants.ts) forms a cohesive, maintainable system. The phased implementation (2A→2E over 4 days) is realistic and correctly ordered.

**No further revision needed. Implementation can begin.**

---

*v3 Final panel review completed 2026-06-18 by Plan Reviewer mode.*