Files

T

Patrick Plate be932c1930 docs: Sprint 12 planning, analysis, reviews, and code review

- sprint12-analysis.md (full page audit)
- sprint12-plan.md (button fix plan)
- sprint12-testplan.md (button fix test plan)
- sprint12-phase2-integration-tests.md (v3, expert-approved)
- sprint12-phase2-panel-review.md (3 review cycles, 95% confidence)
- sprint12-code-review.md (approved with comments, blockers fixed)

2026-06-18 14:43:25 +02:00

35 KiB

Raw Blame History

Expert Panel Review: Sprint 12 Phase 2 — Integration Tests with Seed DB

Datum: 2026-06-18 Artifact: docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md (v1) Panel: Domain Expert 🏛️ | Architecture Expert 🔧 | Risk & Reliability Expert 🛡️ Ticket: CANNAMANAGE-SPRINT12

🏛️ Expert 1: Domain Expert (Cannabis Club Regulatory Compliance / KCanG)

Assessment: ⚠️ Mostly Sound — 2 Gaps

Strengths

Under-21 quota testing is present — Test case 3.3 #4 explicitly tests the 30g/month limit for under-21 members (Jonas Weber, is_under_21=true). This is KCanG §3 Abs. 2 critical.
Destruction records in seed data — Plan includes compliance audit trail (V23 destruction_records), which is mandatory for KCanG §16 documentation requirements.
Multi-role seed accounts — Admin, Staff, Member accounts cover the role hierarchy that regulatory audits require (who did what, with what authority).
Deterministic UUIDs — Critical for regulatory audit trail assertions: you can verify exactly which member received which distribution.

Findings

#	Severity	Finding	Recommendation
D-1	⚠️ Warning	Adult quota limit not tested. Plan tests under-21 limit (30g/month) but does NOT test the adult limit (50g/month, max 25g/day). KCanG §3 Abs. 1 requires both.	Add test case: Attempt >25g single distribution for adult member → expect rejection
D-2	⚠️ Warning	No THC% limit test for under-21. KCanG §3 Abs. 2 Nr. 3 limits THC to 10% for under-21 members. Jonas Weber (under-21) is distributed "CBD Critical Mass" (5% THC) — but there's no test that tries to distribute a high-THC strain to an under-21 member.	Add negative test: Distribute "Amnesia Haze" (22% THC) to Jonas → expect rejection with specific error
D-3	ℹ️ Info	Compliance deadlines seed uses fixed statuses. The plan mentions PENDING, OVERDUE, COMPLETED but uses fixed dates. If tests rely on OVERDUE status, date drift will eventually make assertions wrong.	Use `NOW() - INTERVAL` for overdue deadlines, or recalculate status at assertion time
D-4	ℹ️ Info	Distribution `recorded_by` references member UUID, not admin UUID. In practice, distributions should be recorded by staff/admin, not self-service. The seed shows member `c1...001` as `recorded_by` which is technically allowed but non-standard for audits.	Consider using admin UUID `b1000000-...001` as `recorded_by` for realism
D-5	✅ Good	Document retention testing. Documents seed covers all categories (SATZUNG, PROTOKOLL, VERTRAG, SONSTIGES) — important for KCanG §19 documentation duties.
D-6	✅ Good	Grow tracking covers lifecycle. SEEDLING → VEGETATIVE → FLOWERING stages match KCanG §2 cultivation documentation requirements.

Domain Verdict: ⚠️ ACCEPTABLE with D-1 and D-2 as recommended additions

The plan covers the core regulatory-critical paths (quota enforcement, audit trail, compliance deadlines, document management). The two missing negative tests (D-1 daily limit, D-2 THC% limit for under-21) are important regulatory edge cases but not plan-blocking — they can be added during implementation.

🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)

Assessment: ⚠️ Concerns — 3 Issues (1 blocking)

Strengths

Flyway repeatable migration for seed (Option A) — Cleanest solution. R__seed_test_data.sql with profile-gated location is idiomatic Spring Boot + Flyway. Correct choice.
API client abstraction — Separating UI assertions from API-level verification is architecturally sound. Enables both black-box and white-box testing.
Suite-level reset with ordered execution — Given fullyParallel: false is already in the config, this is the pragmatic choice over per-test isolation.
Existing Playwright project structure — Plan correctly identifies adding an integration project with dependencies: ["setup"] — aligns with the existing 3-project pattern.

Findings

#	Severity	Finding	Recommendation
A-1	❌ Blocking	Seed timing contradiction. Section 5.3 correctly identifies that `init.sql` as `/docker-entrypoint-initdb.d/99-seed.sql` runs at Postgres init — BEFORE Flyway creates tables (Flyway runs on backend startup). The recommended Option A (Flyway `R__seed_test_data.sql`) solves this, but Section 5.1 still shows `./scripts/seed/init.sql:/docker-entrypoint-initdb.d/99-seed.sql:ro` in the docker-compose override. These are contradictory — you can't do both. If you use Option A, the volume mount becomes dead code. If you keep the volume mount, Option A is unnecessary.	Remove the `99-seed.sql` volume mount from `docker-compose.test.yml` Section 5.1. The seed must come from Flyway `R__` migration only. The `scripts/seed/` files become the source-of-truth for copy into `src/main/resources/db/migration/test/`.
A-2	⚠️ Warning	`tmpfs` for Postgres may cause issues with Docker Desktop on macOS. Docker Desktop's Linux VM doesn't handle tmpfs the same as native Linux. On some versions, Postgres fails to start with tmpfs due to permission issues or the VM not forwarding tmpfs syscalls correctly. CI (GitHub Actions on `ubuntu-latest`) is fine, but local development on macOS may fail.	Add a conditional: use tmpfs only when `CI=true`, otherwise use regular volume. Or document this as "CI-only optimization" and keep the named volume for local test runs.
A-3	⚠️ Warning	The `seed` container is still referenced in docker-compose.test.yml but plan says "Remove seed container." Section 5.2 states "Remove `seed` container" as an improvement, but the existing file still has it. The plan must be explicit: in the new `docker-compose.test.yml`, the `seed` service is replaced by the Flyway `R__` migration + the global-setup DB readiness check.	Explicitly state: delete the `seed` service from docker-compose.test.yml. Replace the `depends_on: seed: condition: service_completed_successfully` on the `playwright` service with `depends_on: backend: condition: service_healthy`.
A-4	⚠️ Warning	Playwright runs inside Docker container but mounts host `./cannamanage-frontend`. The volume mount `./cannamanage-frontend:/app` means the container uses host `node_modules` (if present) or needs to install them. Since Playwright image doesn't include project deps, the test command (`npx playwright test`) will fail unless deps are installed first. Current `system-test.spec.ts` works because it's a single file with minimal deps, but 12 integration spec files with helpers will need the full `pnpm install` step.	Add `pnpm install --frozen-lockfile` before the test command in the playwright service, or use a multi-stage Dockerfile for the playwright service that pre-installs deps.
A-5	ℹ️ Info	No explicit base URL override for API client. The `ApiClient` connects to the backend directly (`baseUrl`). Inside Docker, this should be `http://backend:8080`, not `http://localhost:8080`. The plan shows `BASE_URL: http://frontend:3000` in the environment but doesn't define a `BACKEND_URL` for direct API calls.	Add `BACKEND_URL: http://backend:8080` to the playwright service environment. The API client should read from `process.env.BACKEND_URL ?? 'http://localhost:8080'`.
A-6	✅ Good	Authentication reuse via storageState — Correct pattern, avoids per-test login overhead.
A-7	✅ Good	Profile-gated test endpoint (`POST /api/v1/test/reset-db` only on `test` profile) — Proper security boundary.

Architecture Verdict: ⚠️ REVISE — A-1 is contradictory and needs resolution

The seed timing contradiction (A-1) is a plan consistency error that will cause confusion during implementation. A-2 through A-5 are addressable during implementation without plan revision, but A-1 needs explicit resolution in the plan document.

🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Coverage)

Assessment: ⚠️ Acceptable — manageable risks

Strengths

"Never use waitForTimeout" — Explicitly stated in Section 4.3. This is the #1 rule for non-flaky Playwright tests.
Trace collection on first retry — Correct debugging strategy for CI.
Liberal timeouts for Docker networking — 90s test / 60s navigation accounts for cold-start Docker overhead.
DB verification via API — Tests don't solely rely on UI state, which is inherently more fragile.
Test ordering strategy — Read-only tests first, CRUD tests last. Reduces state pollution.

Findings

#	Severity	Finding	Recommendation
R-1	⚠️ Warning	60+ tests in <5 minutes is ambitious. Each test navigates, waits for API response, performs assertions. With Docker networking latency, expect 3-8 seconds per test. 60 tests × 5s average = 5 minutes. CRUD tests that submit forms and wait for toasts will be slower. Real expectation: 6-8 minutes.	Adjust success criterion to "< 8 minutes" or reduce test count per spec. Alternatively, enable parallel test execution for read-only specs (they don't mutate state).
R-2	⚠️ Warning	Suite-level reset means CRUD test failures corrupt state for subsequent tests. If test 3.2#3 (create member) fails mid-way (e.g., form submitted but toast timeout), the DB now has a partial member. All subsequent tests that count rows will fail with confusing "expected 5, got 6" errors.	Add a "CRUD section reset" mechanism: before each CRUD-heavy describe block, call `apiClient.resetDb()`. Or structure specs so each CRUD test verifies against its own created data, not against absolute counts.
R-3	⚠️ Warning	No mention of `data-testid` strategy. Section 9 lists this as an "open question" but it's actually critical for test reliability. CSS selectors and text-based selectors (`"5 Mitglieder"`) are brittle — a translation change, number format change, or design refactor breaks tests.	Decide NOW: use `data-testid` attributes on all interactive elements and key display elements. This is not optional for a 60+ test suite — it's a prerequisite.
R-4	⚠️ Warning	Hardcoded expected values in tests. Many assertions reference specific values: "5 Mitglieder", "Northern Lights 18.5% THC", "30€". If the seed data changes (even a typo fix), these tests break.	Create a `seed-constants.ts` file that exports expected values derived from a single source of truth. Tests import from this file. When seed changes, update one file.
R-5	ℹ️ Info	No retry strategy for the DB reset endpoint. If the `POST /api/v1/test/reset-db` call fails or times out (backend GC pause, connection pool exhaustion), the entire suite fails.	Add retry logic (3 attempts, 2s backoff) in the global-setup for the DB reset call.
R-6	ℹ️ Info	Monthly quota seed uses fixed year/month (2024/12). Tests checking quota display may show "no quota for current month" because the seed references December 2024, not the current month.	Use dynamic month in seed OR test assertions should navigate to the historical period, not assume "current month" view shows seed data.
R-7	ℹ️ Info	No parallel execution plan for read-only tests. The plan states `fullyParallel: false` globally, but read-only tests (all #1 and #2 cases per spec) could safely run in parallel since they don't mutate state. This would cut execution time by 30-40%.	Consider splitting into two Playwright projects: `integration-read` (parallel) and `integration-write` (serial).
R-8	✅ Good	Artifact collection strategy is comprehensive — screenshots, traces, backend logs, HTML report. This is sufficient for debugging CI failures.
R-9	✅ Good	CI timeout of 10 minutes — Appropriate safety margin above the expected 5-8 minute runtime.

Reliability Verdict: ⚠️ ACCEPTABLE — R-2 and R-3 are the main risks

The plan will produce working tests, but without data-testid (R-3) and with suite-level-only reset (R-2), expect a 15-25% maintenance burden from flaky/cascading failures within the first 3 months. These are addressable during implementation without plan revision.

Panel Synthesis

Confidence Scores

Expert	Confidence	Reasoning
🏛️ Domain	82%	Core regulatory paths covered; 2 missing edge cases (daily limit, THC% limit for under-21) are non-blocking
🔧 Architecture	65%	Seed timing contradiction (A-1) is a consistency error that will cause implementation confusion
🛡️ Reliability	75%	Fundamentally sound approach but `data-testid` decision and reset granularity need resolution

Overall Panel Confidence: 74%

Combined Findings by Severity

❌ Blocking (1)

ID	Expert	Finding
A-1	🔧 Architecture	Seed timing contradiction: docker-compose.test.yml still mounts `init.sql` to `docker-entrypoint-initdb.d` while recommending Flyway `R__` migration. These are mutually exclusive approaches — plan must pick one and remove the other.

⚠️ Warnings (9)

ID	Expert	Finding
D-1	🏛️ Domain	No adult daily quota limit (25g/day) test
D-2	🏛️ Domain	No THC% limit test for under-21 members
A-2	🔧 Architecture	tmpfs may fail on Docker Desktop macOS
A-3	🔧 Architecture	`seed` container removal not explicitly reflected in compose
A-4	🔧 Architecture	Playwright container needs `pnpm install` before tests
A-5	🔧 Architecture	No BACKEND_URL env for API client inside Docker
R-1	🛡️ Reliability	5-minute target unrealistic for 60+ tests with Docker overhead
R-2	🛡️ Reliability	Suite-level reset causes cascading failures on CRUD test errors
R-3	🛡️ Reliability	`data-testid` strategy must be decided before implementation

ℹ️ Info (6)

ID	Expert	Finding
D-3	🏛️ Domain	Fixed-date compliance deadlines may drift
D-4	🏛️ Domain	`recorded_by` should reference admin, not member
R-4	🛡️ Reliability	Hardcoded expected values — use seed-constants.ts
R-5	🛡️ Reliability	No retry on DB reset endpoint
R-6	🛡️ Reliability	Monthly quota seed uses fixed 2024/12, not current month
R-7	🛡️ Reliability	Read-only tests could run in parallel for speed

Panel Verdict

🔄 REVISE — 1 blocking finding must be resolved in the plan

Required before implementation:

Resolve A-1: Remove the docker-entrypoint-initdb.d volume mount from the proposed docker-compose.test.yml changes. Make it unambiguous that seed data flows through R__seed_test_data.sql via Flyway only.

Strongly recommended (can be addressed during implementation): 2. Resolve R-3: Commit to data-testid attributes as the selector strategy. Add a section "Selector Strategy" to the plan. 3. Add D-1 and D-2 test cases to the Distributions spec (regulatory completeness). 4. Add pnpm install to the playwright service command (A-4).

Nice-to-have (implementation decisions): 5. Adjust execution time target from 5 min to 8 min (R-1). 6. Add BACKEND_URL environment variable for API client (A-5). 7. Create seed-constants.ts for maintainable assertions (R-4).

Panel review completed 2026-06-18 by Plan Reviewer mode.

v2 Re-Review (2026-06-18)

Reviewer: Roo (Plan Reviewer) Document reviewed: docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md (v2) Purpose: Verify all v1 findings have been properly addressed

❌ Blocker Resolution

ID	v1 Finding	v2 Status	Evidence
A-1	Seed timing contradiction: docker-compose mounts `init.sql` to `docker-entrypoint-initdb.d` while recommending Flyway `R__` migration	✅ Resolved	Section 1.3 explicitly declares "Decision (v2): Flyway-only seeding — NO Docker docker-entrypoint-initdb.d mount." Section 5.1 `docker-compose.test.yml` has NO volume mount on the `db` service, with comment confirming intent. The contradiction is eliminated.

⚠️ Warning Resolution

ID	Expert	v1 Finding	v2 Status	Evidence
D-1	🏛️	No adult daily quota limit (25g/day) test	✅ Resolved	New `13-kcang-regulatory.spec.ts` — test #1 (26g → rejection), test #6 (23g+3g exceeds daily), test #7 (23g+2g = exactly 25g → success). Thorough coverage of boundary.
D-2	🏛️	No THC% limit test for under-21 members	✅ Resolved	KCanG spec tests #3 (22% THC to U21 → rejection), #4 (0.5% THC to U21 → success), #5 (30g+ low-THC to U21 → quota error), #9 (UI shows "max. 10% THC" notice). Excellent coverage.
A-2	🔧	tmpfs may fail on Docker Desktop macOS; network aliases unclear	⚠️ Partially	Section 5.4 clearly documents Docker service-name networking (no custom aliases needed). However, `tmpfs` is still unconditional — no CI-only gating. Risk is LOW: CI runs on `ubuntu-latest` (native Linux), and local devs can override with `docker compose -f docker-compose.yml -f docker-compose.test.yml up`. Acceptable for implementation.
A-3	🔧	`seed` container removal not explicitly reflected in compose	✅ Resolved	Section 5.1 `docker-compose.test.yml` defines exactly 4 services: `db`, `backend`, `frontend`, `playwright`. No `seed` service exists. `playwright` depends on `backend: condition: service_healthy`. Clean.
A-4	🔧	Playwright container needs `pnpm install` before tests	✅ Resolved	Section 5.2 introduces `Dockerfile.playwright` with `pnpm install --frozen-lockfile` at build time. Dependencies are pre-installed in the image.
A-5	🔧	No BACKEND_URL env for API client inside Docker	✅ Resolved	Section 5.1 playwright service has `API_URL: http://backend:8080`. Section 2.7 global-setup uses this for health checks and API client initialization.
R-1	🛡️	5-minute target unrealistic for 60+ tests with Docker overhead	⚠️ Partially	Plan now has per-test reset (~500ms × 70 = 35s overhead), making 5 min MORE ambitious. CI timeout is 10 min (appropriate). Test tagging splits `@smoke` (PR, <2 min) from `@full` (merge, 10 min). Section 7 success criteria still says "< 5 minutes total" — this is optimistic but the CI timeout and tagging strategy make it non-blocking.
R-2	🛡️	Suite-level reset causes cascading failures on CRUD test errors	✅ Resolved	Section 2.1 switches to per-test reset: "Decision (v2): Per-test reset via backend API endpoint + `beforeEach` hook." Each test calls `apiClient.resetDb()` — complete state isolation between tests.
R-3	🛡️	`data-testid` strategy must be decided before implementation	✅ Resolved	Section 2.2 commits to `data-testid` as mandatory. Naming convention defined (`<page>-<component>-<identifier>`), `selectors.ts` centralized file shown, implementation tracked as Phase 2C sub-task.

ℹ️ Info Finding Resolution

ID	Expert	v1 Finding	v2 Status	Evidence
D-3	🏛️	Fixed-date compliance deadlines may drift	✅ Resolved	Section 1.4 principle #3: "Use relative dates where possible (`NOW() - INTERVAL '7 days'`) for time-sensitive tests". Per-test reset re-runs seed each time, keeping relative dates fresh.
D-4	🏛️	`recorded_by` should reference admin, not member	ℹ️ Noted	Not explicitly addressed in v2 — minor realism improvement, can be done during seed implementation.
R-4	🛡️	Hardcoded expected values — use seed-constants.ts	⚠️ Partially	No explicit `seed-constants.ts` file created, but deterministic UUIDs (Section 1.4 #1) + per-test reset ensure assertions are stable. The `selectors.ts` centralizes locators but not data values. Acceptable — can be added during implementation when duplication becomes apparent.
R-5	🛡️	No retry on DB reset endpoint	ℹ️ Noted	No explicit retry logic shown for `resetDb()`. Implementation detail — the global-setup health check ensures backend is ready before any reset calls. Low risk.
R-6	🛡️	Monthly quota seed uses fixed 2024/12, not current month	✅ Resolved	Relative dates principle + Section 4.2 test tagging strategy (`@smoke`/`@full` with `--grep`). Seed data for quotas will use relative dates per principle #3.
R-7	🛡️	Read-only tests could run in parallel for speed	ℹ️ Noted	Not adopted in v2. Per-test reset makes parallelism less critical (each test is independent). Can be optimized later if execution time becomes a concern.

New Findings in v2

ID	Severity	Finding	Impact
v2-1	ℹ️ Info	Volume mount + Dockerfile build overlap. Section 5.1 mounts `./cannamanage-frontend/e2e:/app/e2e:ro` into the playwright container which also COPYs `e2e/` in `Dockerfile.playwright`. The volume mount overrides the built-in files at runtime.	This is an intentional Docker pattern (enables test iteration without rebuild) but should be documented to avoid confusion. Non-blocking.
v2-2	ℹ️ Info	Success criterion "< 5 minutes total" vs per-test reset overhead. 70+ tests × ~500ms reset + test execution (3-8s each) = realistic estimate is 6-8 minutes for `@full` suite. CI timeout of 10 min is correct, but the stated target is optimistic.	Cosmetic — CI timeout is appropriate. Adjust success criterion to "< 8 minutes" post-implementation based on actual measurements.
v2-3	ℹ️ Info	`Dockerfile.playwright` pinned to `v1.49.0` — this should match the Playwright version in `package.json` to avoid browser/API version mismatches.	Document: keep Dockerfile image version in sync with `@playwright/test` version in `package.json`.

Updated Confidence Scores

Expert	v1 Confidence	v2 Confidence	Change	Reasoning
🏛️ Domain	82%	95%	+13	D-1 and D-2 fully resolved with 9 comprehensive KCanG test cases. Regulatory coverage is now excellent.
🔧 Architecture	65%	92%	+27	A-1 blocker completely eliminated. Dockerfile.playwright, API_URL, health checks — all addressed. Only tmpfs (A-2) remains partially open but low-risk.
🛡️ Reliability	75%	90%	+15	Per-test reset (R-2) and data-testid commitment (R-3) are the two highest-impact improvements. Execution time target (R-1) is slightly optimistic but non-blocking.

Overall Panel Confidence: 92% (v1: 74%, Δ: +18)

Panel Verdict (v2)

✅ APPROVED — Plan is ready for implementation

All blocking findings resolved. All critical warnings addressed. Remaining items (A-2 tmpfs conditional, R-1 time target, R-4 seed-constants.ts) are implementation-time decisions that don't require plan revision.

Key improvements in v2:

Seed timing contradiction eliminated — Flyway-only, no Docker init.sql
Per-test DB reset — complete test isolation via beforeEach + API endpoint
data-testid strategy committed — naming convention, selectors.ts, mandatory for Phase 2C
KCanG regulatory spec added — 9 comprehensive edge case tests covering §3 Abs. 1 + Abs. 2
Dedicated Playwright Dockerfile with pre-installed dependencies
Clear Docker networking documentation and health check strategy

Recommendation: Proceed to implementation. The 3 ℹ️ info items (v2-1, v2-2, v2-3) can be addressed during Phase 2C/2D without plan revision.

v2 Re-review completed 2026-06-18 by Plan Reviewer mode.

v3 Final Review (2026-06-18)

Reviewer: Roo (Plan Reviewer) Document reviewed: docs/sprint-12/cannamanage-sprint12-phase2-integration-tests.md (v3 — final revision) Purpose: Final gate review before implementation begins. All v2 partial/info items should now be fully resolved.

🏛️ Expert 1: Domain Expert (KCanG Regulatory Compliance)

Assessment: ✅ Excellent — No remaining gaps

#	Check	Verdict	Evidence
D-1	Adult daily 25g limit tested	✅	KCanG spec tests #1, #6, #7 — boundary cases (26g reject, 23+3 reject, 23+2 pass)
D-2	Under-21 THC% limit tested	✅	KCanG spec tests #3, #4, #5, #9 — Amnesia Haze 22% → reject, CBD Critical Mass 0.5% → pass, UI notice
D-3	Compliance deadlines use relative dates	✅	Section 1.4 principle #3: `NOW() - INTERVAL` for time-sensitive tests, per-test reset keeps fresh
D-4	`recorded_by` references admin UUID	✅	Section 1.2: explicitly states "v3: `recorded_by` = admin UUID `b1000000-...001`, not member UUID"
D-5	Monthly quota limits differentiated (50g adult vs 30g U21)	✅	`seed-constants.ts` exports `KCANG.ADULT_MONTHLY_LIMIT_G: 50` and `UNDER21_MONTHLY_LIMIT_G: 30`
D-6	Seed covers all regulatory-critical entity types	✅	Destruction records, compliance deadlines, distribution audit trail, grow lifecycle all seeded

New observation (v3):

#	Severity	Finding	Impact
D-7	✅ Good	`seed-constants.ts` exports KCanG limits as constants. Tests import `KCANG.ADULT_DAILY_LIMIT_G` etc. — if regulations change (unlikely short-term, but possible with KCanG amendments), there is a single point of update.	Excellent maintainability for regulatory compliance.
D-8	ℹ️ Info	No "combined monthly" test (adult at 50g boundary). Tests cover 25g daily and under-21 30g monthly, but no test explicitly hits the adult 50g monthly ceiling. KCanG spec #2 tests 45g+6g=51g exceeding monthly, which implicitly covers it.	Covered via test #2 — no action needed.

Domain Verdict: ✅ APPROVED — confidence 97%

All KCanG regulatory paths are comprehensively tested. The seed-constants.ts file makes regulatory limit changes trivial to propagate. The recorded_by fix ensures audit trail realism. No remaining domain gaps.

🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)

Assessment: ✅ Sound — All concerns resolved

#	v2 Finding	v3 Resolution	Verdict
A-1	Seed timing contradiction	✅ Eliminated in v2 (Flyway-only, confirmed in v3)	✅
A-2	tmpfs macOS issues	✅ `docker-compose.test.local.yml` override with named volume (Section 5.5)	✅
A-3	Seed container removal	✅ Only 4 services in compose: `db`, `backend`, `frontend`, `playwright`	✅
A-4	Playwright needs pnpm install	✅ `Dockerfile.playwright` with `pnpm install --frozen-lockfile` at build time	✅
A-5	No BACKEND_URL for API client	✅ `API_URL: http://backend:8080` in playwright service env	✅
v2-1	Volume + Dockerfile overlap confusion	✅ Explicit comment in Section 5.1 explaining the intentional pattern	✅
v2-3	Playwright version pinning	✅ Bold warning in Section 5.2: must match `@playwright/test` in package.json	✅

Architecture analysis of v3 additions:

#	Severity	Finding	Impact
A-8	✅ Good	`docker-compose.test.local.yml` is a clean override pattern. Using compose file stacking (`-f base -f override`) is the Docker-native way to handle env-specific differences. No conditional logic in the base file — declarative and predictable.	Well-architected.
A-9	✅ Good	`seed-constants.ts` placement at `cannamanage-frontend/e2e/seed-constants.ts`. Correctly lives in the test scope, not in application code. Imported by specs via relative path. No runtime dependency.	Clean separation of concerns.
A-10	ℹ️ Info	No automated enforcement of `seed-constants.ts` usage. The "Rule: never hardcode seed values in specs" is documented but not lint-enforced. A custom ESLint rule (e.g., forbid UUID/number literals in spec files) could catch violations during PR review.	Low priority — PR review process is sufficient for a team of this size. Future improvement.
A-11	ℹ️ Info	`R__seed_test_data.sql` checksum behavior. Flyway repeatable migrations re-execute ONLY when the file checksum changes. If a developer adds seed data but forgets to update `seed-constants.ts`, tests will pass locally (fresh DB) but may confuse on next run.	Documented in Section 8 risks. Acceptable.

Architecture Verdict: ✅ APPROVED — confidence 95%

All v1/v2 architectural concerns are fully resolved. The tmpfs override pattern is clean. The Dockerfile.playwright with version pinning is well-documented. Docker networking is clear. The volume mount explanation in v3 eliminates the last source of implementation confusion.

🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Maintenance)

Assessment: ✅ Solid — all high-risk items resolved

#	v2 Finding	v3 Resolution	Verdict
R-1	5-minute target unrealistic	✅ Changed to "< 8 minutes for @full, < 2 minutes for @smoke" (Section 7)	✅
R-2	Suite-level reset causes cascading failures	✅ Resolved in v2 (per-test reset via beforeEach)	✅
R-3	data-testid strategy undecided	✅ Resolved in v2 (mandatory, naming convention, selectors.ts)	✅
R-4	Hardcoded expected values	✅ Complete `seed-constants.ts` with UUIDs, member data, KCanG limits, counts (Section 2.8)	✅
R-5	No retry on DB reset endpoint	ℹ️ Not explicitly added, but global-setup health check warms backend + pool before tests	Acceptable
R-6	Fixed year/month in quota seed	✅ Relative dates principle (Section 1.4 #3) + per-test reset re-runs seed	✅
R-7	No parallel execution for read-only tests	ℹ️ Not adopted — per-test reset makes it less critical	Acceptable

Reliability analysis of v3 additions:

#	Severity	Finding	Impact
R-10	✅ Good	`seed-constants.ts` is comprehensive. Covers UUIDs, member metadata (`isUnder21`, `quotaUsedG`), strain THC%, KCanG limits, counts for every entity type. 80+ exported constants.	Single source of truth dramatically reduces maintenance burden from spec-level hardcoding.
R-11	✅ Good	Time targets are now realistic. 8-minute full suite with 10-minute CI timeout gives 25% safety margin. 2-minute smoke suite is achievable with ~15 tagged tests at 5-8s each.	Prevents false-failure CI red due to optimistic expectations.
R-12	✅ Good	tmpfs override is opt-in, not opt-out. Developers only use the local override if they experience issues. Default (tmpfs) works on CI and most macOS Docker Desktop versions. No "if CI then X else Y" conditional complexity.	Simple mental model for developers.
R-13	ℹ️ Info	seed-constants.ts → R__seed_test_data.sql synchronization is manual. When someone changes the SQL seed, they must also update the TypeScript constants file. There's no automated check that these stay in sync.	Acceptable risk for a small team. Could add a CI check later (parse SQL → generate constants → compare).

Flakiness risk assessment (final):

Risk Factor	v1 Score	v3 Score	Change
State pollution between tests	🔴 High	🟢 Low	Per-test reset eliminates
Selector brittleness	🔴 High	🟢 Low	data-testid mandatory
Hardcoded assertion values	🟡 Medium	🟢 Low	seed-constants.ts
Time-dependent test data	🟡 Medium	🟢 Low	Relative dates in seed
Docker networking flakiness	🟡 Medium	🟢 Low	Health checks + liberal timeouts
CI execution time pressure	🟡 Medium	🟢 Low	Realistic 8-min target + 10-min timeout
macOS local dev issues	🟡 Medium	🟢 Low	docker-compose.test.local.yml override

Estimated maintenance burden: < 5% of test development time (was 15-25% estimated in v1 review).

Reliability Verdict: ✅ APPROVED — confidence 94%

All high-risk flakiness vectors have been systematically addressed. The seed-constants.ts file is the most impactful v3 addition from a reliability perspective — it eliminates the "change seed, break 40 tests" failure mode. The realistic time targets prevent CI false-reds. The per-test reset (from v2) provides complete test isolation.

Panel Synthesis (v3 Final)

Confidence Scores

Expert	v1	v2	v3	Reasoning
🏛️ Domain	82%	95%	97%	`recorded_by` fix + KCanG constants in `seed-constants.ts` close the last gaps
🔧 Architecture	65%	92%	95%	tmpfs override + volume explanation + version pinning = all concerns fully resolved
🛡️ Reliability	75%	90%	94%	`seed-constants.ts` + realistic time targets eliminate remaining medium-risk items

Overall Panel Confidence: 95% (v1: 74% → v2: 92% → v3: 95%)

Remaining Items (all ℹ️ Info — none blocking)

ID	Expert	Finding	Priority
A-10	🔧	No lint enforcement of seed-constants usage	Future improvement
A-11	🔧	SQL ↔ TS constants sync is manual	Acceptable for team size
R-5	🛡️	No explicit retry on DB reset endpoint	Low risk, health check mitigates
R-13	🛡️	seed-constants.ts sync with SQL is manual	Can add CI check post-implementation

None of these require plan revision. All are implementation-time decisions or future improvements.

Panel Verdict (v3 Final)

✅ APPROVED — Plan is complete, correct, and ready for implementation

GO recommendation: Proceed immediately to Phase 2A.

All 3 experts approve without blocking findings. The plan has matured through 3 iterations:

Version	Verdict	Blockers	Confidence
v1	🔄 REVISE	1 (seed timing)	74%
v2	✅ APPROVED	0	92%
v3	✅ APPROVED	0	95%

v3 specifically resolved:

✅ tmpfs conditional — docker-compose.test.local.yml override (clean, declarative)
✅ Realistic time targets — @full < 8 min, @smoke < 2 min (was "<5 min")
✅ seed-constants.ts — 80+ exported constants, single source of truth for all test assertions
✅ recorded_by fix — admin UUID for audit trail realism
✅ Volume overlap documentation — intentional pattern explained inline
✅ Version pinning — Playwright Docker image ↔ package.json sync rule with bold warning

Quality assessment: This is a production-grade integration test plan. The architecture (Flyway seed → per-test reset → data-testid selectors → seed-constants.ts) forms a cohesive, maintainable system. The phased implementation (2A→2E over 4 days) is realistic and correctly ordered.

No further revision needed. Implementation can begin.

v3 Final panel review completed 2026-06-18 by Plan Reviewer mode.

35 KiB Raw Blame History Unescape Escape

Expert Panel Review: Sprint 12 Phase 2 — Integration Tests with Seed DB

🏛️ Expert 1: Domain Expert (Cannabis Club Regulatory Compliance / KCanG)

Assessment: ⚠️ Mostly Sound — 2 Gaps

Strengths

Findings

Domain Verdict: ⚠️ ACCEPTABLE with D-1 and D-2 as recommended additions

🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)

Assessment: ⚠️ Concerns — 3 Issues (1 blocking)

Strengths

Findings

Architecture Verdict: ⚠️ REVISE — A-1 is contradictory and needs resolution

🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Coverage)

Assessment: ⚠️ Acceptable — manageable risks

Strengths

Findings

Reliability Verdict: ⚠️ ACCEPTABLE — R-2 and R-3 are the main risks

Panel Synthesis

Confidence Scores

Combined Findings by Severity

❌ Blocking (1)

⚠️ Warnings (9)

ℹ️ Info (6)

Panel Verdict

🔄 REVISE — 1 blocking finding must be resolved in the plan

v2 Re-Review (2026-06-18)

❌ Blocker Resolution

⚠️ Warning Resolution

ℹ️ Info Finding Resolution

New Findings in v2

Updated Confidence Scores

Panel Verdict (v2)

✅ APPROVED — Plan is ready for implementation

v3 Final Review (2026-06-18)

🏛️ Expert 1: Domain Expert (KCanG Regulatory Compliance)

Assessment: ✅ Excellent — No remaining gaps

Domain Verdict: ✅ APPROVED — confidence 97%

🔧 Expert 2: Architecture Expert (Next.js + Spring Boot + Playwright + Docker)

Assessment: ✅ Sound — All concerns resolved

Architecture Verdict: ✅ APPROVED — confidence 95%

🛡️ Expert 3: Risk & Reliability Expert (Test Reliability, CI Flakiness, Maintenance)

Assessment: ✅ Solid — all high-risk items resolved

Reliability Verdict: ✅ APPROVED — confidence 94%

Panel Synthesis (v3 Final)

Confidence Scores

Remaining Items (all ℹ️ Info — none blocking)

Panel Verdict (v3 Final)

✅ APPROVED — Plan is complete, correct, and ready for implementation

35 KiB

Raw Blame History