Files
cannamanage/docs/sprint-10/cannamanage-sprint10-analysis.md
Patrick Plate 55110c95af feat(sprint10): Phase 1 — Data model + bank statement parsers (MT940, CAMT.053, CSV)
Implements the Sprint 10 Phase 1 foundation for the Smart Payment Import feature:

Domain layer:
- 3 new enums: BankFormat (MT940, CAMT053, CSV), ImportSessionStatus, MatchStatus
- StaffPermission.FINANCE_IMPORT
- AuditEventType: BANK_IMPORT_STARTED/COMPLETED/FAILED + BANK_PAYMENT_CONFIRMED
- NotificationType.BANK_IMPORT_COMPLETED
- ConsentType.BANK_DATA (DSGVO consent for IBAN storage)
- 3 new entities: BankImportSession, BankTransaction, CsvColumnMapping
- Member: + iban (VARCHAR 34) + ibanConsentDate
- MemberStatus.LEFT (semantic alias for RESIGNED, referenced by Sprint 9 RetentionService)

Persistence:
- V30__bank_import_sessions.sql
- V31__bank_transactions.sql
- V32__csv_column_mappings.sql (also adds iban + iban_consent_date to members)
- 3 Spring Data repositories

Parser infrastructure (cannamanage-service/src/main/java/de/cannamanage/service/bankimport):
- BankStatementParser interface (Strategy pattern, Spring-injected list)
- ParsedTransaction + ParseResult records
- BankStatementParseException (parse errors)
- Mt940Parser: custom state machine, CENTURY_BOUNDARY=70 for YY→YYYY, proprietary
  header tolerance (skips lines before first :20: for StarMoney/WISO/Hibiscus wrappers)
- Camt053Parser: StAX streaming with XXE hardening (IS_SUPPORTING_EXTERNAL_ENTITIES,
  SUPPORT_DTD, IS_REPLACING_ENTITY_REFERENCES all false); supports camt.053.001.02
  and camt.053.001.08 namespaces
- CsvBankParser: Apache Commons CSV with configurable columns per club; German number
  format ("1.234,56"); ISO-8859-1 default encoding
- BankStatementParserService: filename-extension hint + content probe; throws
  UnrecognizedFormatException when no parser claims the file

Build verified via Docker (cannamanage-api:sprint10-phase1).

Sprint 9 fix (incidental, required to compile):
- Added MemberStatus.LEFT (Sprint 9 RetentionService referenced it but the enum
  value was missing)
- MemberListRegistryGenerator: added LEFT to formatStatus() switch (mapped to
  "Ausgetreten", same as RESIGNED)

Sprint 10 docs: analysis, plan, plan-review, testplan.

Co-Authored-By: Lumen <lumen@cannamanage.de>
2026-06-15 17:21:55 +02:00

14 KiB
Raw Permalink Blame History

Sprint 10 Analysis — Smart Payment Import (Kontoauszug-Abgleich)

Date: 2026-06-15 Author: Patrick Plate / Lumen (Architect) Status: Draft v1 Sprint Goal: Automated bank statement import with intelligent payment matching


1. Problem Statement

Cannabis clubs (Anbauvereinigungen) collect monthly/quarterly member fees. Currently, the admin must manually record each payment in the system — comparing bank statements line-by-line against expected member fees. For a club with 200 members making monthly payments, this means 200+ manual Payment entries per month.

Pain points:

  • Time-consuming: ~2-4 hours/month for 200-member clubs
  • Error-prone: manual transcription leads to mismatches
  • Delayed bookkeeping: payments sit unrecorded for days/weeks
  • No audit trail of the import source (original bank file)
  • Difficult to identify missing payments (overdue detection lags)

Opportunity: German banks universally support MT940 (SWIFT standard) and increasingly CAMT.053 (ISO 20022 XML) for electronic bank statement export. Automating the import and matching process reduces the workload from hours to minutes.


2. Affected Components

Existing (Sprint 8 — Finance Module)

Component Path Role
FinanceService cannamanage-service/.../service/FinanceService.java Payment recording, ledger management
Payment entity cannamanage-domain/.../entity/Payment.java Payment records with member + period
LedgerEntry entity cannamanage-domain/.../entity/LedgerEntry.java Double-entry financial journal
FeeSchedule cannamanage-domain/.../entity/FeeSchedule.java Fee definitions (amount, interval)
MemberFeeAssignment cannamanage-domain/.../entity/MemberFeeAssignment.java Member ↔ Fee schedule link
PaymentMethod enum cannamanage-domain/.../enums/PaymentMethod.java CASH, BANK_TRANSFER, SEPA_LASTSCHRIFT, OTHER
finance.ts service cannamanage-frontend/src/services/finance.ts Frontend API hooks for finance
Finance pages cannamanage-frontend/src/app/.../finance/ Admin finance management UI

New (Sprint 10)

Component Purpose
BankImportSession entity Tracks each uploaded file + processing status
BankTransaction entity Individual parsed transactions from bank file
CsvColumnMapping entity Saved CSV column templates per bank
BankStatementParserService Format detection + delegation to format-specific parsers
Mt940Parser SWIFT MT940 text format parser
Camt053Parser ISO 20022 XML parser
CsvBankParser Configurable CSV parser
PaymentMatchingService Weighted matching algorithm with confidence scoring
BankImportService Orchestration: upload → parse → match → confirm
Import UI wizard 4-step frontend wizard for the import flow

3. Current State (Ist-Zustand)

Finance Module (Sprint 8)

The finance module already supports:

  • Fee Schedules: Define fee amounts + intervals (MONTHLY, QUARTERLY, YEARLY, ONE_TIME)
  • Member Fee Assignments: Link members to fee schedules with validity periods
  • Payment Recording: Manual payment entry with member, amount, method, period, reference
  • Ledger Entries: Automatic INCOME entry on payment, EXPENSE entries for club costs
  • Member Balance: Calculated outstanding amounts per member
  • Notifications: PAYMENT_REMINDER, PAYMENT_OVERDUE, PAYMENT_RECEIVED types exist
  • Receipts: Auto-generated on payment confirmation

What's Missing

  • No file upload capability for bank statements
  • No MT940/CAMT.053 parsing
  • No automatic matching logic
  • No import session tracking / audit trail
  • No member IBAN storage (for enhanced matching)
  • No CSV column mapping configuration
  • No GDPR consent type for bank data / IBAN storage
  • No bulk payment confirmation workflow

4. Format Specifications

4.1 MT940 (SWIFT Standard)

The most common bank statement export format from German banks (Sparkasse, Volksbank, Deutsche Bank, Commerzbank, ING DiBa, DKB).

Structure:

:20:TRANSACTION_REF        -- Transaction reference number
:25:BLZKTO/IBAN            -- Account identification (BLZ + Account or IBAN)
:28C:STATEMENT/PAGE        -- Statement number / sequence
:60F:D/CYYMMDDCURRENCY AMT -- Opening balance (D=debit, C=credit)
:61:YYMMDDYYMMDDCD AMOUNT  -- Transaction line (value date, booking date, D/C, amount)
:86:PURPOSE TEXT            -- Multi-line Verwendungszweck (up to 6 lines of 27 chars)
:62F:D/CYYMMDDCURRENCY AMT -- Closing balance
:64:D/CYYMMDDCURRENCY AMT  -- Available balance

Key parsing challenges:

  • Multi-line :86: field contains the Verwendungszweck (payment reference)
  • Amount uses comma as decimal separator (German: 1234,56)
  • Date format is YYMMDD (2-digit year)
  • D = debit (outgoing), C = credit (incoming)
  • Some banks prefix IBAN in :25:, others use BLZ + account number
  • Field :61: encodes transaction type codes (N = normal, S = SEPA, etc.)

Example transaction block:

:61:2506150615CR50,00NMSCNONREF
:86:SVWZ+M-0042 Mitgliedsbeitrag Juni
EREF+NOTPROVIDED
KREF+NOTPROVIDED
MREF+NOTPROVIDED
CRED+DE98ZZZ09999999999
DEBT+DE89370400440532013000

4.2 CAMT.053 (ISO 20022 XML)

Newer XML-based format, richer structured data. Increasingly used by German banks alongside MT940.

Structure (simplified):

<BkToCstmrStmt>
  <Stmt>
    <Id>Statement-ID</Id>
    <Acct><Id><IBAN>DE89370400440532013000</IBAN></Id></Acct>
    <Bal><!-- Opening/Closing balances --></Bal>
    <Ntry>
      <Amt Ccy="EUR">50.00</Amt>
      <CdtDbtInd>CRDT</CdtDbtInd>  <!-- CRDT=incoming, DBIT=outgoing -->
      <BookgDt><Dt>2025-06-15</Dt></BookgDt>
      <ValDt><Dt>2025-06-15</Dt></ValDt>
      <NtryDtls>
        <TxDtls>
          <RmtInf>
            <Ustrd>M-0042 Mitgliedsbeitrag Juni</Ustrd>  <!-- Verwendungszweck -->
          </RmtInf>
          <RltdPties>
            <Dbtr><Nm>Max Mustermann</Nm></Dbtr>
            <DbtrAcct><Id><IBAN>DE12345678901234567890</IBAN></Id></DbtrAcct>
          </RltdPties>
        </TxDtls>
      </NtryDtls>
    </Ntry>
  </Stmt>
</BkToCstmrStmt>

Advantages over MT940:

  • Full counterparty IBAN always available
  • Counterparty name in structured field
  • Verwendungszweck clearly separated (Ustrd = unstructured, Strd = structured)
  • ISO date format (YYYY-MM-DD)
  • Amount with dot decimal separator
  • Richer metadata (end-to-end reference, mandate ID for SEPA)

4.3 Generic CSV

Every German bank offers CSV download but with wildly different column layouts:

Bank Date Col Amount Col Reference Col Separator Encoding
Sparkasse 0 4 8 ; ISO-8859-1
ING DiBa 0 7 4 ; ISO-8859-1
DKB 0 7 3 ; ISO-8859-1
Commerzbank 0 4 3 ; UTF-8
Volksbank 0 11 8 ; ISO-8859-1

Common German CSV traits:

  • Semicolon delimiter (not comma)
  • ISO-8859-1 encoding (not UTF-8) for most banks
  • German date format: dd.MM.yyyy
  • German number format: 1.234,56 (dot as thousands separator, comma as decimal)
  • Header rows (1-2 lines to skip)
  • Quote character: " for fields containing semicolons

5. Matching Algorithm Design

5.1 Candidate Generation

For each incoming transaction (credit only — debits are expenses):

  1. Filter members with active fee assignments in the club
  2. Calculate expected payment amount for current/recent periods
  3. Generate candidate pairs: (transaction, member)

5.2 Weighted Scoring

Criterion Weight Score Logic
Amount match 35% 100 if exact match; 80 if within ±5%; 0 otherwise
Verwendungszweck contains member number 30% 100 if "M-XXXX" found; 50 if partial match
Verwendungszweck contains member name 15% 100 if full name match; 70 if last name only; 50 if fuzzy (Levenshtein ≤2)
IBAN match 15% 100 if exact IBAN match (requires stored IBAN)
Date within payment window 5% 100 if within expected period; 50 if ±30 days; 0 otherwise

5.3 Confidence Thresholds

Confidence Classification Action
≥90% AUTO_MATCHED Green badge, ready for bulk confirm
6089% SUGGESTED Yellow badge, needs manual review
<60% UNMATCHED Red badge, manual assignment required

5.4 Conflict Resolution

  • If multiple members match a single transaction: pick highest confidence, mark as SUGGESTED (never AUTO)
  • If a member matches multiple transactions: possible double payment — flag for review
  • Negative amounts (outgoing): skip matching, offer expense categorization

6.1 DSGVO (GDPR)

Aspect Legal Basis Implementation
IBAN storage Art. 6(1)(b) — contract performance + Art. 6(1)(a) — explicit consent New BANK_DATA consent type, opt-in per member
Bank statement data Art. 6(1)(f) — legitimate interest (bookkeeping) Process & match, don't store raw file permanently
Counterparty names Art. 6(1)(f) — legitimate interest Only display during import review, don't persist for non-members
Data minimization Art. 5(1)(c) Delete raw import file after 30 days, keep only parsed transactions
Right to erasure Art. 17 bank_transactions linked to member — cascade on member deletion

6.2 Financial Retention (§147 AO / GoBD)

Data Retention Period Basis
Bank import sessions 10 years §147 Abs. 1 Nr. 5 AO (Buchungsbelege)
Bank transactions (parsed) 10 years §147 Abs. 1 Nr. 5 AO
Confirmed payments 10 years §147 Abs. 1 Nr. 5 AO
Raw import files 30 days (then delete) Data minimization — parsed data is the Beleg
CSV column mappings Until club deletion Configuration data, no retention requirement

6.3 Tier Restrictions

Feature Starter Pro Enterprise
Imports per month 1 Unlimited Unlimited
Formats CSV only MT940 + CAMT.053 + CSV All formats
Saved column templates 0 3 Unlimited
Auto-confirm (≥90%)
Import history retention 3 months 12 months Unlimited

7. Technology Decisions

7.1 Parser Libraries

Format Approach Rationale
MT940 Custom parser (no external lib) Format is simple enough, no mature Java MT940 lib exists for Spring Boot 3.x. Hand-rolled parser gives full control over German bank quirks.
CAMT.053 JAXB with XSD-generated classes ISO 20022 has official XSD schemas. JAXB (Jakarta XML Binding) generates type-safe classes. Already used in PAISY for GKV data exchange.
CSV Apache Commons CSV Already a pattern in the project (semicolon + ISO-8859-1). Configurable delimiter, quote, encoding.

7.2 File Upload

  • Spring Boot multipart upload (MultipartFile)
  • Max file size: 10 MB (sufficient for years of bank statements)
  • Temporary storage: server filesystem during processing, then delete
  • No S3/cloud storage needed — parsed data is persisted in DB

7.3 Fuzzy Matching

  • Levenshtein distance for name matching: Apache Commons Text StringUtils.getLevenshteinDistance()
  • Already available via Spring Boot starter dependencies
  • No external NLP/AI needed — rule-based matching is sufficient for this domain

8. Risk Assessment

Risk Probability Impact Mitigation
MT940 format variations across banks High Medium Extensive test fixtures from multiple banks; graceful degradation on parse errors
False positive matches (wrong member) Medium High Never auto-confirm without admin review; require ≥90% confidence for green badge
Performance with large files (5000+ transactions) Low Medium Stream-based parsing; batch DB inserts (flush/clear every 100); async processing for large files
IBAN storage GDPR complaints Low Medium Explicit opt-in consent; easy deletion; clear privacy notice
Duplicate import (same file uploaded twice) Medium Low Detect by filename + date range + total; warn but allow (idempotent match re-run)
CSV encoding issues (mojibake) Medium Low Default ISO-8859-1; allow encoding override in mapping template

9. Data Flow

graph TD
    A[Admin uploads bank file] --> B[Format Detection]
    B -->|MT940| C[Mt940Parser]
    B -->|CAMT.053| D[Camt053Parser]
    B -->|CSV| E[CsvBankParser]
    C --> F[List of BankTransaction entities]
    D --> F
    E --> F
    F --> G[PaymentMatchingService]
    G --> H{Confidence Score}
    H -->|>= 90%| I[AUTO_MATCHED - green]
    H -->|60-89%| J[SUGGESTED - yellow]
    H -->|< 60%| K[UNMATCHED - red]
    I --> L[Admin reviews match table]
    J --> L
    K --> L
    L --> M[Confirm matches]
    M --> N[Create Payment + LedgerEntry]
    N --> O[Send PAYMENT_RECEIVED notification]
    M --> P[Categorize as expense]
    P --> Q[Create expense LedgerEntry]

10. Open Questions

  • Should we support MT940 multi-statement files (multiple accounts in one file)?
    • Recommendation: Yes — filter by club's own IBAN, ignore other accounts.
  • Should debit transactions (outgoing) be auto-categorized as expenses?
    • Recommendation: Offer optional categorization, but don't auto-create. Admin selects category.
  • Should we generate a PDF reconciliation report after import completion?
    • Recommendation: Nice-to-have for Sprint 11. Focus on core import/match flow first.
  • Should the matching algorithm learn from previous confirmations (ML)?
    • Recommendation: Out of scope. Rule-based matching is sufficient for club-size data (50-500 members).

11. Integration Points

System Direction What
FinanceService Outgoing recordPayment() called for each confirmed match
NotificationDispatchService Outgoing PAYMENT_RECEIVED to member on confirmation
AuditService Outgoing BANK_IMPORT_STARTED, BANK_IMPORT_COMPLETED events
RetentionService Scheduled Auto-delete raw files after 30 days; retain parsed data 10 years
ConsentService Read Check BANK_DATA consent before storing member IBAN
TierLimitService Read Enforce import count + format restrictions per plan tier