GDPR — TECHNICAL MEASURES¶

OWNER: julian
UPDATED: 2026-03-24
SCOPE: technical implementation of GDPR requirements — encryption, pseudonymization, data minimization, retention, audit logging, consent, erasure
SERVES: boris (DBA), urszula/maxim (backend devs), all agents handling personal data

ARTICLE_32 — SECURITY_OF_PROCESSING¶

REQUIRES: implement appropriate technical and organizational measures to ensure security appropriate to the risk
CONSIDERING: state of the art, cost of implementation, nature/scope/context/purposes of processing, risk to individuals

EXPLICITLY_MENTIONED_MEASURES¶

Pseudonymization and encryption of personal data
Ability to ensure ongoing confidentiality, integrity, availability, and resilience of processing systems
Ability to restore availability and access to personal data in a timely manner (incident)
Process for regularly testing, assessing, and evaluating effectiveness of measures

ENCRYPTION¶

AT_REST¶

DATABASE:
- PostgreSQL: enable TDE (Transparent Data Encryption) or filesystem-level encryption
- GE_CURRENT: LUKS full disk encryption on Hetzner server
- COLUMNS: sensitive PII fields additionally encrypted at application level (AES-256-GCM)
- ALGORITHM: AES-256 (symmetric), keys managed in Vault
CHECK: verify encryption active — SELECT current_setting('ssl') for connections

BACKUPS:
- All database backups encrypted before storage
- Backup encryption key stored separately from backup (Vault)
- otto (Backup Guardian) verifies encryption on every backup
CHECK: backup file starts with encryption header, not plaintext SQL

FILE_STORAGE:
- Any file storage containing personal data encrypted at rest
- Container volumes: encrypted underlying storage
- Temporary files: encrypted tmpfs or cleaned on pod termination

IN_TRANSIT¶

ALL_CONNECTIONS:
- TLS 1.3 minimum for external connections
- TLS 1.2+ for internal k8s service-to-service (service mesh or network policy)
- Redis connections: TLS enabled (requirepass + TLS)
- PostgreSQL connections: SSL required (reject non-SSL)
RULE: never transmit personal data over unencrypted channels
CHECK: test with openssl s_client -connect host:port — verify TLS version and cipher

KEY_MANAGEMENT¶

OWNER: piotr (Secrets Manager)
IMPLEMENTATION:
- Vault manages all encryption keys
- Key rotation: annually minimum, immediately if compromise suspected
- Key hierarchy: master key → data encryption keys → per-tenant keys (for client isolation)
- Key backup: Vault unseal keys stored securely, separate from Vault backup
RULE: never store encryption keys alongside encrypted data
RULE: never hardcode keys in source code — secrets scanning enforces this

PSEUDONYMIZATION¶

DEFINITION (Article 4(5))¶

Processing personal data so it can no longer be attributed to a specific data subject without additional information, provided that additional information is kept separately and subject to technical and organizational measures.

TECHNIQUES¶

TOKENIZATION:
- Replace PII with random token
- Mapping table stored separately (Vault or separate encrypted database)
- USE_WHEN: need to re-identify (e.g., for data subject rights)
- EXAMPLE: replace email with UUID, mapping stored in lookup table

HASHING:
- One-way hash of PII (SHA-256 with salt)
- USE_WHEN: need to match/deduplicate but not re-identify
- RULE: always salt hashes — unsalted hashes can be reversed via rainbow tables
- EXAMPLE: hash email for analytics linkage

DATA_MASKING:
- Partial redaction of PII
- USE_WHEN: need partial visibility (e.g., last 4 digits of phone)
- EXAMPLE: "--1234", "j***@example.com"
- boris (DBA) implements masking views for non-privileged access

GE_IMPLEMENTATION_PATTERN¶

-- Pseudonymized view for analytics
CREATE VIEW analytics_users AS
SELECT
    user_token,              -- tokenized user_id
    age_bracket,             -- generalized from date_of_birth
    region,                  -- generalized from full address
    signup_month,            -- generalized from exact date
    -- NO: name, email, phone, full address, date_of_birth
FROM users_pseudonymized;

RULE: pseudonymized data is STILL personal data under GDPR (recital 26) — it CAN be re-identified
RULE: anonymized data (irreversibly) is NOT personal data — but true anonymization is hard to achieve

DATA_MINIMIZATION¶

PRINCIPLE (Article 5(1)(c))¶

Personal data shall be adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed.

SCHEMA_DESIGN_RULES¶

RULE: for every personal data field, document WHY it is necessary
IF field is "nice to have" but not necessary THEN do not collect it
IF field collected for one purpose but used for another THEN separate legal basis needed

ANTI-PATTERNS:
- Collecting full date of birth when only age verification needed → collect year only or boolean (over_18)
- Collecting full address when only country/region needed → collect only what is needed
- Storing raw IP addresses when only geolocation needed → resolve to region, discard IP
- Retaining form submissions indefinitely → define retention per purpose

SCHEMA_REVIEW_CHECKLIST (boris reviews)¶

FOR each table containing personal data:
  FOR each column:
    1. Is this personal data? (directly or indirectly identifying)
    2. What is the purpose? (documented in ROPA)
    3. Is it necessary for that purpose? (if no → remove)
    4. What is the lawful basis? (consent/contract/etc.)
    5. What is the retention period? (define and enforce)
    6. Who can access it? (RBAC, need-to-know)
    7. Is it adequately protected? (encryption, pseudonymization)

RETENTION_POLICIES¶

PRINCIPLE (Article 5(1)(e))¶

Personal data kept in identifiable form no longer than necessary for the purposes of processing.

GE_RETENTION_SCHEDULE¶

Data Category	Retention Period	Basis	Deletion Method
Active user account data	Duration of account + 30 days	Contract	Hard delete
Inactive user data	2 years after last activity	Legitimate interest	Hard delete
Transaction records	7 years	Legal obligation (tax)	Hard delete after period
Application logs with PII	90 days	Legitimate interest	Automated rotation
Backup containing PII	30 days rolling	Security	Overwritten by rotation
Consent records	Duration of processing + 5 years	Legal obligation (proof)	Hard delete
DSAR records	3 years after completion	Legal obligation (proof)	Hard delete
Breach records	5 years	Legal obligation	Hard delete
Employee data	Duration + 5 years	Legal obligation (tax/pension)	Hard delete

AUTOMATED_DELETION¶

IMPLEMENTATION:

-- Example: automated deletion job (run daily by cron)
-- Delete inactive users after 2 years
DELETE FROM users
WHERE last_activity_at < NOW() - INTERVAL '2 years'
AND status = 'inactive'
AND user_id NOT IN (
    SELECT user_id FROM legal_holds  -- respect legal holds
);

-- Log deletion (metadata only, not the deleted data)
INSERT INTO deletion_log (table_name, record_count, deletion_date, reason)
VALUES ('users', deleted_count, NOW(), 'retention_policy_inactive_2yr');

RULE: deletion jobs run automatically — not manually triggered
RULE: legal holds override retention policy (litigation, regulatory investigation)
RULE: deletion log preserves METADATA only — not the deleted personal data
CHECK: otto verifies deletion jobs execute on schedule

AUDIT_LOGGING_FOR_DATA_ACCESS¶

WHAT_TO_LOG¶

RULE: log every access to personal data — who, what, when, why, from where

LOG_FIELDS:
- Timestamp (UTC, millisecond precision)
- Actor (agent ID or user ID)
- Action (read, create, update, delete, export)
- Resource (table.column or API endpoint)
- Data subject ID (pseudonymized in log)
- Source IP / agent identity
- Justification (task ID, request ID)
- Outcome (success/failure)

WHAT_NOT_TO_LOG¶

RULE: NEVER log the personal data itself in audit logs
BAD: User john@example.com accessed by agent-123
GOOD: User [user_id:a1b2c3] accessed by agent-123 for task [task_id:xyz]

IMPLEMENTATION_PATTERN¶

-- Audit trigger on personal data tables
CREATE OR REPLACE FUNCTION audit_personal_data()
RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO data_access_log (
        timestamp, actor, action, table_name,
        record_id, source, task_id
    ) VALUES (
        NOW(), current_setting('app.current_agent'),
        TG_OP, TG_TABLE_NAME,
        COALESCE(NEW.id, OLD.id),
        current_setting('app.source_ip'),
        current_setting('app.task_id')
    );
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

LOG_RETENTION: 2 years minimum (for demonstrating compliance)
LOG_PROTECTION: append-only, integrity-protected, access restricted to amber (auditor) and julian (compliance)
CHECK: verify audit triggers active on all personal data tables

CONSENT_MANAGEMENT¶

REQUIREMENTS (Articles 7, 8)¶

CONSENT_MUST_BE:
- Freely given (no bundling with service, no imbalance of power)
- Specific (per purpose, not blanket)
- Informed (clear language, identified controller, purposes stated)
- Unambiguous (clear affirmative action, NOT pre-ticked boxes)
- Withdrawable (as easy to withdraw as to give)

CONSENT_RECORD_SCHEMA¶

CREATE TABLE consent_records (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    purpose VARCHAR(100) NOT NULL,        -- e.g., 'marketing_email', 'analytics'
    lawful_basis VARCHAR(20) DEFAULT 'consent',
    granted_at TIMESTAMPTZ NOT NULL,
    withdrawn_at TIMESTAMPTZ,
    consent_version VARCHAR(20) NOT NULL,  -- links to privacy notice version
    collection_method VARCHAR(50) NOT NULL, -- 'web_form', 'api', 'in_app'
    ip_address INET,                       -- for proof of consent
    user_agent TEXT,                        -- for proof of consent
    proof_reference TEXT                    -- screenshot, form submission ID
);

CREATE INDEX idx_consent_user_purpose ON consent_records(user_id, purpose);
CREATE INDEX idx_consent_active ON consent_records(user_id) WHERE withdrawn_at IS NULL;

CONSENT_CHECK_PATTERN¶

FUNCTION has_valid_consent(user_id, purpose):
    consent = SELECT * FROM consent_records
              WHERE user_id = $user_id
              AND purpose = $purpose
              AND withdrawn_at IS NULL
              AND consent_version = current_privacy_notice_version
              ORDER BY granted_at DESC LIMIT 1
    IF consent EXISTS THEN return TRUE
    ELSE return FALSE

RULE: check consent BEFORE processing, not after
RULE: if privacy notice changes, re-consent may be required (depends on materiality of change)

WITHDRAWAL_MECHANISM¶

User-facing: settings page with toggle per purpose
API: DELETE /api/consent/{purpose} or PATCH /api/consent/{purpose} {withdrawn: true}
Effect: processing stops within 24 hours of withdrawal
Existing data: may need deletion unless another lawful basis applies
Proof: retain consent record (with withdrawal date) for compliance evidence

RIGHT_TO_ERASURE_IN_POSTGRESQL¶

HARD_DELETE_VS_SOFT_DELETE¶

HARD_DELETE:
- DELETE FROM table WHERE user_id = $id
- Data physically removed (after VACUUM)
- PREFERRED for GDPR compliance — data truly gone
- RISK: cascade delete must be complete — check ALL related tables

SOFT_DELETE:
- UPDATE table SET deleted_at = NOW() WHERE user_id = $id
- Data still physically present but flagged
- PROBLEM: data subject's data still exists — not truly erased
- ACCEPTABLE_ONLY_IF: processing stops completely AND data deleted within defined period (e.g., 30 days)
- USE_CASE: grace period for accidental deletion, then hard delete by cleanup job

CASCADE_DELETE_IMPLEMENTATION¶

-- Define foreign keys with CASCADE for personal data relationships
ALTER TABLE user_profiles
    ADD CONSTRAINT fk_user_profiles_user_id
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;

ALTER TABLE user_addresses
    ADD CONSTRAINT fk_user_addresses_user_id
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;

-- For tables that should NOT cascade (e.g., financial records with legal retention):
ALTER TABLE transactions
    ADD CONSTRAINT fk_transactions_user_id
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL;
    -- user_id becomes NULL but transaction record preserved for legal obligation

ERASURE_PROCEDURE¶

1. VERIFY identity of requesting data subject
2. CHECK for legal holds or legal obligation exceptions
   IF legal hold THEN deny erasure, inform data subject of reason
   IF legal obligation (tax records) THEN partially erase (remove PII, keep financial records anonymized)
3. IDENTIFY all tables containing data subject's personal data
   → Run data mapping query against all personal data tables
4. DELETE from all identified tables (cascade handles related records)
5. VERIFY deletion complete — run data subject search, expect zero results
6. PURGE from backups — document that backups will be overwritten within retention cycle
7. NOTIFY sub-processors to delete (if data was shared)
8. LOG erasure (metadata: data subject pseudonymized ID, date, scope, executor)
9. CONFIRM to data subject within 1 month

BACKUP_ERASURE_CHALLENGE¶

PROBLEM: deleted data persists in backups
SOLUTIONS:
- Short backup retention (30 days rolling) — data naturally aged out
- Document in privacy notice that backup deletion occurs within retention cycle
- If backup restored, re-apply deletion before going live
RULE: do NOT attempt to selectively delete from backup files — technically infeasible and error-prone
GE_APPROACH: 30-day rolling backups + documented policy that backup overwrite = deletion completion

DATA_PORTABILITY_IMPLEMENTATION¶

EXPORT_FORMAT¶

STANDARD: JSON (machine-readable, commonly used)
ALTERNATIVE: CSV for tabular data

EXPORT_SCOPE¶

INCLUDE: data provided by the data subject (input data)
INCLUDE: data generated by the data subject's activity (observed data)
EXCLUDE: data derived or inferred by the controller (e.g., credit scores, AI predictions)
EXCLUDE: data about other data subjects mixed in

IMPLEMENTATION_PATTERN¶

API: GET /api/users/{id}/export
AUTH: authenticated as the data subject or authorized representative
RESPONSE: application/json

{
  "export_date": "2026-03-24T12:00:00Z",
  "data_controller": "Growing Europe B.V.",
  "data_subject_id": "user-123",
  "data_categories": {
    "account": {
      "name": "...",
      "email": "...",
      "created_at": "..."
    },
    "profile": {
      "preferences": {...},
      "settings": {...}
    },
    "activity": [
      {"date": "...", "action": "...", "details": "..."}
    ]
  }
}

RULE: export within 1 month of request
RULE: provide in commonly used format — JSON preferred
RULE: if technically feasible and requested, transmit directly to another controller

DATA_MINIMIZATION_IN_LLM_PROMPTS¶

THE_PROBLEM¶

AI agents process tasks that may involve personal data in prompts sent to LLM providers (Anthropic, OpenAI, Google).
Personal data in prompts = data transfer to third party = GDPR implications.

MITIGATION_STRATEGIES¶

STRATEGY_1: AVOID — do not include personal data in prompts
- Use placeholders: "User [USER_123] requested..."
- Reference by ID, resolve locally
- GE_IMPLEMENTATION: agent runner strips PII before LLM call where feasible

STRATEGY_2: MINIMIZE — if PII in prompt is unavoidable, use minimum necessary
- Only include fields directly relevant to the task
- Never include full datasets in prompts

STRATEGY_3: CONTRACTUAL — ensure DPA covers LLM provider processing
- Verify provider's data handling policy (no training on customer data)
- Verify DPF certification or SCCs in place
- Document the processing in ROPA

STRATEGY_4: TECHNICAL — implement PII detection and stripping
- Pre-prompt PII scanner (regex + NER model)
- Log when PII passes through to LLM (for audit trail)
- Post-response PII check (ensure LLM did not expose other users' data)

RULE: Anthropic Claude API — customer data not used for training (verify current policy)
RULE: OpenAI API — business tier does not use data for training (verify current policy)
RULE: Google Gemini API — check Vertex AI data processing terms
CHECK: verify provider data handling policies quarterly

SEE_ALSO: gdpr-implementation.md, dpa-landscape.md, iso27001-annex-a.md (A.5.34, A.8.10, A.8.11)
READ_ALSO: domains/privacy/index.md, domains/database/index.md