GDPR — TECHNICAL MEASURES¶
OWNER: julian
UPDATED: 2026-03-24
SCOPE: technical implementation of GDPR requirements — encryption, pseudonymization, data minimization, retention, audit logging, consent, erasure
SERVES: boris (DBA), urszula/maxim (backend devs), all agents handling personal data
ARTICLE_32 — SECURITY_OF_PROCESSING¶
REQUIRES: implement appropriate technical and organizational measures to ensure security appropriate to the risk
CONSIDERING: state of the art, cost of implementation, nature/scope/context/purposes of processing, risk to individuals
EXPLICITLY_MENTIONED_MEASURES¶
- Pseudonymization and encryption of personal data
- Ability to ensure ongoing confidentiality, integrity, availability, and resilience of processing systems
- Ability to restore availability and access to personal data in a timely manner (incident)
- Process for regularly testing, assessing, and evaluating effectiveness of measures
ENCRYPTION¶
AT_REST¶
DATABASE:
- PostgreSQL: enable TDE (Transparent Data Encryption) or filesystem-level encryption
- GE_CURRENT: LUKS full disk encryption on Hetzner server
- COLUMNS: sensitive PII fields additionally encrypted at application level (AES-256-GCM)
- ALGORITHM: AES-256 (symmetric), keys managed in Vault
CHECK: verify encryption active — SELECT current_setting('ssl') for connections
BACKUPS:
- All database backups encrypted before storage
- Backup encryption key stored separately from backup (Vault)
- otto (Backup Guardian) verifies encryption on every backup
CHECK: backup file starts with encryption header, not plaintext SQL
FILE_STORAGE:
- Any file storage containing personal data encrypted at rest
- Container volumes: encrypted underlying storage
- Temporary files: encrypted tmpfs or cleaned on pod termination
IN_TRANSIT¶
ALL_CONNECTIONS:
- TLS 1.3 minimum for external connections
- TLS 1.2+ for internal k8s service-to-service (service mesh or network policy)
- Redis connections: TLS enabled (requirepass + TLS)
- PostgreSQL connections: SSL required (reject non-SSL)
RULE: never transmit personal data over unencrypted channels
CHECK: test with openssl s_client -connect host:port — verify TLS version and cipher
KEY_MANAGEMENT¶
OWNER: piotr (Secrets Manager)
IMPLEMENTATION:
- Vault manages all encryption keys
- Key rotation: annually minimum, immediately if compromise suspected
- Key hierarchy: master key → data encryption keys → per-tenant keys (for client isolation)
- Key backup: Vault unseal keys stored securely, separate from Vault backup
RULE: never store encryption keys alongside encrypted data
RULE: never hardcode keys in source code — secrets scanning enforces this
PSEUDONYMIZATION¶
DEFINITION (Article 4(5))¶
Processing personal data so it can no longer be attributed to a specific data subject without additional information, provided that additional information is kept separately and subject to technical and organizational measures.
TECHNIQUES¶
TOKENIZATION:
- Replace PII with random token
- Mapping table stored separately (Vault or separate encrypted database)
- USE_WHEN: need to re-identify (e.g., for data subject rights)
- EXAMPLE: replace email with UUID, mapping stored in lookup table
HASHING:
- One-way hash of PII (SHA-256 with salt)
- USE_WHEN: need to match/deduplicate but not re-identify
- RULE: always salt hashes — unsalted hashes can be reversed via rainbow tables
- EXAMPLE: hash email for analytics linkage
DATA_MASKING:
- Partial redaction of PII
- USE_WHEN: need partial visibility (e.g., last 4 digits of phone)
- EXAMPLE: "--1234", "j***@example.com"
- boris (DBA) implements masking views for non-privileged access
GE_IMPLEMENTATION_PATTERN¶
-- Pseudonymized view for analytics
CREATE VIEW analytics_users AS
SELECT
user_token, -- tokenized user_id
age_bracket, -- generalized from date_of_birth
region, -- generalized from full address
signup_month, -- generalized from exact date
-- NO: name, email, phone, full address, date_of_birth
FROM users_pseudonymized;
RULE: pseudonymized data is STILL personal data under GDPR (recital 26) — it CAN be re-identified
RULE: anonymized data (irreversibly) is NOT personal data — but true anonymization is hard to achieve
DATA_MINIMIZATION¶
PRINCIPLE (Article 5(1)(c))¶
Personal data shall be adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed.
SCHEMA_DESIGN_RULES¶
RULE: for every personal data field, document WHY it is necessary
IF field is "nice to have" but not necessary THEN do not collect it
IF field collected for one purpose but used for another THEN separate legal basis needed
ANTI-PATTERNS:
- Collecting full date of birth when only age verification needed → collect year only or boolean (over_18)
- Collecting full address when only country/region needed → collect only what is needed
- Storing raw IP addresses when only geolocation needed → resolve to region, discard IP
- Retaining form submissions indefinitely → define retention per purpose
SCHEMA_REVIEW_CHECKLIST (boris reviews)¶
FOR each table containing personal data:
FOR each column:
1. Is this personal data? (directly or indirectly identifying)
2. What is the purpose? (documented in ROPA)
3. Is it necessary for that purpose? (if no → remove)
4. What is the lawful basis? (consent/contract/etc.)
5. What is the retention period? (define and enforce)
6. Who can access it? (RBAC, need-to-know)
7. Is it adequately protected? (encryption, pseudonymization)
RETENTION_POLICIES¶
PRINCIPLE (Article 5(1)(e))¶
Personal data kept in identifiable form no longer than necessary for the purposes of processing.
GE_RETENTION_SCHEDULE¶
| Data Category | Retention Period | Basis | Deletion Method |
|---|---|---|---|
| Active user account data | Duration of account + 30 days | Contract | Hard delete |
| Inactive user data | 2 years after last activity | Legitimate interest | Hard delete |
| Transaction records | 7 years | Legal obligation (tax) | Hard delete after period |
| Application logs with PII | 90 days | Legitimate interest | Automated rotation |
| Backup containing PII | 30 days rolling | Security | Overwritten by rotation |
| Consent records | Duration of processing + 5 years | Legal obligation (proof) | Hard delete |
| DSAR records | 3 years after completion | Legal obligation (proof) | Hard delete |
| Breach records | 5 years | Legal obligation | Hard delete |
| Employee data | Duration + 5 years | Legal obligation (tax/pension) | Hard delete |
AUTOMATED_DELETION¶
IMPLEMENTATION:
-- Example: automated deletion job (run daily by cron)
-- Delete inactive users after 2 years
DELETE FROM users
WHERE last_activity_at < NOW() - INTERVAL '2 years'
AND status = 'inactive'
AND user_id NOT IN (
SELECT user_id FROM legal_holds -- respect legal holds
);
-- Log deletion (metadata only, not the deleted data)
INSERT INTO deletion_log (table_name, record_count, deletion_date, reason)
VALUES ('users', deleted_count, NOW(), 'retention_policy_inactive_2yr');
RULE: deletion jobs run automatically — not manually triggered
RULE: legal holds override retention policy (litigation, regulatory investigation)
RULE: deletion log preserves METADATA only — not the deleted personal data
CHECK: otto verifies deletion jobs execute on schedule
AUDIT_LOGGING_FOR_DATA_ACCESS¶
WHAT_TO_LOG¶
RULE: log every access to personal data — who, what, when, why, from where
LOG_FIELDS:
- Timestamp (UTC, millisecond precision)
- Actor (agent ID or user ID)
- Action (read, create, update, delete, export)
- Resource (table.column or API endpoint)
- Data subject ID (pseudonymized in log)
- Source IP / agent identity
- Justification (task ID, request ID)
- Outcome (success/failure)
WHAT_NOT_TO_LOG¶
RULE: NEVER log the personal data itself in audit logs
BAD: User john@example.com accessed by agent-123
GOOD: User [user_id:a1b2c3] accessed by agent-123 for task [task_id:xyz]
IMPLEMENTATION_PATTERN¶
-- Audit trigger on personal data tables
CREATE OR REPLACE FUNCTION audit_personal_data()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO data_access_log (
timestamp, actor, action, table_name,
record_id, source, task_id
) VALUES (
NOW(), current_setting('app.current_agent'),
TG_OP, TG_TABLE_NAME,
COALESCE(NEW.id, OLD.id),
current_setting('app.source_ip'),
current_setting('app.task_id')
);
RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;
LOG_RETENTION: 2 years minimum (for demonstrating compliance)
LOG_PROTECTION: append-only, integrity-protected, access restricted to amber (auditor) and julian (compliance)
CHECK: verify audit triggers active on all personal data tables
CONSENT_MANAGEMENT¶
REQUIREMENTS (Articles 7, 8)¶
CONSENT_MUST_BE:
- Freely given (no bundling with service, no imbalance of power)
- Specific (per purpose, not blanket)
- Informed (clear language, identified controller, purposes stated)
- Unambiguous (clear affirmative action, NOT pre-ticked boxes)
- Withdrawable (as easy to withdraw as to give)
CONSENT_RECORD_SCHEMA¶
CREATE TABLE consent_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id),
purpose VARCHAR(100) NOT NULL, -- e.g., 'marketing_email', 'analytics'
lawful_basis VARCHAR(20) DEFAULT 'consent',
granted_at TIMESTAMPTZ NOT NULL,
withdrawn_at TIMESTAMPTZ,
consent_version VARCHAR(20) NOT NULL, -- links to privacy notice version
collection_method VARCHAR(50) NOT NULL, -- 'web_form', 'api', 'in_app'
ip_address INET, -- for proof of consent
user_agent TEXT, -- for proof of consent
proof_reference TEXT -- screenshot, form submission ID
);
CREATE INDEX idx_consent_user_purpose ON consent_records(user_id, purpose);
CREATE INDEX idx_consent_active ON consent_records(user_id) WHERE withdrawn_at IS NULL;
CONSENT_CHECK_PATTERN¶
FUNCTION has_valid_consent(user_id, purpose):
consent = SELECT * FROM consent_records
WHERE user_id = $user_id
AND purpose = $purpose
AND withdrawn_at IS NULL
AND consent_version = current_privacy_notice_version
ORDER BY granted_at DESC LIMIT 1
IF consent EXISTS THEN return TRUE
ELSE return FALSE
RULE: check consent BEFORE processing, not after
RULE: if privacy notice changes, re-consent may be required (depends on materiality of change)
WITHDRAWAL_MECHANISM¶
- User-facing: settings page with toggle per purpose
- API:
DELETE /api/consent/{purpose}orPATCH /api/consent/{purpose} {withdrawn: true} - Effect: processing stops within 24 hours of withdrawal
- Existing data: may need deletion unless another lawful basis applies
- Proof: retain consent record (with withdrawal date) for compliance evidence
RIGHT_TO_ERASURE_IN_POSTGRESQL¶
HARD_DELETE_VS_SOFT_DELETE¶
HARD_DELETE:
- DELETE FROM table WHERE user_id = $id
- Data physically removed (after VACUUM)
- PREFERRED for GDPR compliance — data truly gone
- RISK: cascade delete must be complete — check ALL related tables
SOFT_DELETE:
- UPDATE table SET deleted_at = NOW() WHERE user_id = $id
- Data still physically present but flagged
- PROBLEM: data subject's data still exists — not truly erased
- ACCEPTABLE_ONLY_IF: processing stops completely AND data deleted within defined period (e.g., 30 days)
- USE_CASE: grace period for accidental deletion, then hard delete by cleanup job
CASCADE_DELETE_IMPLEMENTATION¶
-- Define foreign keys with CASCADE for personal data relationships
ALTER TABLE user_profiles
ADD CONSTRAINT fk_user_profiles_user_id
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;
ALTER TABLE user_addresses
ADD CONSTRAINT fk_user_addresses_user_id
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;
-- For tables that should NOT cascade (e.g., financial records with legal retention):
ALTER TABLE transactions
ADD CONSTRAINT fk_transactions_user_id
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL;
-- user_id becomes NULL but transaction record preserved for legal obligation
ERASURE_PROCEDURE¶
1. VERIFY identity of requesting data subject
2. CHECK for legal holds or legal obligation exceptions
IF legal hold THEN deny erasure, inform data subject of reason
IF legal obligation (tax records) THEN partially erase (remove PII, keep financial records anonymized)
3. IDENTIFY all tables containing data subject's personal data
→ Run data mapping query against all personal data tables
4. DELETE from all identified tables (cascade handles related records)
5. VERIFY deletion complete — run data subject search, expect zero results
6. PURGE from backups — document that backups will be overwritten within retention cycle
7. NOTIFY sub-processors to delete (if data was shared)
8. LOG erasure (metadata: data subject pseudonymized ID, date, scope, executor)
9. CONFIRM to data subject within 1 month
BACKUP_ERASURE_CHALLENGE¶
PROBLEM: deleted data persists in backups
SOLUTIONS:
- Short backup retention (30 days rolling) — data naturally aged out
- Document in privacy notice that backup deletion occurs within retention cycle
- If backup restored, re-apply deletion before going live
RULE: do NOT attempt to selectively delete from backup files — technically infeasible and error-prone
GE_APPROACH: 30-day rolling backups + documented policy that backup overwrite = deletion completion
DATA_PORTABILITY_IMPLEMENTATION¶
EXPORT_FORMAT¶
STANDARD: JSON (machine-readable, commonly used)
ALTERNATIVE: CSV for tabular data
EXPORT_SCOPE¶
INCLUDE: data provided by the data subject (input data)
INCLUDE: data generated by the data subject's activity (observed data)
EXCLUDE: data derived or inferred by the controller (e.g., credit scores, AI predictions)
EXCLUDE: data about other data subjects mixed in
IMPLEMENTATION_PATTERN¶
API: GET /api/users/{id}/export
AUTH: authenticated as the data subject or authorized representative
RESPONSE: application/json
{
"export_date": "2026-03-24T12:00:00Z",
"data_controller": "Growing Europe B.V.",
"data_subject_id": "user-123",
"data_categories": {
"account": {
"name": "...",
"email": "...",
"created_at": "..."
},
"profile": {
"preferences": {...},
"settings": {...}
},
"activity": [
{"date": "...", "action": "...", "details": "..."}
]
}
}
RULE: export within 1 month of request
RULE: provide in commonly used format — JSON preferred
RULE: if technically feasible and requested, transmit directly to another controller
DATA_MINIMIZATION_IN_LLM_PROMPTS¶
THE_PROBLEM¶
AI agents process tasks that may involve personal data in prompts sent to LLM providers (Anthropic, OpenAI, Google).
Personal data in prompts = data transfer to third party = GDPR implications.
MITIGATION_STRATEGIES¶
STRATEGY_1: AVOID — do not include personal data in prompts
- Use placeholders: "User [USER_123] requested..."
- Reference by ID, resolve locally
- GE_IMPLEMENTATION: agent runner strips PII before LLM call where feasible
STRATEGY_2: MINIMIZE — if PII in prompt is unavoidable, use minimum necessary
- Only include fields directly relevant to the task
- Never include full datasets in prompts
STRATEGY_3: CONTRACTUAL — ensure DPA covers LLM provider processing
- Verify provider's data handling policy (no training on customer data)
- Verify DPF certification or SCCs in place
- Document the processing in ROPA
STRATEGY_4: TECHNICAL — implement PII detection and stripping
- Pre-prompt PII scanner (regex + NER model)
- Log when PII passes through to LLM (for audit trail)
- Post-response PII check (ensure LLM did not expose other users' data)
RULE: Anthropic Claude API — customer data not used for training (verify current policy)
RULE: OpenAI API — business tier does not use data for training (verify current policy)
RULE: Google Gemini API — check Vertex AI data processing terms
CHECK: verify provider data handling policies quarterly
SEE_ALSO: gdpr-implementation.md, dpa-landscape.md, iso27001-annex-a.md (A.5.34, A.8.10, A.8.11)
READ_ALSO: domains/privacy/index.md, domains/database/index.md