Skip to content

DOMAIN:PRIVACY:PRIVACY_BY_DESIGN

OWNER: julian
ALSO_USED_BY: aimee, eric, victoria
UPDATED: 2026-03-26
SCOPE: architectural privacy patterns for all client projects


OVERVIEW

Privacy by Design is a binding legal obligation under GDPR Art. 25, not a best practice.
The Sambla Group was fined EUR 950,000 specifically for violating Art. 25.
Every GE project must embed privacy into architecture from the design phase.

LEGAL_BASIS: GDPR Art. 25 (Data Protection by Design and by Default)
GUIDANCE: EDPB Guidelines 4/2019


SEVEN FOUNDATIONAL PRINCIPLES

Originally formulated by Ann Cavoukian (Ontario Privacy Commissioner).
Adopted into GDPR framework via Art. 25 and EDPB guidelines.
Jaap-Henk Hoepman (Radboud University, NL) translated into eight engineering strategies (see index.md).

1. PROACTIVE NOT REACTIVE — PREVENTATIVE NOT REMEDIAL

PRINCIPLE: anticipate and prevent privacy-invasive events before they happen
NOT: detect and respond after the fact

IMPLEMENTATION:
- threat modelling for privacy during architecture design
- privacy risk assessment before coding starts
- security review checkpoints in sprint cycles
- default deny for data access (whitelist, not blacklist)

2. PRIVACY AS THE DEFAULT SETTING

PRINCIPLE: maximum privacy protection automatically, without user action
NO user configuration should be needed to achieve privacy.

IMPLEMENTATION:
CHECK: data collection disabled by default — user opts IN to sharing
CHECK: profile visibility set to private by default
CHECK: analytics tracking off until consent given
CHECK: notification preferences set to minimum by default
CHECK: account deletion available without barriers
CHECK: data retention set to minimum necessary by default
CHECK: third-party sharing disabled by default

TRAP: "privacy-friendly defaults" that still collect more than strictly necessary.

3. PRIVACY EMBEDDED INTO DESIGN

PRINCIPLE: privacy is integral to system architecture, not a bolt-on
NOT: adding a cookie banner after the product is built.

IMPLEMENTATION:
- privacy requirements in user stories and acceptance criteria
- data flow diagrams include privacy annotations
- architecture decision records document privacy trade-offs
- privacy-focused definition of done (feature handles PII? mini-DPIA done?)

4. FULL FUNCTIONALITY — POSITIVE-SUM NOT ZERO-SUM

PRINCIPLE: accommodate all legitimate interests without unnecessary trade-offs
NOT: "privacy OR functionality" — instead: "privacy AND functionality."

IMPLEMENTATION:
- privacy-enhancing technologies that maintain service quality
- differential privacy for analytics (useful aggregates, no individual exposure)
- federated learning for ML (model improvement without data centralisation)
- synthetic data for testing (realistic datasets without real PII)

5. END-TO-END SECURITY — LIFECYCLE PROTECTION

PRINCIPLE: strong security throughout entire data lifecycle
FROM collection to deletion — no gaps.

IMPLEMENTATION:
- encryption in transit (TLS 1.3 minimum)
- encryption at rest (AES-256)
- secure deletion procedures (cryptographic erasure for backups)
- access logging at every data touchpoint
- incident response plan covering data at all lifecycle stages

6. VISIBILITY AND TRANSPARENCY

PRINCIPLE: keep operations visible and open to independent verification
USERS and regulators should be able to verify privacy claims.

IMPLEMENTATION:
- public privacy policy in plain language
- data processing register (RoPA) maintained and audit-ready
- transparency reporting on data access requests
- open-source components where possible for verifiability
- algorithmic transparency for automated decisions

7. RESPECT FOR USER PRIVACY — USER-CENTRIC

PRINCIPLE: keep interests of the individual paramount
DESIGN for the user, not for the organisation's convenience.

IMPLEMENTATION:
- granular consent controls
- easy-to-use data subject rights mechanisms
- meaningful choices (not "accept all or leave")
- privacy notices at point of data collection (not buried in T&C)
- user dashboard for data visibility and control


DATA MINIMISATION PATTERNS

PRINCIPLE: collect and process only what is strictly necessary for the stated purpose.
LEGAL_BASIS: GDPR Art. 5(1)(c)

COLLECTION MINIMISATION

PATTERN: collect less data at the point of entry.

TECHNIQUES:
- optional vs required fields — make most fields optional
- progressive profiling — collect data incrementally as relationship develops
- purpose-bound collection — each field tied to specific documented purpose
- form design — remove fields that exist "just in case"
- API design — request minimum necessary scopes/permissions

CHECK: can we achieve the purpose with less data?
CHECK: do we really need date of birth or is age range sufficient?
CHECK: do we need full address or is country/city sufficient?
CHECK: do we need name or is a pseudonymous identifier sufficient?

PROCESSING MINIMISATION

PATTERN: limit which systems and people can access personal data.

TECHNIQUES:
- role-based access control (RBAC) with least privilege
- purpose-specific data views (different services see different subsets)
- aggregation before analysis (count users, don't list them)
- query-level restrictions (redact PII from non-essential queries)

RETENTION MINIMISATION

PATTERN: delete data as soon as purpose is fulfilled.

TECHNIQUES:
- per-purpose retention periods (not one blanket period)
- automated deletion workflows triggered by retention expiry
- retention review during data mapping
- archival vs deletion distinction (archive only if legal basis exists)

COMMON_RETENTION_PERIODS:
- active account data: duration of account + 30 days
- billing records: 7 years (Dutch tax law — Art. 52 AWR)
- log files: 90 days (unless security incident under investigation)
- marketing consent records: duration of consent + 1 year
- CCTV footage: max 4 weeks (Dutch DPA guidance)


PSEUDONYMISATION

DEFINITION (Art. 4(5)): processing so data cannot be attributed to specific person
without use of additional information kept separately.

TECHNIQUES

TOKEN_REPLACEMENT: replace identifiers with random tokens, keep mapping in separate secured store
HASH_WITH_SALT: one-way hash of identifiers — irreversible without salt
KEY_BASED: encrypt identifiers with key stored separately from data
CONSISTENT_PSEUDONYMISATION: same input always maps to same pseudonym (for linkage across datasets)
RANDOM_PSEUDONYMISATION: different pseudonym each time (prevents linkage)

IMPLEMENTATION RULES

CHECK: mapping table (or key) stored separately from pseudonymised data
CHECK: access to mapping table restricted (separate access controls)
CHECK: mapping table encrypted at rest
CHECK: pseudonymisation applied as early as possible in data flow
CHECK: re-identification only by authorised personnel for authorised purposes
CHECK: deletion of mapping table = effective anonymisation

PSEUDONYMISED_DATA: still personal data under GDPR (can be re-identified)
INCENTIVE: GDPR considers pseudonymisation a security safeguard (Art. 32)
BENEFIT: reduced risk profile in DPIA, may enable legitimate interest where otherwise insufficient


ANONYMISATION

DEFINITION: irreversible process — data can never be re-identified.
LEGAL_STATUS: anonymised data is NOT personal data — GDPR does not apply.

TECHNIQUES

AGGREGATION: combine data into groups (minimum group size 5-10)
GENERALISATION: reduce precision (exact age → age range, postcode → region)
K-ANONYMITY: ensure every record is indistinguishable from at least k-1 others
L-DIVERSITY: ensure diversity of sensitive attributes within equivalence classes
DIFFERENTIAL_PRIVACY: add calibrated noise — mathematical guarantee of anonymity
DATA_MASKING: replace real values with realistic but fake values

CRITICAL WARNINGS

WARNING: EDPB (Apr 2025) — LLMs rarely achieve anonymisation standards.
WARNING: re-identification risk increases as auxiliary data grows (Narayanan & Shmatikov).
WARNING: "anonymised" datasets have been re-identified multiple times (Netflix, NYC taxi, AOL).
WARNING: removing names/emails alone is NOT anonymisation — quasi-identifiers remain.

ANONYMISATION VALIDATION

CHECK: can any individual be singled out from the dataset?
CHECK: can records be linked to the same individual?
CHECK: can information be inferred about an individual?
IF_ANY_YES: data is not truly anonymised — treat as pseudonymised (GDPR applies).


PURPOSE LIMITATION (Art. 5(1)(b))

PRINCIPLE: collect data for specified, explicit, and legitimate purposes.
Do not process for purposes incompatible with original purpose.

COMPATIBILITY ASSESSMENT (Art. 6(4))

WHEN considering new purpose for existing data:
1. link between original and new purpose
2. context of collection (relationship, expectations)
3. nature of data (special categories = stricter)
4. consequences for data subject
5. existence of appropriate safeguards

IF_INCOMPATIBLE: need new lawful basis (typically consent) for new purpose.
EXCEPTION: archiving in public interest, scientific/historical research, statistics (Art. 89(1)).

IMPLEMENTATION

CHECK: each data element linked to specific documented purpose
CHECK: purpose recorded in RoPA
CHECK: new feature processing existing data → compatibility assessment documented
CHECK: API endpoints only return data needed for their specific purpose
CHECK: database queries scoped to purpose (no SELECT * on PII tables)


STORAGE LIMITATION (Art. 5(1)(e))

PRINCIPLE: keep data in identifiable form only as long as necessary for purpose.

IMPLEMENTATION

RETENTION_SCHEDULE: document per data category, per purpose
AUTOMATED_DELETION: implement TTL-based deletion (not manual review)
REVIEW_TRIGGERS: account closure, contract end, consent withdrawal, retention expiry
BACKUP_HANDLING: backups must also respect retention (cryptographic erasure or rotation)

TRAP: "we keep everything for compliance" — which compliance? specify the legal basis.
TRAP: indefinite retention "in case we need it" — not a valid purpose.
TRAP: retaining deleted account data in backups without deletion schedule.


PRIVACY-ENHANCING TECHNOLOGIES (PETs)

MATURE / PRODUCTION-READY

TECHNOLOGY: end-to-end encryption (E2EE) — Signal protocol, AES-256-GCM
USE: messaging, file storage, sensitive data fields
MATURITY: production-ready, widely adopted

TECHNOLOGY: homomorphic encryption (HE) — compute on encrypted data
USE: privacy-preserving analytics, secure multi-party computation
MATURITY: partial HE production-ready (Microsoft SEAL), full HE still slow

TECHNOLOGY: differential privacy — mathematical privacy guarantees via noise injection
USE: aggregate analytics, census data, ML training
MATURITY: production-ready (Apple, Google deploy at scale)

TECHNOLOGY: secure enclaves (TEE) — hardware-isolated processing
USE: sensitive computation, key management
MATURITY: production-ready (Intel SGX, ARM TrustZone, AWS Nitro)

EMERGING

TECHNOLOGY: federated learning — train ML models without centralising data
USE: mobile keyboard prediction, healthcare ML
MATURITY: growing adoption, Google FL framework

TECHNOLOGY: zero-knowledge proofs — prove statement without revealing underlying data
USE: age verification without revealing birth date, credential verification
MATURITY: growing adoption, eIDAS 2.0 wallet uses selective disclosure

TECHNOLOGY: synthetic data generation — realistic fake data for development/testing
USE: testing, training, analytics without real PII
MATURITY: production-ready tools available (Mostly AI, Gretel, SDV)


CHECKLIST FOR GE PROJECTS

ARCHITECTURE_PHASE:
CHECK: data flow diagram with privacy annotations
CHECK: privacy threat model completed
CHECK: data minimisation review — each field justified
CHECK: pseudonymisation strategy for stored PII
CHECK: retention schedule documented
CHECK: purpose limitation — each processing activity has specific purpose
CHECK: privacy-friendly defaults configured

BUILD_PHASE:
CHECK: RBAC with least privilege implemented
CHECK: encryption in transit and at rest
CHECK: data subject rights mechanisms functional
CHECK: consent management integrated (if applicable)
CHECK: audit logging for PII access
CHECK: automated retention enforcement

LAUNCH_PHASE:
CHECK: privacy policy published and accessible
CHECK: cookie consent mechanism (if web-facing)
CHECK: DPIA completed (if required)
CHECK: DPA signed with all processors


READ_ALSO: domains/privacy/index.md, domains/privacy/gdpr-implementation.md, domains/privacy/pitfalls.md