Architecture White Paper
Source of truth:
d3chat/backend/app/main.pyd3chat/backend/app/routers/*.pyd3chat/backend/app/models/*.pyd3chat/backend/app/federation/*.pyd3chat/backend/app/websocket/*.pyd3chat/frontend/src/crypto/*.tsd3chat/frontend/src/store/*.ts
Abstract
d3chat is a self-hosted, federated messaging system with client-side encryption and server-side operational controls. The architecture is designed around four core goals:
- Preserve message confidentiality from server operators.
- Enable practical real-time UX with horizontal scalability.
- Support federated server-to-server communication with cryptographic integrity.
- Provide strong moderation and configuration controls for operators.
At runtime, the system behaves as a layered design:
- client cryptographic and state layer
- HTTP API control plane
- WebSocket realtime data plane
- PostgreSQL persistence layer
- Redis coordination layer
- federation trust boundary
Design Goals and Constraints
Confidentiality model
The server persists payloads as opaque content from the application’s perspective. For encrypted flows, payload generation and payload interpretation happen on clients.
Availability and responsiveness
The system separates API writes from realtime fanout using Redis pub/sub. This avoids coupling client delivery latency to request-response path completion.
Operational control
Moderation, registration policy, and branding are treated as runtime-controlled settings in database state, not build-time constants.
Federation simplicity
Federation uses signed HTTP with explicit replay protection and dedup semantics rather than a heavyweight consensus protocol.
System Context and Boundaries
Trust boundaries
- Client device boundary
- Local server boundary
- Remote server boundary
- Shared transport boundary (TLS, reverse proxy, tunnel)
Principal identities
- User identity:
username@server_domain - Device identity: UUID plus per-device cryptographic bundle
- Server identity: domain + Ed25519 signing key
Data ownership model
- User, channel, membership, and moderation state are local-server authoritative.
- Message routing may cross servers for federated channels.
- Cryptographic secrecy for content depends on client key custody.
Component Architecture
Frontend runtime
Major responsibilities:
- authentication and token lifecycle
- channel/message/member state management
- realtime event reconciliation
- key material bootstrap and ratcheting
- encryption/decryption dispatch by channel type
Primary implementation modules:
- API client and token refresh logic
- WebSocket client with reconnect backoff
- chat/auth/admin stores
- crypto primitives and protocol orchestration
Backend application
Backend is a FastAPI composition with route groups for:
- auth
- users
- devices
- channels
- messages
- keys
- admin
- federation
- websocket endpoint
Cross-cutting responsibilities:
- JWT validation and role checks
- global request rate limiting
- startup migrations and bootstrap
- audit logging
- cache invalidation for public config
Persistence and coordination layers
PostgreSQL persists durable domain state. Redis handles:
- websocket ticket issuance/consumption
- realtime pub/sub fanout
- low-OTP warnings
- federation dedup and rate counters
- public-config cache
- presence state
Data Model Architecture
Core social graph
users: local and remote identities, role/moderation flags, profile datachannels: group/DM metadata, federated marker, encryption typechannel_members: membership relation with role and join timestamp
Messaging core
messages: channel-scoped payloads, sender references, protocol version, origin metadata- indexed by
(channel_id, created_at)for pagination
Device and session model
devices: per-user endpoints and optional public device keysessions: hashed refresh tokens with expiry and device link
Cryptographic support model
device_keys: identity key, signed pre-key, OTP setsender_keys: channel+device sender ratchet material metadata
Governance model
server_settings: runtime knobs in categorized key-value formaudit_logs: immutable moderation/config action trail
Federation peer model
servers: discovered remote peers, key material, endpoint metadata
Runtime Planes
Control plane (HTTP API)
All durable mutations pass through authenticated route handlers and SQL writes. Control-plane responsibilities include:
- account and device lifecycle
- key bundle management
- channel and membership mutation
- moderation actions
- settings mutation
Data plane (WebSocket + Redis)
Data plane responsibilities include:
- realtime message fanout
- typing and presence indicators
- asynchronous channel notifications
- key-health and sender-key notifications
The data plane is eventually consistent with the control plane. Clients should always reconcile with API reads after reconnect or suspected drift.
End-to-End Message Flows
Local channel message flow
- Client encrypts payload according to channel mode.
- Client POSTs message via HTTP.
- Backend validates membership and persists message row.
- Backend publishes
message.newon Redis channel topic. - WebSocket manager fans out event to subscribed user sockets.
Federated channel message flow
Steps 1-3 are the same as local. Additional steps:
- Backend determines remote server set from channel members.
- Backend emits signed
message.relayevents to remote inboxes. - Remote server verifies signature/replay/dedup.
- Remote server materializes local message row and publishes local realtime event.
DM first-session flow (X3DH-style)
- Initiator fetches recipient bundles.
- Initiator derives session key and stores setup metadata in
/keys/channels/{id}/x3dh-setup. - Initiator encrypts message and sends with protocol marker.
- Responder fetches setup data, derives mirror session key, and decrypts.
Group sender-key flow
- Sender ensures channel sender key exists (local + server-distributed).
- Per message, sender ratchets chain key and encrypts payload.
- Message includes sender device identity for receiver key selection.
- Receiver decrypts sequentially per sender-device stream.
Cryptographic Architecture
Client primitives
Current implementation uses Web Crypto APIs with:
- ECDH P-256 for key agreement
- ECDSA P-256 for signed pre-key signatures
- AES-GCM for payload encryption
- HKDF/HMAC-derived ratcheting behavior
Server primitives
- JWT signing for access tokens
- hashed refresh-token persistence
- Ed25519 request signatures for federation transport authentication
Important implementation property
Sender-key ratchet is one-way. If local cache/state is lost and chain state has advanced past a given message number, that message may be undecryptable. Mobile implementations must persist ratchet state and plaintext cache carefully.
Federation Security and Reliability
Inbound federation checks
Each inbound request validates:
- required signature headers
- replay window compliance (5 minutes)
- per-origin rate budget
- origin discovery and key retrieval
- Ed25519 signature validity
- event-id deduplication
Outbound federation behavior
Outbound events are signed per request payload hash and transmitted to remote inbox endpoints with bounded timeout.
Failure posture
- Temporary remote failures log and return failure status.
- Dedup cache reduces duplicate side-effects.
- Unknown federation event types are safely ignored.
Access Control and Governance Model
Role model
useradminsuperadmin
Enforcement model
Authorization is applied via dependency guards before route logic executes. High-impact operations require superadmin role.
Moderation invariants
- self-ban/self-suspend/self-delete prevention
- admin targeting restrictions for non-superadmins
- session invalidation on ban
- full audit-log trail on admin actions
Performance and Scaling Characteristics
Read/write patterns
- message reads rely on channel+time index
- realtime writes are O(1) to Redis publish plus socket fanout cost by subscriber count
Horizontal scaling shape
The pub/sub model supports multi-instance websocket workers as long as each instance subscribes users/channels and listens on shared Redis.
Known hot paths
- high-volume channel fanout
- federation relay bursts on federated group channels
- per-message decryption sequencing on clients
Consistency Model
Realtime consistency
Realtime events are near-immediate but not authoritative in isolation.
Source of truth
HTTP API + database remain authoritative. Clients should rehydrate from API after disconnections.
Idempotency and dedup
- message dedup should occur client-side by message id
- federation dedup occurs server-side by event id
Operational Architecture
Startup and bootstrap
Startup performs:
- migrations
- federation identity setup
- default superadmin bootstrap
- upload directory initialization
Runtime configuration
Server settings are mutable at runtime and surfaced publicly through a cached projection endpoint.
Observability surface
- action audit logs
- route-level HTTP status patterns
- websocket logs for connect/subscribe/fanout
- federation logs for discovery/signature/dispatch failures
Threat Model Summary
In scope mitigations
- tampered federation requests
- replay attacks on federation transport
- unauthorized admin actions
- excessive request volume abuse
Residual risks and assumptions
- key compromise on client device compromises message confidentiality for that device scope
- plaintext cache persistence strategy affects historical decryptability
- transport security quality depends on deployment (TLS termination and reverse proxy posture)
Mobile Client Architecture Guidance
A mobile client can be built from docs if it implements:
- strict endpoint contracts and error handling policy
- robust token and refresh lifecycle
- deterministic websocket reconnection and reconciliation
- durable local storage of cryptographic state
- sequential sender-key decrypt processing
- channel/message dedup rules across HTTP and websocket data planes
See companion docs: