WebSocket Real-Time Collaboration Design Guide
Design real-time collaborative features with WebSockets, presence, conflict handling, offline recovery, scaling, observability, and security safeguards.
Prompt Template
You are a senior backend and real-time systems architect. Design a WebSocket-based real-time collaboration feature for [application/product]. System context: - Collaboration feature: [comments, cursors, document editing, whiteboard, chat, dashboard updates, multiplayer workflow] - Users and scale: [concurrent users, rooms/workspaces, peak traffic] - Stack: [frontend framework, backend language, database, cache/pub-sub, hosting] - Consistency needs: [eventual consistency, strong ordering, conflict-free editing, audit trail] - Offline/reconnect needs: [mobile clients, flaky networks, resumable sessions] - Auth model: [sessions, JWT, SSO, tenant roles] - Data sensitivity: [PII, enterprise data, healthcare, financial, public chat] - Existing APIs/events: [REST, GraphQL, queues, webhooks] - Operational constraints: [cost, latency region, compliance, small team, vendor preference] Deliver: 1. **Recommended architecture**: WebSocket gateway, app services, pub/sub, persistence, and client state 2. **Message protocol** with event names, payload examples, versioning, and validation rules 3. **Presence model** for online users, cursors, typing, room membership, and idle states 4. **Conflict strategy**: optimistic updates, locks, operational transform, CRDT, or last-write rules with rationale 5. **Reconnect and offline recovery flow** with sequence IDs, replay, and stale-session handling 6. **Authorization and tenant isolation checks** for every connection and room join 7. **Scaling plan** for multiple nodes, sticky sessions, Redis/NATS/Kafka, backpressure, and rate limits 8. **Observability plan**: metrics, logs, traces, synthetic tests, and alert thresholds 9. **Security checklist** for abuse, payload limits, origin checks, token expiry, and data leakage 10. **Implementation roadmap** from MVP to production hardening Flag risky assumptions and include test cases for race conditions and reconnect bugs.
Example Output
Real-Time Design — Collaborative Roadmap Board
Architecture
Use a WebSocket gateway for room connections, Redis pub/sub for cross-node fanout, PostgreSQL for durable board events, and a client-side optimistic store. Each board is a room scoped by tenant_id and board_id.
Event protocol
| Event | Direction | Purpose |
|---|---|---|
| board.join | client → server | Authorize and enter room |
| card.move.requested | client → server | Request optimistic move |
| card.move.applied | server → clients | Broadcast validated move |
| presence.updated | server → clients | Cursor and active user state |
Reconnect flow
Clients include last_seen_sequence on reconnect. Server replays missed events from the durable event table for 5 minutes; otherwise the client receives board.snapshot.required.
Test cases
- Two users move the same card at the same time.
- User loses network after optimistic update but before server ack.
- User is removed from tenant while socket remains connected.
- Payload exceeds limit or comes from a disallowed origin.
Tips for Best Results
- 💡Specify whether you need collaborative editing or simpler real-time status updates; the conflict model changes everything.
- 💡Ask for reconnect flows early — real-time features fail in the messy middle, not the happy path.
- 💡Include tenant isolation and room authorization in every design review.
- 💡Plan observability before launch; debugging ghost sockets in production is pure Gremlins-after-midnight energy.
Related Prompts
Code Review Assistant
Get a thorough, senior-level code review with actionable feedback on quality, security, performance, and best practices.
Debugging Detective
Systematically debug errors and unexpected behavior with root cause analysis and fix suggestions.
Code Refactoring Advisor
Transform messy, complex code into clean, maintainable, well-structured code with clear explanations.