Communication Service Architecture
Overview
The Communication Service is built with Elixir/Phoenix and leverages OTP (Open Telecom Platform) for high concurrency and fault tolerance.
Architecture Diagram
┌─────────────┐
│ Gateway │
└──────┬──────┘
│
▼
┌─────────────────────┐
│ Communication API │
│ (Phoenix/GraphQL) │
└──────┬──────────────┘
│
├──► Oban (Background Jobs)
│ │
│ ├──► Email Worker
│ ├──► SMS Worker
│ └──► Push Worker
│
├──► PostgreSQL (Templates, Tracking)
│
└──► External Providers
├──► SMTP Servers
├──► Twilio
├──► BulkSMS.com
├──► AWS SNS
├──► FCM
└──► APNS
Components
API Layer (Phoenix)
- GraphQL API - Primary interface using Absinthe
- REST API - Alternative REST endpoints
- WebSocket - Real-time delivery status updates
Background Processing (Oban)
- Email Worker - Handles email sending asynchronously
- SMS Worker - Processes SMS requests with provider failover
- Push Worker - Manages push notification delivery
Storage
- PostgreSQL - Templates, delivery tracking, configuration
- Redis - Oban job queue and caching
Providers
- Email: SMTP with multiple provider support
- SMS: Twilio, BulkSMS.com, AWS SNS (automatic failover)
- Push: FCM (Android), APNS (iOS)
Design Patterns
Supervisor Tree
All workers run under supervision trees for fault tolerance. If a worker crashes, it's automatically restarted.
Retry Logic
Failed deliveries are automatically retried with exponential backoff:
- Initial delay: 1 second
- Max delay: 60 seconds
- Max retries: 5
Provider Failover
For SMS, if the primary provider fails, the system automatically tries the next provider in the configured order.
Scalability
- Horizontal Scaling: Stateless design allows multiple instances
- Concurrency: Elixir processes handle thousands of concurrent requests
- Background Jobs: Heavy operations run asynchronously via Oban
Monitoring
- Prometheus Metrics - Request rates, error rates, delivery success
- OpenTelemetry Tracing - Distributed tracing across providers
- Health Checks - Database and Redis connectivity monitoring