Scaling Strategy
The platform automatically scales based on demand. HPA (Horizontal Pod Autoscaler) adjusts the number of pods based on CPU and memory usage.
Scaling Approach
Horizontal scaling (increasing pod count) is the primary method. Vertical scaling (increasing resources) is used for special cases.
HPA Configuration
| Metric | Target | Min | Max |
|---|---|---|---|
| CPU | 70% | 2 | 10 |
| Memory | 80% | 2 | 10 |
| Requests/sec | 1000 | 2 | 20 |
Scaling Scenarios
Normal Load (< 100 users)
| Component | Replica | CPU | Memory |
|---|---|---|---|
| API | 2 | 250m | 512Mi |
| Agent | 2 | 500m | 1Gi |
| DB | 1 | 1000m | 2Gi |
High Load (100-500 users)
| Component | Replica | CPU | Memory |
|---|---|---|---|
| API | 5 | 500m | 1Gi |
| Agent | 4 | 1000m | 2Gi |
| DB | 2 | 2000m | 4Gi |
Peak Load (500+ users)
| Component | Replica | CPU | Memory |
|---|---|---|---|
| API | 10 | 1000m | 2Gi |
| Agent | 8 | 2000m | 4Gi |
| DB | 3 (HA) | 4000m | 8Gi |
Automatic Scaling
HPA automatically scales up when CPU exceeds 70%. Scale-down begins when it drops below 30%.
Database Scaling
| Strategy | Description |
|---|---|
| Connection Pooling | Connection pool with PgBouncer |
| Read Replicas | Read load distribution |
| Partitioning | Table partitioning |
| Archiving | Old data archiving |
Performance Targets
| Metric | Target |
|---|---|
| Response Time | p99 < 200ms |
| Throughput | 1000+ TPS |
| Availability | 99.9% |
| Error Rate | < 0.1% |