Little's Law ...
L = λ × W
In a stable queueing system:
- L = average number of items in the system
- λ = average arrival rate
- W = average time each item spends in the system
As applied to planning for scale, we want to find λ based on L (fixed number of cpu cores or hyperthreads) and W (our target or known compute time per request).
e.g. If we had a 4 core cpu, L = 4, and our compute task was (for simplicity we'll say it's consistent) 50ms or .05 seconds, we should expect λ = 80 requests per second.