Little's Law ...

L = λ × W

In a stable queueing system:

  • L = average number of items in the system
  • λ = average arrival rate
  • W = average time each item spends in the system

As applied to planning for scale, we want to find λ based on L (fixed number of cpu cores or hyperthreads) and W (our target or known compute time per request).

e.g. If we had a 4 core cpu, L = 4, and our compute task was (for simplicity we'll say it's consistent) 50ms or .05 seconds, we should expect λ = 80 requests per second.